Eight multi-disciplinary teams tackle HPC and AI challenges 

SFU GPU Hackathon Teams
Authored by
Publication Date

It has been one year since the start of the pandemic and as the world continues to adjust to the “new normal” of remote collaboration, Simon Fraser University (SFU) kicked off the first GPU hackathon of 2021. Eight teams from 13 institutions representing various scientific domains across computational fluid dynamics, climate modeling, physics, molecular genetics, hydrology and chemistry, participated in the digital event. 

Organized in partnership with Compute Canada, the hackathon was held remotely over four days from February to March utilizing SFU’s supercomputer, CEDAR. Mentors and teams met two weeks prior to the event to discuss challenges and goals and to complete preparatory work. The result: Each team made substantial progress in their projects during the hackathon despite different time zones, social distancing, and virtualized environments.

Online Format Continues to Deliver

Quickly adapting to the restrictions posed by COVID-19, GPU Hackathons became online remote events with their own unique set of advantages and disadvantages. With over 20 digital events executed, the  format of the remote hackathons has shown that most teams are completely acclimated and adept at digital meeting spaces and tools. With guidance and planning, the event organizers had no difficulty handling multiple time zones, distributed teams and a bevy of different programming models, languages, and development tools.

“Although we weren’t able to enjoy the beautiful environment of SFU’s Burnaby Mountain campus, moving to the online format had the advantage of allowing us to work with more people and reach beyond SFU into the virtual world,” said Fred Popowich, scientific director of SFU’s Big Data Initiative. “Events like this hackathon really help us to move forward on our goal of making AI and machine learning more accessible.”

The digital format enables more mentors to participate by removing travel constraints that would cut into bandwidth, so they are more available to engage with their teams. Additionally, splitting the event over multiple weeks introduces the opportunity for preparatory work to be done ahead of time and contributes to increased productivity at the actual event. 

Accelerating Hydrologic Routing with OpenACC

The task explored by one of the Hackathon teams dealt with the issue of how the  Canadian prairies are dominated by numerous depressions which can affect water movement and transfer. The Prairie Region Inundation MApping (PRIMA) model is a hydrologic routing model that simulates the spatiotemporal surface water movement and storage dynamics over the landscape.

The Numerical Simlab team focused on the Water Redistribution and Routing (WRR) component of their code. In the serial code, the algorithm looks at all the pixels in the map to compute the average water elevation in each computing unit, consisting of a center cell and its eight neighboring cells. It then eliminates cells that exceed the average value, distributes the outflow to the remaining cells and calculates the minimum water travel time. The team’s hope was to accelerate this process by parallelizing the algorithm on GPUs to accelerate the water flow simulation.

Water Redistribution and Routing (WRR) component flow
Figure 1: Water Redistribution and Routing (WRR) component flow

The team began by identifying and extracting the computation-intensive part of the code. They worked with mentors to port the code to the GPU using OpenACC but encountered some challenges. First, the PRIMA code had a sorting stage prior to the WRR component that needed to be abandoned to parallelize the code. Next, they discovered that each computing component is interdependent, so data was read and written to the neighboring cells simultaneously, causing incorrect output. Numerical Simlab changed the WRR algorithm to write data only to the center cell and used a sliding grid to divide the input matrix into slices to resolve the read/write conflict issues.

Numerical Simlab ran tests with a small data set using different variations of OpenACC clauses to gauge if they could achieve a speedup:  Using #pragma acc parallel they achieved a 6X speedup compared to the CPU run on a single CPU core and optimized further with gang and worker clauses to achieve a 41.7X speedup. Testing on a larger system yielded a 171.3X speedup by introducing gang and worker clauses and showed good scaling potential. At the conclusion of the hackathon, the team went from a serial code and no previous experience with OpenACC to a parallelized code running on GPUs and a go-forward plan to continue fine-tuning the WRR algorithm for multi-GPUs to increase parallel scale.

OpenACC gang and worker clause results
Figure 2: Test results for big systems using gang and worker clauses.

Can Deep Learning Help Save Sea Birds?

Climate change and human activity have resulted in massive population declines of many bird species. Extensive monitoring must be done to understand which species are in distress and identify worrying population trends. Unfortunately, many types of birds—particularly seabirds—live in remote locations that are difficult and expensive to monitor.

Genome sequencing has provided a more efficient method for monitoring since population dynamics leave recognizable signatures, a sort of fingerprint in the genome; but many different types of events leave similar patterns. GeneTiX, a team of conservation geneticists from Queen’s University, wondered: Can deep neural networks trained on simulated genetic data identify the events responsible for generating contemporary genetic patterns? They came to the hackathon with the goal of optimizing the training of a convolutional neural network using simulated avian genomic data to estimate recent evolutionary history. 

Although they had very little GPU experience, the team worked with mentors on NEvolve, a Python-based application using PyTorch, and focused their efforts on parallelizing the generation of simulated genetic data; converting this data into PyTorch tensors, a format that can easily be input into the neural network; and training and tuning the network.

By the end of the event, GeneTiX was able to realize their first working version of a simulation and training pipeline. They were able to generate a data set of approximately 100,000 simulations in 30 minutes and train the network in 11.5 minutes where before the event, the data conversion would not complete. Additionally, the team not only learned how to assess bottlenecks in GPU pipelines through GPU- and CPU-profiling tools, but also gained hands-on skills with a bevy of new tools, including cuDF, NVIDIA Nsight Systems, and Docker among others.

Fun with Physics

Two teams came to the SFU Hackathon with physics applications. The Swarm Canadian EFI Working Group from the University of Calgary worked on a collision-based extension of a general-purpose Monte Carlo N-Particle (MCNP) transport code they utilize for the photoelectron transport in the earth’s magnetic field. Since most general transport codes do not have the ability to trace particles in the presence of the realistic geomagnetic field, the team’s hope was to create an extension that could be easily embedded into any general transport code to execute parallel calculations of a (realistic) electron trajectory between collisions.

Team Swarm was able to achieve an 8X speedup by implementing a sliding grid to speedup the geomagnetic field calculations and using CUDA to offload the field calculations to GPUs. Overall speedup achieved for the application was 18X. “Our mentor’s time was very precious to us because he saved us months of work that would have occurred if we had tried and tested different approaches on our own,” said Alexei Kouznetsov, Swarm team member.  “We are planning on using the work we did at the hackathon as a prototype for all future work on the project.”

The international team Larnd-Sim worked on the Liquid Argon Near Detector Simulator (Larnd-Sim) which simulates the electrical signals of charged particles inside a liquid Argon Time Projection Chamber (TPC).  They were able to achieve a 10X speedup using CuPy and optimizing their threading structure to  better match the layout of their data in memory. 

Teamwork Makes the Dream Work

Named for the Canadian Centre for Climate Modeling and Analysis, Team CCCma’s goal was to parallelize the single most expensive portion of the atmospheric routines in their Canadian Atmospheric Model (CanAM5), a part of the Canadian Earth System Model version 5 (CanESM5.0.3), using GPUs.  The radiative transfer portion of CanAM5 takes 20 percent of the overall computational time so the team dedicated their efforts to those parts. Using OpenACC, Team CCCma was able to accelerate different subroutines of the radiative transport module between 2.5 to 12X.

Computational fluid mechanics team Mind Your Boltzmanners focused on Hipersol CFD, a lattice-Boltzmann solver for turbulent flows that features an adaptive mesh refinement (AMR) technique. The team wanted to extend the solver parallelism to GPUs or hybrid platforms (MPI+GPU), and concentrated on porting and optimizing the streaming and collide kernels of the code. At the conclusion of the event, they were able to port the kernels to GPUs, optimize the code for increased performance using OpenACC, and formulate a future roadmap for development.

Team Computational Hydrosystems worked on the Mesh-free PARticle Simulator (MPARS), an open-source mesh-free solver for simulation of fluid flows based on the Moving Particle Semi-implicit method. Originally written in C++, the code was partially accelerated using CUDA-C and the team was intent on further optimizing the GPU-accelerated parts of the code. Computational Hydrosystems employed several optimization techniques, including reducing repetitive computations inside kernels, merging certain kernel functions into a single kernel to reduce access to global memory, implementing a reordering algorithm to avoid non-coalesced memory access in complex problems with unordered allocation of point data, and modifying a frequently used device function. Running scenarios with different variations of techniques, the team was able to achieve speed-up ranges from 4.4 to 20X.

Although all the teams worked within different disciplines of science and with different code across multiple programming languages, one outcome was the same: They learned quite a lot across identifying bottlenecks, trying different tools and overall GPU programming.

“I hope the participants enjoyed working with the SFU developers, exploring SFU’s supercomputer CEDAR and its infrastructure, and that they came away with new knowledge and connections—not just from the technology point of view but also from the people perspective,” added Popowich.

Type

Author(s)

Izumi Barker
Izumi Barker

Izumi Barker is a program manager for GPU hackathons and bootcamps at NVIDIA and public relations director for OpenACC-Standard Organization, bringing more than twenty years of experience in communications, strategic marketing, and product management. Prior to her roles at NVIDIA and OpenACC Organization, Izumi held positions across multiple industries including University of Phoenix under Apollo Education Group, Cengage Learning, Bio-Rad Laboratories, Annual Reviews, Cystic Fibrosis Foundation, Ernst & Young, LLP, as well as several start-ups.