Hardware devices called graphics processing units (GPUs) were originally developed for a specific purpose: To render images, animations, and video on computer screens.

However, GPUs are also attractive for solving computationally intensive problems — such as those in science and engineering fields — because they can process large amounts of data at the same time, or in parallel. Lately, GPUs for general-purpose computing have emerged as new platforms for accelerating computations that would traditionally be handled by central processing units (CPUs).

Currently, the U.S. Department of Energy (DoE) has several large-scale computing systems based on general-purpose GPUs (GPGPUs), including Summit at the Oak Ridge Leadership Computing Facility (OLCF) and the upcoming Perlmutter at the National Energy Research Scientific Computing Center (NERSC)—DoE Office of Science User Facilities at Oak Ridge and Lawrence Berkeley National Laboratories, respectively—and Sierra at Lawrence Livermore National Lab.

These supercomputers offer tremendous computing power for science and engineering applications, but software must be written to take full advantage of their GPGPU capabilities. Such programming requires a concerted effort among hardware developers, computer scientists, and domain scientists.

Schneider Bold

To facilitate such collaboration, the Computational Science Initiative (CSI) at DoE’s Brookhaven National Laboratory began hosting annual GPU hackathons three years ago. These CSI-hosted hackathons are part of the OLCF GPU hackathon series, which first began in 2014. Various institutions across the United States and abroad host hackathons throughout the year.

“It is great to see the energy in the room,” said hackathon organizing committee member and CSI computational scientist Meifeng Lin. “Everyone is completely absorbed in their codes, and there is a lot of interaction between the teams. This year, it has been interesting to see the teams who brought applications that have not traditionally run on GPUs or high-performance computing platforms—for example, machine learning (ML) and high-energy physics (HEP). Encouraging communities who are not used to working with GPUs is one of the goals of the hackathon series.”

Exploring the implementation of model parallelism for their deep learning code, from left to right are graduate student Jaeyeong Yang of Seoul National University in Korea, CSI research associate Yihui (Ray) Ren, and CSI senior technology engineer Abid Malik. Splitting the computations of a model across multiple devices would make the training of large machine learning models more practical.
Source: Brookhaven National Laboratory

This year’s hackathon was held from September 23 through 27 in partnership with Oak Ridge and the University of Delaware.

Throughout the five-day coding workshop, GPU programming experts from Brookhaven, Lawrence Livermore, Oak Ridge, NVIDIA, Boston University, Columbia University, Stony Brook University (SBU), and University of Tennessee, Knoxville, worked side by side with nine teams comprising users of large hybrid CPU-GPU systems.

The experts helped some teams with getting their scientific applications running on GPUs for the first time, and other teams with optimizing applications already running on GPUs. The teams’ applications spanned the fields of HEP (particle physics), astrophysics, chemistry, biology, ML, and geoscience.

Team’s Goal
One case had Team FastCaloSim coming to the hackathon with a code for simulating the ATLAS calorimeter. One of the largest particle physics experiments at CERN’s Large Hadron Collider (LHC) in Europe, ATLAS seeks to improve our understanding of the fundamental building blocks of matter and the forces that govern their interactions. The calorimeter is a detector that measures the energy of particles passing through. More than 1,000 particles fly through the detector after each collision, and all of them must be accurately simulated in order to reconstruct the event. Traditionally, such calorimeter simulations have been run on CPUs, taking a significant fraction of the total simulation time.

“We are now reaching the capacity of available computational power,” said CERN physicist Tadej Novak, one of the developers of the standalone FastCaloSim code. “Especially with the future upgrade of the LHC to a higher luminosity, we need to investigate new, modern methods to reduce the calorimeter simulation time while still getting accurate energies. By decoupling this code from the other ATLAS codes, we could learn how to make it run on GPUs more efficiently. We started with single particles of well-defined energies, and obtained comparable physics results between the CPU and GPU versions. The code was about three times faster on GPUs at the end of the week. Now we’re trying to run more realistic physics events and speed things up further.”

Hot QCD was another particle physics team. Quantum chromodynamics, or QCD, describes the theory of the force that holds together quarks and gluons — the elementary particles that make up protons, neutrons, and other particles.

“If you sit down at home, it would take months to achieve what we did in these five days with the help of our two mentors,” said Christian Schmidt, a senior researcher in the Theoretical High-Energy Physics Group at Bielefeld University in Germany.

Quarks and Gluons
Team Hot QCD’s code numerically simulates how elementary particles called quarks and gluons interact at very high temperatures and densities (plotting the interaction in a lattice with space and time on the axes). A quark-gluon plasma (QGP) was generated in the early universe seconds after the Big Bang, and today scientists recreate the QGP through collisions at the Relativistic Heavy Ion Collider (RHIC)—a DOE Office of Science User Facility at Brookhaven—and the LHC. Team Hot QCD’s code calculates thermodynamic properties of the QGP called conserved charge fluctuations. A comparison of these calculations with the experimental data can be used to probe the phase diagram of QCD matter.

The hackathon provided an opportunity for the team to explore QUDA, a library of algorithms for solving the lattice QCD physics equations on GPUs, with QUDA developers as their mentors.

“We hadn’t implemented these algorithms before,” said Schmidt. “In some cases, we saw an improved performance in our code; we were able to obtain more statistics with the same computing time. One of the challenges with the lattice QCD calculations is that there is a lot of noise, and collecting a higher number of statistics means we can calculate fluctuations of conserved charges more precisely.”

Machine learning has emerged as a powerful tool for efficiently analyzing large data sets collected over time and space, such as for climate and neuroscience studies.

Machine Learning
Team Model Parallelism for Spatio-Temporal Learning (MP-STL) and NERSC Extreme Science Applications Program (NESAP) Extreme-Scale STL (LSTNet) came to the hackathon with spatio-temporal deep learning algorithms to port and optimize on GPUs. Deep learning, a type of direct ML from raw data, is commonly used for classification tasks such as object detection.

“Our goal is to develop a model that can predict people’s fluid intelligence scores based on functional magnetic resonance imaging (fMRI) data obtained as tasks are completed by someone,” said Jaeyeong Yang, a graduate student in the Computational Clinical Science Laboratory at Seoul National University in Korea. “It is an open question in science whether this is possible or not. Currently, most image processing focuses on 2-D stationary images, such as photos of cats. The challenge in our case is that we have a large size of time-series 3-D fMRI images, more than 100 times larger than a regular photo.”

Yang and the other team members — who are focusing on whole genome sequence and brain segmentation data — will train scalable deep learning algorithms currently being developed by CSI computational scientist Shinjae Yoo. This genomics research leverages a GPU cluster at Brookhaven.

At the hackathon, Yoo and his collaborators from Brookhaven and NERSC explored using multiple GPUs at the same time to do ML-based multimodal analysis—particularly, combining different MRI brain imaging modalities with genetic data to study Alzheimer’s disease. They are using GPU nodes on Cori, an Intel Phi supercomputer at NERSC, for the multimodal brain imaging. The Alzheimer’s prediction code is the same as the one for fluid intelligence prediction but uses a different dataset and labels.

“This is a scalability challenge,” said Yoo. “By being able to analyze structural, diffusion, and functional MRI data in the context of genetic parameters, we expect better predictability of Alzheimer’s.”

Pin It on Pinterest

Share This