Lead by Dr Richard Booth
FARGO is a GPU-enabled hydrodynamics code designed specifically for studying accretion discs. In 2016 the team extended FARGO to model the dynamics of gas coupled to multiple dust species — treated as a pressure-less fluid — to model the effects of planets on the dust surface density in discs. This has been enormously successful for interpreting the structures seen in numerous images of protoplanetary discs from the Atacama Large Millimeter Array (ALMA). The aim of this project was to extend the code to also treat the evolution of dust due to coagulation and fragmentation by porting an existing dust evolution code to run on GPUs.
Modeling dust evolution requires solving a coupled set of integro-differential equations that govern the growth and fragmentation of dust grains at each location in space. These equations are solved by discretizing the dust distribution into 100-200 species at each location in the disc. Implicit time integration was used, requiring the solution of a small linear system for each grid point in the simulation. The large computational cost comes from the fact that this must be done for each cell (of order 10^6 for current applications) and at each time-step (i.e. ~ 10^4 times).
The dust evolution code had primarily been written by myself with help from Giovanni Rosotti and written in C++. The expected user base will be initially small, consisting of a few post-docs and students who were originally based in Cambridge and are currently working with the dust-enabled FARGO code. I traveled to the hackathon on my own with the aim of porting the core of the grain-growth code to run on GPUs, along with profiling and beginning to optimize the GPU implementation.
The hackathon began with discussing different options for porting the code with experts. We settled on using OpenACC to accelerate the core of the code, which computes the coagulation and fragmentation terms in the dust evolution equations. I undertook this task while Jarno Rantaharju looked into the second major part of the code that required porting to the GPU: the part of the code that does the time-integration. Since the time-integration relied on the Eigen linear algebra package to solve the linear systems associated with the implicit time-step we needed to find a replacement designed for running on the GPU. We settled on the cuSOLVER package, part of the NVIDIA Cuda Toolkit.
The port was completed successfully, with a working implementation of the grain growth module on the GPU. Tests for correctness showed that the GPU code was giving correct results and achieved a factor of ~3 speedup over the CPU version. However, profiling showed that there was considerable room for improvement. The small size of each individual problem was not large enough to fill the GPU. We discussed options for improving this, aiming to exploit the extra parallelism associated with running each sub-problem associated with the different cells at the same time, but ran out of time to complete the work during the hackathon.