We are a team of graduate students from the University of Cambridge, working in the numerical simulations group at the Institute of Astronomy. When we learned that there would be a three-day GPU hackathon before DiRAC day, we were all excited to apply.
None of us had any experience of using GPUs, but we had all heard about the significant accelerations they could provide. Alongside this two of our members had been frustrated with the speed of some code they used to analyse their hydrodynamical simulations. With the dual prospects of learning about GPU programming and accelerating this analysis code, we decided to submit a team application to the hackathon.
Prior to the event we were given some interactive training material to start learning how to use GPUs. The hackathon was sponsored by Nvidia, and so they provided tutorials for learning about CUDA and OpenACC as two possible ways to write GPU-aware code. CUDA is Nvidia’s API for interacting with a GPU; it’s the more low-level approach of the two, in which you control memory allocation, data transfer and kernel definition explicitly. OpenACC is a programming standard, similar to OpenMP, which allows a more high-level approach to GPU programming. Using directives you indicate to the compiler which regions you want to parallelise on the GPU, and then let the compiler optimise the low-level details.
We arrived on Sunday in Swansea ready to get to work. On the first day we had to decide on our acceleration strategy (CUDA, OpenACC, both?) as well as figuring out how best to modify the existing code. The code we were working on is a module of the simulation code AREPO that creates visualisations. AREPO’s main purpose is to run cosmological hydrodynamical simulations, for example modelling galaxy evolution. In order to visualise the outputs from AREPO, it can perform ray-tracing to create projected views of the simulation volume.
Our task was to transfer the simulation volume onto the GPU, so that it could perform the additive ray-tracing step massively parallel. We chose to use the OpenACC approach, and so we spent the rest of that day (and the next) carefully choosing our compiler directives. Our main challenge was to ensure the simulation data was transferred correctly to the GPU. It felt like we were battling the PGI compiler for the whole two days, until finally we had a breakthrough late on Monday. The code finally compiled, and we immediately tried a test run. Comparing to a reference time that we calculated for a CPU only run, the new GPU code ran ~10 times faster!
On the final day we did some further testing and profiling, and confirmed that indeed the ray tracing step was massively accelerated by running on the GPU. Furthermore the code now scaled very well with the desired image resolution. This speed-up allows us to now make rapid projections, useful not just for creating images but also movies of the simulation volume. We’d like to thank the organisers at DiRAC, our hosts at the University of Swansea, and the sponsor Nvidia for running a really fun hackathon!