3 days of code, coffee, & GPUs
On the 9th of September 2018, at Swansea University, the first DiRAC hackathon took place. This event gave DiRAC users an opportunity to explore the potential of GPU’s in pushing their science to the next level of parallel computing. Several teams of explorers set off on their GPU adventure aided by GPU trainer Wayne Gaudin from PGI Compilers and tools (Dev Tech) sponsored by Nvidia, and supported by Dr Ed Bennett from Swansea University, and Dr Jeffrey Salmond for Cambridge University.
The event was hosted at Swansea and facilitated by the DiRAC GPU systems hosted by Cambridge. DiRAC has 11 GPU machines which are free to use subject to a scientific project approval. The GPU system comprises of 11 node each containing 4 NVIDIA Tesla P100 GPUs. The NVIDIA Pascal architecture enables the Tesla P100 to deliver superior performance for HPC and hyperscale workloads. With more than 21 teraFLOPS of 16-bit floating-point (FP16) performance, Pascal is optimized to drive exciting new possibilities in deep learning applications. Pascal also delivers over 5 and 10 teraFLOPS of double- and single-precision performance for HPC workloads.
Hardware was not the only thing offered by DiRAC & Swansea, expertise was also in abundance, in the form of RSEs like Dr Jeffrey Salmond, Dr Mark Dawson, Dr Michele Mesiti, Dr Jarno Rantaharju, and Dr Chennakesava form Swansea, from DiRAC Cambridge Jeffrey Salmond and Matt Archer, and with Matthias Wagner from Nvidia.
Hackathons are a place where ideas mix and in September there was a great mix of AREGPU, CURSE, FARGO, GRID, TROVE, and The Mighty Atom. They came together to share ideas, experience, and develop good practice, but most of all to see if GPUs would be a good fit for their research. There was teams from all across the UK, these included Swansea University, University of Edinburgh, University of Cambridge and UK Atomic Energy Authority.
AREGPU: Is a moving-mesh cosmological hydro solver that is used in astrophysical simulations, e.g. galaxy formation. The team intended to modify the visualisation module, which ray-traces through the simulation volume to produce projections of physical quantities, and also parallelise the ray-tracing procedure, so that each GPU thread carries a ray.
CURSE: Written in Fortran with over 150k lines of code. An investigation into a GPU version written in C had been previously investigated by the PI where one of the main subroutines (the evaluation of the equation of state) was ported. The project aim was to build on this move more of the application over to the GPU, and make use of CUDA Fortran and OpenACC. This made the port possible without rewriting large parts of the code.
FARGO: The code solves a coupled set of integro-differential equations that govern the growth of dust grains, and was designed to allow easy coupling to hydrodynamic simulations. The target audience is people working on protoplanetary discs and planet formation. In this community it is becoming common to model the dynamics of gas and dust, although including growth is very much at the forefront of the field. The bulk of the numerical work consists of solving a small linear system (100 to 200 cells) for each grid point in the simulation, and in serial the cost of setting up each linear system and solving it are comparable. The high cost comes from the fact that this must be done for each cell (of order 10^6 for current applications) and at each time-step (i.e. ~ 10^4 times).
GRID: Lattice QCD code, comprising C++14 data parallel template engine layer and physics code. This library is current Plan of Record for USQCD DOE Exascale Computing Project for cross platform performance portability at the Exascale. It is around 100k lines of code.
TROVE: A GPU-diagonalizer for a double real, symmetric, dense (i.e. non-sparse), diagonal-dominated matrices, which would be very efficient for matrices with dimensions of the order N = 200,000-400,000 (and lower) and would work for N up to 1,000,000.
The Mighty Atom: Warp GPU Based Neutron Transport code from UC Berkley. Code was produced by a PhD student around 2-3 years ago and has suffered significant bit rot since.
With all these different areas of expertise, the teams came together on a cool morning in September.
How the day went
With all none local participants staying on site, the event started bright and early on Sunday the 9th of September. Gathered in the Wallace Building, introductions were made and objectives set for the 3 day GPU journey.
With the base knowledge of CUDA & openACC , two free courses from Nvidia, and re-enforced by Wayne all the teams started their first steps into the world of GPUs.
After a productive day with most groups successfully running their code on the GPU systems, everyone relaxed and discussed the day’s trials and triumphs with a pizza at Brewstone.
Monday was another 9am start, and with fruits, cakes and coffee the GPU experience was continued.
In the evening the weary travellers relaxed in the award winning luxurious boutique style hotel Morgans, where the exotic idea for GPUs mixed with the spices, and aromas of the beautifully prepared food, situated in a relaxed and unstuffy environment. With the aid of sweet desserts and a small amount of alcohol, the thought of what had been achieved and what was still to do dominated the conversations.
On the last day, with heads down and, gritted determination everyone focused on the final push to achieve each teams goal. This was not the focus of the lone worker, but the focus of a well trained team, working together, looking out for each other and supporting each other. This support was also give between teams, and was not just apparent on the last day, but was a continuous theme of the whole 3-day hackathon.
In the afternoon of the last day all teams gathered to present their achievement over the 3 days, highlighting problems, solutions found, and an expectation of where this would advance there research in the future, and prepare them for the machines of the future.
On the last evening there was a celebration of what was achieved, not a big fanfare, but a relaxed quite reflection of a job well done. Mark Wilkinson was there to welcome, gauge reaction to this first DiRAC Hackathon, and assess interest in GPUs possibly playing a bigger part in the upcoming DiRAC3 systems.
Neurons and Cores
This hackathon was not just for the ‘knowledgable’ ones, but also for the ‘I’ve done a bit’ ones, and the ‘would like to know’ ones. They all came with open minds and a very basic knowledge about GPUs. All participants stated that they would recommend the online training, there are some comments:
“The online material was great introduction, that gave us an idea of some of the key issues”
“Very useful, everthing well explained & liked interactive aspect”
At the end of the 3 days great strides had been achieved by all the teams:
AREGPU: The GPU-accelerated projection routine achieved a ~10x speedup compared with the original CPU code run on one core.
Fargo: Successfully ported the grain growth module to the GPU (the rest of the code already runs on the GPU). They got the code running successfully on the GPU, giving correct results. 3×speedup, still room to optimise memory management.
GRID: Successfully implemented summation across GPU threads (formerly host only), and looked at Nvidia thrust reductions. Assessed whether these were reproducible. They implemented the first cut in lib/lattice/Lattice_reduction.h
TROVE: Had varying GPU experience in the team, so sharing experience was important. The problem was intended primarily as a learning experience, which increased the confidence of working with ScaLAPACK, BLAS, cuBLAS and CUDA source. Using pair-programming, allowed unfamiliar libraries to be understood quickly, which reduced the iteration time of the build and testing process which was crucial. The build setup and workflows will be useful for future code development.
The Mighty Atom: Can now successfully (and robustly) run fission problems with modern nuclear data, and code changes were made for fusion problems (particles terminated correctly). Code can now run up to 1,000,000,000 particles, but interaction cross sections take memory.
All teams agreed that the 3 day GPU event will have an impact on their research.
Looking to the Future
The hackathon was a great success, and apparently the participants agreed, with all participants stating that they would expect to use what they learnt in the future, and every one reporting the hackathon was a good or very good event, with comments like:
“Great experience, learned a lot”, “it was good fun”, and “GREAT FUN! GOOD EXPERIENCE!”
“We attended DiRAC’s “Nvidia GPU Hackathon” with limited knowledge — and almost no practical experience — of accelerating codes using GPUs. The training material provided by Nvidia gave us a broad view of using CUDA and OpenACC to achieve speed-ups using GPU hardware, but we really got to grips with it at the hackathon by getting stuck in and modifying code ourselves. Our goals were to gain some experience, and to try to speed-up a ray-tracing module used in the cosmological hydrodynamics code AREPO. Using OpenACC directives and PGI’s compiler, we eventually managed to gain a ~10x acceleration with a single GPU, compared to both single & multi CPU only runs. Alongside this acceleration we also gained an appreciation for the algorithmic regimes where GPUs are useful, and the important technical considerations associated with GPU programming. Since the hackathon we have put our new knowledge to use, modifying other codes to take advantage of GPU acceleration!”
Lewis H. Weinberger
After the success of the GPU hackathon, DiRAC will hold more of these events. DiRAC is here for the advancement of science, and to help you get your research to the next level. In the future there will be hackathons on different topics around the country. The next one is exploring ARM technology, coming soon in January 2019.
Watch this space for training, tweets and hackathons