Online Event
TBA 9am to 5pm
Present-day high-performance computing (HPC) can benefit from cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilise GPUs across a cluster requires a distinct set of skills. In this code camp, you’ll learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.
You’ll do this by working on code from several CUDA C++ applications in an interactive cloud environment backed by several NVIDIA GPUs. You’ll gain exposure to a handful of multi-GPU programming methods, including CUDA-aware Message Passing Interface (MPI), before proceeding to the main focus of this course, NVSHMEM™.
NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM’s asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.
Overview
At the conclusion of the code camp, you’ll have an understanding of the tools and techniques for multi GPU-accelerating C/C++ applications with CUDA on multi nodes and be able introduced to:
- Several methods for writing multi-GPU CUDA C++ applications
- Use a variety of multi-GPU communication patterns and understand their tradeoffs
- Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
- Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
- Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges
Prerequisites
Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. Intermediate experience writing CUDA C/C++ applications is assumed. Participants must have passed or attended one of the recent code camp:
- Fundamentals of Accelerated Computing with CUDA C/C++
- Accelerating CUDA C++ Applications with Multiple GPUs
About the Instructor
Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of the National Technical University of Athens (NTUA). Since October 2020, she has been working as an HPC Consultant at the Research Computing Services (RCS) of the University of Cambridge. She is also pursuing a PhD degree with the parallel systems research group of CSlab@ECE/NTUA under the supervision of Professor Georgios Goumas and holds publications in top-tier journals and conferences in the area of HPC. She is also an NVIDIA DLI Ambassador at the University of Cambridge.
Registering Your Interest
There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.
After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by TBA.
This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.
Closing date is TBA