GPU training 2022

Computationally intensive CUDA® C++ applications in high-performance computing are becoming a staple in the armory of researchers. DiRAC is offering a series of courses to help our researchers to advance their codes and their own skills. These three code camps will take you from fundamentals and over the next few months eventually show you how to harness the power of multi GPU cards on multiple nodes.

We would be pleased if you join us on the journey.

If you have any other questions or requests for other training please contact: richard.regan@durham.ac.uk

Scaling CUDA C++ Applications to Multiple Nodes

Online Event

TBA 9am to 5pm

Present-day high-performance computing (HPC) can benefit from cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilise GPUs across a cluster requires a distinct set of skills. In this code camp, you’ll learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.

You’ll do this by working on code from several CUDA C++ applications in an interactive cloud environment backed by several NVIDIA GPUs. You’ll gain exposure to a handful of multi-GPU programming methods, including CUDA-aware Message Passing Interface (MPI), before proceeding to the main focus of this course, NVSHMEM™.

NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM’s asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.

Overview

At the conclusion of the code camp, you’ll have an understanding of the tools and techniques for multi GPU-accelerating C/C++ applications with CUDA on multi nodes and be able introduced to:

  • Several methods for writing multi-GPU CUDA C++ applications
  • Use a variety of multi-GPU communication patterns and understand their tradeoffs
  • Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
  • Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
  • Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. Intermediate experience writing CUDA C/C++ applications is assumed. Participants must have passed or attended one of the recent code camp:

About the Instructor

Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of the National Technical University of Athens (NTUA). Since October 2020, she has been working as an HPC Consultant at the Research Computing Services (RCS) of the University of Cambridge. She is also pursuing a PhD degree with the parallel systems research group of CSlab@ECE/NTUA under the supervision of Professor Georgios Goumas and holds publications in top-tier journals and conferences in the area of HPC. She is also an NVIDIA DLI Ambassador at the University of Cambridge.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by TBA.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

Closing date is TBA

Accelerating CUDA C++ Applications with Multiple GPUs

Online Event

TBA 9am to 1pm

Computationally intensive CUDA® C++ applications in high-performance computing can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computations can be scaled across multiple GPUs without increasing the cost of memory transfers. For projects with access to multi-GPU servers, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it’s important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes. 

This code camp covers how to write CUDA C++ applications that efficiently and correctly utilise all available GPUs in a single node, dramatically improving the performance of your applications and making the most cost-effective use of systems with multiple GPUs.

Overview

At the conclusion of the code camp, you’ll have an understanding of the tools and techniques for multi GPU-accelerating C/C++ applications with CUDA on a single node and be able to:

  • Use concurrent CUDA streams to overlap memory transfers with GPU computation
  • Utilise all available GPUs on a single node to scale workloads across all available GPUs
  • Combine the use of copy/compute overlap with multiple GPUs
  • Rely on the NVIDIA Nsight Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. Working knowledge of CUDA programming is assumed. Participants must have passed or attended the recent code camp:

About the Instructor

Richard Regan is the DiRAC Systems Manager at Durham University. As System Manager, is involved in the procurement, installation, and configuration of the COSMA HPC systems at Durham. He is also the Training Manager for DiRAC and is responsible for all training events including the essentials training and the hackathon program. DiRAC is a national service that gives free HPC access to the astronomy, cosmology, high energy physics, and particle physics research communities.

Richard believes that the path to efficient research is through education, and maximising the efficiency of your code through good design.

Richard was trained as a digital engineer and then worked as a software engineer for over 15 years with companies such as British Steel, Rolls Royce, and Ingenico Futronic. He then spent 8 years teaching software engineering and discovering the joys of e-learning before joining the Institute of Computational Cosmology at Durham University. At the ICC he is part of the HPC support team, and for the last 5 years has been helping to steer the training for the DiRAC community as its Training Manager.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by TBA.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

Closing date is TBA

Fundamentals of Accelerated Computing with CUDA C/C++

Online Event

12 July 9am to 5pm

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. DiRAC users have two large GPU clusters they can use, and several small development systems exist in all our sites.

This code camp teaches the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA® and is your first step into accelerating your application with Nvidia GPUs.

Overview

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. This course will teach C/C++ application acceleration using techniques such as:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work

Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. No previous knowledge of CUDA programming is assumed.

Tools, Libraries, and Frameworks Used

About the Instructor

Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of the National Technical University of Athens (NTUA). Since October 2020, she has been working as an HPC Consultant at the Research Computing Services (RCS) of the University of Cambridge. She is also pursuing a PhD degree with the parallel systems research group of CSlab@ECE/NTUA under the supervision of Professor Georgios Goumas and holds publications in top-tier journals and conferences in the area of HPC. She is also an NVIDIA DLI Ambassador at the University of Cambridge.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by 8 July.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

Closing date is the 6th of July

Fundamentals of Accelerated Computing with CUDA C/C++ Course Registration June 2021
At the end of the course these will be feedback survey. Are you willing to participate in this.

AMD Induction Training (AIT)

13th October 2021 (Registration is closed)

Introduction

This is a hands on workshop where participants will use code snippets to illustrate the best approach to accessing the power of  our new AMD ROME CPUs.

The workshop will cover:

  • The ROME Microarchitecture & Memory channels
  • Compilation, & optimization
  • Numa regions & Pinning
  • Maths and Scientific Libraries
  • uProf a new profiler

Format

The virtual workshop will be run on multiple DiRAC sites so you will be able to experience exactly what you need to do on the system you use.

  • Support will be there from AMD and the local technical support teams.
  • A shared slack channel will be available to ask questions, highlight any issues and share good practice between sites.

Target Audience

The target audience is researchers:

  • Who are interested in building and running their code on our new DiRAC-3 AMD systems.
  • Who want to run their code efficiently to get the best performance.
  • Who want to take advantage of the new AMD features and tools.

Requirements

Only basic experience of a language is required C/C++, Fortran, or Python.


Registration

Closing date is the 6th of October. Now closed

N Ways to GPU Programming

30th Sep 2021, full day course, TBA

Learning Objectives

With the release of NVIDIA CUDA in 2007, different approaches to GPU programming have evolved. Each approach has its own advantages and disadvantages. By the end of this bootcamp session, participants will have a broader perspective on GPU programming approaches to help them select a programming model that better fits their application’s needs and constraints. The bootcamp will teach how to accelerate a real-world scientific application using the following methods:

  • Standard: C++ stdpar, Fortran Do-Concurrent
  • Directives: OpenACC, OpenMP
  • Programming Language Extension: CUDA C, CUDA Fortran.

Bootcamp Outline

During this lab, we will be working on porting mini applications in Molecular Simulation (MD) domain to GPUs. You can choose to work with either version of this application.

Bootcamp Duration

The lab material will be presented in an 8-hour session. A Link to the material is available for download at the end of the lab.

Content Level

Beginner, Intermediate

Target Audience and Prerequisites

The target audience for this lab are researchers/graduate students and developers who are interested in learning about various ways of GPU programming to accelerate their scientific applications.

Basic experience with C/C++ or Fortran programming is needed. No GPU programming knowledge is required.

Registration

Registration is closed.

CUDA C/C++ CodeCamp – Itinerary

The CUDA C/C++ program

will follow the standard NVIDIA one day course “Fundamentals of Accelerated Computing C/C++”. This course explores the structure of a Nvidia GPU, how to replace standard C/C++ methods with GPU kernels, and first steps into optimizing your code to get the best out of the GPU. You will learn how to:

  • Write code to be executed by a GPU accelerator
  • Expose and express data and instruction-level parallelism in C/C++ applications using CUDA
  • Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching
  • Leverage command line and visual profilers to guide your work
  • Utilize concurrent streams for instruction-level parallelism
  • Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach

Upon completion, you’ll be able to use CUDA to compile and launch CUDA kernels to accelerate your C/C++ applications on NVIDIA GPUs.

It is expected that all participants have experience programming of C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations.

Day

10:00 am Accelerating Applications with CUDA C/C++

12:00 pm Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof

13:00 pm Lunch

14:00 pm Continue Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof

15:00 pm Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++

17:00 Course End

CUDA C/C++ CodeCamp

Who will attend

There will be a mix of local/DiRAC researchers, supported by are DLI registered trainer. All you need is a laptop and a willingness to learn.

Image result for NVIDIA trainer

Venue

The hackathon is located in the beautiful city of Durham at the Ogden Centre, Durham University, South Road DH1 3LE. The event will be held in room OCW017.

Accommodation

If required Individuals will have to pay for the rooms themselves.

We would recommend hotel below:

travelodge Durham

Staying at this venue is highly preferred, since it maximises networking opportunities and ensure all participants can get the most out of the event.

Travel & Meals

Participants are expected to cover their travel and meal costs.

Taxi

Carefree
0754 034 2450
Paddy’s Taxis
0191 386 6662
Sherburn Taxis
0191 372 3388

Important dates

  • 2nd March application deadline.
  • 17th March Event Welcome

Contact

If you need any addition information, please do not hesitate to contact DiRAC’s Training Manager: Richard Regan.

  • Tel: 0191 3343632
  • email: richard.regan@durham.ac.uk

GPU CodeCamp Itinerary

Due to the varied group of participants it has been decided that the first day will focus on CUDA python and the second day on C/C++ with OpenACC.

The CUDA python program

will follow the standard NVIDIA one day course “Fundamentals of Accelerated Computing with CUDA Python”. This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:

  • Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
  • Use Numba to create and launch custom CUDA kernels.
  • Apply key GPU memory management techniques.

Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

The OpenACC program

will follow the Linux Academy “Introduction to OpenACC – NVIDIA OpenACC Online Lab” .

OpenACC.org, Amazon Web Services, NVIDIA, and Linux Academy have organized the Introduction to OpenACC lab. This lab consists of three instructor-led classes that include interactive lectures, dedicated Q&A sessions, and hands-on exercises. The lab covers analyzing performance, parallelizing, and optimizing code.

Experience programming in C, C++, or Fortran is helpful but not required. You do not need any prior experience with OpenACC directives or GPU programming to complete this lab.

Day 1

10:00 am Introduction to CUDA with Numba

12:30 pm Lunch

13:30 pm Custom CUDA Kernels in Python with Numba

15:30 pm Multidimensional Grids and Shared Memory for CUDA Python with Numba

19:30 Evening Meal

Day 2

09:00 am Introduction to OpenACC – NVIDIA OpenACC Online Lab

12:30 Lunch

13:30 pm Participants work on their codes

16:30 pm Feedback

17:00 pm The End

GPU CodeCamp Event Information

Who will attend

There will be a mix of local/DiRAC researchers, supported by a NVidia trainer. All you need is a laptop and a willingness to learn.

Image result for NVIDIA trainer

Venue

The hackathon is located in the beautiful city of Durham at the Ogden Centre, Durham University, South Road DH1 3LE. The event will be held in room OCW017.

Accommodation

DiRAC will support any DiRAC researchers wishing to attend this even by paying for their accommodation. Individuals will pay for the room themselves and then claim the cost back. DiRAC will only accept accommodation and breakfast costs from the approved hotel below:

travelodge Durham

Staying at this venue is highly preferred, since it maximises networking opportunities and ensure all participants can get the most out of the event.

Travel & Meals

Participants are expected to cover their travel.

Taxi

Carefree
0754 034 2450
Paddy’s Taxis
0191 386 6662
Sherburn Taxis
0191 372 3388

As sponsor NVidia has agreed to cover the costs of all meals during this event.

Important dates

  • 22nd November application deadline.
  • 25th November successful applicants will get a n email confirming there place at the event.
  • 11th December Event Welcome

Contact

If you need any addition information, please do not hesitate to contact DiRAC’s Training Manager: Richard Regan.

  • Tel: 0191 3343632
  • email: richard.regan@durham.ac.uk