AMD Tools Workshop

28th-30th November 2022

Introduction

This is a hands-on workshop where participants will apply presented techniques to their own codes.

The workshop will cover:

  • Optimal Pinning processes for the AMD architecture
  • AMD Compilers and libraries
  • μProf tutorial

Format

This will be an in-person workshop, held in Durham. Participants will use their normal DiRAC site.

The workshop will run over 3 days. Each day will normally start introducing a topic which is then followed by a lengthy practical session where participants have an opportunity to apply what was taught to their code. Support will be there from AMD and from DiRAC support teams. During the event, there will be feedback and Q&A sessions to help spread good practice and address any issues.

On the afternoon of the 3-days, teams will be given time to develop a presentation that will be presented by themselves at DiRAC day on the 8th of December at UCL.


Target Audience

The target audience are researchers who want to:

  • Optimising their code for today and tomorrow on AMD CPUs.
  • Get the most out of our new DiRAC-3 AMD systems.
  • Run their code efficiently to get the best performance.
  • Learn about and take advantage of the new AMD features and tools.
  • Build links with other research groups and AMDs technical team.

Requirements

At least one member of a team needs good experience in a programming language. Also in-depth knowledge of your own codes. Technical support can be arranged if required.


Registration

Registration closed.

GPU training 2022

Computationally intensive CUDA® C++ applications in high-performance computing are becoming a staple in the armory of researchers. DiRAC is offering a series of courses to help our researchers to advance their codes and their own skills. These three code camps will take you from fundamentals and over the next few months eventually show you how to harness the power of multi GPU cards on multiple nodes.

We would be pleased if you join us on the journey.

If you have any other questions or requests for other training please contact: richard.regan@durham.ac.uk

Scaling CUDA C++ Applications to Multiple Nodes

Online Event

TBA 9am to 5pm

Present-day high-performance computing (HPC) can benefit from cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilise GPUs across a cluster requires a distinct set of skills. In this code camp, you’ll learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.

You’ll do this by working on code from several CUDA C++ applications in an interactive cloud environment backed by several NVIDIA GPUs. You’ll gain exposure to a handful of multi-GPU programming methods, including CUDA-aware Message Passing Interface (MPI), before proceeding to the main focus of this course, NVSHMEM™.

NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM’s asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.

Overview

At the conclusion of the code camp, you’ll have an understanding of the tools and techniques for multi GPU-accelerating C/C++ applications with CUDA on multi nodes and be able introduced to:

  • Several methods for writing multi-GPU CUDA C++ applications
  • Use a variety of multi-GPU communication patterns and understand their tradeoffs
  • Write portable, scalable CUDA code with the single-program multiple-data (SPMD) paradigm using CUDA-aware MPI and NVSHMEM
  • Improve multi-GPU SPMD code with NVSHMEM’s symmetric memory model and its ability to perform GPU-initiated data transfers
  • Get practice with common multi-GPU coding paradigms like domain decomposition and halo exchanges

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. Intermediate experience writing CUDA C/C++ applications is assumed. Participants must have passed or attended one of the recent code camp:

About the Instructor

Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of the National Technical University of Athens (NTUA). Since October 2020, she has been working as an HPC Consultant at the Research Computing Services (RCS) of the University of Cambridge. She is also pursuing a PhD degree with the parallel systems research group of CSlab@ECE/NTUA under the supervision of Professor Georgios Goumas and holds publications in top-tier journals and conferences in the area of HPC. She is also an NVIDIA DLI Ambassador at the University of Cambridge.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by TBA.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

Closing date is TBA

Accelerating CUDA C++ Applications with Multiple GPUs

Online Event

16th August 2022 9am to 1pm

* CANCELED by NVIDIA *

Computationally intensive CUDA® C++ applications in high-performance computing can be accelerated by using multiple GPUs, which can increase throughput and/or decrease your total runtime. When combined with the concurrent overlap of computation and memory transfers, computations can be scaled across multiple GPUs without increasing the cost of memory transfers. For projects with access to multi-GPU servers, these techniques enable you to achieve peak performance from GPU-accelerated applications. And it’s important to implement these single-node, multi-GPU techniques before scaling your applications across multiple nodes. 

This code camp covers how to write CUDA C++ applications that efficiently and correctly utilise all available GPUs in a single node, dramatically improving the performance of your applications and making the most cost-effective use of systems with multiple GPUs.

Overview

At the conclusion of the code camp, you’ll have an understanding of the tools and techniques for multi GPU-accelerating C/C++ applications with CUDA on a single node and be able to:

  • Use concurrent CUDA streams to overlap memory transfers with GPU computation
  • Utilise all available GPUs on a single node to scale workloads across all available GPUs
  • Combine the use of copy/compute overlap with multiple GPUs
  • Rely on the NVIDIA Nsight Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. Working knowledge of CUDA programming is assumed. Participants must have passed or attended the recent code camp:

About the Instructor

Richard Regan is the DiRAC Systems Manager at Durham University. As System Manager, is involved in the procurement, installation, and configuration of the COSMA HPC systems at Durham. He is also the Training Manager for DiRAC and is responsible for all training events including the essentials training and the hackathon program. DiRAC is a national service that gives free HPC access to the astronomy, cosmology, high energy physics, and particle physics research communities.

Richard believes that the path to efficient research is through education, and maximising the efficiency of your code through good design.

Richard was trained as a digital engineer and then worked as a software engineer for over 15 years with companies such as British Steel, Rolls Royce, and Ingenico Futronic. He then spent 8 years teaching software engineering and discovering the joys of e-learning before joining the Institute of Computational Cosmology at Durham University. At the ICC he is part of the HPC support team, and for the last 5 years has been helping to steer the training for the DiRAC community as its Training Manager.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by TBA.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

Sorry Event Canceled

Fundamentals of Accelerated Computing with CUDA C/C++

Online Event

12 July 9am to 5pm

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. DiRAC users have two large GPU clusters they can use, and several small development systems exist in all our sites.

This code camp teaches the fundamental tools and techniques for accelerating C/C++ applications to run on massively parallel GPUs with CUDA® and is your first step into accelerating your application with Nvidia GPUs.

Overview

The CUDA computing platform enables the acceleration of CPU-only applications to run on the world’s fastest massively parallel GPUs. This course will teach C/C++ application acceleration using techniques such as:

  • Accelerating CPU-only applications to run their latent parallelism on GPUs
  • Utilizing essential CUDA memory management techniques to optimize accelerated applications
  • Exposing accelerated application potential for concurrency and exploiting it with CUDA streams
  • Leveraging command line and visual profiling to guide and check your work

Upon completion, you’ll be able to accelerate and optimize existing C/C++ CPU-only applications using the most essential CUDA tools and techniques. You’ll understand an iterative style of CUDA development that will allow you to ship accelerated applications fast.

Prerequisites

Basic C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations. No previous knowledge of CUDA programming is assumed.

Tools, Libraries, and Frameworks Used

About the Instructor

Athena Elafrou is a graduate of the Electrical and Computer Engineering (ECE) School of the National Technical University of Athens (NTUA). Since October 2020, she has been working as an HPC Consultant at the Research Computing Services (RCS) of the University of Cambridge. She is also pursuing a PhD degree with the parallel systems research group of CSlab@ECE/NTUA under the supervision of Professor Georgios Goumas and holds publications in top-tier journals and conferences in the area of HPC. She is also an NVIDIA DLI Ambassador at the University of Cambridge.

Registering Your Interest

There are a limited number of places available for this course. Your application will be treated as an expression of interest and you are not guaranteed a place at the workshop.

After the application deadline has passed, submissions will be considered, and successful applicants will be offered a place by 8 July.

This event is primarily open to those using one of the DiRAC facilities, but others will be considered if space allows.

REGISTRATION IS CLOSED

AMD Induction Training (AIT)

29th September 2022

Sorry registration is closed.

Introduction

This is a hands-on workshop where participants will use code snippets to illustrate the best approach to accessing the power of our new AMD ROME CPUs based at Durham & Leicester. This training will also be useful to anyone using the AMD CPUs on the GPU systems at Cambridge and Edinburgh.

The workshop will cover:

  • The ROME Microarchitecture & Memory channels
  • Pinning processes
  • AMD Compilers and libraries
  • MPI considerations
  • uProf tutorial
  • Hands-on examples spread throughout the day

Format

The virtual workshop will be run on multiple DiRAC sites so you will be able to experience exactly what you need to do on the system you use.

  • Support will be there from AMD and the local technical support teams.
  • A shared slack channel will be available to ask questions, highlight any issues and share good practice between sites.

Target Audience

The target audience is researchers:

  • Those who are interested in building and running their code on our new DiRAC-3 AMD systems.
  • Those who want to run their code efficiently to get the best performance.
  • Those who want to take advantage of the new AMD features and tools.

Requirements

Only basic experience of a language is required C/C++, or Python.


Registration

Sorry registration is closed.

N Ways to GPU Programming

30th Sep 2021, full day course, TBA

Learning Objectives

With the release of NVIDIA CUDA in 2007, different approaches to GPU programming have evolved. Each approach has its own advantages and disadvantages. By the end of this bootcamp session, participants will have a broader perspective on GPU programming approaches to help them select a programming model that better fits their application’s needs and constraints. The bootcamp will teach how to accelerate a real-world scientific application using the following methods:

  • Standard: C++ stdpar, Fortran Do-Concurrent
  • Directives: OpenACC, OpenMP
  • Programming Language Extension: CUDA C, CUDA Fortran.

Bootcamp Outline

During this lab, we will be working on porting mini applications in Molecular Simulation (MD) domain to GPUs. You can choose to work with either version of this application.

Bootcamp Duration

The lab material will be presented in an 8-hour session. A Link to the material is available for download at the end of the lab.

Content Level

Beginner, Intermediate

Target Audience and Prerequisites

The target audience for this lab are researchers/graduate students and developers who are interested in learning about various ways of GPU programming to accelerate their scientific applications.

Basic experience with C/C++ or Fortran programming is needed. No GPU programming knowledge is required.

Registration

Registration is closed.

CUDA C/C++ CodeCamp – Itinerary

The CUDA C/C++ program

will follow the standard NVIDIA one day course “Fundamentals of Accelerated Computing C/C++”. This course explores the structure of a Nvidia GPU, how to replace standard C/C++ methods with GPU kernels, and first steps into optimizing your code to get the best out of the GPU. You will learn how to:

  • Write code to be executed by a GPU accelerator
  • Expose and express data and instruction-level parallelism in C/C++ applications using CUDA
  • Utilize CUDA-managed memory and optimize memory migration using asynchronous prefetching
  • Leverage command line and visual profilers to guide your work
  • Utilize concurrent streams for instruction-level parallelism
  • Write GPU-accelerated CUDA C/C++ applications, or refactor existing CPU-only applications, using a profile-driven approach

Upon completion, you’ll be able to use CUDA to compile and launch CUDA kernels to accelerate your C/C++ applications on NVIDIA GPUs.

It is expected that all participants have experience programming of C/C++ competency including familiarity with variable types, loops, conditional statements, functions, and array manipulations.

Day

10:00 am Accelerating Applications with CUDA C/C++

12:00 pm Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof

13:00 pm Lunch

14:00 pm Continue Managing Accelerated Application Memory with CUDA C/C++ Unified Memory and nvprof

15:00 pm Asynchronous Streaming, and Visual Profiling for Accelerated Applications with CUDA C/C++

17:00 Course End

CUDA C/C++ CodeCamp

Who will attend

There will be a mix of local/DiRAC researchers, supported by are DLI registered trainer. All you need is a laptop and a willingness to learn.

Image result for NVIDIA trainer

Venue

The hackathon is located in the beautiful city of Durham at the Ogden Centre, Durham University, South Road DH1 3LE. The event will be held in room OCW017.

Accommodation

If required Individuals will have to pay for the rooms themselves.

We would recommend hotel below:

travelodge Durham

Staying at this venue is highly preferred, since it maximises networking opportunities and ensure all participants can get the most out of the event.

Travel & Meals

Participants are expected to cover their travel and meal costs.

Taxi

Carefree
0754 034 2450
Paddy’s Taxis
0191 386 6662
Sherburn Taxis
0191 372 3388

Important dates

  • 2nd March application deadline.
  • 17th March Event Welcome

Contact

If you need any addition information, please do not hesitate to contact DiRAC’s Training Manager: Richard Regan.

  • Tel: 0191 3343632
  • email: richard.regan@durham.ac.uk

GPU CodeCamp Itinerary

Due to the varied group of participants it has been decided that the first day will focus on CUDA python and the second day on C/C++ with OpenACC.

The CUDA python program

will follow the standard NVIDIA one day course “Fundamentals of Accelerated Computing with CUDA Python”. This course explores how to use Numba—the just-in-time, type-specializing Python function compiler—to accelerate Python programs to run on massively parallel NVIDIA GPUs. You’ll learn how to:

  • Use Numba to compile CUDA kernels from NumPy universal functions (ufuncs).
  • Use Numba to create and launch custom CUDA kernels.
  • Apply key GPU memory management techniques.

Upon completion, you’ll be able to use Numba to compile and launch CUDA kernels to accelerate your Python applications on NVIDIA GPUs.

The OpenACC program

will follow the Linux Academy “Introduction to OpenACC – NVIDIA OpenACC Online Lab” .

OpenACC.org, Amazon Web Services, NVIDIA, and Linux Academy have organized the Introduction to OpenACC lab. This lab consists of three instructor-led classes that include interactive lectures, dedicated Q&A sessions, and hands-on exercises. The lab covers analyzing performance, parallelizing, and optimizing code.

Experience programming in C, C++, or Fortran is helpful but not required. You do not need any prior experience with OpenACC directives or GPU programming to complete this lab.

Day 1

10:00 am Introduction to CUDA with Numba

12:30 pm Lunch

13:30 pm Custom CUDA Kernels in Python with Numba

15:30 pm Multidimensional Grids and Shared Memory for CUDA Python with Numba

19:30 Evening Meal

Day 2

09:00 am Introduction to OpenACC – NVIDIA OpenACC Online Lab

12:30 Lunch

13:30 pm Participants work on their codes

16:30 pm Feedback

17:00 pm The End