Fundamentals of Accelerated Computing with CUDA C/C++
Overall, this course equips participants with the foundation skills and knowledge required to write efficient CUDA code, manage GPU memory, and leverage the full potential of GPU acceleration for their applications. The fundamentals course covers the following key topics:
CUDA Memory Management: Users learn the essential CUDA memory management techniques that are crucial for optimising accelerated applications.
Concurrency and CUDA Streams: The course also explores how to identify and harness the potential for concurrency within accelerated applications and how to leverage CUDA streams to exploit this concurrency effectively.
Profiling Tools: Users also gain proficiency in using both command line and visual profiling tools. Profiling is a critical step for monitoring and optimizing CUDA code, ensuring that it performs as expected.
Accelerating CUDA C++ Applications with Multi-Card GPUs
The Multi-Card CUDA course builds upon the fundamentals course and is designed to empower users to fully harness the capabilities of multiple GPU cards within a single node. This advanced course addresses the primary challenges of managing memory transfers and concurrency and covers:
Concurrent CUDA Streams: Users learn how to use concurrent CUDA streams effectively to overlap memory transfers with GPU computation and optimise the utilisation of the GPUs.
Scaling Workloads: The course provides insights into utilizing all available GPUs on a single node to efficiently scale workloads across these GPUs, allowing for increased parallel processing and faster computations.
Copy/Compute Overlap: Users explore how to combine the use of copy/compute overlap techniques with multiple GPUs for improved performance and efficiency.
Nsight™ Systems Visual Profiler: The course teaches how to use the NVIDIA Nsight™ Systems Visual Profiler timeline to observe opportunities for improvement and assess the impact of the techniques covered during the workshop.