EXTREME SCALING
SERVICE (EDINBURGH)

EXTREME SCALING at edinburgH

System name: Tursa

Tursa is optimised for particle physics calculations probing the fundamental properties of matter, in particular how the properties of strongly-interacting particles known as hadrons are determined by their constituent quarks and gluons. One goal of this programme is to identify discrepancies between the standard model underpinning our current understanding of particle physics and current experiments aimed at revealing new laws of physics. This requires incredibly precise calculations, matching the precision of experimental measurements at the Large Hadron Collider and other major international facilities, which in turn requires the generation of large Monte Carlo ensembles.
 
Efficient simulation of local translationally-invariant quantum field theory requires not just the computational power of GPUs, but also low-latency communication between nodes. Tursa has one infiniband 200 Gb/s network card per GPU (4 cards per node) and a non-blocking network. Highly optimised I/O is needed so that output results can be rapidly moved into storage and re-loaded for future calculations, so our acceptance tests included a specific mini-app based on I/O patterns of full-scale production runs. 
 
The deployment of Tursa was an outstanding example of co-design in action. The table below shows how the system performance increased between pre-tender benchmarking tests and the final deployment. Initial measurements at the time of the tender process obtained a sustained performance of 5.3 TF for a production run using 16 nodes. ATOS committed to 5.83 TF in their tender submission and following the formal DiRAC acceptance process, the delivered performance had increased to 6.15 TF. Over the course of the three-month technical commissioning, successive hardware and software tuning using the combined efforts of DiRAC software engineers, ATOS engineers and the Edinburgh ACF technical team increased the sustained performance to 8.8 TF, with peak 9.9 TF. 
 
Stage
16 Node Performance (TFlop)
Measured
5.3
Committed
5.83
Acceptance
6.15
Commissioning
8.8
Peak
9.9
Since deployment, the widely-discussed convergence of the computing requirements of AI and simulation workflows has continued apace. Tursa’s design specifications are very well-matched to those needed for the training of large-language models in particular, making Tursa the first UK example of a system which can support both capability QCD simulations and state-of-the-art AI calculations.
 
As it is GPU-based, Tursa is extremely energy efficient. Compared to its predecessor, Tursa delivers approximately 5 times more scientific throughput using only half the power.  

TURSA

TURSA COMPRISES
4272 AMD CPU CORES RUNNING AT 2.6/3.3GHz
178TB OF SYSTEM MEMORY
712 * A100 NVIDIA GPUs
200GB/s HDR IB NON-BLOCKING INTERCONNECT
8PB TAPE BACKUP
 
TO SUPPORT THE REQUIRED WORKLOADS, EACH OF THE 112 GPU NODES HAS:
2 * AMD EPIC ROME CPUs, EACH WITH 12 CORES (24 CORES PER NODE)
1TB OF SYSTEM MEMORY, GIVING ∼ 40GB PER CPU CORE
4* NVIDIA A100 GPU CARDS, 40GB ON BOARD MEMOREY, AND 432 TENSOR CORES RUNNING AT 765/1310MHz, GIVING 27,648 CUDA CORES AND 160GB OF GPU MEMORY
THE GPU CARDS ARE CONNECTED VIA NVLINK, GIVING MEMORY TRANSFER SPEED BETWEEN CARDS OF 4800GB/s
 
CPU-ONLY PARTITIONS, COMPRISING 6 NODES EACH WITH:
DUAL AMD PROCESSORS (64 CORES EACH)
256GB RAM
 
STORAGE
4PB LUSTRE STORAGE
 
BENCHMARK CODES
THE BENCHMARK CODES USED FOR THE DESIGN AND TESTING OF THE TURSA SYSTEM WERE:
GRID: A KEY CODE FOR THE THEORETICAL PARTICLE PHYSICS COMMUNITY, GRID IS DIRAC’S MOST HIGHLY OPTIMISED SCIENCE CODE
GRIDIO: A BENCHMARK SPECIFICALLY DESIGNED TO TEST THE PERFORMANCE OF THE I/O AND STORAGE SYSTEMS FOR I/O PATTERNS MATCHING THOSE OF PRODUCTION CALCULATIONS
SOMBRERO: ANOTHER IMPORTANT CODE FOR PARTICLE PHYSICS, WHICH WAS USED FOR COMPARATIVE BENCHMARKING ACROSS ALL DIRAC-3 SYSTEMS

SITE SPECFIC USER GUIDE

Our site spcific user guide, hosted by EPCC, contains a full user guide for the use of Tursa

DATA MANAGEMENT PLAN

SCIENCE ON EXTREME SCALING SERVICE (EDINBURGH)