DiRAC Services support a significant portion of STFC’s science programme, providing simulation and data modelling resources for the UK Frontier Science theory community in Particle Physics, astroparticle physics, Astrophysics, cosmology, solar system & planetary science and Nuclear physics (PPAN; collectively STFC Frontier Science). DiRAC services are optimised for these research communities and operate as a single distributed facility which provides the range of architectures needed to deliver our world-leading science outcomes.
Information on how to apply for time on our Services can be found here, and how our Services map onto our Science agenda can be found here. The DiRAC Data Management Plan is available for download here.
Each of our DiRAC-3 sites have user guides which are hosted online, and they provide you with all the information about the service, how to log on and updates on downtime, or upcoming server room or data centre maintenance.
We have created a user-friendly GitHub Wiki page to share the site and service information in one place as well as sharing the Tips and Tricks, compiled by our Research Software Engineers.
Data Intensive Service
The Data Intensive Service is jointly hosted by the Universities of Cambridge and Leicester.
DiRAC has a part share of the CSD3 petascale HPC platform (Cumulus)hosted at the University of Cambridge.
The Cumulus system currently consists of several components:412 Ice Lake CPU nodes each with 2 x Intel Xeon Platinum 8368Qprocessors, 2.60GHz 38-core (76 cores per node):296 nodes with 256 GiB memory116 nodes with 512 GiB memory672 Cascade Lake CPU nodes each with 2 x Intel Xeon Platinum 8276processors, 2.6GHz 28-core (56 cores per node):616 nodes with 192 GiB memory56 nodes with 384 GiB memory384 Skylake CPU nodes each with 2 x Intel Xeon Gold 6142 processors,2.6GHz 16-core (32 cores per node) and 192 GiB memory.80 Ampere GPU nodes (Wilkes-3) each with 4x NVIDIA A100-SXM-80GB GPUs,2x AMD EPYC 7763 processors, 1.8GHz 64-core (128 cores per node), 1TiBRAM.The HPC interconnect isIntel OmniPath, 2:1 blocking (Skylake)Mellanox HDR Infiniband, 3:1 blocking (Cascade Lake, Ice Lake andWilkes-3).Storage consists of 23PiB of disk storage configured as multiple Lustreparallel filesystems.The operating system is based on RedHat Enterprise Linux, and resourcemanagement is performed by Slurm.For more information see the site specific user guide.
The DIaL system spec has:
- 25,600 AMD cores running at 2.25/3.4GHz
- 102TB of system memory
- 200Gbps HDR IB 3:1 blocking interconnect
- 4TB file space
Each of the 200 nodes has:
- 2 * AMD EPIC ROME 7742 CPUs each with 64 cores giving 128 cores per node running at 2.25/3.4GHz
- 512GB of system memory, giving 3.9GB per CPU core
- 200Gbps HDR IB interconnect
- Running CentOS7
Data Intensive 2.5x
The DI system has two login nodes, Mellanox EDR interconnect in a 2:1 blocking setup and 3PB Lustre storage.
- 408 dual-socket nodes with Intel Xeon Skylake 6140, two FMA AVX512, 2.3GHz; 36 cores, 192 GB RAM. 14688 cores and 3.5PB storage in total.
- 1 x 6TB server with 144 cores X6154@ 3.0GHz base
- 10 x 1.5TB server with 36 cores X6240@ 2.3GHz base
The DI System at Leicester is designed to offer fast, responsive I/O.
A site specific user guide is available here.
Memory Intensive Service (COSMA)
Memory Intensive 3 (COSMA8)
The DiRAC-3 Memory Intensive service (COSMA8) was installed in 2021, and becameoperational in October of that year. It is comprised of:
- 360 compute nodes each with 128 cores (2x AMD 7H12 processors), 1TB RAMand a non-blocking HDR200 InfiniBand network.
- 2x 2TB login nodes with 64 cores (dual AMD Rome 7542 processors)
- Two fat nodes with 4TB RAM and 128 cores
- GPU nodes with NVIDIA A100, V100 and AMD MI50 and MI100 GPUs
- 5PB bulk Lustre storage
- 1.2PB fast scratch storage (~350GBytes/s)
Memory Intensive 2.5x (COSMA7)
The DiRAC-2.5x Memory Intensive service (COSMA7) was installed in 2018.It comprises:
2x 1.5TB and 1x 768GB login nodes with Intel Xeon 5120 Skylake processors, 1FMA AVX512, 2.2GHz, 28 cores
452 compute nodes, each with 512 GB of RAM and 2 x X5120 2.2Ghz per node, offering a total of 12 656 cores.
The system is connected via Mellanox EDR in a 2:1 blocking configuration. 512TB of fast I/O scratch space and 3.1PB of Data space on Lustre.
Memory Intensive 2 (Formerly “Data Centric”, now COSMA6)
About 9000 cores in the COSMA6 cluster. Approximately 570 nodes offer 128GB of memory per node and are connected via a Mellanox FDR 10 2:1 Blocking Infiniband fabric. Storage capacity on COSMA6 is 2.6PB.
The IB fabric connects COSMA6 to Lustre filesystem, with the I/O performance for both being 10-11GB/s write and 5-6GB/s read
More information on the Memory Intensive (COSMA) system can be found here and further enquiries on the Memory Intensive Service can be emailed to email@example.com. A site specific user guide can be found here.
Extreme Scaling Service
DiRAC Extreme Scaling ‘Tursa’
Based in Edinburgh and locally named ‘Tursa’, this system is dominated by the GRID team. This service aimed to provide a service for computational intensive codes with relatively small data footprint per core, but with high data transfer. Tursa has two clusters, a large 112 GPU based cluster, and a small AMD based cluster.
The ES system spec has:
- 14592 AMD CPU cores running at 2.6/3.3GHz
- 114TB of system memory
- 448 * A100 Nvidia GPUs
- 200Gbps HDR IB non-blocking interconnect
- 8PB Tape backup
To support the required workloads each of the 112 GPU nodes has:
- 2 * AMD EPIC ROME 7H12 CPUs each with 64 cores giving 128 cores per node running at 2.6/3.3GHz
- 1TB of system memory, giving 7.8GB per CPU core
- 4 * NVIDIA A100 GPU cards each with 6912 FP32 CUDA cores, 40GB on board memory, and 432 tensor cores running at 765/1410MHz. Giving 27,648 cuda cores and 160GB of GPU memory
- The GPU cards are connected via NVLink giving memory transfer speed between cards of 4800Gbps
- 448 * A100 Nvidia GPUs
DiRAC Extreme Scaling ‘Tesseract’
The Extreme Scaling Service is hosted by the University of Edinburgh. DiRAC Extreme Scaling (also know as Tesseract) is available to industry, commerce and academic researchers. General information on Tesseract, as well as the User Guide, is available here.
The Tesseract compute service is based around an HPE SGI 8600 system with 1476 compute nodes.
There are 1468 standard compute nodes, each with two 2.1 GHz, 12-core Intel Xeon (Skylake) Silver 4116 processors and 96 GB of memory. In addition, there are 8 GPU compute nodes each with two 2.1 GHz, 12-core Intel Xeon (Skylake) Silver 4116 processors; 96 GB of memory; and 4 NVidia V100 (Volta) GPU accelerators connected over NVlink.
All compute nodes are connected together by a single Intel Omni-Path fabric and all nodes access the 3 PB Lustre file system.
As well as the fast, parallel Lustre storage, Tesseract also provides a tiered storage solution based on zero watt disk storage and tape storage built on the HPE DMF solution.
Our Services Supporting our Science
DiRAC operates within a framework of well-established science cases which have been fully peer reviewed to deliver a transformative research programme aimed at creating novel and improved computing techniques and facilities. We tailor our Services’ architectures towards solving these science problems and by doing so help underpin research covering the full remit of STFC’s astronomy, particle, nuclear and accelerator physics Science Challenges. Some brief illustrations of how our Services map onto our Science Agenda can be found below and for more information please email theProject Office.
The Data Intensive Service addresses the problems associated with driving scientific discovery through the analysis of large data sets using a combination of modelling and simulation, e.g. the large-volume data sets from flagship astronomical satellites such as Planck and Gaia, and ground-based facilities such as the Square Kilometre Array (SKA). One project using the Data Intensive Service is looking at breaking resonances between migrating planets.
The Memory Intensive Service supports detailed and complex simulations related to Computational Fluid Dynamic problems, for example cosmological simulations of galaxy formation and evolution, which require access to very large amounts of memory (more than 300 terabytes) to enable codes to ‘follow’ structures as they form. The innovative design of this Service supports physically detailed simulations which can use an entire DiRAC machine for weeks or months at a time. More on the Virgo project, which uses the Memory Intensive Service can be found here.
The Extreme Scaling Service supports codes that make full use of multi-petaflop HPC systems. DiRAC works with industry on the design of systems using Lattice QCD in theoretical particle physics as a driver. This field of physics provides theoretical input on the properties of hadrons to assist with the interpretation of data from experiments such as the Large Hadron Collider. To find out more about one of the Lattice QCD projects using the Extreme Scaling Service see the 2017 Science Highlights page.
The DiRAC Data Management Plan can be found here.