DiRAC Services support a significant portion of STFC’s science programme, providing simulation and data modelling resources for the UK Frontier Science theory community in Particle Physics, astroparticle physics, Astrophysics, cosmology, solar system & planetary science and Nuclear physics (PPAN; collectively STFC Frontier Science). DiRAC services are optimised for these research communities and operate as a single distributed facility which provides the range of architectures needed to deliver our world-leading science outcomes.

Based at four University sites (Cambridge; Leicester; Durham & Edinburgh), we host three Services,: Data Intensive Cambridge; Data Intensive Leicester; Memory Intensive and Extreme Scaling.

Information on how to apply for time on our Services can be found here, and how our Services map onto our Science agenda can be found here. The DiRAC Data Management Plan is available for download here.

For general enquires please email DiRAC Support or the Project Office.



Each of our DiRAC-3 sites have user guides which are hosted online, and they provide you with all the information about the service, how to log on and updates on downtime, or upcoming server room or data centre maintenance. 

We have created a user-friendly GitHub Wiki page to share the site and service information in one place as well as sharing the Tips and Tricks, compiled by our Research Software Engineers. 



Data Intensive Service

The Data Intensive Service is jointly hosted by the Universities of Cambridge and Leicester.

Data Intensive@Cambridge

DiRAC has a part share of the CSD3 petascale HPC platform (Cumulus)hosted at the University of Cambridge.

Cumulus

The Cumulus system currently consists of several components:

544 Ice Lake CPU nodes each with 2 x Intel Xeon Platinum 8368Q
processors, 2.60GHz 38-core (76 cores per node):

296 nodes with 256 GiB memory
116 nodes with 512 GiB memory
DiRAC has a share of 267 nodes (20,292 cores)

672 Cascade Lake CPU nodes each with  2 x Intel Xeon Platinum 8276
processors, 2.6GHz 28-core (56 cores per node):

616 nodes with 192 GiB memory
56 nodes with 384 GiB memory
DiRAC has a share of 119 nodes (6664 cores).

80 Ampere GPU nodes (Wilkes-3) each with 4x NVIDIA A100-SXM-80GB GPUs,
2x AMD EPYC 7763 processors, 1.8GHz 64-core (128 cores per node), 1TiB
RAM.

The HPC interconnect is
Intel OmniPath, 2:1 blocking (Skylake)
Mellanox HDR Infiniband, 3:1 blocking (Cascade Lake, Ice Lake and
Wilkes-3).

Storage consists of 23PiB of disk storage configured as multiple Lustre
parallel filesystems.

The operating system is based on RedHat Enterprise Linux, and resource
management is performed by Slurm.

For more information see the site specific user guide.


Data Intensive@Leicester

lecester
Data Intensive 3

The DIaL system spec has:

  • 25,600 AMD cores running at 2.25/3.4GHz
  • 102TB of system memory
  • 200Gbps HDR IB 3:1 blocking interconnect
  • 4TB file space

Each of the 200 nodes has:

  • 2 * AMD EPIC ROME 7742 CPUs each with 64 cores giving 128 cores per node running at 2.25/3.4GHz
  • 512GB of system memory, giving 3.9GB per CPU core
  • 200Gbps HDR IB interconnect
  • Running CentOS7
Data Intensive 2.5x

The DI system has two login nodes, Mellanox EDR interconnect in a 2:1 blocking setup and 3PB Lustre storage.

Cluster
  • 408 dual-socket nodes with Intel Xeon Skylake 6140, two FMA AVX512, 2.3GHz; 36 cores, 192 GB RAM. 14688 cores  and 3.5PB storage in total.
Large-Memory
  • 1 x 6TB server with 144 cores X6154@ 3.0GHz base
  • 10 x 1.5TB server with 36 cores X6240@ 2.3GHz base

The DI System at Leicester is designed to offer fast, responsive I/O.

Further information is available on the web page or by emailing Leicester support.

A site specific user guide is available here.


Memory Intensive Service (COSMA)

The Memory Intensive Service is hosted by the University of Durham at the Institute for Computational Cosmology (ICC). The COSMA support web pages are available here.

Memory Intensive 3 (COSMA8)

The DiRAC-3 Memory Intensive service (COSMA8) was installed in 2021, and becameoperational in October of that year.  It is comprised of:

  • 360 compute nodes each with 128 cores (2x AMD 7H12 processors), 1TB RAMand a non-blocking HDR200 InfiniBand network.
  • 2x 2TB login nodes with 64 cores (dual AMD Rome 7542 processors)
  • Two fat nodes with 4TB RAM and 128 cores
  • GPU nodes with NVIDIA A100, V100 and AMD MI50 and MI100 GPUs
  • 5PB bulk Lustre storage
  • 1.2PB fast scratch storage (~350GBytes/s)

Memory Intensive 2.5x (COSMA7)

DiRAC’s Memory Intensive Resource

The DiRAC-2.5x Memory Intensive service (COSMA7) was installed in 2018.It comprises:

  • 2x 1.5TB and 1x 768GB login nodes with Intel Xeon 5120 Skylake processors, 1FMA AVX512, 2.2GHz, 28 cores

  • 452 compute nodes, each with 512 GB of RAM and 2 x X5120 2.2Ghz per node, offering a total of 12 656 cores.

  • The system is connected via Mellanox EDR in a 2:1 blocking configuration. 512TB of fast I/O scratch space and 3.1PB of Data space on Lustre.

Memory Intensive 2 (Formerly “Data Centric”, now COSMA6)

  • About 9000 cores in the COSMA6 cluster.  Approximately 570 nodes offer 128GB of memory per node and are connected via a Mellanox FDR 10 2:1 Blocking Infiniband fabric. Storage capacity on COSMA6 is 2.6PB.

  • The IB fabric connects COSMA6 to Lustre filesystem, with the I/O performance for both being 10-11GB/s write and 5-6GB/s read

More information on the Memory Intensive (COSMA) system can be found  here and further enquiries on the Memory Intensive Service can be emailed to cosma-support@durham.ac.uk. A site specific user guide can be found here.


Extreme Scaling Service

DiRAC Extreme Scaling ‘Tursa’

Based in Edinburgh and locally named ‘Tursa’, this system is dominated by the GRID team. This service aimed to provide a service for computational intensive codes with relatively small data footprint per core, but with high data transfer. Tursa has two clusters, a large 112 GPU based cluster, and a small AMD based cluster.

The ES system spec has:

  • 14592 AMD CPU cores running at 2.6/3.3GHz
  • 114TB of system memory
  • 448 * A100 Nvidia GPUs
  • 200Gbps HDR IB non-blocking interconnect
  • 8PB Tape backup

To support the required workloads each of the 112 GPU nodes has:

  • 2 * AMD EPIC ROME 7H12 CPUs each with 64 cores giving 128 cores per node running at 2.6/3.3GHz
  • 1TB of system memory, giving 7.8GB per CPU core
  • 4 * NVIDIA A100 GPU cards each with 6912 FP32 CUDA cores, 40GB on board memory, and 432 tensor cores running at 765/1410MHz. Giving 27,648 cuda cores and 160GB of GPU memory
  • The GPU cards are connected via NVLink giving memory transfer speed between cards of 4800Gbps
  • 448 * A100 Nvidia GPUs
Further information on the Extreme Scaling Service is available by emailing DiRAC Support. A site specific user guide can be found here. 

 

Our Services Supporting our Science

DiRAC operates within a framework of well-established science cases which have been fully peer reviewed to deliver a transformative research programme aimed at creating novel and improved computing techniques and facilities. We tailor our Services’ architectures towards solving these science problems and by doing so help underpin research covering the full remit of STFC’s astronomy, particle, nuclear and accelerator physics Science Challenges. Some brief illustrations of how our Services map onto our Science Agenda can be found below and for more information please email theProject Office.

The Data Intensive Service addresses the problems associated with driving scientific discovery through the analysis of large data sets using a combination of modelling and simulation, e.g. the large-volume data sets from flagship astronomical satellites such as Planck and Gaia, and ground-based facilities such as the Square Kilometre Array (SKA).  One project using the Data Intensive Service is looking at breaking resonances between migrating planets.

The Memory Intensive Service supports detailed and complex simulations related to Computational Fluid Dynamic problems, for example cosmological simulations of galaxy formation and evolution, which require access to very large amounts of memory (more than 300 terabytes) to enable codes to ‘follow’ structures as they form.   The innovative design of this Service supports physically detailed simulations which can use an entire DiRAC machine for weeks or months at a time. More on the Virgo project, which uses the Memory Intensive Service can be found here.

The Extreme Scaling Service supports codes that make full use of multi-petaflop HPC systems. DiRAC works with industry on the design of systems using Lattice QCD in theoretical particle physics as a driver.   This field of physics provides theoretical input on the properties of hadrons to assist with the interpretation of data from experiments such as the Large Hadron Collider. To find out more about one of the Lattice QCD projects using the Extreme Scaling Service see the 2017 Science Highlights page.


The DiRAC Data Management Plan can be found here.


Categories: Home