The image shows a vertical cross-section taken from a 3D...READ MORE
The Data Intensive service provides a set of hardware options for projects from across all DiRAC research domains, driving scientific discovery by delivering a step-change in our capability to handle large datasets, both to perform and analyse precision theoretical simulations and then to confront them with the next generation of observational and experimental data. Such diverse workflows are best supported by a heterogeneous mix of architectures delivered across two DiRAC sites: Cambridge and Leicester.
System name: DIaC part of CSD3 (Cambridge Service for Data-Driven Discovery)
Many DiRAC projects explore high-dimensional parameter spaces using statistical techniques generating large numbers of computationally-intensive models. There is also an increasing use of GPU acceleration in simulations, either to support post-processing of simulation data or to make use of AI-driven models. The Cambridge service supports these workflows through a mix of CPU and GPU nodes sharing a common parallel file system to ensure that workflows can use both architectures.
The Cambridge system uses OpenStack for deployment and presentation of services. On-going work with UK-based SME StackHPC will enable DiRAC users to explore the potential benefits to their workflows, including the long-term goal of supporting workflows requiring access to more than one DiRAC service for their efficient completion.
The benchmark codes used for the design and testing of the CSD3 system were:
MILC – a particle physics QCD code, providing key results informing ongoing experiments at the precision frontier. The expensive step is calculating propagators for light quarks on gluon field backgrounds defined on large, fine space-time lattices. The output is stored for subsequent re-analysis, making I/O performance a key requirement for this work.
Arepo – a code used for cosmological zoom simulations to explore the inner regions of galaxies at high resolutions. Some outputs from running Arepo on CPUs are later processed using GPUs to add the effects of radiation.
GRChombo – a adaptive mesh refinement (AMR) numerical relativity code wirth applications ranging from early universe cosmology to black hole mergers producing observable gravitational wave signatures.
DIRAC HAS ACCESS TO A SHARE OF 100 NODES, EACH WITH 4X A100 GPUs, DUAL 64-CORE AMD MILAN PROCESSORS & 1TB RAM
THE HPC INTERCONNECT:
INTEL OMNIPATH, 2:1 BLOCKING (SKYLAKE) MELLANOX HDR INFINIBAND, 3:1 BLOCKING (CASCADE LAKE, ICE LAKE AND WILKES-3)
STORAGE CONSISTS OF 23PiB OF DISK STORAGE CONFIGURES AS MULTIPLE LUSTRE PARALLEL FILESYSTEMS, OF WHICH DIRAC HAS ACCESS TO 4.8 PiB
RESOURCE MANAGEMENT IS PERFORMED BY SLURM
Our site specific user guide, hosted by The University of Cambridge, contains a full user guide as well as a list of applications of the CSD3 system