Performance Analysis Workshop Series 2023 20th April – 18th May 2023
With every new generation of computers, we see the gap between the theoretical performance of a machine and the performance that is actually delivered by applications widen. Codes struggle to exploit the hardware. It has therefore become critical that researchers and research software engineers in HPC to understand how well and why codes use the machinery as they do. Insight into performance behaviour can drive the code evolution and ultimately become the means through which future advancement through computing are facilitated.
This workshop series offers a comprehensive introduction to a selection of open source tools that enable researchers to assess the performance behaviour of their code. The workshops will be augmented by revision sessions of some of the core HPC know-how. We encourage participants to bring along their own codes so they can continually assess and improve them throughout the series.
16/17 February 2023 Durham University, Department of Computer Science, Durham, UK (hybrid, in person preferred) In collaboration with NVIDIA Networking
Durham’s Department of Computer Science, in collaboration with Durham’s DiRAC facilities and Durham’s ExCALIBUR H&ES installations, has organised a 1.5 day hackathon on how to use NVIDIA BlueField technology.
BlueField-empowered systems are supercomputers, where each individual networking card is equipped with additional ARM processors. These processors can, for example, take ownership of data movements between nodes, i.e. release the host from messaging-related work, manipulate message content while the messages fly through the network, own checkpointing,…
During the hybrid workshop, participants will first get a brief intro into BlueField technology, and can then try out prepared exercises on these machines. After that, we host a series of talks and brainstorming sessions on how this technology could enable next-generation simulation software. Finally, NVIDIA’s experts will be available to help with some prototyping of ideas on BlueField cards.
“Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.”
Innovation Placements 2023
DiRAC Innovation Placements provide doctoral students and early career researchers with an exciting opportunity to engage with Industrial and academic partners on projects that deliver research impact and develop key skills. Offering the potential to gain industrial experience by working collaboratively with businesses and fostering fundamental research competencies through academic mentoring, these competitive, fully funded internships address current challenges with cutting edge solutions.
To investigate XDMOD to assess if it is suitable as a DiRAC wide Accounting tool.
Summary of work undertaken
A demonstrator application was set up at DiRAC Cambridge, and basic functions then tested. A Summary of conclusions and next steps is in the report below.
Produce a digital repository for the sharing and archiving of benchmarking data for key DiRAC codes.
Summary of work undertaken
A wiki was created within the DiRAC instance of the confluence package, currently hosted at the University of Edinburgh. This was created as a long-term repository for benchmarking data from the DiRAC systems. The wiki is intended to be updated as and when new benchmark results are available, for example during procurement activities or when new versions of applications are introduced. The repository has space for detailed information and comments from the benchmark runners to highlight special features of the run. It also has space for MPI profiling information from the run.
Initial data from existing benchmark runs has been loaded into the wiki.
DiRAC is excited to launch our inaugural research image competition and encourage our past and present users to submit aesthetically inspiring and scientifically interesting imagery which has been generated using the DiRAC facility over the past three years.
The competition is a wonderful opportunity to have your research imagery displayed across the DiRAC platform, on our website, on social media, and in print media (see our 2022 calendar, right) and will help promote interest in your area of scientific study.
There are two categories for submission and the winners of each category will be selected by a panel of experts.
The top image in each category being awarded a £250 Amazon e-voucher, kindly donated by Q Associates, a Logicalis company.
Submissions are now closed.
Prize winners will be notified on 1st November and results annouced on our website and social media channels.
DiRAC 2022 Calendar comprising previously submitted images generated on DiRAC.
Themes:
Theme 1: Particle and Nuclear Physics
Theme 2: Astronomy, Cosmology and Solar & Planetary Science
Any imagery submitted to this competition could be used in future marketing/publicity materials relating to DiRAC in either digital or print and as such, should have visual impact and scientific interest. We will be producing a 2023 DiRAC image calendar, the imagery for which will be drawn from the submissions to this competition and a selection of the top images will be displayed in print at our annual DiRAC Science Day event.
Entry requirements:
Images must be generated as a result of research work carried out using the DiRAC facility
Images should not be older than three years
Digital images must be submitted in one of the following formats: JPEG, TIFF, PNG or PDF
All entries must be accompanied by an entry form (see details below)
Entires should be accompanied by a short description, of no more than 150 words, giving scientific context to the image
Images may be generated specifically for this competition, but should result from research performed within the last 3 years
Author names should be included
Up to three submissions may be made per person
Images should be of a sufficient size and resolution (300 dpi minimum)
File sizes should not be larger than 15mb
Competition opens 1st Sept
The deadline for submissions is 5pm, Friday 14th October
To utilise Reframe as a single wrapper for the suite of existing DiRAC and UCL Tier2 benchmarks – with the aim of providing a single set of benchmarks that can be run as needed following system upgrades.
Summary of work undertaken
The following were successfully added to Reframe:
The benchmarks for Swift and Grid
The benchmarks for CP2K
The benchmarks for HPGMG, IMB, and Sombrero (the latter is a mini-app for Swift)
Work on benchmarks that has been progressed, but as yet not completed due to the technical challenges:
To create a self-contained AI Benchmark/workflow in the domain of synthetic brain imaging.
Summary of work undertaken
Training epochs were run for three provided model configurations on the UCL AI platform on both a single GPU and multiple GPU devices. Several multiple day runs of ~100 epochs were run. As training scripts were configured to run for 100,000 epochs and each epoch takes around an hour or more to run, ‘full’ runs of the model were not performed.
Python requirements and the package associated with the code were installed on the Cambridge HPC service (following the same set up process documented in the repository README below).
Outputs
A Public GitHub repository containing the open-source (GPL v3) release of the research code developed by Kings College London. The GitHub repository (https://github.com/r-gray/3d_very_deep_vae) has a GPL v3 license file included.
The README file in the repository contains full details of how to install the dependencies and Python package, and includes platform dependent requirement specifications with pinned versions for the support operating system and Python version combinations. There is also documentation on how to run the model training with the example configurations provided.
Advanced Application & Systems Performance Analysis Tools
Objective
To produce an application that monitors workload usage of hardware components.
Summary of work undertaken
The assets from the Cloud Road-testing for UKRI Workloads work package were extended to ensure every platform has monitoring to get visibility into how well a workload is making use of the hardware assigned to that specific platform.
The Jupyter Notebook and Linux machine platforms both ran an isolated Prometheus Node Exporter and Grafana stack. A similar stack was run on Slurm, alongside Slurm specific information, such as the current jobs being run.
Outputs
OpenStack Cloud Dashboard – Azimuth was modified to link to a Grafana that can provide insights into the users current usage and resource allocations. (Although, there are still some missing links in making a full end-to-end prototype.)
DiRAC Wide Dashboard – there is no working prototype for a DiRAC wide dashboard, but architecturally it was shown how this could be adopted for each site, and then aggregated centrally, using the same technologies.