Advanced Application & Systems Performance Analysis Tools
To produce an application that monitors workload usage of hardware components.
Summary of work undertaken
The assets from the Cloud Road-testing for UKRI Workloads work package were extended to ensure every platform has monitoring to get visibility into how well a workload is making use of the hardware assigned to that specific platform.
The Jupyter Notebook and Linux machine platforms both ran an isolated Prometheus Node Exporter and Grafana stack. A similar stack was run on Slurm, alongside Slurm specific information, such as the current jobs being run.
OpenStack Cloud Dashboard – Azimuth was modified to link to a Grafana that can provide insights into the users current usage and resource allocations. (Although, there are still some missing links in making a full end-to-end prototype.)
DiRAC Wide Dashboard – there is no working prototype for a DiRAC wide dashboard, but architecturally it was shown how this could be adopted for each site, and then aggregated centrally, using the same technologies.