Hydra+: A universal Bayesian analysis of cosmic neutral hydrogen with a million-parameter data model

Hydra+: A universal Bayesian analysis of cosmic neutral hydrogen with a million-parameter data model

PI: Dr Phil Bull (University of Manchester)

Neutral hydrogen gas is the raw material of star formation, and is ubiquitous across cosmic time. The presence of neutral hydrogen, and how is is distributed in space, can tell us about the processes that helped form the first galaxies, as well as tracing the way that (otherwise invisible) dark matter clumps together to form structures in the Universe.

We can detect neutral hydrogen through its weak radio emission, known as the 21cm line after the wavelength it is emitted at. Radio emission from our own Galaxy and others nearby is much brighter by a factor of tens of thousands however, making disentangling the faint hydrogen signal a difficult task. This is made even more difficult by small but complicated ways in which radio telescopes distort the signals that they receive.

We have developed a new data analysis tool called Hydra that attempts to build a model of all the complex effects that are present in the data, and then separate them using a very accurate statistical approach. The drawback to trying to model everything is that hundreds of thousands, or possibly even millions of numbers must be estimated somehow from the data. This is a tremendous computational problem.

During this project, we have shown that it is possible to successfully estimate such vast numbers of parameters using a method called Gibbs sampling, running on the cosma8 system in Durham. Publications by Burba et al. (2024) and Glasscock et al. (2024) explain the method behind this, and how it can be used to estimate various parts of the model that explains the 21cm line data from the Hydrogen Epoch of Reionization Array (HERA), one of the leading experiments in this field. 

Going forward, we will be able to build on these results to build a progressively more realistic and accurate model of the data, with each step getting us closer to an unambiguous detection of the 21cm line from gas around the very first stars and galaxies in the Universe.

Fig. 14 from Glasscock et al. (2024), showing a simulated ‘ideal’ map of the radio emission on the sky (top left), and the map recovered by Hydra in spite of all of the complicated effects and distortions that were present in the data due to the radio instrumentation. The bottom panels show the fractional difference between the ideal and recovered signals, and the statistical uncertainty respectively; the recovery is very accurate.

Publications

  • Sensitivity of Bayesian 21 cm power spectrum estimation to foreground model errors 

Burba, Jacob; Bull, Philip; Wilensky, Michael J.; Kennedy, Fraser; Garsden, Hugh; Glasscock, Katrine A. MNRAS 535, 1 (2024).

  • Statistical estimation of full-sky radio maps from 21 cm array visibility data using Gaussian constrained realizations

Glasscock, Katrine A.; Bull, Philip; Burba, Jacob; Garsden, Hugh; Wilensky, Michael J. RASTI 3, 1 (2024).