In Search of an Interdisciplinary Solution for Scalable Planetary Characterisation

In Search of an Interdisciplinary Solution for Scalable Planetary Characterisation

Facility Resource

PI: Kai Hou Yip

Our Dirac project has been focusing on exploring the potential of artificial intelligence and machine learning in helping us to explore the strange worlds of exoplanets. In particular, the field is now receiving much better quality data from JWST, but our data analysis is struggling to comprehensively analyse the data with the required precision.

One of the setbacks is the slowness of atmospheric models. We have developed (PI: Tara Tahseen) an LSTM-based surrogate model to provide a substantial speedup (100x) to the radiative transfer module in a general circulation model (a 3D climate model for planets) compared to the original physical model, with very small differences in the simulation (Figure1 RHS shows the % difference).


On the other hand, deploying models in production requires one to be cautious, as the model might be exposed to unknown data distributions. To this end, an investigation led by Luis Simões from ML Analytics has looked into building safety cages to safeguard the performance of machine learning models during production against data with different distributions. With the use of Isolation Forest and SHAP values, we can flag examples that are outside the training distribution (outliers), which helps to maintain the model’s performance (See Figure 2 RHS: low error are maintained for up to 60% coverage of the test data).


Apart from these, we have been busy organising the Ariel Data Challenge series (2025 coming up!) and Datathons for the local community, so far we have done it in Harwell (UK), Lisbon (Portugal) and Madrid (Spain).