Trustworthy AI | Machine Learning & Atmospheric Processes Group

As machine learning becomes embedded in operational system across the geosciences the question of trust becomes critical. Despite their impressive accuracy, black-box models often lack transparency, leaving users in the dark about how predictions are made or where they might fail. These hidden biases can lead to serious real-world consequences. In other words, do we really know what our models are doing?

Our work focuses on building interpretable machine learning systems for geoscience applications. In a recent study (under review at JGR: Machine Learning and Computation), we explore the internal structure of neural networks trained to classify precipitation types, revealing interpretable “circuits” that resemble known physical processes. We show that these circuits remain stable across initializations and can be traced to specific phase-relevant features like temperature thresholds and moisture layers.

Operational ML systems are used for forecasting floods, monitoring food security, and predicting wildfire risk, but the lack of interpretability poses ongoing challenges.

We use sparse autoencoders to analyze the internal representations of neural networks trained to classify precipitation phase.

We find that key neurons emerge with well-defined physical meanings, encoding features like melting layer structure or cloud-top height. These insights allow domain experts to interrogate the model and build confidence in its outputs, or identify conditions where the model may be extrapolating dangerously.

Some hidden units form "circuits" with consistent physical interpretation across runs. This opens new doors for trust and explainability in atmospheric ML.

Our goal is to create tools that make machine learning models more interpretable and accountable so that as these systems scale into critical domains, the people who rely on them can understand how and why decisions are being made.

Related Publications

2025

JGR: ML&C

Leveraging Sparse Autoencoders to Reveal Interpretable Features in Geophysical Models

Fraser King, Claire Pettersen, Derek Posselt, and 2 more authors

Journal of Geophysical Research: Machine Learning and Computation, 2025

Abs DOI

Machine learning is an increasingly popular tool in the geosciences, offering new approaches to numerical weather prediction and complex dataset analysis. However, as reliance on these techniques grows, pressing questions about model transparency, internal biases, and trust emerge. While post-hoc explainability analyses can provide insights on how neural network (NN) outputs are generated, a robust framework for interpreting internal decision-making remains underdeveloped. We address this challenge by exploring a framework to better understand the inner structure of NNs using sparse autoencoders (SAEs). With simplified multilayer perceptrons (MLPs), we demonstrate that hidden layer neurons often exhibit polysemantic behavior where each feature is mapped to a linear combination of neurons, creating an overcomplete representation. This phenomenon, known as superposition, arises when networks encode more features than available neurons, causing neurons to respond to multiple, seemingly unrelated inputs. By introducing a regularized SAE that learns from the original MLP’s activations, we can disentangle these representations, resulting in a 33% reduction in the average number of sensitive inputs per neuron. Applied to a precipitation classification model, this framework reveals evidence of monosemantic behavior in which neurons respond to a single meaningful concept tied to specific physical phenomena such as temperature and fallspeed thresholds for precipitation phase partitioning. We observe similar monosemantic behavior in SAE activations from a snowfall rate regressor related to particle concentration intensity, and vertical radar structures. This framework supports the development of more physically consistent interpretations of hidden neuron activations and improved trust in operational ML models across the geosciences.