Foundation Models
Examining the skill and alignment of GenAI for complex problems in the atmospheric sciences at global scales
Foundation Models promise a step change for weather and climate prediction using multimodal inputs, long contexts, and powerful generative priors. Our group studies when these models actually help, and how to make them skillful, interpretable, robust, and aligned for use across the atmospheric sciences!
So how do we do this?
Performance. We benchmark short-lead and medium-range forecasts with a focus on extremes, reliability, and trust. Here, we compare predictions across regions, seasons, sensors, and event types to learn where Foundation Models beat traditional ESM baselines, and where classic statistical/ML methods still win.
Interpretability. We connect internal representations to physics using attribution tests, sparse encoders/crosscoders, and counterfactuals. So when a model works, we can potentially say something about why; and when it fails, we can examine where and how early.
Security. We evaluate data-poisoning and backdoor risks in realistic open-science pipelines, building integrity checks, and red-team tests to keep operational deployments trustworthy.
Alignment. We explore domain-informed objectives, human-in-the-loop evaluation, and transparent reporting that respect physical constraints and observational uncertainty for the public good. Still a lot to do here, and we are open to new alignment ideas.
What’s next? Upcoming projects will stress-test frontier models like NVIDIA’s Climate in a Bottle (cBottle) and related GenAI forecasters. We are looking to expand to other Foundation Models in the future. Interested in working in this area or collaborating? Reach out!