JMLR

ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation

Authors
Sungduk Yu Zeyuan Hu Akshay Subramaniam Walter Hannah Liran Peng Jerry Lin Mohamed Aziz Bhouri Ritwik Gupta Björn Lütjens Justus C. Will Gunnar Behrens Julius J. M. Busecke Nora Loose Charles I Stern Tom Beucler Bryce Harrop Helge Heuer Benjamin R Hillman Andrea Jenney Nana Liu Alistair White Tian Zheng Zhiming Kuang Fiaz Ahmed Elizabeth Barnes Noah D. Brenowitz Christopher Bretherton Veronika Eyring Savannah Ferretti Nicholas Lutsko Pierre Gentine Stephan Mandt J. David Neelin Rose Yu Laure Zanna Nathan M. Urban Janni Yuval Ryan Abernathey Pierre Baldi Wayne Chuang Yu Huang Fernando Iglesias-Suarez Sanket Jantre Po-Lun Ma Sara Shamekh Guang Zhang Michael Pritchard
Paper Information
  • Journal:
    Journal of Machine Learning Research
  • Added to Tracker:
    Sep 08, 2025
Abstract

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations.

Author Details
Sungduk Yu
Author
Zeyuan Hu
Author
Akshay Subramaniam
Author
Walter Hannah
Author
Liran Peng
Author
Jerry Lin
Author
Mohamed Aziz Bhouri
Author
Ritwik Gupta
Author
Björn Lütjens
Author
Justus C. Will
Author
Gunnar Behrens
Author
Julius J. M. Busecke
Author
Nora Loose
Author
Charles I Stern
Author
Tom Beucler
Author
Bryce Harrop
Author
Helge Heuer
Author
Benjamin R Hillman
Author
Andrea Jenney
Author
Nana Liu
Author
Alistair White
Author
Tian Zheng
Author
Zhiming Kuang
Author
Fiaz Ahmed
Author
Elizabeth Barnes
Author
Noah D. Brenowitz
Author
Christopher Bretherton
Author
Veronika Eyring
Author
Savannah Ferretti
Author
Nicholas Lutsko
Author
Pierre Gentine
Author
Stephan Mandt
Author
J. David Neelin
Author
Rose Yu
Author
Laure Zanna
Author
Nathan M. Urban
Author
Janni Yuval
Author
Ryan Abernathey
Author
Pierre Baldi
Author
Wayne Chuang
Author
Yu Huang
Author
Fernando Iglesias-Suarez
Author
Sanket Jantre
Author
Po-Lun Ma
Author
Sara Shamekh
Author
Guang Zhang
Author
Michael Pritchard
Author
Citation Information
APA Format
Sungduk Yu , Zeyuan Hu , Akshay Subramaniam , Walter Hannah , Liran Peng , Jerry Lin , Mohamed Aziz Bhouri , Ritwik Gupta , Björn Lütjens , Justus C. Will , Gunnar Behrens , Julius J. M. Busecke , Nora Loose , Charles I Stern , Tom Beucler , Bryce Harrop , Helge Heuer , Benjamin R Hillman , Andrea Jenney , Nana Liu , Alistair White , Tian Zheng , Zhiming Kuang , Fiaz Ahmed , Elizabeth Barnes , Noah D. Brenowitz , Christopher Bretherton , Veronika Eyring , Savannah Ferretti , Nicholas Lutsko , Pierre Gentine , Stephan Mandt , J. David Neelin , Rose Yu , Laure Zanna , Nathan M. Urban , Janni Yuval , Ryan Abernathey , Pierre Baldi , Wayne Chuang , Yu Huang , Fernando Iglesias-Suarez , Sanket Jantre , Po-Lun Ma , Sara Shamekh , Guang Zhang & Michael Pritchard . ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation. Journal of Machine Learning Research .
BibTeX Format
@article{paper513,
  title = { ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation },
  author = { Sungduk Yu and Zeyuan Hu and Akshay Subramaniam and Walter Hannah and Liran Peng and Jerry Lin and Mohamed Aziz Bhouri and Ritwik Gupta and Björn Lütjens and Justus C. Will and Gunnar Behrens and Julius J. M. Busecke and Nora Loose and Charles I Stern and Tom Beucler and Bryce Harrop and Helge Heuer and Benjamin R Hillman and Andrea Jenney and Nana Liu and Alistair White and Tian Zheng and Zhiming Kuang and Fiaz Ahmed and Elizabeth Barnes and Noah D. Brenowitz and Christopher Bretherton and Veronika Eyring and Savannah Ferretti and Nicholas Lutsko and Pierre Gentine and Stephan Mandt and J. David Neelin and Rose Yu and Laure Zanna and Nathan M. Urban and Janni Yuval and Ryan Abernathey and Pierre Baldi and Wayne Chuang and Yu Huang and Fernando Iglesias-Suarez and Sanket Jantre and Po-Lun Ma and Sara Shamekh and Guang Zhang and Michael Pritchard },
  journal = { Journal of Machine Learning Research },
  url = { https://www.jmlr.org/papers/v26/24-1014.html }
}