ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation
Authors
Paper Information
-
Journal:
Journal of Machine Learning Research -
Added to Tracker:
Sep 08, 2025
Abstract
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations.
Author Details
Sungduk Yu
AuthorZeyuan Hu
AuthorAkshay Subramaniam
AuthorWalter Hannah
AuthorLiran Peng
AuthorJerry Lin
AuthorMohamed Aziz Bhouri
AuthorRitwik Gupta
AuthorBjörn Lütjens
AuthorJustus C. Will
AuthorGunnar Behrens
AuthorJulius J. M. Busecke
AuthorNora Loose
AuthorCharles I Stern
AuthorTom Beucler
AuthorBryce Harrop
AuthorHelge Heuer
AuthorBenjamin R Hillman
AuthorAndrea Jenney
AuthorNana Liu
AuthorAlistair White
AuthorTian Zheng
AuthorZhiming Kuang
AuthorFiaz Ahmed
AuthorElizabeth Barnes
AuthorNoah D. Brenowitz
AuthorChristopher Bretherton
AuthorVeronika Eyring
AuthorSavannah Ferretti
AuthorNicholas Lutsko
AuthorPierre Gentine
AuthorStephan Mandt
AuthorJ. David Neelin
AuthorRose Yu
AuthorLaure Zanna
AuthorNathan M. Urban
AuthorJanni Yuval
AuthorRyan Abernathey
AuthorPierre Baldi
AuthorWayne Chuang
AuthorYu Huang
AuthorFernando Iglesias-Suarez
AuthorSanket Jantre
AuthorPo-Lun Ma
AuthorSara Shamekh
AuthorGuang Zhang
AuthorMichael Pritchard
AuthorCitation Information
APA Format
Sungduk Yu
,
Zeyuan Hu
,
Akshay Subramaniam
,
Walter Hannah
,
Liran Peng
,
Jerry Lin
,
Mohamed Aziz Bhouri
,
Ritwik Gupta
,
Björn Lütjens
,
Justus C. Will
,
Gunnar Behrens
,
Julius J. M. Busecke
,
Nora Loose
,
Charles I Stern
,
Tom Beucler
,
Bryce Harrop
,
Helge Heuer
,
Benjamin R Hillman
,
Andrea Jenney
,
Nana Liu
,
Alistair White
,
Tian Zheng
,
Zhiming Kuang
,
Fiaz Ahmed
,
Elizabeth Barnes
,
Noah D. Brenowitz
,
Christopher Bretherton
,
Veronika Eyring
,
Savannah Ferretti
,
Nicholas Lutsko
,
Pierre Gentine
,
Stephan Mandt
,
J. David Neelin
,
Rose Yu
,
Laure Zanna
,
Nathan M. Urban
,
Janni Yuval
,
Ryan Abernathey
,
Pierre Baldi
,
Wayne Chuang
,
Yu Huang
,
Fernando Iglesias-Suarez
,
Sanket Jantre
,
Po-Lun Ma
,
Sara Shamekh
,
Guang Zhang
&
Michael Pritchard
.
ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation.
Journal of Machine Learning Research
.
BibTeX Format
@article{paper513,
title = { ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation },
author = {
Sungduk Yu
and Zeyuan Hu
and Akshay Subramaniam
and Walter Hannah
and Liran Peng
and Jerry Lin
and Mohamed Aziz Bhouri
and Ritwik Gupta
and Björn Lütjens
and Justus C. Will
and Gunnar Behrens
and Julius J. M. Busecke
and Nora Loose
and Charles I Stern
and Tom Beucler
and Bryce Harrop
and Helge Heuer
and Benjamin R Hillman
and Andrea Jenney
and Nana Liu
and Alistair White
and Tian Zheng
and Zhiming Kuang
and Fiaz Ahmed
and Elizabeth Barnes
and Noah D. Brenowitz
and Christopher Bretherton
and Veronika Eyring
and Savannah Ferretti
and Nicholas Lutsko
and Pierre Gentine
and Stephan Mandt
and J. David Neelin
and Rose Yu
and Laure Zanna
and Nathan M. Urban
and Janni Yuval
and Ryan Abernathey
and Pierre Baldi
and Wayne Chuang
and Yu Huang
and Fernando Iglesias-Suarez
and Sanket Jantre
and Po-Lun Ma
and Sara Shamekh
and Guang Zhang
and Michael Pritchard
},
journal = { Journal of Machine Learning Research },
url = { https://www.jmlr.org/papers/v26/24-1014.html }
}