ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation

التفاصيل البيبلوغرافية
العنوان: ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation
المؤلفون: Yu, Sungduk, Hu, Zeyuan, Subramaniam, Akshay, Hannah, Walter, Peng, Liran, Lin, Jerry, Bhouri, Mohamed Aziz, Gupta, Ritwik, Lütjens, Björn, Will, Justus C., Behrens, Gunnar, Busecke, Julius J. M., Loose, Nora, Stern, Charles I., Beucler, Tom, Harrop, Bryce, Heuer, Helge, Hillman, Benjamin R., Jenney, Andrea, Liu, Nana, White, Alistair, Zheng, Tian, Kuang, Zhiming, Ahmed, Fiaz, Barnes, Elizabeth, Brenowitz, Noah D., Bretherton, Christopher, Eyring, Veronika, Ferretti, Savannah, Lutsko, Nicholas, Gentine, Pierre, Mandt, Stephan, Neelin, J. David, Yu, Rose, Zanna, Laure, Urban, Nathan, Yuval, Janni, Abernathey, Ryan, Baldi, Pierre, Chuang, Wayne, Huang, Yu, Iglesias-Suarez, Fernando, Jantre, Sanket, Ma, Po-Lun, Shamekh, Sara, Zhang, Guang, Pritchard, Michael
سنة النشر: 2023
المجموعة: Computer Science
Physics (Other)
مصطلحات موضوعية: Computer Science - Machine Learning, Physics - Atmospheric and Oceanic Physics
الوصف: Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts. As an extension of the ClimSim dataset (Yu et al., 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.
Comment: This manuscript is an expanded version of our paper that received the Outstanding Paper Award at the NeurIPS 2023 conference
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2306.08754
رقم الأكسشن: edsarx.2306.08754
قاعدة البيانات: arXiv