This repository contains the official implementation of the paper "Towards a Unified Copernicus Foundation Model for Earth Vision".
- 🌍 Copernicus-Pretrain: A massive-scale pretraining dataset with 18.7M aligned images from all major Copernicus Sentinel missions, spanning from the Earth's surface to its atmosphere.
- 🤖 Copernicus-FM: A unified foundation model capable of processing any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding.
- 📊 Copernicus-Bench: A systematic evaluation benchmark with 15 hierarchical downstream tasks ranging from preprocessing to specialized applications for each Sentinel mission.
Copernicus-Pretrain is an extension of the SSL4EO-S12 dataset to all major Sentinel missions (S1-S5P). The images are organized into ~310K regional grids (0.25°x0.25°, consistent with ERA5), densely covering the whole land surface and near-land ocean with time series from eight distinct Sentinel modalities.
🔽 Dataset access:
- Raw format (GeoTiff): To be released soon.
- Streaming format (WebDataset): This version is available on HuggingFace.
📂 Further details: Copernicus-Pretrain/
Copernicus-FM is an extension of the DOFA foundation model that can process any spectral or non-spectral sensor modality using extended dynamic hypernetworks and flexible metadata encoding. The model is pretrained on the Copernicus-Pretrain dataset with masked image modeling and continual distillation.
🔽 Weights access: The model weights are available on HuggingFace.
📂 Further details: Copernicus-FM/
Copernicus-Bench is a systematic evaluation benchmark with 15 hierarchical downstream datasets spread into three level of applications covering all major Sentinel missions (S1,2,3,5P). Among them, 9 are derived from existing datasets, and 6 are newly curated.
Level | Name | Modality | Task | Source |
---|---|---|---|---|
L1 | Cloud-S2 | S2 TOA | segmentation (cloud) | CloudSEN12 |
L1 | Cloud-S3 | S3 OLCI | segmentation (cloud) | new |
L2 | EuroSAT-S1 | S1 GRD | classification (LULC) | EuroSAT-SAR |
L2 | EuroSAT-S2 | S2 TOA | classification (LULC) | EuroSAT |
L2 | BigEarthNet-S1 | S1 GRD | classification (LULC) | BigEarthNet v2.0 |
L2 | BigEarthNet-S2 | S2 SR | classification (LULC) | BigEarthNet v2.0 |
L2 | LC100Cls-S3 | S3 OLCI | classification (LULC) | new |
L2 | DFC2020-S1 | S1 GRD | segmentation (LULC) | DFC2020 |
L2 | DFC2020-S2 | S2 TOA | segmentation (LULC) | DFC2020 |
L2 | LC100Seg-S3 | S3 OLCI | segmentation (LULC) | new |
L3 | Flood-S1 | S1 GRD | change detection (flood) | Kuro Siwo |
L3 | LCZ-S2 | S2 TOA | classification (local climate zone) | So2Sat LCZ42 |
L3 | Biomass-S3 | S3 OLCI | regression (biomass) | new |
L3 | AQ-NO2-S5P | S5P NO2 | regression (air quality) | new |
L3 | AQ-O3-S5P | S5P O3 | regression (air quality) | new |
🔽 Dataset access: The benchmark datasets are available on HuggingFace.
📂 Further details: Copernicus-Bench/
This repo is licensed under the Apache License 2.0. The Copernicus-Pretrain dataset, the newly-curated datasets in Copernicus-Bench, and the pretrained weights of Copernicus-FM are licensed under the CC-BY-4.0 license.
@misc{wang2025unifiedcopernicusfoundationmodel,
title={Towards a Unified Copernicus Foundation Model for Earth Vision},
author={Yi Wang and Zhitong Xiong and Chenying Liu and Adam J. Stewart and Thomas Dujardin and Nikolaos Ioannis Bountos and Angelos Zavras and Franziska Gerken and Ioannis Papoutsis and Laura Leal-Taixé and Xiao Xiang Zhu},
year={2025},
eprint={2503.11849},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.11849},
}