
Datasets
This page lists the open source datasets created during the course of the Embed2Scale project:
SSL4EO-S12-downstream
SSL4EO-S12-downstream is a Earth Observation (EO) dataset of downstream tasks. It is released as a standalone dataset together with the NeuCo-Bench neural compression benchmarking framework.
SSL4EO-S12 v1.1
SSL4EO-S12 v1.1 is an updated version of the pre-training dataset SSL4EO-S12, which is a large-scale multimodal multitemporal dataset for unsupervised/self-supervised pre-training in Earth observation.
ClimateBenchPress
ClimateBenchPress is an open-source benchmark for lossy compression of climate. The benchmark unifies data from heterogeneous sources, including both satellite and climate model data, under a common interface.
SSL4Eco
SSL4Eco is a Sentinel-2 dataset for pretraining geospatial foundation models. More specifically, this project proposes a recipe for building pretraining sets that capture the geographical and phenological diversity of ecosystems across the globe. We observe that this simple spatiotemporal sampling yields significant improvements on various downstream macroecological tasks.
