Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

India IODC satellite data #223

Open
peterdudfield opened this issue Feb 5, 2024 · 6 comments
Open

India IODC satellite data #223

peterdudfield opened this issue Feb 5, 2024 · 6 comments
Assignees
Labels

Comments

@peterdudfield
Copy link
Collaborator

peterdudfield commented Feb 5, 2024

The long range forecast for RUVNL, as well as requiring extended ECMWF data (openclimatefix/nwp-consumer#138) and a new Meteomatics archive, necessitates the collection of Satellite data for India.

[Part of RUVNL]

@jacobbieker
Copy link
Member

Processed data is currently being stored here: https://huggingface.co/datasets/openclimatefix/eumetsat-iodc although it will be eventually collated and pushed to the GCP Public Dataset

@peterdudfield
Copy link
Collaborator Author

Do what years fo data have been collected?

@jacobbieker
Copy link
Member

No, its currently filling in random timesteps from the whole archive, 2017-now. Once dagster has its external assets tracker thing, it'll be able to know what's on HF and we can more systematically fill in the dataset. It has some data from all years though.

@devsjc
Copy link
Contributor

devsjc commented Feb 26, 2024

Currently at 16,000/~175,000.

@peterdudfield
Copy link
Collaborator Author

Could speed it up by removing the europe 15 minutes.
2017 to 2024. Shrink this down to doing 2019 to 2024.

@devsjc devsjc changed the title Satellite India India IODC satellite data Apr 19, 2024
@devsjc
Copy link
Contributor

devsjc commented Apr 19, 2024

Seems the vast majority of this is now available on huggingface: https://huggingface.co/datasets/openclimatefix/eumetsat-iodc. This might be enough to mark the task as done @peterdudfield?

Also, for future reference as part of the handover from Jacob I've been instructed to now include this as part of the Google Public Datasets -hosted dataset, as opposed to updating the huggingface-hosted dataset. This is because GCP has faster reads for Zarr and can handle non-zipped yearly Zarr folders as opposed to zipped zarrs per timestep. See #223

@peterdudfield peterdudfield assigned devsjc and unassigned jacobbieker May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants