Skip to content

Data preservation for future stability & reproducibility #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
evalieungh opened this issue Feb 26, 2025 · 1 comment
Open

Data preservation for future stability & reproducibility #41

evalieungh opened this issue Feb 26, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@evalieungh
Copy link
Collaborator

@MichalTorma @trossi
It seems we are all having issues with downloading data necessary to run ModGP. With the Capfitogen development, there will be even more data sources that introduce further vulnerability in case one of the data providers fail, change formats, or something else breaks.

How should we handle this? I dislike copying things that are better hosted elsewhere, but is there somewhere we could store a bundle of the global data and general downloads that will be common to several taxa? They are obviously too big for GitHub, and I'm not sure whether it's OK to upload other people's data on e.g. Zenodo. Are there other solutions that are openly available and stable enough that our prototype can live on for at least a couple of years?

@evalieungh evalieungh added the enhancement New feature or request label Feb 26, 2025
@MichalTorma
Copy link
Collaborator

I feel your pain, and there is certainly a need that this kind of environmental data should be kept and even accumulated over time somewhere close to any future DT. I just don't see where that would be since it's quite a demand on resources when we are talking global scale here. I think the official plan is to have many of these sources available form some meteorological twin and that way we could eliminate lot of the redundancy here.
The idea behind BioDT as I understand it is to showcase examples use-cases and it will have to be some other project/collaborations that allow it to live further...
I think it is important to find some long term sustainable solution here, but I don't really have any... Zenodo might be feasable but as you say, it's questionable practice to upload someone else's data.
One think that might help at least to identify the issues before the long run would be some sort of integration test harness that would run on regular basis on small scale (like mentioned here #32 ). I know it's not the best practice to ping servers for regular downloads that are never going to be used but since they recently changed the public API response type, I don't feel too obliged to follow the standard courtesies...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants