-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create data package for spatialEpisim #3
Comments
Per the topic, sharing the pre-aggregated data in a package rather than as bundled data makes sense if the user is installing/running our Shiny app with This isn't necessary, however. |
Preferred, given I've read more about it: https://dirk.eddelbuettel.com/code/drat.html |
Backup of data in our private Google DriveDon't worry, it's restricted. Only people who've been granted access to that Google Drive folder can utilize the link; for others it leads nowhere. |
@ashokkrish, please see the above comment for a link to a backup of ye olde data/ folder. Part of creating the data package will be to remove these files from the repository (in the There is an effect of having lots of large data files in Git repositories; because they are binary files they have a different impact on Git's performance than textual files like source code, or small binary files. Large binary files cause Git to slow down when it pushes and pulls, or when someone updates their repository from a very old copy to a new copy. It's not a big deal, but when I remove the data from the repository it will also be removed from the history of the repository, so that would effectively break archived versions of the software from long ago because they won't have the data they're expecting to be available. What do you think? #35 is related, because if we're not given permission to redistribute the data we'll need to remove it from our history as well, otherwise we're just redistributing it in an older version of our software repository, but it would still be redistribution, nonetheless. Did the Copyright office at MRU have anything to say about that issue? |
@ashokkrish, I believe we should publish the data that _we are allowed to redistribute_e under a compatible license, and publish this on Zenodo, which is supported by CERN and is hosted in their data centre (so it's highly available and reliable). In my work with Prof. Jon Mee, I published the data that I created as a research output on Zenodo and Dryad. The only original data product I've produced in this research is the preaggregated WorldPop data. There's probably already an existing data product of this somewhere else; we could search for that and find if its in a scientific data repository with high availability, or we can just go ahead with publishing that alone, and publish other original outputs separately. For re-hosting and re-distributing data we don't have rights to, we need to be careful, like the GADM and WorldPop data which I haven't manipulated (preaggregated). We might need to prune those data from this repository. |
@ashokkrish, you could also share the data that users will need through Google Drive with a public link, if you're confident in your rights to do so (your rights to share the data). |
I don't know if LFS will help us more than not. |
I have created a data package spatialEpisim. It's R package sources are here: https://github.com/KrishnamurthyLab/spatialEpisim.data It ONLY contains the seed data and the preaggregated GeoTIF data from WorldPop, because that's the data we have rights to redistribute |
A package is a good way to distribute the data, because the script which generated the data can be distributed alongside it. Essentially, "modularize" the data so that spatialEpisim itself is not just some heap of data, scripts, and then a Shiny app consuming that.
Remember, "code as data" also means "data as code" which implies using
data
:The text was updated successfully, but these errors were encountered: