Create data package for spatialEpisim #3

bryce-carson · 2024-06-10T20:10:18Z

A package is a good way to distribute the data, because the script which generated the data can be distributed alongside it. Essentially, "modularize" the data so that spatialEpisim itself is not just some heap of data, scripts, and then a Shiny app consuming that.

Remember, "code as data" also means "data as code" which implies using data:

data(spatRastersAggregated, package = spatialEpisim)

The text was updated successfully, but these errors were encountered:

bryce-carson · 2024-07-16T00:49:25Z

Per the topic, sharing the pre-aggregated data in a package rather than as bundled data makes sense if the user is installing/running our Shiny app with shiny::runGitHub("ahsokkrish/spatialEpisim"). This will, presumably, install all packages required and then download the application and run it. Shipping the data alongside the application is a good idea, but the data might be useful to someone else separately. It also simplifies the application marginally if we don't bother to read the data ourselves and only need to run data(spatRastersAggregated, package = spatialEpisim).

This isn't necessary, however.

bryce-carson · 2024-07-25T05:18:58Z

https://ropensci.org/blog/2021/06/22/setup-runiverse/

bryce-carson · 2024-07-26T17:19:09Z

Preferred, given I've read more about it: https://dirk.eddelbuettel.com/code/drat.html

bryce-carson · 2024-08-13T18:20:40Z

Backup of data in our private Google Drive

Don't worry, it's restricted. Only people who've been granted access to that Google Drive folder can utilize the link; for others it leads nowhere.

bryce-carson · 2024-08-13T18:25:44Z

@ashokkrish, please see the above comment for a link to a backup of ye olde data/ folder. Part of creating the data package will be to remove these files from the repository (in the leftover-spaghetti-casserole branch, at first).

There is an effect of having lots of large data files in Git repositories; because they are binary files they have a different impact on Git's performance than textual files like source code, or small binary files. Large binary files cause Git to slow down when it pushes and pulls, or when someone updates their repository from a very old copy to a new copy.

It's not a big deal, but when I remove the data from the repository it will also be removed from the history of the repository, so that would effectively break archived versions of the software from long ago because they won't have the data they're expecting to be available.

What do you think?

#35 is related, because if we're not given permission to redistribute the data we'll need to remove it from our history as well, otherwise we're just redistributing it in an older version of our software repository, but it would still be redistribution, nonetheless. Did the Copyright office at MRU have anything to say about that issue?

bryce-carson · 2024-08-14T02:54:52Z

@ashokkrish, I believe we should publish the data that _we are allowed to redistribute_e under a compatible license, and publish this on Zenodo, which is supported by CERN and is hosted in their data centre (so it's highly available and reliable).

In my work with Prof. Jon Mee, I published the data that I created as a research output on Zenodo and Dryad.

The only original data product I've produced in this research is the preaggregated WorldPop data. There's probably already an existing data product of this somewhere else; we could search for that and find if its in a scientific data repository with high availability, or we can just go ahead with publishing that alone, and publish other original outputs separately.

For re-hosting and re-distributing data we don't have rights to, we need to be careful, like the GADM and WorldPop data which I haven't manipulated (preaggregated). We might need to prune those data from this repository.

bryce-carson · 2024-08-14T03:01:44Z

@ashokkrish, you could also share the data that users will need through Google Drive with a public link, if you're confident in your rights to do so (your rights to share the data).

bryce-carson · 2024-08-14T04:15:36Z

https://docs.github.com/en/repositories/working-with-files/managing-large-files/collaboration-with-git-large-file-storage

I don't know if LFS will help us more than not.

bryce-carson · 2024-08-14T07:04:10Z

I have created a data package spatialEpisim.

It's R package sources are here: https://github.com/KrishnamurthyLab/spatialEpisim.data
It's R package website is here: https://krishnamurthylab.github.io/spatialEpisim.data/

It ONLY contains the seed data and the preaggregated GeoTIF data from WorldPop, because that's the data we have rights to redistribute

bryce-carson mentioned this issue Jun 11, 2024

Concatenate CSV data into a spreadsheet #12

Closed

2 tasks

bryce-carson added the enhancement New feature or request label Jun 12, 2024

bryce-carson self-assigned this Jul 18, 2024

bryce-carson added the UNSTABLE The work is to be completed in the unstable branch label Jul 18, 2024

bryce-carson closed this as completed Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create data package for spatialEpisim #3

Create data package for spatialEpisim #3

bryce-carson commented Jun 10, 2024

bryce-carson commented Jul 16, 2024

bryce-carson commented Jul 25, 2024

bryce-carson commented Jul 26, 2024

bryce-carson commented Aug 13, 2024

bryce-carson commented Aug 13, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024

Create data package for spatialEpisim #3

Create data package for spatialEpisim #3

Comments

bryce-carson commented Jun 10, 2024

bryce-carson commented Jul 16, 2024

bryce-carson commented Jul 25, 2024

bryce-carson commented Jul 26, 2024

bryce-carson commented Aug 13, 2024

Backup of data in our private Google Drive

bryce-carson commented Aug 13, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024

bryce-carson commented Aug 14, 2024