Skip to content

Main data repository for parameter estimates stored in the SCRC data registry

License

Notifications You must be signed in to change notification settings

ScottishCovidResponse/DataRepository

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataRepository

This is a git repository to store human readable data for the SCRC data pipeline. We anticipate that the Data Products stored here will be primarily toml files encoding epidemiological parameters.

Data Product naming

The expected format of the components of data product in the pipeline is described in this file on SCRC Teams.

Naming conventions

We suggest that namespaces for Data Products should only contain ASCII letters, ASCII digits, underscores, and dashes (A-Za-z0-9_-), names in the namespace can also include forward slashes (/) to denote structure. Component names in TOML should use the same characters allowed in namespaces (A-Za-z0-9_-) – these correspond to the characters allowed in TOML's bare keys. Component names in hdf5 files, like Data Product names, can also include forward slashes (/) to denote sub-components. At the moment none of these conventions are enforced, but we suggest that everyone maintains them until we find a reason to change them.

Location within repository and filenames

Data Products stored in this repository should be stored in folders according to their namespace, data product name and version number. So the human/infection/SARS-CoV-2/latent-period Data Product version v0.0.1 in the SCRC namespace should be found in SCRC/human/infection/SARS-CoV-2/latent-period and called v0.0.1.toml. Following this convention will make it easy to browse the repository.

TOML file format

Single-component TOML files

For TOML files, there are currently three types of information that can be stored in one:

  1. A simple point estimate of a parameter
[latent-period]
type = "point-estimate"
value = 123.12
  1. The distribution of a parameter
[latent-period]
type = "distribution" 
distribution = "gamma" 
shape = 1
scale = 2 
  1. Empirical samples drawn from the distribution of a parameter
[latent-period] 
type = "samples" 
samples = [1.0, 2.0, 3.0, 4.0, 5.0]

In the examples above, each file had a single component called latent-period in the data product. If there's only one component in a data product, then we suggest giving it the same name as the last part of the data product's name in the namespace, so for human/infection/SARS-CoV-2/latent-period, this would be latent-period. This will be the default if no component name is given in a funtion call.

Multiple-component TOML files

You can have multiple components of any kind in a single data product. For example:

[latent-period]
type = "point-estimate"
value = 123.12

[asymptomatic-period] 
type = "point-estimate" 
value = 200.1

The only further constraint is that all of the component names (here latent-period and asymptomatic-period) are different.

About

Main data repository for parameter estimates stored in the SCRC data registry

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published