Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed the version information in the respective Loc-I datasets so users can easily find out which version it is and where to get more info #16

Open
jyucsiro opened this issue Sep 9, 2019 · 4 comments
Assignees
Labels
help wanted Extra attention is needed Priority: Medium

Comments

@jyucsiro
Copy link
Contributor

jyucsiro commented Sep 9, 2019

How do we describe version information in each of the Loc-I datasets (e.g. ASGS, Geofabric, GNAF)?

  • each Loc-I dataset should be described with version info consistently in the metadata

2nd part - implement for each Loc-I enabled dataset.

Add details to this issue ticket. Will need to document this somewhere for consistent communication to users

@jyucsiro jyucsiro changed the title Embed the version information in the Loc-I dataset so users can easily find out which version it is and where to get more info Embed the version information in the respective Loc-I datasets so users can easily find out which version it is and where to get more info Sep 16, 2019
@dr-shorthair
Copy link

I assume this will be an aspect of the dataset metadata - see CSIRO-enviro-informatics/asgs-dataset#8 CSIRO-enviro-informatics/geofabric-dataset#14 CSIRO-enviro-informatics/gnaf-dataset#2

In that context, there are a few ways to indicate version information:

  1. explicitly through a comprehensive provenance statement, with a date-time stamp - prov:wasGeneratedBy/prov:endedAtTime
  2. date-time stamp - dct:modified - time-stamp will be needed if there is more than one update per day
  3. version number - pav:version

where pav: is http://purl.org/pav/

@dr-shorthair
Copy link

My general assumption would be that

  • a date-time stamp will be very specific, can be easily generated, and captured in the dct:modified element. This should be automatically updated when the ETL process is run
  • alongside this, the link to the source dataset, the details of the ETL process with any run-specific parameters, and the time the process completed must all be recorded in the provenance information (prov:wasGeneratedBy and prov:wasDerivedFrom)

The link to the source data should be to a specific version.

@dr-shorthair
Copy link

dr-shorthair commented Sep 30, 2019

@shaneseaton @ashleysommer @benjaminleighton Could I see an example of what the run-time parameters are, so that I can suggest how these could be recorded in a provenance record?

@dr-shorthair dr-shorthair added the help wanted Extra attention is needed label Sep 30, 2019
@dr-shorthair
Copy link

@benjaminleighton wrote on Slack:

On minimalist provenance for #16 I think getting this completely right first time is going to be tricky. Would sticking a pav:version in that we manually increment be sufficient for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed Priority: Medium
Projects
None yet
Development

No branches or pull requests

4 participants