Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to constrain iderdown queries to appropriate dataset #21

Open
shaneseaton opened this issue Aug 18, 2019 · 11 comments
Open

Need to constrain iderdown queries to appropriate dataset #21

shaneseaton opened this issue Aug 18, 2019 · 11 comments

Comments

@shaneseaton
Copy link
Contributor

Currently the queries are not constrained to the specified dataset. This can result in cross dataset issues, for example GNAF and GNAF16 use the same uris for base types, and thus if both datasets are in the cache, things could get confused.

@dr-shorthair
Copy link

dr-shorthair commented Aug 19, 2019

Each "dataset" is considered to be a reg:Register, so that each "entity" (address, SLA, catchment etc) is associated with a dataset using the reg:register predicate. e.g.

<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/12105364> rdf:type geofabric:ContractedCatchment ;
    reg:register <http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> .

So each entity is a member

  • of a class such as <http://linked.data.gov.au/def/geofabric#ContractedCatchment> and
  • of a dataset such as http://linked.data.gov.au/dataset/geofabric/contractedcatchment/ .

These are different things.

Furthermore, lower level datasets are also a member of a higher level register, e.g.

<http://linked.data.gov.au/dataset/geofabric/contractedcatchment/> rdf:type reg:Register ;
    reg:register <http://linked.data.gov.au/dataset/geofabric> .

so if you want to constrain the query to the higher level datasets, then the query will have to include a property-path reg:register+

@dr-shorthair
Copy link

dr-shorthair commented Aug 19, 2019

Looking at what datasets are present:

PREFIX reg: <http://purl.org/linked-data/registry#>
select * where { 
   ?d a reg:Register .
}

which results in

1 | http://linked.data.gov.au/dataset/geofabric/contractedcatchment/
2 | http://linked.data.gov.au/dataset/geofabric
3 | http://linked.data.gov.au/dataset/geofabric/drainagedivision/
4 | http://linked.data.gov.au/dataset/geofabric/riverregion/
5 | http://linked.data.gov.au/dataset/asgs2016/meshblock/
6 | http://linked.data.gov.au/dataset/asgs2016/stateorterritory/
7 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel1/
8 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel2/
9 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel3/
10 | http://linked.data.gov.au/dataset/asgs2016/statisticalarealevel4/
11 | http://linked.data.gov.au/dataset/gnaf/address/
12 | http://linked.data.gov.au/dataset/gnaf/reg/
13 | http://linked.data.gov.au/dataset/gnaf/addressSite/
14 | http://linked.data.gov.au/dataset/gnaf/locality/
15 | http://linked.data.gov.au/dataset/gnaf/streetLocality/
16 | http://linked.data.gov.au/dataset/asgs2016/australia/
17 | http://linked.data.gov.au/dataset/asgs2016/reg/

i.e. the GNAF datasets are not dated.

@dr-shorthair
Copy link

Other examples:

  1. find entities that are members of datasets that are subsets of a higher-level dataset:
PREFIX reg: <http://purl.org/linked-data/registry#>
select * where { 
  ?s reg:register+ <http://linked.data.gov.au/dataset/geofabric> .
} limit 100 
  1. find entities that are members of classes that are sub-classes of a more general class:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * where { 
    ?d a ?c .
    ?c rdfs:subClassOf <http://linked.data.gov.au/def/geofabric#ReportingRegion> . 
} limit 100 

@shaneseaton
Copy link
Contributor Author

OK. Totally agree this is the way to do, thanks for confirming it @dr-shorthair. The big task before we can do this of course, is getting all the datasets to conform to this approach... I will get into it.

@shaneseaton
Copy link
Contributor Author

I though I would address this when looking into a refactor of the LDAPI's. Unfortunately the refactor didn't get far enough to replace all the LDAPI's so it isn't the solution I was hoping for. Just flagging this stuff is still an issue.

Mainly, it's an issue because of the inconsistent was registers are registered within each other. Each dataset has a different way of doing it, we need to make it consistent for a tool to be able to navigate it sensibly.

@dr-shorthair
Copy link

dr-shorthair commented Oct 28, 2019

Partly related: The problem Nick was trying to resolve was the absence of a standard property that is the inverse of rdfs:member. However, use of the predicate reg:register entails that the subject is a reg:RegisterItem and the object is a reg:Register which is a little weird. 'Register' comes from the notion of 'registration' - i.e. submitting an item to be added a list, and if it meets the acceptance criteria getting issued a register-ID for it as evidence of having met the criteria. The definition of RegisterItem is "A metadata record for an entry in a register. " which is not what we are managing here (see http://purl.org/linked-data/registry#). I'm all for using an existing class/predicate in preference to just making up a new one, but it looks like there is an overshoot here.

At the end of the day what I think we need is
(i) a class for datasets - this is loci:Dataset - I don't think we need reg:Register anywhere
(ii) a membership predicate - this could be rdfs:member - reg:register is just wrong (iii) rules or constraints to say that * a Dataset can have either another Dataset or a Feature as members * Datasets and Features can be members of more than one Dataset we don't need ereg:superregister` etc. I think we can do all the knitting we need with simple SPARQL.

Just discussed this f2f with Jonno, and will attempt to rationalize all this in https://github.com/CSIRO-enviro-informatics/loci.cat/wiki/Rules-for-Loc-I-datasets

Note:
Strictly class or set-membership in RDF is handled by rdf:type, but that would require that the containers be defined as classes, e.g.

ASGS-2016 rdfs:subClassOf loci:Dataset .

rather than as individuals, like

AGSG-2016 rdf:type loci:Dataset .

which is the way we have done it and is failry conventional. Meta-modelling often ends up with axle-wrapping ...

@jyucsiro
Copy link
Contributor

@dr-shorthair on "I don't think we need reg:Register anywhere"

The issue is that this is baked into the pyldapi library (see https://github.com/RDFLib/pyLDAPI/blob/master/pyldapi/register_renderer.py#L224)

For this to change, we'd need to change that bit of code which renders items as reg:Register.

@dr-shorthair
Copy link

Hmm. Well that indicates a modelling error in pyldapi IMHO. The Register ontology is clear - register items are metadata records, not data items.

@jyucsiro
Copy link
Contributor

Is there an alternative to the Register ontology that can be proposed? It would need to be generic (like not just Dataset items)

@dr-shorthair
Copy link

dr-shorthair commented Oct 31, 2019

Yeah - that is the issue.

As discussed the other day, I think the membership predicate is easy - rdfs:member - though it would require the query pattern to be reversed.

For the container I think the options are dcat:Dataset, void:Dataset, or loci:Dataset.

  • loci:Dataset is project specific
  • void:Dataset is strictly 'A set of RDF triples that are published, maintained or aggregated by a single provider', and 'triples' are not really the same as 'Features'
  • dcat:Dataset leans the other direction - it may be a collection of discrete items, but often is not

But I think I'd be inclined to go with dcat:Dataset and rdfs:member unless and until we come up with anything better.

@jyucsiro
Copy link
Contributor

For Loc-I, probably dcat:Dataset would be fine. However, the pyldapi library's scope is more general than that I believe. might be good to push some requirements to that library from loci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants