Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore SHACL validation for checking ontologies #102

Open
amoeba opened this issue Sep 3, 2021 · 7 comments
Open

Explore SHACL validation for checking ontologies #102

amoeba opened this issue Sep 3, 2021 · 7 comments
Assignees

Comments

@amoeba
Copy link
Collaborator

amoeba commented Sep 3, 2021

We discovered on Slack today that some a property we expected (mosaic:hasBasis) wasn;t present on every mosaic:Campaign and it should have been. Manually checking the ontology after every change is time-consuming and error-prone. It'd be great to write a set of SHACL shape constraints that we could use to check for some of these things.

I'll build out a GHA that does some basic checking and then we could probably brainstorm a more complete set of checks and look at applying the process to the other ontologies.

@amoeba
Copy link
Collaborator Author

amoeba commented Sep 3, 2021

This looks pretty promising. With just a simple constraint:

mosaic:CampaignShape
    a sh:NodeShape ;
    sh:targetClass mosaic:00000001;
    # Every Campaign has at least one mosaic:hasBasis triple
    sh:property [
        sh:path mosaic:00000034 ;
        sh:minCount 1 ;
    ] .

PySHACL catches the exact problem we saw today:

; pyshacl -s shapes.shacl -df xml -sf turtle ../MOSAiC.owl
Validation Report
Conforms: False
Results (4):
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
	Focus Node: odo:MOSAIC_00000005
	Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
	Message: Less than 1 values on odo:MOSAIC_00000005-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
	Focus Node: odo:MOSAIC_00000008
	Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
	Message: Less than 1 values on odo:MOSAIC_00000008-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
	Focus Node: odo:MOSAIC_00000019
	Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
	Message: Less than 1 values on odo:MOSAIC_00000019-><https://purl.dataone.org/odo/MOSAIC_00000034>
Constraint Violation in MinCountConstraintComponent (http://www.w3.org/ns/shacl#MinCountConstraintComponent):
	Severity: sh:Violation
	Source Shape: [ sh:minCount Literal("1", datatype=xsd:integer) ; sh:path <https://purl.dataone.org/odo/MOSAIC_00000034> ]
	Focus Node: odo:MOSAIC_00000018
	Result Path: <https://purl.dataone.org/odo/MOSAIC_00000034>
	Message: Less than 1 values on odo:MOSAIC_00000018-><https://purl.dataone.org/odo/MOSAIC_00000034>

amoeba added a commit to amoeba/sem-prov-ontologies that referenced this issue Sep 3, 2021
@amoeba
Copy link
Collaborator Author

amoeba commented Sep 3, 2021

I just did a quick look-over to see what checks might make sense to implement as a first pass:

  • Every Campaign has a single Basis
  • Every Campaign has at least one Chief Scientist
  • Every Campaign hosts at least one Event (is this right?)
  • Every Event isHostedBy a single Campaign
  • Every Chief Scientist is a chief scientist of at least oneCampaign
  • Every Deployment has a single deployedSystem

Some of the other parts of the ontology are a bit confusing so I'll stop there and chat with @mpsaloha.

@mpsaloha
Copy link
Collaborator

mpsaloha commented Sep 3, 2021

@amoeba those look like good suggestions for constraints! Note that the MOSAIC Ontology is in OWL, and so is OWA. Thus, while all Campaigns do have a Basis, that doesn't mean that all Campaigns in our Ontology must have an associated Basis, unless we decide to "require it" (hence SHACL which is CWA). Thus, the lack of some Campaign having a Basis or having a Chief Scientist was not an"it should have been there" (as you phrased it in your first comment on this Issue), but rather "it might be useful if it were there". There are LOTS OF additional "It might be useful" predicates I could have filled out in the MOSAIC Ontology, but I didn't do these for lack of time, or suspicion they would not be leveraged in our Web UI. Happy to discuss this further if this doesn't make perfect sense.

@amoeba
Copy link
Collaborator Author

amoeba commented Sep 3, 2021

You might have to define OWA and CWA for me. Other than that, your comment makes sense.

What I want to do is help you and @laijasmine get the work you both need to do on MOSAiC done quickly and efficiently so if we can add SHACL validation rules to help catch things like the hasBasis thing then that'd make me happy.

Are any of the rules above ones you want?

@laijasmine
Copy link

all of the above rules look good to me except for the last one i'm not sure about and Mark will need to confim:
Every Deployment has a single deployedSystem

@amoeba
Copy link
Collaborator Author

amoeba commented Sep 4, 2021

Thanks @laijasmine. I'll touch base with @mpsaloha at some point here.

@amoeba
Copy link
Collaborator Author

amoeba commented Sep 11, 2021

We discussed part of this on our salmantics call this week and we talked about the point above: Should this ontology be comprehensive over all of the MOSAiC expedition or just what PANGAEA or we have? We decided that we should aim to be comprehensive. We're presenting an PDF soon and are hoping to have some conversations about the ontology and the project as a whole with relevant folks.

I've merged an initial skeleton for this kind of checking onto the develop branch but haven't added all of the constraints I listed above. I'm going to leave this issue open with the intent to revisit this at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants