Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(RDF): Email/UUID should now be an IRI instead of String (+ code maintenance for easier implementation) #4323

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from

Conversation

svandenhoek
Copy link
Contributor

@svandenhoek svandenhoek commented Oct 8, 2024

What are the main changes you did:

  • Improved code (readability):
    • Default namespaces moved to their own Enum
    • CoreDatatype retrieval & formatValue(row, column) moved to a class dedicated for ColumnType-specific behaviour.
      • Reduced duplicate code while streaming the cell data during conversion to Value.
      • This approach should allow for same CoreDataType with different cell data retrieval without needing the 2-layer approach currently implemented through formatValue() (first layer differentiates references/FILE/others & second layer for others defines 1 basic approach per CoreDataType)
      • Includes a SKIP internal type to allow for specific ColumnType values to always return an emtpy Set
    • rowsToRdf() split into separate functions for ontologies/data TableType.
      • Adjusted logics where a TableType check was done for each Row instead of for each Table.
  • Added a test that validates there is RDF conversion logics available for each ColumnType
    • Code should fail to build if existing ColumnType is removed
    • Test should fail if new ColumnType is added without updating the RDF API for it.
  • Fixes fix(rdf): Email addresses should be IRIs, not literals #4226
  • UUIDs should now be urn:uuid: IRIs instead of literals
  • Fixes ColumnType.FILE not being present in RDF output (+ updated to show correct path)
  • In theory should make future implementations for the following easier:

how to test:

  1. Checkout master
  2. ./gradlew run
  3. Create DCAT DB with demo data
  4. curl http://localhost:8080/dcat/api/rdf > ~/Desktop/output_old.ttl
  5. Checkout PR
  6. ./gradlew run
  7. curl http://localhost:8080/dcat/api/rdf > ~/Desktop/output_new.ttl
  8. (Apache Jena CMD tools) rdfdiff output_new.ttl output_old.ttl TTL TTL
    Output should look like this (only unequal in regards to changes made to email):
$ rdfdiff output_new.ttl output_old.ttl TTL TTL
models are unequal

< [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://semanticscience.org/resource/SIO_001323, mailto:[email protected]]
< [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://localhost:8080/dcat/api/rdf/ContactPersons/column/email, mailto:[email protected]]
< [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://purl.obolibrary.org/obo/NCIT_C42775, mailto:[email protected]]
< [http://localhost:8080/dcat/api/rdf/ContactPersons/column/email, http://www.w3.org/2000/01/rdf-schema#range, http://www.w3.org/2001/XMLSchema#anyURI]
> [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://localhost:8080/dcat/api/rdf/ContactPersons/column/email, [email protected]]
> [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://semanticscience.org/resource/SIO_001323, [email protected]]
> [http://localhost:8080/dcat/api/rdf/ContactPersons?identifier=Person01, http://purl.obolibrary.org/obo/NCIT_C42775, [email protected]]
> [http://localhost:8080/dcat/api/rdf/ContactPersons/column/email, http://www.w3.org/2000/01/rdf-schema#range, http://www.w3.org/2001/XMLSchema#string]

Turtle output should now look something like this:

[...]

sio:SIO_001323 <mailto:[email protected]>;
  <http://purl.obolibrary.org/obo/NCIT_C42775> <mailto:[email protected]>;
  <http://localhost:8080/dcat/api/rdf/ContactPersons/column/email> <mailto:[email protected]>;

[...]

<http://localhost:8080/dcat/api/rdf/ContactPersons/column/email> a owl:DatatypeProperty;
  rdfs:range "http://www.w3.org/2001/XMLSchema#anyURI";

[...]

todo:

  • updated docs in case of new feature
  • added/updated tests
  • added/updated testplan to include a test for this fix, including ref to bug using # notation

Copy link

sonarcloud bot commented Oct 8, 2024

@svandenhoek svandenhoek changed the title feat(RDF): Email should now be an IRI instead of String (+ code maintenance for easier implementation) feat(RDF): Email/UUID should now be an IRI instead of String (+ code maintenance for easier implementation) Oct 10, 2024
@svandenhoek
Copy link
Contributor Author

All tests should work once #4378 is merged and included in this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix(rdf): Email addresses should be IRIs, not literals
1 participant