Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TravHarv subjects must be made when task is started , not when config builder is called #48

Open
cedricdcc opened this issue Apr 16, 2024 · 1 comment
Milestone

Comments

@cedricdcc
Copy link
Member

With the following config:

snooze-till-graph-age-minutes: 0
prefix:
  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
  dcat: <http://www.w3.org/ns/dcat#>
  schema: <https://schema.org/>
  org: <http://www.w3.org/ns/org#>
  dct: <http://purl.org/dc/terms/>
  mi: <http://www.marineinfo.org/ns/ontology#>
assert:
  - subjects:
      literal:
        - http://dev.marineinfo.org/id/collection/947 # WoRMS ackn - direct
    paths:
      - "<http://www.w3.org/ns/dcat#resource> "
  - subjects:
      SPARQL: >
        SELECT DISTINCT ?s
        WHERE {
              [] <http://www.w3.org/ns/dcat#resource> ?s .
              }
    paths:
      - "<https://schema.org/author>"
  - subjects:
      SPARQL: >
        PREFIX schema: <https://schema.org/>
        SELECT DISTINCT ?s
        WHERE {
          ?ok <https://schema.org/author> ?authorid .
          ?authorid <https://schema.org/identifier> ?s .
        }
    paths:
      - "<https://schema.org/affiliation>"
      - "<https://schema.org/givenName>"
      - "<https://schema.org/familyName>"
  - subjects:
      SPARQL: >
        SELECT DISTINCT ?affid
        WHERE {
            ?s <https://schema.org/affiliation> ?affid .
        }
    paths:
      - "<https://schema.org/name>"

Travharv does not dereference the publications from a given dataset.
However on the next run it does.

The same issue has been detected for the LWUA where @laurianvm had to rerun the sembench container for the publications to be dereferenced.

@cedricdcc cedricdcc added this to the 0.0.3 milestone Apr 16, 2024
@cedricdcc cedricdcc changed the title TravHarv does not dereference on first run TravHarv subjects must be made when task is started , not when config builder is called Apr 16, 2024
@cedricdcc
Copy link
Member Author

When testing in kgap , it was found that all subjects for all tasks are made when config_builder is called and not when tasks are started. This causes many tasks not to have any subjects to dereference.

cedricdcc added a commit that referenced this issue May 7, 2024
- Added helper.py containing functions that allow for prefix support in sparql queries and traversal harvesting paths.
- deleted call functions and refactored code in config_builder and all subsequent files that used this __call__ method.
- replaced and refactored all files that worked with the GraphNameMapper, now the maper of py-rdf-store is being used.
- refactored the config builder propery subjects so that when they are called they will get the subjects from the graph if
that is required (when SPARQL query is given instead of list of subjects).
- edited the .yml files that are used as configs to now not contain the <> anymore in the prefixes since these will now cause issues for the helper functions resolve_uri()

Issues that were affecting by the changes in this commit are:
- #35
- #43
- #48
- #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant