Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide proper prefix support in the travharv config #35

Open
3 tasks done
marc-portier opened this issue Apr 8, 2024 · 2 comments
Open
3 tasks done

provide proper prefix support in the travharv config #35

marc-portier opened this issue Apr 8, 2024 · 2 comments
Milestone

Comments

@marc-portier
Copy link
Member

marc-portier commented Apr 8, 2024

the prefix config in the yml should extend to

  • subject literals
  • subject-sparql
  • path assertions

an updated test-yml should show this is all actually working
(and if needed implementation fixes should make it work)

@marc-portier marc-portier modified the milestones: 0.0.2, 1.0.0 Apr 8, 2024
@marc-portier marc-portier changed the title show proper prefix support is working provide proper prefix support in the travharv config Apr 15, 2024
@marc-portier marc-portier modified the milestones: 1.0.0, 0.0.3 Apr 15, 2024
@marc-portier
Copy link
Member Author

quick separate exercise showing how the config entries in the yml can be normalised using the prefix declarations

from typing import Dict, List
import validators
import re
from re import Match
from rdflib import Namespace, Graph, URIRef
from rdflib.namespace import NamespaceManager
# see https://rdflib.readthedocs.io/en/stable/namespaces_and_bindings.html


def makeNSM(pfx_declarations: Dict[str, str]) -> Dict[str, Namespace]:
    pfxs = {k: Namespace(v) for k, v in pfx_declarations.items()}
    print(f"{pfxs=}")

    nsm = NamespaceManager(Graph(), bind_namespaces="none")
    for pf, ns in pfxs.items():
        nsm.bind(pf, ns, override=True)
    print(f"{list(nsm.namespaces())=}")
    return nsm


def resolve_uri(uri: str, nsm: NamespaceManager) -> URIRef:
    # TODO reconsider the validators trick -- we might want to explicitely demand <> surrounding the <uri>
    return URIRef(uri) if validators.url(uri) else nsm.expand_curie(uri)


def resolve_literals(literal_uris: List[str], nsm: NamespaceManager) -> List[URIRef]:
    return [resolve_uri(u, nsm) for u in literal_uris]


def resolve_sparql(sparql, nsm):
    pfxlines: str = "\n".join((f"PREFIX {p}: {u.n3()}" for p,u in nsm.namespaces()))
    return f"{pfxlines}\n{sparql}"


PPATH_RE: str = r'(([^<>\/\s]+)|<([^>]+)>)\s*\/'  # how to match parts of property-paths


def ppath_split(ppath: str) -> List[str]:
    return (m.group(2) or m.group(3) for m in re.finditer(pattern=PPATH_RE, string=ppath + "/"))


def resolve_ppaths(ppaths: List[str], nsm: NamespaceManager):
    return [
        " / ".join(resolve_uri(part, nsm).n3() for part in ppath_split(ppath)) for ppath in ppaths
    ]


def do():
    yml_pfx_declarations = dict(
        schema="https://schema.org",
        ex="https://example.org/",
    )
    yml_literals = [
        "ex:test",
        "schema:DataSet",
        "https://demo.me/whatever",
    ]
    yml_sparql = """select * where ?s schema:name ?n ."""
    yml_ppaths = [
        "<https://demo.me/whatever> / ex:some",
        "ex:some",
        "<https://demo.me/whatever>",
        "schema:owner / schema:name",
    ]

    # make actual namespaces that can be used
    nsm: NamespaceManager = makeNSM(yml_pfx_declarations)

    literals = resolve_literals(yml_literals, nsm)
    print(f"{literals=}")
    sparql = resolve_sparql(yml_sparql, nsm)
    print(f"{sparql=}")
    ppaths = resolve_ppaths(yml_ppaths, nsm)
    print(f"{ppaths=}")

cedricdcc added a commit that referenced this issue May 7, 2024
- Added helper.py containing functions that allow for prefix support in sparql queries and traversal harvesting paths.
- deleted call functions and refactored code in config_builder and all subsequent files that used this __call__ method.
- replaced and refactored all files that worked with the GraphNameMapper, now the maper of py-rdf-store is being used.
- refactored the config builder propery subjects so that when they are called they will get the subjects from the graph if
that is required (when SPARQL query is given instead of list of subjects).
- edited the .yml files that are used as configs to now not contain the <> anymore in the prefixes since these will now cause issues for the helper functions resolve_uri()

Issues that were affecting by the changes in this commit are:
- #35
- #43
- #48
- #34
@marc-portier
Copy link
Member Author

waiting for PR #51 to get merged with main branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant