Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-Quads parsing error during normalization #10

Open
u-hubar opened this issue Aug 3, 2023 · 0 comments
Open

N-Quads parsing error during normalization #10

u-hubar opened this issue Aug 3, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@u-hubar
Copy link
Member

u-hubar commented Aug 3, 2023

Overview

I've ran into an issue trying to normalize N-Quads using URDNA2015 normalization from jsonld library.

pyld.jsonld.JsonLdError: ('Could not convert input to RDF dataset before normalization.',)
Type: jsonld.NormalizeError
Cause: ('Error while parsing N-Quads invalid quad.',)
Type: jsonld.ParseError

Details

In dkg.js we're normalizing N-Quads using the following function:

async toNQuads(content, inputFormat) {
    const options = {
        algorithm: 'URDNA2015',
        format: 'application/n-quads',
    };

    if (inputFormat) {
        options.inputFormat = inputFormat;
    }

    const canonized = await jsonld.canonize(content, options);

    return canonized.split('\n').filter((x) => x !== '');
}

I've tried to reproduce the same logic in dkg.py, but I've ran into issues trying to normalized N-Quads (JSON-LD works fine). It may be either wrong usage of the library from my side or bug in the jsonld as it seems it's not supported anymore.

Python normalization function:

def normalize_dataset(
    dataset: JSONLD | NQuads,
    input_format: Literal["JSON-LD", "N-Quads"] = "JSON-LD",
) -> NQuads:
    normalization_options = {
        "algorithm": "URDNA2015",
        "format": "application/n-quads",
    }

    match input_format.lower():
        case "json-ld" | "jsonld":
            pass
        case "n-quads" | "nquads":
            normalization_options["inputFormat"] = "application/n-quads"
        case _:
            raise DatasetInputFormatNotSupported(
                f"Dataset input format isn't supported: {input_format}. "
                "Supported formats: JSON-LD / N-Quads."
            )

    n_quads = jsonld.normalize(dataset, normalization_options)
    assertion = [quad for quad in n_quads.split("\n") if quad]

    if not assertion:
        raise InvalidDataset("Invalid dataset, no quads were extracted.")

    return assertion
@u-hubar u-hubar added the bug Something isn't working label Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant