Skip to content

Empty URLs have been incorrectly resolved in the downloaded turtle files #1

@isotes

Description

@isotes

It seems that empty URLs are in some cases resolved to the encompassing rdf file leading to incorrect (and at least for Münster) malformed entries in the turtle files.

Example from Münster_(Westfalen).rdf:

<https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2>
        a                    cat:Distribution ;
        dcterms:description  "" ;
        dcterms:format       "wms" ;
        dcterms:issued       "2019-07-01T17:33:24+02:00"^^xsd:date ;
        dcterms:modified     "2019-07-18T11:21:22+02:00"^^xsd:date ;
        dcterms:title        "Sporthallen und Sportstätten - Standorte - WMS-GetMap" ;
        cat:accessURL        <https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2> ;
        cat:byteSize         "" ;
        cat:downloadURL      <file:///home/lisa/repos/crawling/target/Münster_(Westfalen).rdf> ;
        cat:mediaType        "" ;
        foaf:page            "https://opendata.stadt-muenster.de/dataset/sporthallen-und-sportst%C3%A4tten-standorte/resource/96e271af-7e05-4c2e-9406-17a3535e88a2" .

The cat:downloadURL <file:///home/lisa/repos/crawling/target/Münster_(Westfalen).rdf> ; is incorrect and malformed ('ü').

Looking at the catalog entry on the website it should be empty: <dcat:downloadURL rdf:resource=""/>

Grepping for 'home/lisa' in catalog/toLoad leads to 2948 results for various fields (at least dcat:accessURL, dcat:downloadURL, and vcard:hasURL). I did not check if the reason is always an empty URL in the original data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions