Skip to content

Accept JSON parsing errors in JSON-LD extractor #45

Open
@giordand

Description

@giordand

When the JsonLdExtractor tries to parse json ld in some web page raise ValueError; no json object could be decoded.
My solution was to catch the error in JsonLdExtractor._extract_items(self, node) (because maybe the extractor detected some microdata or rdfa in the webpage but the error only occurs with json-ld, and if we catch the error in extruct.extract we'll lose that data) and by default return an empty list:

def _extract_items(self, node):
        try:
            data = json.loads(node.xpath('string()'))
            if isinstance(data, list):
                return data
            elif isinstance(data, dict):
                return [data]
        except Exception as e:
            print e
        return []

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions