This repository contains resources for the Open Information Extraction benchmark WiRe57.
It acts as a companion to the article
William Léchelle, Fabrizio Gotti, Philippe Langlais (2019), WiRe57 : A Fine-Grained Benchmark for Open Information Extraction, LAW XIII 2019 : The 13th Linguistic Annotation Workshop
Open Information Extraction (OIE) systems, starting with TextRunner, seek to extract all relational tuples expressed in text, without being bound to an anticipated list of predicates. Such systems have been used recently for relation extraction, question-answering, and for building domain-targeted knowledge bases, among others.
Subsequent extractors (ReVerb, Ollie, ClausIE, Stanford Open IE, OpenIE4, MinIE) have sought to improve yield and precision. Despite this, the task definition is underspecified, and, to the best of our knowledge, there is no gold standard.
In order to mitigate this problem, we built a reference for the task of Open Information Extraction, on five documents. We tentatively resolve a number of issues that arise, including inference and granularity. We seek to better pinpoint the requirements for the task. We produce our annotation guidelines specifying what is correct to extract and what is not. In turn, we use this reference to score existing Open IE systems. We address the non-trivial problem of evaluating the extractions produced by systems against the reference tuples, and share our evaluation script. Among seven compared extractors, we find the MinIE system to perform best.
The annotation guidelines define how to annotate English text with triples.
The data
directory contains the following files:
README.md
: Specifies the source of the annotated documents.WiRe57_343-manual-oie.json
: The WiRe57 manual referenceWiRe57_extractions_by_ollie_clausie_openie_stanford_minie_reverb_props-export.json
: Extractions by state-of-the-art OIE systems on WiRe documents.
The code
directory contains the evaluation script used in the paper.