-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non standard Relation Extraction metric ? #5
Comments
Thank you for your comments. The (Bekoulis 2018) we have compared with is "Adversarial training for multi-context joint entity and relation extraction", not "Joint entity recognition and relation extraction as a multi-head selection problem". |
Thank you for your answer but I beg to differ. In "Adversarial training for multi-context joint entity and relation extraction", Bekoulis et al. also introduce the 3 evaluation settings ("Strict", "Boundaries" and "Relaxed") in section 4. And for example on ACE04 they report Strict evaluation to compare to (Miwa and Bansal 2016). (Miwa and Bansal 2016) do compare with (Li and Ji 2014) but this is a mistake in my opinion since they state that they consider the type of an entity in section 4.1. I am not familiar with the SciERC and WLPC literature but for ACE datasets I am confident that most of related works use the Strict evaluation setting. |
Both SciERC and WLPC uses span evaluation for relation. |
I am still not convinced that (Miwa and Bansal 2016) used the same setting as you and am trying to get first-hand information on that. They say for ACE05 : "We use the same data splits, preprocessing, and task settings as Li and Ji (2014) [...] We treat an entity as correct when its type and the region of its head are correct. We treat a relation as correct when its type and argument entities are correct". And for ACE04 : "We follow the cross-validation setting of Chan and Roth (2011) and Li and Ji (2014), and the preprocessing and evaluation metrics of ACE05." I am positive that (Bekoulis 2018) used the Strict setting for their results, as they state and as one can see in their code. And I am very confident that (Li and Ji 2014) is the only work using your setting on ACE datasets. I agree that all of this is very confusing and, if anything, I thank you for releasing your code. |
Since so many previous work on ACE are based on and compared with (Li and Ji 2014), I'm skeptical about the statement that "(Li and Ji 2014) is the only work using your setting on ACE datasets". |
I am trying to run your model but it seems that the "glove.840B.300d.txt.filtered" file is missing for datasets other than genia and wlp. Could you kindly provide it? Or are we supposed to compute it from glove.840B.300d.txt? |
Hello,
In your paper you only specify the criterion to consider an entity as correct, and not a relation.
As I understand by a quick look at your code in model1/relation_metrics.py you consider a relation as correct if the relation type is correct along with the spans of its two arguments.
That is without considering the predicted entity type of the arguments.
If so, you use what (Bekoulis 2018) refers to as the "Boundaries" evaluation setting.
You cannot directly compare to previous works that take into account the entity type in the "Strict" evaluation setting as defined by (Bekoulis 2018).
As far as I know, (Li and Ji 2014) is the only related work using this "Boundaries" evaluation setting.
FYI (Sanh 2019) also uses a different metric and its scores are already not comparable to previous work, as pointed out in this issue.
(Bekoulis 2018) = "Joint entity recognition and relation extraction as a multi-head selection problem"
Best regards,
The text was updated successfully, but these errors were encountered: