Skip to content

Evaluation

André Pires edited this page Apr 21, 2017 · 5 revisions

In order to have a better, stable, comparable estimation of the output of the NER from all the tools, all output was converted to the conll format and evaluated using conlleval script. Also, repeated 10-fold cross validation was performed and then an average of the results was obtained.

All results can be accessed here.

Steps

  1. Get output from tools
  2. Get golden data for each fold
  3. Join both (script)
  4. Evaluate each fold (scripts)
  5. Compute average for each repeat (script)
  6. Compute global average (script)

Computing average algorithm

  • Get result files for each fold
  • Save each result into a list with the accuracy and each category
  • Create dictionary for each category
    • Using categories as keys, and a list with precision, recall and fb1
  • For each result
    • Save averages
    • Save measures for each category
  • Calculate average
  • Calculate macro-average for fb1
  • Print to file

Check script here.

Clone this wiki locally