Skip to content

NLP project notes: 2019 09 30 to 2019 10 04

Tom Longley edited this page Oct 7, 2019 · 2 revisions

By: MS/IM

This week, we looked at how BART worked and tried out a few annotations. We went through the annotated files and the text files to see if the annotations captured what we were looking for. From what we understand, currently, the annotations focus on person, tank, title, organization, posting. Below are our questions/observations.

  1. Several people, ranks and titles have not been annotated. As an example, in the very first document under “annotated_sources”, below are the annotations captured.
T1	Person 729 743	Lamidi Adeosun
T2	Rank 721 728	General
T3	Organization 709 719	7 Division
T4	Title 675 707	General Officer Commanding (GOC)
R1	has_rank Arg1:T1 Arg2:T2	
R3	is_posted Arg1:T1 Arg2:T3	
T5	Person 8548 8558	Fatai Alli
T6	Rank 8534 8547	Major-General
T7	Organization 8522 8532	3 Division
T8	Role 8489 8521	General Officer Commanding (GOC)
R4	has_rank Arg1:T5 Arg2:T6	
R5	is_posted Arg1:T5 Arg2:T7	
R6	has_role Arg1:T5 Arg2:T8	
R2	has_title Arg1:T1 Arg2:T4

And below sentences were not annotated.

  • The terrorists were engaged by troops of 23 Brigade Nigerian Army in a gun battle that led to the killing of quite a number of them, destruction and recovery of vehicles, motorcycles, various calibre of weapons and ammunition

  • These included four Hilux vehicles, three anti-aircraft guns and one 0.50-inch Browning machine gun

  • A statement by the Nigerian Army spokesman, Col. Sani Kukasheka Usman, said: “Boko Haram terrorists fleeing from the onslaught of troops met their waterloo in their attempt to enter Gombi town, Gombi Local Government Area of Adamawa State yesterday (Monday) evening.

  • Also, Brigadier-General Victor Ezeugwu, the Commanding Officer, 28 Task Force Brigade, Hong, described the recapture of Mararraba Mubi as a “watershed” in the fight against Boko Haram in Adamawa

  • “Boko Haram has been substantially degraded, the flow of arms and ammunition to them has reduced drastically, their financial support has been blocked by financial measures adopted by the international community,” he said. Also, Brigadier-General Victor Ezeugwu, the Commanding Officer, 28 Task Force Brigade, Hong, described the recapture of Mararraba Mubi as a “watershed” in the fight against Boko Haram in Adamawa

  • The Chairman of the National Information Centre, Mike Omeri, was quoted by AFP as stating that the pledge by Boko Haram was “an act of desperation and comes at a time when Boko Haram is suffering heavy losses”. The Islamists’ leader Abubakar Shekau made the announcement in an audio message on Saturday night after months of indication that Boko Haram was seeking a formal tie-up. Troops from Nigeria, Cameroun, Chad and Niger have recorded a series of successes against the militants since last month, pushing them out of captured territory in North-east Nigeria

Are we missing something?

  1. Why are places(countries, town, local areas), weapons, dates, and groups not being annotated? Would it not be useful to capture this information from source documents?

For instance, when we look at https://github.com/security -force-monitor/fmtrpt_data, which has information of training activities for non-US security forces arranged and funded by the United States Department of State and Department of Defence, the data model has details of the student unit, training unit, start date, end date, program name, group, country, number of training and total cost.

Going by the above example, shouldn’t we do the same with news article txt files in nlp_starter_dataset/annotated_sources?

  1. Also, regarding places, would it not be useful to capture the hierarchies of local areas, towns, in countries? Especially since we are also capturing the geographical footprint of these forces?

  2. For named-entity recognition, what would be the expected accuracy?

  3. Should we focus only on the nlp_starter_dataset to work on a model? We have also gone through the repo on unclassified training activities and wanted to know whether they have been annotated too? As there seems to be a date model for that one.

Clone this wiki locally