-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linguistic Annotations #33
Comments
Hi @matyaskopp, I think it would be worth a try. I'd annotate a sample file with UDPipe, as you mentioned, and see if the errors don't outweigh the ones generated by the current script. If they do, it would be better to implement those improvements into the already existing script. What do you think? The annotation with the current script using stanza is taking an hour or so per file using my machine. So it'd be better to have something more efficient. Fingers crossed! |
Hi @matyaskopp, |
Hi @matyaskopp |
I will annotate the sample with UDPipe and NameTag tomorrow. |
Hi, @matyaskopp |
@lucianadmacedo |
@lucianadmacedo, I have 2.5 minutes per file - XML parsing on my laptop and annotating with LINDAT service. I have to implement the finalization script. I will then upload sample to ParlaMint.ana.sample and update this pull request clarin-eric/ParlaMint#692
|
I haven't run bin/ana_work_stanza.py, I have only checked ParlaMint 2.1 result ParlaMint.ana and the python source code. I can see the following problems:
pb
elements, so it produces a different file (an unannotated TEI file is not reconstructible from TEI.ana file)pb
is preserved, the result due to incorrect XML parsing:PARLAMINT-ES-MC/ParlaMint/ParlaMint-ES_2020-01-04-CD200104.xml
Line 560 in 0b3a40e
There is a lot of work to fix the annotating script, so I suggest using scripts from ParCzech project. It uses Lindat UDPipe with spanish-ancora-ud-2.10-220711 model and NameTag annotation services, and it has been successfully reused in ParlaMint-AT and ParlaMint-UA corpora, so I believe it will work for ParlaMint-ES too. My raw time estimation for annotation is 1-2 days.
@calzada @lucianadmacedo, what do you think? Should I integrate this annotation in Makefile and run it when TEI version is ready?
The text was updated successfully, but these errors were encountered: