-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build in support for standard entity linking corpora (TAC, ACQUAINT, Etc...) #26
Comments
Make standard Testcase with abstract input and output classes, and specific implementations for standard entity linking corpora (also Wikipedia, IITB, etc.) Also implement output class to generate/output standard metrics. |
Good point Edgar, can you send your ~500 something manually linked microblog posts to be included as test cases? |
...and while where at it, Daan can you do the same with your test/dev set? |
Edgar’s dataset has been superseded by an updated version, we have that one and use it to train tweet models. Let’s see if we can include it this afternoon. My annotation dataset is available, but without subtitles, as the copyright of that does not lie with me. So it’s not suited for inclusion as standard corpus. |
Are those |
Yes, the subtitles are unique if you have a channel and timestamp or program, day, timecode, or etc. I was planning to include a link to Uitzending Gemist for each episode, were the subtitles are available (but not for automatic download AFAIK). I don’t know of any (legal) automatic way of downloading the subtitles right now. |
There are subtitles in B&G's iMMix, but i don't know if they coincide with what you used... |
Probably, but iMMix is not freely available, is it? Op 19 nov. 2013, om 11:21 heeft IsaacHaze [email protected] het volgende geschreven:
|
I remember Bouke mentioning an OAI endpoint being available for their metadata. I assumed that that meant for the iMMix catalogue, so i asked whether the subtitles were/are in there and i think he said yes. |
we now are handling the ACQUAINT stuff... |
And made good progress, I understood. Thanks, Ridho and Zhaochun! |
No description provided.
The text was updated successfully, but these errors were encountered: