Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build in support for standard entity linking corpora (TAC, ACQUAINT, Etc...) #26

Open
graus opened this issue Nov 18, 2013 · 12 comments
Open

Comments

@graus
Copy link
Contributor

graus commented Nov 18, 2013

No description provided.

@ejmeij
Copy link

ejmeij commented Nov 18, 2013

Make standard Testcase with abstract input and output classes, and specific implementations for standard entity linking corpora (also Wikipedia, IITB, etc.)

Also implement output class to generate/output standard metrics.

@IsaacHaze
Copy link
Contributor

Good point Edgar, can you send your ~500 something manually linked microblog posts to be included as test cases?

@IsaacHaze
Copy link
Contributor

...and while where at it, Daan can you do the same with your test/dev set?

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

Edgar’s dataset has been superseded by an updated version, we have that one and use it to train tweet models. Let’s see if we can include it this afternoon. My annotation dataset is available, but without subtitles, as the copyright of that does not lie with me. So it’s not suited for inclusion as standard corpus.

@larsmans
Copy link
Contributor

Are those tweets microblog posts redistributable?

@IsaacHaze
Copy link
Contributor

@larsmans: no, but the id's are

@dodijk: do those subtitles have a unique way of referreing to them? Can we download them on-the-fly from B&G's OAI-PMH (they do have one right now, don't they?)

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

Yes, the subtitles are unique if you have a channel and timestamp or program, day, timecode, or etc. I was planning to include a link to Uitzending Gemist for each episode, were the subtitles are available (but not for automatic download AFAIK). I don’t know of any (legal) automatic way of downloading the subtitles right now.

@IsaacHaze
Copy link
Contributor

There are subtitles in B&G's iMMix, but i don't know if they coincide with what you used...

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

Probably, but iMMix is not freely available, is it?

Op 19 nov. 2013, om 11:21 heeft IsaacHaze [email protected] het volgende geschreven:

There /are/ subtitles in B&G's iMMix, but i don't know if they coincide with what you used...


Reply to this email directly or view it on GitHub.

@IsaacHaze
Copy link
Contributor

I remember Bouke mentioning an OAI endpoint being available for their metadata. I assumed that that meant for the iMMix catalogue, so i asked whether the subtitles were/are in there and i think he said yes.

@sinorichard
Copy link

we now are handling the ACQUAINT stuff...

@dodijk
Copy link
Contributor

dodijk commented Nov 19, 2013

And made good progress, I understood. Thanks, Ridho and Zhaochun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants