Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-index DBpedias by not indexing disambiguation pages #34

Open
m1ci opened this issue Sep 18, 2015 · 3 comments
Open

re-index DBpedias by not indexing disambiguation pages #34

m1ci opened this issue Sep 18, 2015 · 3 comments

Comments

@m1ci
Copy link
Contributor

m1ci commented Sep 18, 2015

Currently, we index any label-URI pairs, however some pairs point to disambiguation pages. This issue freme-project/e-Entity#49 a results from this.
We need to re-index DBpedias by removing the disambiguation pages.

To distinguish whether URL is disambiguation page or not we can use the DBpedia disambiguation pages dataset http://downloads.dbpedia.org/2015-04/core/disambiguations_en.nt.bz2

@m1ci m1ci changed the title re-index DBpedias by not including disambiguation pages re-index DBpedias by not indexing disambiguation pages Sep 18, 2015
@nilesh-c
Copy link
Member

Hi Milan, doing only this will not solve the problem because disambiguation pages are nothing but a collection of pages. Even if the disambiguation page is removed, such wrongly spotted entities will get linked to other URIs.

For example, NOT (if it is recognised as an entity mention by the NER layer) will get linked to http://dbpedia.org/resource/Inverter_(logic_gate) because there is a mapping between NOT and that URI.

@jnehring
Copy link
Member

I think that an entity linked to a disambiguation page is wrong with 100% probability. When FREME NER has to choose another link then disambiguation can or cannot work well. But at least there is a change for success. So even if that does not solve the problem of Not being detected as an entity, it might improve FREME NER performance on the dbpedia dataset.

But please dont implement this task now. It is just an idea that should not get lost. I am sure we have many ways to improve FREME NER and we should implement the most promising improvements first.

@m1ci
Copy link
Contributor Author

m1ci commented Jan 18, 2017

this is not a critical issue for FREME 1.0 and will be left open for future development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants