Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash Gazetteer and Feature Separator #11

Open
greenwoodma opened this issue May 4, 2020 · 4 comments
Open

Hash Gazetteer and Feature Separator #11

greenwoodma opened this issue May 4, 2020 · 4 comments

Comments

@greenwoodma
Copy link
Contributor

The hash gazetteer doesn't support feature separators. In itself this isn't an issue except.... if you create an instance of the PR with the default values it uses the ANNIE gazetteer files which contain entries with features (see the state abbreviation list for an example) which mans the entire line becomes the entry in the gazetteer and so will never match the documents.

@johann-petrak
Copy link
Contributor

So I guess the only way to deal with this properly would be to ask for a feature separator and then ignore the features, and also warn about them getting ignored?
On the other hand, what advantage does the HashGazetteer really have over any of the other gazetteers? Why is it still around?

@greenwoodma
Copy link
Contributor Author

My suggestion would be to just remove the default value for the list URL from the metadata so it doesn't default to using the Annie lists which it doesn't support. I certainly wouldn't spend time updating the code.

@johann-petrak
Copy link
Contributor

OK. If there is no real advantage of the hash gazetteer over the annie/extended ones, maybe we can plan on deprecating/removing it? May help to avoid confusion and cut down on support request emails on the list :)

@greenwoodma
Copy link
Contributor Author

It's difficult to deprecate given it's in the ANNIE plugin, but I suppose we could make it a hidden PR so that any apps that use it won't break but people won't be able to create new instances. @ianroberts any thoughts on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants