Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving ranking of Cree entries for English sense (WordNet) based search) #1206

Open
1 task
aarppe opened this issue Dec 19, 2024 · 0 comments
Open
1 task

Comments

@aarppe
Copy link
Contributor

aarppe commented Dec 19, 2024

  • Tweak the ranking of the Cree entries within the senses. Currently we are using corpus-based lemma frequencies, when they exist, but we might want to factor in the glossary-counts as well as dictionary-morpheme-based entry frequencies as well. [This needs an update of the source files with the frequencies, by @aarppe]

Originally posted by @aarppe in #1138 (comment)

The corpus-based lemma frequencies cover only a part of all the Cree entries in CW and the other dictionary resources, and they are skewed due to the corpora that we have. The following would be options to consider:

  1. Include the glossary-based rankings. This will ensure that core vocabulary is ranked up (some 3 thousand entries).
  2. Include dictionary-based morpheme aggregate rankings. This will ensure that all entries in CW (over 30k) will receive a ranking (which will cover most of the other sources as well).
  3. Include the extent of matches of English search terms (the lexical parts remaining after English phrase analysis) with the English definitions of the Cree entries under each sense.
  4. Include an improved form of vector similarity between the English search terms and the English definitions of the Cree entries.
  5. A ranked combination of 1-4 above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant