-
-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(web): provide lexicon probabilities directly on the search path 📚 #11868
Conversation
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- nits only
} | ||
}; | ||
return; | ||
} | ||
} | ||
|
||
get entries(): string[] { | ||
get entries() { | ||
const totalWeight = this.totalWeight; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -35,6 +35,10 @@ describe('Trie traversal abstractions', function() { | |||
it('traversal with simple internal nodes', function() { | |||
var model = new TrieModel(jsonFixture('tries/english-1000')); | |||
|
|||
// Prob: entry weight / total weight | |||
// "the" is the highest-weighted word in the fixture. | |||
const PROB_OF_THE = 1000 / 500500; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this and PROB_OF_TROUBLE
outside the functions so we can use them throughout?
Changes in this pull request will be available for download in Keyman version 18.0.70-alpha |
This PR was originally part of #10973.
In order to efficiently traverse a full lexical efficiently for dictionary-based wordbreaking, it's best to directly provide relevant probability data as efficiently as possible. Fortunately, it's easily possible to make this O(1) on the lexical model's internal iterator - the
LexiconTraversal
type. It would take O(log(N)) time to recompute it via the model's.predict
method instead.Note that this provides two different probability value types:
There are uses for this outside of dictionary-based wordbreaking, too. The latter 'probability' listed above can be useful for optimizing the correction-search - if a path only produces low-frequency words, we should consider other paths that could yield higher-frequency words first.
There's also notable potential for being able to merge / blend two different models together via their
LexiconTraversal
iterators in this manner. Noting our upcoming push toward #11872, this would facilitate a fantastic way to achieve that goal - to create a stand-in model for the OS's dictionary and blend that with the loaded lexical-model via traversals.@keymanapp-test-bot skip