feat(web): provide lexicon probabilities directly on the search path 📚 #11868

jahorton · 2024-06-25T05:46:18Z

This PR was originally part of #10973.

In order to efficiently traverse a full lexical efficiently for dictionary-based wordbreaking, it's best to directly provide relevant probability data as efficiently as possible. Fortunately, it's easily possible to make this O(1) on the lexical model's internal iterator - the LexiconTraversal type. It would take O(log(N)) time to recompute it via the model's .predict method instead.

Note that this provides two different probability value types:

The probability of each reached entry.
The probability of the highest-frequency entry either represented by the current node or by any of its descendants.

There are uses for this outside of dictionary-based wordbreaking, too. The latter 'probability' listed above can be useful for optimizing the correction-search - if a path only produces low-frequency words, we should consider other paths that could yield higher-frequency words first.

There's also notable potential for being able to merge / blend two different models together via their LexiconTraversal iterators in this manner. Noting our upcoming push toward #11872, this would facilitate a fantastic way to achieve that goal - to create a stand-in model for the OS's dictionary and blend that with the loaded lexical-model via traversals.

@keymanapp-test-bot skip

keymanapp-test-bot · 2024-06-25T05:46:22Z

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Android
Developer
iOS
- Keyman for iOS (simulator image)
- FirstVoices Keyboards for iOS (simulator image)
- TestFlight internal PR build version - 18.0.60 (0.11868.11420)
Keyboards
- Test Keyboards
Web
- KeymanWeb Test Home
Windows

mcdurdin

LGTM -- nits only

common/models/templates/src/trie-model.ts

mcdurdin · 2024-07-05T05:09:53Z

common/models/templates/src/trie-model.ts

        }
      };
      return;
    }
  }

-  get entries(): string[] {
+  get entries() {
+    const totalWeight = this.totalWeight;


common/models/templates/src/trie-model.ts

mcdurdin · 2024-07-05T05:13:29Z

common/models/templates/test/test-trie-traversal.js

@@ -35,6 +35,10 @@ describe('Trie traversal abstractions', function() {
  it('traversal with simple internal nodes', function() {
    var model = new TrieModel(jsonFixture('tries/english-1000'));

+    // Prob:  entry weight / total weight
+    // "the" is the highest-weighted word in the fixture.
+    const PROB_OF_THE = 1000 / 500500;


Can we move this and PROB_OF_TROUBLE outside the functions so we can use them throughout?

common/models/templates/test/test-trie-traversal.js

keyman-server · 2024-07-08T18:03:30Z

Changes in this pull request will be available for download in Keyman version 18.0.70-alpha

jahorton added 5 commits June 21, 2024 15:04

feat(common/models): lexicon traversals - probability access

abb3264

fix(common/models): basic unit test patchup

8e058a5

feat(common/models/templates): unit test enhancements for new feature

17fefaa

change(web): maxP -> p, to ensure common field name for node and entry

0c47fcd

fix(common/models/templates): unit test patchup after maxP -> p

27bc021

jahorton requested a review from mcdurdin as a code owner June 25, 2024 05:46

keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Jun 25, 2024

keymanapp-test-bot bot added this to the A18S5 milestone Jun 25, 2024

github-actions bot added common/ common/models/ common/models/types/ common/models/templates/ feat web/ labels Jun 25, 2024

keymanapp-test-bot bot removed the user-test-missing User tests have not yet been defined for the PR label Jun 25, 2024

This was referenced Jun 25, 2024

change(common/models/templates): rework Trie predict method to utilize traversals 📚 #11870

Merged

feat(web): dictionary-based wordbreaking - draft form with demo page 📚 #10973

Draft

Base automatically changed from refactor/common/models/priority-queue-as-util to master July 3, 2024 01:13

mcdurdin approved these changes Jul 5, 2024

View reviewed changes

darcywong00 modified the milestones: A18S5, A18S6 Jul 5, 2024

change(common/models): enact changes per PR review

14b453c

github-actions bot added web/ and removed web/ labels Jul 8, 2024

jahorton merged commit 6ecb461 into master Jul 8, 2024
15 checks passed

jahorton deleted the feat/common/models/lexicon-traversal-probs branch July 8, 2024 07:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(web): provide lexicon probabilities directly on the search path 📚 #11868

feat(web): provide lexicon probabilities directly on the search path 📚 #11868

jahorton commented Jun 25, 2024 •

edited

Loading

keymanapp-test-bot bot commented Jun 25, 2024 •

edited

Loading

mcdurdin left a comment

mcdurdin Jul 5, 2024

mcdurdin Jul 5, 2024

keyman-server commented Jul 8, 2024

feat(web): provide lexicon probabilities directly on the search path 📚 #11868

feat(web): provide lexicon probabilities directly on the search path 📚 #11868

Conversation

jahorton commented Jun 25, 2024 • edited Loading

keymanapp-test-bot bot commented Jun 25, 2024 • edited Loading

User Test Results

Test Artifacts

mcdurdin left a comment

Choose a reason for hiding this comment

mcdurdin Jul 5, 2024

Choose a reason for hiding this comment

mcdurdin Jul 5, 2024

Choose a reason for hiding this comment

keyman-server commented Jul 8, 2024

jahorton commented Jun 25, 2024 •

edited

Loading

keymanapp-test-bot bot commented Jun 25, 2024 •

edited

Loading