Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(web): check for low-probability exact + exact-key correction matches 📚 #11876

Merged
merged 6 commits into from
Jul 25, 2024

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Jun 26, 2024

One long-time pet peeve of mine when it comes to auto-correct: it should never correct away from a perfectly-valid word of the language. Even if it's is a more common word in English than its, I believe that its should be left in place. (I find it quite the pain on iOS, as I've had to fight iOS to leave its before.)

Even if we were to auto-correct away, we should ensure that its is, at least, easily accessible on the banner as an option so that a user may at least prevent auto-correction that way. (Admittedly, iOS does present it... I probably need to adapt.) That is... we need to detect exact matches + exact-key matches and ensure they're always visible, with priority. This is something our engine currently doesn't do well, but thanks to #11869, we now have the perfect tool to remedy the issue.

Secondly... the ModelCompositor.predict method is long and rather "monolithic". It'd be nice to spin off as much "keep"-related handling as possible into its own method.

Unit tests for the new functionality will be added with #11944. Ideally, this PR should not be merged until all PRs in the sequence up to and including #11944 are ready to go.

@keymanapp-test-bot skip

@keymanapp-test-bot keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Jun 26, 2024
@mcdurdin
Copy link
Member

mcdurdin commented Jul 5, 2024

One long-time pet peeve of mine when it comes to auto-correct: it should never correct away from a perfectly-valid word of the language. Even if it's is a more common word in English than its, I believe that its should be left in place. (I find it quite the pain on iOS, as I've had to fight iOS to leave its before.)

I am tempted to make you feel better by saying "there, their, they're"...

Interestingly, I've seen other users who love this. its vs it's is impossible to get right without some grammatical awareness of course... but in the more general case, I wonder if this should be surfaced as an option for users?

@jahorton
Copy link
Contributor Author

jahorton commented Jul 5, 2024

One long-time pet peeve of mine when it comes to auto-correct: it should never correct away from a perfectly-valid word of the language. Even if it's is a more common word in English than its, I believe that its should be left in place. (I find it quite the pain on iOS, as I've had to fight iOS to leave its before.)

I am tempted to make you feel better by saying "there, their, they're"...

Interestingly, I've seen other users who love this. its vs it's is impossible to get right without some grammatical awareness of course... but in the more general case, I wonder if this should be surfaced as an option for users?

I was actually thinking something similar, making it a configurable option.

@mcdurdin
Copy link
Member

mcdurdin commented Jul 5, 2024

I was actually thinking something similar, making it a configurable option.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this LGTM; I haven't reviewed the changes to the tests in detail as my flight is about to start descent and I wanted to get this in before that!

}

const directEntries = rootTraversal.entries;
// `Set` requires Chrome 38+, which is more recent than Chrome 35.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are moving away from ES5 (#11881), does that bump our minimum version of Chrome?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. This PR's main content was written months ago, before we were looking at directly dropping it. #11881 isn't yet merged, either.

common/models/templates/src/trie-model.ts Outdated Show resolved Hide resolved
common/models/templates/src/trie-model.ts Outdated Show resolved Hide resolved
common/models/templates/src/trie-model.ts Outdated Show resolved Hide resolved
common/models/templates/src/trie-model.ts Outdated Show resolved Hide resolved
common/web/lm-worker/src/main/model-compositor.ts Outdated Show resolved Hide resolved
if(keyed(tuple.correction.sample) == keyedPrefix) {
if(predictedWord == truePrefix) {
tuple.matchLevel = SuggestionSimilarity.exact;
keepOption = this.toAnnotatedSuggestion(tuple.prediction.sample, 'keep', models.QuoteBehavior.noQuotes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add a comment for this and the next 3 lines of code because it's not entirely obvious what's happening here, even with the comments on ll.426-428

common/web/lm-worker/src/main/model-compositor.ts Outdated Show resolved Hide resolved
@darcywong00 darcywong00 modified the milestones: A18S5, A18S6 Jul 5, 2024
Base automatically changed from feat/web/auto-prediction to master July 8, 2024 07:37
@jahorton
Copy link
Contributor Author

jahorton commented Jul 9, 2024

Noticed this while working on focused unit tests for the followup to #11940: even with this PR in place, we're not currently auto-selecting something like can't with priority when the current context is cant - where there's no exact context match, but there is an exact-key match. I'll want to fix that, whether it be within this PR or within a descendant.

The probability-ratio requirement should probably only apply to suggestions within the same "similarity tier". If it's a lower tier, we should probably straight-up ignore its probability component for the sum used in the thresholding ratio.

@darcywong00 darcywong00 modified the milestones: A18S6, A18S7 Jul 19, 2024
@keymanapp-test-bot keymanapp-test-bot bot removed the user-test-missing User tests have not yet been defined for the PR label Jul 23, 2024
@jahorton jahorton merged commit 980ed88 into master Jul 25, 2024
18 of 19 checks passed
@jahorton jahorton deleted the fix/web/low-probability-exact-matching branch July 25, 2024 03:10
@keyman-server
Copy link
Collaborator

Changes in this pull request will be available for download in Keyman version 18.0.75-alpha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants