refactor(web): utilize new tokenization-transition methodology 🚂 #14877

jahorton · 2025-10-01T19:45:14Z

We're now at a place where we can reasonably integrate all the new parts designed to help with token-merge and token-split scenarios. By swapping our tokenization strategy to the one introduced over the last several PRs, we gain the ability to handle such cases when assuming that there is only one true tokenization. (Handling for cases where we drop that assumption will come later - see #14970 for the start of multi-tokenization support.)

Note that we still need to keep the previous tokenization method's alignment style around for a PR - it's referenced by quite a number of unit tests for suggestions and reversions, which are currently written to leverage that. Updating the related methods will be its own unit of work... but fortunately, we can "frankenstein" the two parts together for now until those changes are made.

That said, all unit tests are passing with the new tokenization internals in place - I consider this a good sign! Just gotta update suggestions and reversions to better leverage the data too.

The user tests will come in #14880, two PRs after this one.

Build-bot: skip release:web
Test-bot: skip

keymanapp-test-bot · 2025-10-01T19:45:18Z

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

Web
- KeymanWeb Test Home

Note that we still need to keep the previous tokenization method's alignment style around for now - it's referenced by quite a number of unit tests for suggestions and reversions, which are currently written to leverage that. Updating the related methods will be its own unit of work.

…ion-transition method

jahorton · 2025-10-07T13:58:03Z

web/src/engine/predictive-text/worker-thread/src/main/correction/context-state.ts

+    // We actually will want to build `preservationTransform`s based on the path
+    // leading to each correction/suggestion.  But, until now, we've just built
+    // it based upon the actual input transform - so we'll maintain (temporarily)
+    // as a transitional state.
+
+    const bestResultAnalysis = tokenizationAnalysis;
+    // inputTransform is the ideal transform we found.
+
+    // If tokens were inserted, emit an empty transform; this prevents
+    // suggestions from replacing the "current" token.
+    if(bestResultAnalysis.inputs[0].sample.has(1)) {


Self note: this section could use a rework and reword.

…nk-new-tokenization-transitioner

github-project-automation bot added this to Keyman Oct 1, 2025

github-project-automation bot moved this to Todo in Keyman Oct 1, 2025

github-actions bot added web/ web/predictive-text/ feat labels Oct 1, 2025

keymanapp-test-bot bot added the epic-autocorrect label Oct 1, 2025

keymanapp-test-bot bot changed the title ~~feat(web): utilize new tokenization-transition methodology~~ feat(web): utilize new tokenization-transition methodology 🚂 Oct 1, 2025

keymanapp-test-bot bot added this to the A19S13 milestone Oct 1, 2025

jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from 04c3e63 to e8583bc Compare October 2, 2025 14:51

jahorton force-pushed the feat/web/evaluate-precomputed-tokenization branch from fad631a to 68c8c0c Compare October 2, 2025 16:36

jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from 870b785 to a9b6970 Compare October 2, 2025 16:39

jahorton force-pushed the feat/web/evaluate-precomputed-tokenization branch from 68c8c0c to 7ca6f27 Compare October 2, 2025 16:40

jahorton changed the base branch from feat/web/evaluate-precomputed-tokenization to feat/web/interim-legacy-keyer October 2, 2025 16:41

jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from a9b6970 to c96a604 Compare October 2, 2025 16:42

jahorton changed the title ~~feat(web): utilize new tokenization-transition methodology 🚂~~ refactor(web): utilize new tokenization-transition methodology 🚂 Oct 2, 2025

jahorton added 2 commits October 3, 2025 08:28

change(web): remove old transitionTo method in favor of new tokenizat…

0cf7930

…ion-transition method

jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from c96a604 to 0cf7930 Compare October 3, 2025 13:32

github-actions bot added the refactor label Oct 3, 2025

jahorton commented Oct 7, 2025

View reviewed changes

jahorton added 2 commits October 7, 2025 09:20

change(web): clean up computation of preservationTransform

39dd3b5

docs(web): thoughts toward upcoming changes for preservation transforms

c3e5c66

darcywong00 modified the milestones: A19S13, A19S14 Oct 11, 2025

Base automatically changed from feat/web/interim-legacy-keyer to epic/autocorrect October 16, 2025 14:49

jahorton changed the base branch from epic/autocorrect to fix/web/base-context-state-validation October 16, 2025 14:58

Merge branch 'fix/web/base-context-state-validation' into feat/web/li…

d376c77

…nk-new-tokenization-transitioner

Base automatically changed from fix/web/base-context-state-validation to epic/autocorrect October 16, 2025 15:36

jahorton mentioned this pull request Oct 16, 2025

refactor(web): use tokenization analysis to maintain delayed reversion feature 🚂 #14880

Merged

jahorton requested review from ermshiperete and mcdurdin October 16, 2025 16:20

jahorton marked this pull request as ready for review October 16, 2025 16:20

ermshiperete approved these changes Oct 17, 2025

View reviewed changes

jahorton merged commit 5be0ab7 into epic/autocorrect Oct 20, 2025
7 of 8 checks passed

jahorton deleted the feat/web/link-new-tokenization-transitioner branch October 20, 2025 13:36

github-project-automation bot moved this from Todo to Done in Keyman Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor(web): utilize new tokenization-transition methodology 🚂 #14877

refactor(web): utilize new tokenization-transition methodology 🚂 #14877

Uh oh!

jahorton commented Oct 1, 2025 •

edited

Loading

Uh oh!

keymanapp-test-bot bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

jahorton Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

refactor(web): utilize new tokenization-transition methodology 🚂 #14877

refactor(web): utilize new tokenization-transition methodology 🚂 #14877

Uh oh!

Conversation

jahorton commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Test Results

Test Artifacts

Uh oh!

jahorton Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jahorton commented Oct 1, 2025 •

edited

Loading

keymanapp-test-bot bot commented Oct 1, 2025 •

edited

Loading