Skip to content

Conversation

@jahorton
Copy link
Contributor

@jahorton jahorton commented Oct 1, 2025

We're now at a place where we can reasonably integrate all the new parts designed to help with token-merge and token-split scenarios. By swapping our tokenization strategy to the one introduced over the last several PRs, we gain the ability to handle such cases when assuming that there is only one true tokenization. (Handling for cases where we drop that assumption will come later - see #14970 for the start of multi-tokenization support.)

Note that we still need to keep the previous tokenization method's alignment style around for a PR - it's referenced by quite a number of unit tests for suggestions and reversions, which are currently written to leverage that. Updating the related methods will be its own unit of work... but fortunately, we can "frankenstein" the two parts together for now until those changes are made.

That said, all unit tests are passing with the new tokenization internals in place - I consider this a good sign! Just gotta update suggestions and reversions to better leverage the data too.

The user tests will come in #14880, two PRs after this one.

Build-bot: skip release:web
Test-bot: skip

@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Oct 1, 2025

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

@keymanapp-test-bot keymanapp-test-bot bot changed the title feat(web): utilize new tokenization-transition methodology feat(web): utilize new tokenization-transition methodology 🚂 Oct 1, 2025
@keymanapp-test-bot keymanapp-test-bot bot added this to the A19S13 milestone Oct 1, 2025
@jahorton jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from 04c3e63 to e8583bc Compare October 2, 2025 14:51
@jahorton jahorton force-pushed the feat/web/evaluate-precomputed-tokenization branch from fad631a to 68c8c0c Compare October 2, 2025 16:36
@jahorton jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from 870b785 to a9b6970 Compare October 2, 2025 16:39
@jahorton jahorton force-pushed the feat/web/evaluate-precomputed-tokenization branch from 68c8c0c to 7ca6f27 Compare October 2, 2025 16:40
@jahorton jahorton changed the base branch from feat/web/evaluate-precomputed-tokenization to feat/web/interim-legacy-keyer October 2, 2025 16:41
@jahorton jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from a9b6970 to c96a604 Compare October 2, 2025 16:42
@jahorton jahorton changed the title feat(web): utilize new tokenization-transition methodology 🚂 refactor(web): utilize new tokenization-transition methodology 🚂 Oct 2, 2025
Note that we still need to keep the previous tokenization method's alignment style around for now - it's referenced by quite a number of unit tests for suggestions and reversions, which are currently written to leverage that.  Updating the related methods will be its own unit of work.
@jahorton jahorton force-pushed the feat/web/link-new-tokenization-transitioner branch from c96a604 to 0cf7930 Compare October 3, 2025 13:32
Comment on lines 278 to 288
// We actually will want to build `preservationTransform`s based on the path
// leading to each correction/suggestion. But, until now, we've just built
// it based upon the actual input transform - so we'll maintain (temporarily)
// as a transitional state.

const bestResultAnalysis = tokenizationAnalysis;
// inputTransform is the ideal transform we found.

// If tokens were inserted, emit an empty transform; this prevents
// suggestions from replacing the "current" token.
if(bestResultAnalysis.inputs[0].sample.has(1)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self note: this section could use a rework and reword.

@darcywong00 darcywong00 modified the milestones: A19S13, A19S14 Oct 11, 2025
Base automatically changed from feat/web/interim-legacy-keyer to epic/autocorrect October 16, 2025 14:49
@jahorton jahorton changed the base branch from epic/autocorrect to fix/web/base-context-state-validation October 16, 2025 14:58
Base automatically changed from fix/web/base-context-state-validation to epic/autocorrect October 16, 2025 15:36
@jahorton jahorton marked this pull request as ready for review October 16, 2025 16:20
@jahorton jahorton merged commit 5be0ab7 into epic/autocorrect Oct 20, 2025
7 of 8 checks passed
@jahorton jahorton deleted the feat/web/link-new-tokenization-transitioner branch October 20, 2025 13:36
@github-project-automation github-project-automation bot moved this from Todo to Done in Keyman Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants