feat: KV Blocks #111

nrfulton · 2025-08-29T18:25:27Z

DRAFT

Adds HuggingFace._generate_from_context_with_kv_cache, re-introducing some stuff from our early hacking in May/June. (Consider this a sign we're finally get back to the fun stuff again).

Stuff that still needs to be done:

lmcache integration (currently using a dict which has some... pitfalls... to put it mildly
Component.parts() needs implementation and walks, similar to the old span default_formatter_walk.
Heuristics for excluding improperly marked blocks/parts
Think about how this integrates with alora code
Perhaps refactor at least the HF backend code
Add use_kv_caching flag to generate_from_context.
Add benchmarking (quality impacts and time savings)
Tests, examples, tutorials

no-verify.

And the way it fits into a model that uses apply_chat_tempalte or any other parser/renderer. Note that there's still a bug entailed by the chance that there are also substrings which "hit" on the cached contents. We don't anticipate this happens often in practice because of how KV cache smashing should typically be used, but it's something we need to address by introducing the use of sentinel values, or indexing string machines, or something else along those lines. no-verify commit because the point of this code is documentation.

guicho271828 · 2025-09-02T08:34:32Z

I was wondering how it was implemented. The way I envisioned is a token-wise Trie represented as a flat dictionary that stores the legacy KVcache format token-by-token, by splitting the third dimension of [batch_size, num_heads, seq_len, head_dim] token-by-token as a [1, num_heads, 1, head_dim] tensor.

This

avoids variable-length stitching between partial kv cache info which might be error-prone. Instead, every traversal is token-by-token.
separates cache retrieval as a simple, datastructre-oriented class, clearly named as KVCacheTrie.
is C++/Rust reimplementation friendly due to a simpler datastructure.
has a single retrieval call right before submitting a string query to the LLM, which allows a tranparent access to the cache. This is different from the way it appears to be done here, which requires a big _generate_from_context_with_kv_cache method.

Thoughts?

guicho271828 · 2025-09-02T08:36:49Z

Also:

Allows sharing the cache when two components share a prefix.

mergify · 2025-09-03T21:11:21Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

nrfulton · 2025-09-04T02:48:07Z

I was wondering how it was implemented. The way I envisioned is a token-wise Trie represented as a flat dictionary that stores the legacy KVcache format token-by-token...

Thoughts?

Before / in addition to "rolling our own", we should first/also try to leverage existing stuff in the kv cache management ecosystem (lmcache seems closest to the sort of thing we need).

nrfulton · 2025-09-04T14:37:08Z

mellea/stdlib/base.py

+        value: str | None,
+        meta: dict[str, Any] | None = None,
+        *,
+        cache: bool = False,


per @HendrikStrobelt , change name to use_cache or something like that.

nrfulton added 7 commits August 27, 2025 19:31

Adds cache smash code from the Project M codebase.

49fdcf1

rename to avoid clash b/w cache/ and cache.py

a1a4eb7

Adds cache flag to CBlock.

5989664

Initial work on re-introducing span-ish KV caching.

fab35d9

no-verify.

Adds KV cache smash.

ead3fe8

Adds example of kv cache smash.

1cd08ae

nrfulton marked this pull request as draft August 29, 2025 18:25

Merge branch 'main' into nathan/kv_block_hack

212a768

Merge branch 'generative-computing:main' into nathan/kv_block_hack

e53e13b

nrfulton changed the title ~~KV Blocks~~ feat: KV Blocks Sep 4, 2025

nrfulton commented Sep 4, 2025

View reviewed changes

Merge branch 'main' into nathan/kv_block_hack

806eef7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: KV Blocks #111

feat: KV Blocks #111

nrfulton commented Aug 29, 2025 •

edited

Loading

Uh oh!

guicho271828 commented Sep 2, 2025

Uh oh!

guicho271828 commented Sep 2, 2025

Uh oh!

mergify bot commented Sep 3, 2025 •

edited

Loading

Uh oh!

nrfulton commented Sep 4, 2025 •

edited

Loading

Uh oh!

nrfulton Sep 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

feat: KV Blocks #111

Are you sure you want to change the base?

feat: KV Blocks #111

Conversation

nrfulton commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guicho271828 commented Sep 2, 2025

Uh oh!

guicho271828 commented Sep 2, 2025

Uh oh!

mergify bot commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

nrfulton commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrfulton Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nrfulton commented Aug 29, 2025 •

edited

Loading

mergify bot commented Sep 3, 2025 •

edited

Loading

nrfulton commented Sep 4, 2025 •

edited

Loading