[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 #28295

tdoublep · 2025-11-07T12:47:32Z

Purpose

Falcon-H1 models are failing on main due to a bug in the chunk scan kernel introduced by #24683 for models that use BLOCK_SIZE_DSTATE > 128.

The error looks like:

(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845] triton.compiler.errors.CompilationError: at 170:12:
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             prev_states_ptr
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             + offs_n[None, :] * prev_states_hdim
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             + offs_k_dstate[:, None] * prev_states_dstate
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]         )
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]         for k in range(0, dstate, BLOCK_SIZE_K):
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             C = tl.load(
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 C_ptrs,
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 mask=(offs_m[:, None] < chunk_size_limit)
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 & (offs_k_dstate[None, :] < dstate - k),
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 other=0.0,
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             )
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             if not HAS_INITSTATES and (seq_idx != seq_idx_prev):
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             ^
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845] AssertionError("Mismatched type for prev_states between then block (<['256', '64'], bf16>) and else block (<['64', '256'], bf16>)")

Test Plan

vllm serve tiiuae/Falcon-H1-7B-Base

lm_eval --model local-completions --tasks gsm8k --num_fewshot 5 --batch_size auto \
    --model_args model=tiiuae/Falcon-H1-7B-Base,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=50,max_retries=3,tokenized_requests=False

Test Result

Using v0.10.2 (before bug was introduced)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7278|±  |0.0123|
|     |       |strict-match    |     5|exact_match|↑  |0.7233|±  |0.0123|

After this PR:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7149|±  |0.0124|
|     |       |strict-match    |     5|exact_match|↑  |0.7104|±  |0.0125|

Results are matching within stderr.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

gemini-code-assist

Code Review

This pull request addresses a critical bug in the chunk scan kernel that caused compilation errors for models with BLOCK_SIZE_DSTATE > 128. The issue was a shape mismatch for the prev_states tensor between two conditional branches within the kernel, leading to a Triton compiler AssertionError. The fix correctly aligns the tensor shapes by changing the dimensions passed to tl.zeros, which resolves the error. The change is precise and effectively solves the described problem. The pull request is well-documented with a clear explanation of the bug and validation through test results. The fix is correct and looks good to merge.

…llm-project#28295) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>

…llm-project#28295) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Bram Wasti <bwasti@meta.com>

Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128

ca00b03

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

tdoublep requested a review from heheda12345 November 7, 2025 12:47

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

heheda12345 approved these changes Nov 11, 2025

View reviewed changes

heheda12345 enabled auto-merge (squash) November 11, 2025 04:59

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025

Merge branch 'main' into fix-chunk-scan-falcon-h1

437ccce

heheda12345 merged commit e0c910b into vllm-project:main Nov 14, 2025
45 checks passed

bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (v…

686923d

…llm-project#28295) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Bram Wasti <bwasti@meta.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 #28295

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 #28295

Uh oh!

tdoublep commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 #28295

[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 #28295

Uh oh!

Conversation

tdoublep commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tdoublep commented Nov 7, 2025 •

edited by github-actions bot

Loading