Skip to content

Conversation

@tdoublep
Copy link
Member

@tdoublep tdoublep commented Nov 7, 2025

Purpose

Falcon-H1 models are failing on main due to a bug in the chunk scan kernel introduced by #24683 for models that use BLOCK_SIZE_DSTATE > 128.

The error looks like:

(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845] triton.compiler.errors.CompilationError: at 170:12:
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             prev_states_ptr
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             + offs_n[None, :] * prev_states_hdim
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             + offs_k_dstate[:, None] * prev_states_dstate
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]         )
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]         for k in range(0, dstate, BLOCK_SIZE_K):
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             C = tl.load(
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 C_ptrs,
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 mask=(offs_m[:, None] < chunk_size_limit)
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 & (offs_k_dstate[None, :] < dstate - k),
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]                 other=0.0,
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             )
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             if not HAS_INITSTATES and (seq_idx != seq_idx_prev):
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845]             ^
(EngineCore_DP0 pid=2551783) ERROR 11-07 07:47:05 [core.py:845] AssertionError("Mismatched type for prev_states between then block (<['256', '64'], bf16>) and else block (<['64', '256'], bf16>)")

Test Plan

vllm serve tiiuae/Falcon-H1-7B-Base
lm_eval --model local-completions --tasks gsm8k --num_fewshot 5 --batch_size auto \
    --model_args model=tiiuae/Falcon-H1-7B-Base,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=50,max_retries=3,tokenized_requests=False

Test Result

Using v0.10.2 (before bug was introduced)

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7278|±  |0.0123|
|     |       |strict-match    |     5|exact_match|↑  |0.7233|±  |0.0123|

After this PR:

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7149|±  |0.0124|
|     |       |strict-match    |     5|exact_match|↑  |0.7104|±  |0.0125|

Results are matching within stderr.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
@tdoublep tdoublep requested a review from heheda12345 November 7, 2025 12:47
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical bug in the chunk scan kernel that caused compilation errors for models with BLOCK_SIZE_DSTATE > 128. The issue was a shape mismatch for the prev_states tensor between two conditional branches within the kernel, leading to a Triton compiler AssertionError. The fix correctly aligns the tensor shapes by changing the dimensions passed to tl.zeros, which resolves the error. The change is precise and effectively solves the described problem. The pull request is well-documented with a clear explanation of the bug and validation through test results. The fix is correct and looks good to merge.

@heheda12345 heheda12345 enabled auto-merge (squash) November 11, 2025 04:59
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 11, 2025
@heheda12345 heheda12345 merged commit e0c910b into vllm-project:main Nov 14, 2025
45 checks passed
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
…llm-project#28295)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: George D. Torres <gdavtor@gmail.com>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
…llm-project#28295)

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Bram Wasti <bwasti@meta.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants