[Bugfix] Spec decode + structured output + spec model max len edge case #28298

andylolu2 · 2025-11-07T14:30:59Z

Purpose

Error:

  File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 897, in get_grammar_bitmask
    bitmask = self.structured_output_manager.grammar_bitmask(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/vllm/vllm/v1/structured_output/__init__.py", line 263, in grammar_bitmask
    assert structured_output_request.grammar.accept_tokens(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Cause:

The current implementation relies on scheduler.update_draft_token_ids to be called to reset previously-used spec_token_ids. However, scheduler.update_draft_token_ids is not guaranteed to be called after every step (currently it happens when the model_execute did not produce any new draft tokens, e.g. when input_fits_in_drafter is False). This results in stale spec_token_ids that don't necessarily follow the grammar after accepting the output tokens from the current step, leading to the above assertion error.

Proposed fix: Clear the spec token ids of a request immediately once it has been scheduled.

Test Plan

Extended current tests to cover this edge case.

Test Result

No more assertion errors.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Andy Lo <andy@mistral.ai>

gemini-code-assist

Code Review

This pull request provides a solid fix for a critical bug involving speculative decoding and structured outputs, where stale speculative tokens could cause an assertion failure. The main change in vllm/v1/core/sched/scheduler.py to explicitly clear request.spec_token_ids after each scheduling step is correct and directly addresses the root cause. The subsequent simplification in update_draft_token_ids is a good cleanup that improves code clarity. Furthermore, the new test case in tests/v1/spec_decode/test_max_len.py is well-constructed to reproduce the specific edge case and prevent future regressions. The changes are of high quality and I see no issues.

njhill

Thanks a lot @andylolu2 for this fix! And for updating the test.

Does your test update now exercise the input_fits_in_drafter == False case? If so that's great, I had just noticed yesterday that there was no test added when that check was put in.

vllm/v1/core/sched/scheduler.py

vllm/v1/structured_output/__init__.py

Signed-off-by: Andy Lo <andy@mistral.ai>

andylolu2 · 2025-11-08T15:22:54Z

Thanks a lot @andylolu2 for this fix! And for updating the test.

Does your test update now exercise the input_fits_in_drafter == False case? If so that's great, I had just noticed yesterday that there was no test added when that check was put in.

@njhill I've updated the PR to address the comments, thanks for the review!

Yes the test now assertions the prompt_len is less than draft_max_len and prompt_len + output_len is greater than draft_max_len, so it triggers the input_fits_in_drafter == False case.

njhill

Thanks again @andylolu2

…se (vllm-project#28298) Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

[Bugfix] Spec decode + structured output + spec model max len edge case

4e3d5b5

Signed-off-by: Andy Lo <andy@mistral.ai>

andylolu2 requested review from ApostaC, WoosukKwon, aarnphm, alexm-redhat, benchislett, comaniac, heheda12345, mgoin, njhill, robertgshaw2-redhat, russellb and ywang96 as code owners November 7, 2025 14:31

mergify bot added structured-output speculative-decoding v1 labels Nov 7, 2025

github-project-automation bot added this to Structured Output Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

njhill added the bug Something isn't working label Nov 7, 2025

njhill reviewed Nov 7, 2025

View reviewed changes

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

vllm/v1/core/sched/scheduler.py Outdated Show resolved Hide resolved

vllm/v1/structured_output/__init__.py Outdated Show resolved Hide resolved

njhill modified the milestones: CI Refactor, v0.11.1 Nov 7, 2025

njhill mentioned this pull request Nov 7, 2025

[Core] Async Scheduling X Spec Decoding Compatibility #24799

Open

5 tasks

Comments

8796d01

Signed-off-by: Andy Lo <andy@mistral.ai>

njhill approved these changes Nov 8, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 8, 2025

njhill enabled auto-merge (squash) November 8, 2025 15:59

Merge branch 'main' into andy/fix-spec-decode-with-grammar

ddc870f

njhill disabled auto-merge November 8, 2025 17:37

njhill enabled auto-merge (squash) November 8, 2025 17:38

njhill merged commit 4760413 into vllm-project:main Nov 8, 2025
47 checks passed

github-project-automation bot moved this to Done in Structured Output Nov 8, 2025

benchislett mentioned this pull request Nov 11, 2025

[Bug]: Structured output causes crash in speculative decoding #27969

Closed

1 task

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025

[Bugfix] Spec decode + structured output + spec model max len edge ca…

940579c

…se (vllm-project#28298) Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Spec decode + structured output + spec model max len edge case #28298

[Bugfix] Spec decode + structured output + spec model max len edge case #28298

Uh oh!

andylolu2 commented Nov 7, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andylolu2 commented Nov 8, 2025 •

edited

Loading

Uh oh!

njhill left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Spec decode + structured output + spec model max len edge case #28298

[Bugfix] Spec decode + structured output + spec model max len edge case #28298

Uh oh!

Conversation

andylolu2 commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andylolu2 commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

andylolu2 commented Nov 7, 2025 •

edited by github-actions bot

Loading

andylolu2 commented Nov 8, 2025 •

edited

Loading