Skip to content

Conversation

@andylolu2
Copy link
Contributor

@andylolu2 andylolu2 commented Nov 7, 2025

Purpose

Fixes #27210

Error:

  File "/home/ubuntu/vllm/vllm/v1/core/sched/scheduler.py", line 897, in get_grammar_bitmask
    bitmask = self.structured_output_manager.grammar_bitmask(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/vllm/vllm/v1/structured_output/__init__.py", line 263, in grammar_bitmask
    assert structured_output_request.grammar.accept_tokens(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

Cause:

The current implementation relies on scheduler.update_draft_token_ids to be called to reset previously-used spec_token_ids. However, scheduler.update_draft_token_ids is not guaranteed to be called after every step (currently it happens when the model_execute did not produce any new draft tokens, e.g. when input_fits_in_drafter is False). This results in stale spec_token_ids that don't necessarily follow the grammar after accepting the output tokens from the current step, leading to the above assertion error.

Proposed fix: Clear the spec token ids of a request immediately once it has been scheduled.

Test Plan

Extended current tests to cover this edge case.

Test Result

No more assertion errors.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a solid fix for a critical bug involving speculative decoding and structured outputs, where stale speculative tokens could cause an assertion failure. The main change in vllm/v1/core/sched/scheduler.py to explicitly clear request.spec_token_ids after each scheduling step is correct and directly addresses the root cause. The subsequent simplification in update_draft_token_ids is a good cleanup that improves code clarity. Furthermore, the new test case in tests/v1/spec_decode/test_max_len.py is well-constructed to reproduce the specific edge case and prevent future regressions. The changes are of high quality and I see no issues.

@njhill njhill added the bug Something isn't working label Nov 7, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @andylolu2 for this fix! And for updating the test.

Does your test update now exercise the input_fits_in_drafter == False case? If so that's great, I had just noticed yesterday that there was no test added when that check was put in.

Signed-off-by: Andy Lo <andy@mistral.ai>
@andylolu2
Copy link
Contributor Author

andylolu2 commented Nov 8, 2025

Thanks a lot @andylolu2 for this fix! And for updating the test.

Does your test update now exercise the input_fits_in_drafter == False case? If so that's great, I had just noticed yesterday that there was no test added when that check was put in.

@njhill I've updated the PR to address the comments, thanks for the review!

Yes the test now assertions the prompt_len is less than draft_max_len and prompt_len + output_len is greater than draft_max_len, so it triggers the input_fits_in_drafter == False case.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @andylolu2

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 8, 2025
@njhill njhill enabled auto-merge (squash) November 8, 2025 15:59
@njhill njhill disabled auto-merge November 8, 2025 17:37
@njhill njhill enabled auto-merge (squash) November 8, 2025 17:38
@njhill njhill merged commit 4760413 into vllm-project:main Nov 8, 2025
47 checks passed
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Nov 13, 2025
…se (vllm-project#28298)

Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding structured-output v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: EAGLE Spec Decoding + Structured Outputs causes FSM Crash

2 participants