Skip to content

Conversation

vshampor
Copy link
Contributor

@vshampor vshampor commented Oct 16, 2025

The original PR adding XAttention reference code was not completely aligned with the original code.
Fixed were:

  • global block scoring was replaced with the K-row local scoring, as in the original code
  • antidiagonal stride direction was reversed to conform to the original code and algorithm
  • causal masking was introduced to effectively never select non-causal blocks and properly score the rest
  • block selection now correctly always keeps the "diagonal" blocks and the entire column of oldest blocks at k_block_dim == 0
  • block selection now correctly calculates the required attention mass which should amount to a threshold portion of the total K-row block sum and always include diagonal/first-in-row blocks.
    Added more E2E tests that compare against the original code results, which are fixed in the test code as reference. Note that the original non-Triton code contained bugs, which had to be fixed in order to obtain correct references (see vshampor/x-attention@fdc5c34).

@vshampor vshampor requested a review from a team as a code owner October 16, 2025 16:07
@github-actions github-actions bot added the category: Core OpenVINO Core (aka ngraph) label Oct 16, 2025
@vshampor vshampor force-pushed the fix_xattn_reference branch from 6373407 to 6a73e15 Compare October 16, 2025 20:36
@peterchen-intel
Copy link
Contributor

Link related PRs: #32399, #32064, #32086

@vshampor vshampor force-pushed the fix_xattn_reference branch from 1b1ebc5 to 586f3f2 Compare October 17, 2025 12:16
Comment on lines +276 to +284
if ((k_block_idx ==
(blocked_attention_scores_shape[2] - blocked_attention_scores_shape[1] + q_block_idx)) ||
k_block_idx == 0) {
// We preserve first-in-row and diagonal blocks always, and include their score in the
// cumulative sum. The target for the rest of the blocks in row is to fill up the
// rest of the attention mass fraction so that with the diagonal and first blocks they
// comprise the `threshold` portion of the entire causal attention mass in this row
retval[head_idx].insert({q_block_idx, k_block_idx});
cumsum += current_score;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luo-cheng2021 can you confirm that the GPU uses the same logic?

cc @peterchen-intel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@WeldonWangwang WeldonWangwang Oct 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed, this keep same logic with GPU kernel, thanks.

@vshampor vshampor added this pull request to the merge queue Oct 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 18, 2025
@peterchen-intel peterchen-intel added this pull request to the merge queue Oct 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 19, 2025
@vshampor vshampor added this pull request to the merge queue Oct 19, 2025
Merged via the queue into openvinotoolkit:master with commit 56d62ff Oct 19, 2025
207 checks passed
@vshampor vshampor deleted the fix_xattn_reference branch October 19, 2025 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: Core OpenVINO Core (aka ngraph)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants