Skip to content

Commit 57751e7

Browse files
authored
[GPU] Fixed wrong idx for applying attn_sink in pa_sdpa_opt (#32731)
### Details: - Sink to be appiled to each query idx instead of head_idx - Currently there is no practical situation this code is activated. However potentially there will be, e.g., igpu + sink + pa ### Tickets: - *ticket-id*
1 parent 662d21e commit 57751e7

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/plugins/intel_gpu/src/graph/impls/ocl_v2/paged_attention_opt.cl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ KERNEL(pa_sdpa_opt)(
399399
GET_VECTOR_ELEMENT(exp_sum, q_idx) = sub_group_reduce_add(GET_VECTOR_ELEMENT(exp_sum, q_idx));
400400
#ifdef HAS_SINK_INPUT
401401
const uint head_idx = get_global_id(1);
402-
GET_VECTOR_ELEMENT(exp_sum, head_idx) += (native_exp(TO_SOFTMAX_ACCUMULATOR_TYPE(sink_ptr[head_idx] - GET_VECTOR_ELEMENT(qk_max, q_idx))));
402+
GET_VECTOR_ELEMENT(exp_sum, q_idx) += (native_exp(TO_SOFTMAX_ACCUMULATOR_TYPE(sink_ptr[head_idx] - GET_VECTOR_ELEMENT(qk_max, q_idx))));
403403
#endif
404404
}
405405

0 commit comments

Comments
 (0)