Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 13, 2025

Add support for build_inp_attn_scale() in llm_build_llama

CC @ggerganov

@CISC
Copy link
Collaborator

CISC commented Nov 17, 2025

Hmmm, I think this is basically the same as in Grok-2, except there the scale is calculated from temperature_length:
https://github.com/sgl-project/sglang/blob/86d10d220f665092f93a3f6e8a31a65a36a4f376/python/sglang/srt/layers/attention/triton_ops/decode_attention.py#L307-L311

Never implemented it (but the metadata is there as you can see) because I was unsure exactly where it should go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants