llama: add attn temperature tuning for llama arch (non-iswa) #17239

ngxson · 2025-11-13T14:37:43Z

Add support for build_inp_attn_scale() in llm_build_llama

CISC · 2025-11-17T21:37:19Z

Hmmm, I think this is basically the same as in Grok-2, except there the scale is calculated from temperature_length:
https://github.com/sgl-project/sglang/blob/86d10d220f665092f93a3f6e8a31a65a36a4f376/python/sglang/srt/layers/attention/triton_ops/decode_attention.py#L307-L311

Never implemented it (but the metadata is there as you can see) because I was unsure exactly where it should go.

llama: add attn temperature tuning for llama arch (non-iswa)

4fc6c2c

DajanaV mentioned this pull request Nov 13, 2025

UPSTREAM PR #17239: llama: add attn temperature tuning for llama arch (non-iswa) auroralabs-loci/llama.cpp#193

Open

github-actions bot added model Model specific python python script changes labels Nov 13, 2025

ngxson added 2 commits November 17, 2025 21:45

Merge branch 'master' into xsn/llama4_scaling

0e52ba2

update conversion script

13369dd

make sure to use rope_yarn_log_mul

bf4ef6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama: add attn temperature tuning for llama arch (non-iswa) #17239

llama: add attn temperature tuning for llama arch (non-iswa) #17239

ngxson commented Nov 13, 2025 •

edited

Loading

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

llama: add attn temperature tuning for llama arch (non-iswa) #17239

Are you sure you want to change the base?

llama: add attn temperature tuning for llama arch (non-iswa) #17239

Conversation

ngxson commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Nov 13, 2025 •

edited

Loading