Releases: CodeLinaro/llama.cpp
Releases · CodeLinaro/llama.cpp
b6277
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451) * CUDA: optimize get_int_from_table_16 * CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs * revise documentation --------- Co-authored-by: xix <xiapc@outlook.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
b6167
llama : add 18-layer model type for Gemma 3-270m (#15319) This commit adds support for the 18-layer model type in the Gemma3 series, which is the size of the Gemma3-270m model. The motivation for this commit is was the only change required for Gemma3-270m to be converted to GGUF format and used with llama.cpp. Once the model has been converted and uploaded to Huggingface it can be used like this: ```console $ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0 ```
b6140
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…
b6029
embeddings: fix extraction of CLS pooling results (#14927) * embeddings: fix extraction of CLS pooling results * merge RANK pooling into CLS case for inputs
b5797
ci : disable fast-math for Metal GHA CI (#14478) * ci : disable fast-math for Metal GHA CI ggml-ci * cont : remove -g flag ggml-ci
b5752
batch : fix check for empty sequences in memory (#14364) * batch : fix check for empty sequences in memory ggml-ci * cont : reuse the var ggml-ci
b5689
cmake: remove shader-gen step-targets from ggml-vulkan (#14226) * Remove step-targets from vulkan-shaders-gen * Unset DESTDIR when building vulkan-shaders-gen
b5686
common : suggest --jinja when autodetection fails (#14222)
b5627
llama : support GEGLU for jina-bert-v2 (#14090)
b5548
CUDA: fix typo in FlashAttention code (#13926)