Skip to content

Releases: CodeLinaro/llama.cpp

b6277

25 Aug 22:46
74f52f7
Compare
Choose a tag to compare
CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <xiapc@outlook.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

b6167

14 Aug 16:57
7a0de96
Compare
Choose a tag to compare
llama : add 18-layer model type for Gemma 3-270m (#15319)

This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.

The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.

Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```

b6140

13 Aug 01:38
b049315
Compare
Choose a tag to compare
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

b6029

30 Jul 07:35
a118d80
Compare
Choose a tag to compare
embeddings: fix extraction of CLS pooling results (#14927)

* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs

b5797

02 Jul 03:54
de56944
Compare
Choose a tag to compare
ci : disable fast-math for Metal GHA CI (#14478)

* ci : disable fast-math for Metal GHA CI

ggml-ci

* cont : remove -g flag

ggml-ci

b5752

24 Jun 19:03
62af464
Compare
Choose a tag to compare
batch : fix check for empty sequences in memory (#14364)

* batch : fix check for empty sequences in memory

ggml-ci

* cont : reuse the var

ggml-ci

b5689

18 Jun 05:56
c465030
Compare
Choose a tag to compare
cmake: remove shader-gen step-targets from ggml-vulkan (#14226)

* Remove step-targets from vulkan-shaders-gen

* Unset DESTDIR when building vulkan-shaders-gen

b5686

16 Jun 21:38
e434e69
Compare
Choose a tag to compare
common : suggest --jinja when autodetection fails (#14222)

b5627

10 Jun 20:37
3678b83
Compare
Choose a tag to compare
llama : support GEGLU for jina-bert-v2 (#14090)

b5548

30 May 23:37
e562eec
Compare
Choose a tag to compare
CUDA: fix typo in FlashAttention code (#13926)