Releases · CodeLinaro/llama.cpp

25 Aug 22:46

74f52f7

b6277 Latest

Latest

CUDA: Accelerate MXFP4 table lookup using `__byte_perm` (#15451)

* CUDA: optimize get_int_from_table_16

* CUDA: use v_perm_b32 to replace byte_perm on AMD GPUs

* revise documentation

---------

Co-authored-by: xix <xiapc@outlook.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-08-25T22:46:28Z
llama-b6277-bin-macos-arm64.zip

sha256:75c0240e727fc06f3d7374a4101023920c9ab6aee14e4f3043ed2d4d6c722d3a

10.9 MB 2025-08-25T22:46:42Z
llama-b6277-bin-macos-x64.zip

sha256:219a537fddfd94d39dd791d94187b57718184692d32f53750c995ea865f19397

28.1 MB 2025-08-25T22:46:43Z
llama-b6277-bin-ubuntu-vulkan-x64.zip

sha256:6896ed791629ab248d50d856756dd57bf69ec3fbb7c120bf777271123f03ccc3

24.8 MB 2025-08-25T22:46:45Z
llama-b6277-bin-ubuntu-x64.zip

sha256:2c35f2d6b047bf0c6e54416a6f9292afb6389f1c170c8c5ee83f795597e5a143

12.9 MB 2025-08-25T22:46:47Z
llama-b6277-bin-win-cpu-arm64.zip

sha256:25b29348c78d25477b68935be99b8d175ca3b3f6d1d8b81d8c0d00279132ed37

11.1 MB 2025-08-25T22:46:48Z
llama-b6277-bin-win-cpu-x64.zip

sha256:cabb652493de9d9d8be39592a4a6d9139105d31c9c14313212ec9742aab139c3

14.1 MB 2025-08-25T22:46:49Z
llama-b6277-bin-win-cuda-12.4-x64.zip

sha256:11ba5cf102ff9324cbdd43516166578e3d051266c48b17447080474844100421

137 MB 2025-08-25T22:46:50Z
llama-b6277-bin-win-hip-radeon-x64.zip

sha256:1cc134597c687581b70c398f2835263648a156598b1b6fd7a0fa5cf3403dfa94

287 MB 2025-08-25T22:46:57Z
llama-b6277-bin-win-opencl-adreno-arm64.zip

sha256:7c00ae814346ff1c58fa87b8ae04da770830f140f498f9e94944457f13d968d0

11.5 MB 2025-08-25T22:47:09Z
Source code (zip)

2025-08-25T21:21:22Z
Source code (tar.gz)

2025-08-25T21:21:22Z

14 Aug 16:57

github-actions

b6167

7a0de96

b6167

llama : add 18-layer model type for Gemma 3-270m (#15319)

This commit adds support for the 18-layer model type in the Gemma3
series, which is the size of the Gemma3-270m model.

The motivation for this commit is was the only change required for
Gemma3-270m to be converted to GGUF format and used with llama.cpp.

Once the model has been converted and uploaded to Huggingface it can be
used like this:
```console
$ ./build/bin/llama-cli -hf ggml-org/gemma-3-270m-GGUF:Q8_0
```

Assets 15

13 Aug 01:38

github-actions

b6140

b049315

b6140

HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

Assets 15

30 Jul 07:35

github-actions

b6029

a118d80

b6029

embeddings: fix extraction of CLS pooling results (#14927)

* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs

Assets 15

02 Jul 03:54

github-actions

b5797

de56944

b5797

ci : disable fast-math for Metal GHA CI (#14478)

* ci : disable fast-math for Metal GHA CI

ggml-ci

* cont : remove -g flag

ggml-ci

Assets 15

24 Jun 19:03

github-actions

b5752

62af464

b5752

batch : fix check for empty sequences in memory (#14364)

* batch : fix check for empty sequences in memory

ggml-ci

* cont : reuse the var

ggml-ci

Assets 15

18 Jun 05:56

github-actions

b5689

c465030

b5689

cmake: remove shader-gen step-targets from ggml-vulkan (#14226)

* Remove step-targets from vulkan-shaders-gen

* Unset DESTDIR when building vulkan-shaders-gen

Assets 15

16 Jun 21:38

github-actions

b5686

e434e69

b5686

common : suggest --jinja when autodetection fails (#14222)

Assets 15

10 Jun 20:37

github-actions

b5627

3678b83

b5627

llama : support GEGLU for jina-bert-v2 (#14090)

Assets 15

30 May 23:37

github-actions

b5548

e562eec

b5548

CUDA: fix typo in FlashAttention code (#13926)

Assets 18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: CodeLinaro/llama.cpp

b6277

Uh oh!

b6167

Uh oh!

b6140

Uh oh!

b6029

Uh oh!

b5797

Uh oh!

b5752

Uh oh!

b5689

Uh oh!

b5686

Uh oh!

b5627

Uh oh!

b5548

Uh oh!