Remilia/update remote #4

RemiliaForever · 2025-07-28T03:43:48Z

ggml-ci

* llama : reuse compute graphs ggml-ci * llama-bench : add graph reuse parameter ggml-ci * cont : remove the parameter and the sched resets ggml-ci * graph : rename update() to can_reuse() ggml-ci * params : remove is_same() ggml-ci * graph : set res->params in llm_graph_context constructor ggml-ci * graph : avoid set_max_nodes in llm_graph_result ggml-ci * kv-cache : reuse llama_context's graph result instance ggml-ci * context : reset the previous graph result upon memory updates ggml-ci * batch : llama_ubatch now carries its data instead of pointing to balloc ggml-ci * merge : fix build ggml-ci * graph : fix can_reuse() checks when flash-attention is disabled * graph : move llm_graph_result impl in source file + debug env ggml-ci

ggml-ci

* Add Ernie4.5 MoE * Fix Flake errors. * Properly encode/decode MoE layer step * Correct tensor mappings (.weight) * Pass and read n_ff_exp * n_ff_shexp calculation and further minor changes * Rope fixes. * .gitignore fix * Add unit32 cast for Linux builds * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Further fixes from code review * Fix trailing whitespace * Reenable missing experts error * Code style from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Fix non-MoE regression Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Without that condition, this debug log clutters the screen every batch treated in the prompt processing, or every token generated in Kobold.cpp.

ggml-ci

* graph : avoid huge warm-up graphs for MoE models ggml-ci * cont : bump max nodes to 8x model tensors

* Fix Gemma3n not executed as CUDA_GRAPH on NVGPUs Gemma3n uses Matrix-Matrix addition as part of their input processing, wrongly triggering CUDA_GRAPH disablement on NVGPUs even when batch-size of 1 is used. * Exclude `project_per_layer_input` by matching node names This ensures that all other graphs which don't exhibit this pattern do not have their behavior changed. * Revert unnecessary formatting changes

ggml-ci

* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation. * Documentation: Reorganize build.md's Vulkan section.

…74) (#14707)

* imatrix : allow processing multiple chunks per batch * perplexity : simplify filling the batch * imatrix : fix segfault when using a single chunk per batch * imatrix : use GGUF to store imatrix data * imatrix : fix conversion problems * imatrix : use FMA and sort tensor names * py : add requirements for legacy imatrix convert script * perplexity : revert changes * py : include imatrix converter requirements in toplevel requirements * imatrix : avoid using designated initializers in C++ * imatrix : remove unused n_entries * imatrix : allow loading mis-ordered tensors Sums and counts tensors no longer need to be consecutive. * imatrix : more sanity checks when loading multiple imatrix files * imatrix : use ggml_format_name instead of std::string concatenation Co-authored-by: Xuan Son Nguyen <son@huggingface.co> * quantize : use unused imatrix chunk_size with LLAMA_TRACE * common : use GGUF for imatrix output by default * imatrix : two-way conversion between old format and GGUF * convert : remove imatrix to gguf python script * imatrix : use the function name in more error messages * imatrix : don't use FMA explicitly This should make comparisons between the formats easier because this matches the behavior of the previous version. * imatrix : avoid returning from void function save_imatrix * imatrix : support 3d tensors with MUL_MAT * quantize : fix dataset name loading from gguf imatrix * common : move string_remove_suffix from quantize and imatrix Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * imatrix : add warning when legacy format is written * imatrix : warn when writing partial data, to help guess dataset coverage Also make the legacy format store partial data by using neutral values for missing data. This matches what is done at read-time for the new format, and so should get the same quality in case the old format is still used. * imatrix : avoid loading model to convert or combine imatrix * imatrix : avoid using imatrix.dat in README --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* ggml/ggml-vulkan/test-backend-ops: adds CONV_2D for Vulkan * ggml-vulkan: adds f32 scalar shader to compute 2D convolution directly with gemm (no need for im2col), * test-backend-ops: adds test_case_ref to check the validity/performance of ops against reference implementations having different graphs, adds tests * * Performance fixes: minimized branch divergence, uses collectives to eliminate redundant calculation, macros removed. * Kernel shared memory size check * Updates test-backend-ops to support graphs for performance measurement. * * Apple/Win32 compile errors fixed * Subgroup size used to determine tile size -> fixes llvmpipe errors. * Collectives disabled by default. * Intel support is disabled as the performance is poor. * Conv2d enabled for Intel with disabled collectives, disabled for Apple * test-backend-ops modifications are reverted * Trailing spaces and missing override fixed. * Triggering pipeline relaunch. * Code formatted with .clang-format.

…14785) * Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md. * Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md

The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.

* kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review

* add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks

* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr

…n imatrix file (#12718) * Add --show-statistics option * Add --show-statistics logic * Add tensor name parsing * Tidy output format * Fix typo in title * Improve tensor influence ranking * Add better statistics * Change statistics' sort order * Add Cosine Similarity * Add header search path * Change header search path to private * Add weighted statistics per layer * Update report title * Refactor compute_statistics out of main * Refactor compute_cossim out of load_imatrix * Refactor compute_statistics out of load_imatrix * Move imatrix statistics calculation into its own functions * Add checks and validations * Remove unnecessary include directory * Rename labels * Add m_stats getter and refactor compute_statistics out of load_imatrix * Refactor variable names * Minor cosmetic change * Retrigger checks (empty commit) * Rerun checks (empty commit) * Fix unnecessary type promotion Co-authored-by: compilade <git@compilade.net> * Reverting change to improve code readability * Rerun checks (empty commit) * Rerun checks (empty commit) * Rerun checks - third time's the Charm 🤞 (empty commit) * Minor cosmetic change * Update README * Fix typo * Update README * Rerun checks (empty commit) * Re-implement changes on top of #9400 * Update README.md * Update README * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md * Remove duplicate option in print_usage() * Update README.md * Update README.md Co-authored-by: compilade <git@compilade.net> * Update README.md Co-authored-by: compilade <git@compilade.net> * Remove input check * Remove commented out code --------- Co-authored-by: compilade <git@compilade.net>

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code

* Update llama-memory-recurrent.cpp handle saving/loading null layers in recurrent memory * fixed styling issues and updated comments * fix styling issue Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

ggml-ci

* CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

RemiliaForever · 2025-07-28T03:46:22Z

BTW, should we disable CI on our forked repo?

vinovo · 2025-08-03T17:59:51Z

BTW, should we disable CI on our forked repo?

Looks like most are passing. We can disable the one that is failing due to permission problem I think

CISC and others added 30 commits July 16, 2025 20:52

ci : disable failing vulkan crossbuilds (#14723)

1ba45d4

batch : fix uninitialized has_cpl flag (#14733)

ad57d3e

ggml-ci

kv-cache : opt mask set input (#14600)

d9b6910

ggml-ci

llama : fix parallel processing for lfm2 (#14705)

086cf81

kv-cache : fix k-shift for multiple streams (#14742)

d6fb3f6

ggml-ci

nix : use optionalAttrs for env mkDerivation attrset argument (#14726)

760b448

convert : fix Ernie4.5 MoE without shared experts (#14746)

670e136

use max work group size for device to replace the magic number (#14732)

349ea79

graph : Pass the graph placeholder message in debug mode (#14748)

09651d0

Without that condition, this debug log clutters the screen every batch treated in the prompt processing, or every token generated in Kobold.cpp.

graph : refactor context to not pass gf explicitly (#14629)

8f974bc

ggml-ci

CUDA: set_rows + cpy.cu refactor (#14712)

f9a31ee

model : add EXAONE 4.0 support (#14630)

e0cb5c5

model : fix build after merge conflict (#14754)

eacdeb5

graph : avoid huge warm-up graphs for MoE models (#14753)

d498af3

* graph : avoid huge warm-up graphs for MoE models ggml-ci * cont : bump max nodes to 8x model tensors

parallel : add option for different RNG seeds (#14757)

2adf8d8

ggml-ci

graph : fix graph reuse reset of params (#14760)

9fb1042

ggml-ci

metal : fuse add, mul + add tests (#14596)

bf9087f

ggml-ci

sync : ggml

b172309

Documentation: Update build.md's Vulkan section (#14736)

f0d4d17

* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation. * Documentation: Reorganize build.md's Vulkan section.

Vulkan: Fix fprintf format-security warning (#14770)

83f5872

vulkan: Add logging for bf16 features to ggml_vk_print_gpu_info (#132…

d4b91ea

…74) (#14707)

Contrib: add 0cc4m as codeowner for Vulkan backend (#14775)

36c1532

Clang-format: local files first + fix BinPacking (#14779)

938b785

Documentation: Further revisions to the Vulkan section in build.md (#…

b526ad2

…14785) * Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md. * Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md

docs : fix link for tools/perplexity in README.md (#14780)

2be60cb

jeffbolznv and others added 25 commits July 21, 2025 13:35

vulkan/cuda: Fix im2col when KW!=KH (#14789)

c2e058f

The tid is decomposed into "ow + ky*OW + kx*OW*KH". Change "ksize" to match.

docs : fix backends table in README.md (#14796)

2ba1333

kleidiai: add support for get_rows (#14676)

9220426

* kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review

sycl: Fix im2col (#14797)

cd465d8

opencl: add conv2d kernel (#14403)

6c9ee3b

* add conv2d kernel * fix trailing whitespace * whitespace fixe * handle f16 input and f16 kernel, more opt * resolve conflicts * use enqueue_ndrange_kernel

opencl: fix im2col when KW!=KH (#14803)

38d3af1

cuda: remove linking to cublasLt (#14790)

48b86c4

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

server : allow setting --reverse-prompt arg (#14799)

adef817

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

opencl: remove unreachable return (#14806)

8e6f8bc

cuda : implement bf16 cpy ops and enable bf16 cont (#14763)

e28c0b8

* implement bf16 cpy ops and enable bf16 cont * deduplicate copy functions * deduplicate checks

Mtmd: add a way to select device for vision encoder (#14236)

c8ade30

* Mtmd: add a way to select device for vision encoder * simplify * format * Warn user if manual device selection failed * initialize backend to nullptr

llama : add model type detection for rwkv7 7B&14B (#14816)

d4d1522

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

vulkan: fix rms_norm_mul to handle broadcasting dim0 (#14817)

84712b6

ggml : model card yaml tab->2xspace (#14819)

acd6cb1

CUDA: add fused rms norm (#14800)

8c988fa

CANN: weight format to NZ for Ascend310P3 (#14407)

14c28df

* weight format to nz for 310p * remove quant weight format to nz * clean code * fix * make the conditions for converting weights to NZ format consistent * clean code

ggml: fix loongarch quantize_row_q8_1 error (#14827)

6c88b3b

tests : add non-cont K,V FA tests

18f3b5f

ggml-ci

CUDA: fix quantized KV cache + multiple sequences (#14822)

07a19e2

* CUDA: fix quantized KV cache + multiple sequences * Update ggml/src/ggml-cuda/fattn-common.cuh Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

ci : correct label refactor->refactoring (#14832)

221c0e0

CUDA: fix compilation with GGML_CUDA_F16 (#14837)

b284197

CUDA: fix overflow in FA, tune performance (#14840)

a86f52b

Merge branch 'remote'

34b99d0

RemiliaForever requested a review from vinovo July 28, 2025 03:47

vinovo approved these changes Aug 3, 2025

View reviewed changes

vinovo merged commit 0f417cf into main Aug 3, 2025
50 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remilia/update remote #4

Remilia/update remote #4

Uh oh!

RemiliaForever commented Jul 28, 2025

Uh oh!

RemiliaForever commented Jul 28, 2025

Uh oh!

vinovo commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!

Remilia/update remote #4

Remilia/update remote #4

Uh oh!

Conversation

RemiliaForever commented Jul 28, 2025

Uh oh!

RemiliaForever commented Jul 28, 2025

Uh oh!

vinovo commented Aug 3, 2025

Uh oh!

Uh oh!

Uh oh!