SYCL: Add set_rows support for quantized types #14883

qnixsynapse · 2025-07-26T06:34:07Z

This change adds support for GGML_OP_SET_ROWS operation for various quantized tensor types (Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, IQ4_NL) and BF16 type in the SYCL backend.

The quantization/dequantization copy kernels were moved from cpy.cpp to cpy.hpp to make them available for set_rows.cpp.

This addresses part of the TODOs mentioned in the code.

Please note: I have also added support for BF16 type as well. I am aware that not all GPUs support it. If there is a way to disable it in device_supports_op without declaring compiler definition such as GGML_SYCL_F16, please let me know. Before that please don't merge it.

Performance comparison

Model	Batch size	Test	t/s master (LLAMA_SET_ROWS=0)	t/s sycl/set_rows_q_n_bf16 (LLAMA_SET_ROWS=1)	Speedup
qwen3 1.7B Q8_0	64	pp1024	720.70	717.94	1.00
qwen3 1.7B Q8_0	128	pp1024	1328.80	1333.39	1.00
qwen3 1.7B Q8_0	256	pp1024	2367.32	2350.50	0.99
qwen3 1.7B Q8_0	512	pp1024	3806.25	3762.84	0.99
qwen3 1.7B Q8_0	1024	pp1024	3858.31	3798.22	0.98

Nearly identical for quantized set_rows but has room for further improvements in the future.

Update: A block size of 256 gives the best result so far on an A750 GPU:

Model	Batch size	Test	t/s master (LLAMA_SET_ROWS=0)	t/s sycl/set_rows_q_n_bf16 (LLAMA_SET_ROWS=1)	Speedup
qwen3 1.7B Q8_0	64	pp1024	719.77	719.82	1.00
qwen3 1.7B Q8_0	128	pp1024	1332.75	1336.66	1.00
qwen3 1.7B Q8_0	256	pp1024	2365.29	2348.92	0.99
qwen3 1.7B Q8_0	512	pp1024	3805.34	3782.11	0.99
qwen3 1.7B Q8_0	1024	pp1024	3854.83	3825.46	0.99

This commit adds support for GGML_OP_SET_ROWS operation for various quantized tensor types (Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, IQ4_NL) and BF16 type in the SYCL backend. The quantization/dequantization copy kernels were moved from cpy.cpp to cpy.hpp to make them available for set_rows.cpp. This addresses part of the TODOs mentioned in the code.

ggml-ci

Alcpz · 2025-07-28T10:05:30Z

The bfloat16 extension states that it's supported in all GPUs, with the caveat that unsupported GPUs emulate the behavior in software: https://github.com/intel/llvm/blob/27dab6ce45c073ffbe7706747d6feee80a94dd49/sycl/doc/extensions/experimental/sycl_ext_oneapi_bfloat16_math_functions.asciidoc#overview

I´d say it's safe to merge from an usability perspective. I haven't seen any mechanism in DPC++ to discern if bfloat16 is supported natively on the device at runtime, so I don't know how any performance implications could be avoided.

Rbiessy

LGTM!

ggml/src/ggml-sycl/cpy.hpp

ggml/src/ggml-sycl/set_rows.cpp

SYCL: Add set_rows support for quantized types (ggml-org#14883)

qnixsynapse added 2 commits July 26, 2025 11:55

Use get_global_linear_id() instead

0afd073

ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 26, 2025

qnixsynapse added 3 commits July 26, 2025 18:27

Fix formatting

9539e37

ggml-ci

Use const for ne11 and size_t variables in set_rows_sycl_q

d1e09ef

ggml-ci

Increase block size for q kernel to 256

34b8f02

ggml-ci

Rbiessy reviewed Jul 28, 2025

View reviewed changes

ggml/src/ggml-sycl/cpy.hpp Outdated Show resolved Hide resolved

ggml/src/ggml-sycl/set_rows.cpp Outdated Show resolved Hide resolved

Cleanup imports

e5818d4

Rbiessy approved these changes Jul 28, 2025

View reviewed changes

Add float.h to cpy.hpp

3ad45b1

Alcpz approved these changes Jul 28, 2025

View reviewed changes

qnixsynapse merged commit cd1fce6 into master Jul 28, 2025
47 checks passed

qnixsynapse deleted the sycl/set_rows_q_n_bf16 branch July 28, 2025 15:02

BradHutchings added a commit to BradHutchings/Mmojo-Server that referenced this pull request Jul 28, 2025

Merge pull request #104 from ggml-org/master

e09c4a2

SYCL: Add set_rows support for quantized types (ggml-org#14883)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SYCL: Add set_rows support for quantized types #14883

SYCL: Add set_rows support for quantized types #14883

Uh oh!

qnixsynapse commented Jul 26, 2025 •

edited

Loading

Uh oh!

Alcpz commented Jul 28, 2025

Uh oh!

Rbiessy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SYCL: Add set_rows support for quantized types #14883

SYCL: Add set_rows support for quantized types #14883

Uh oh!

Conversation

qnixsynapse commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance comparison

Uh oh!

Alcpz commented Jul 28, 2025

Uh oh!

Rbiessy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qnixsynapse commented Jul 26, 2025 •

edited

Loading