opencl: tiled mul_mat with local memory for f16 and f32 #14809

lhez · 2025-07-22T07:08:26Z

This PR adds another variant of tiled matmul for f32 and f16. They also use local memory for tiling and pretty much follow the standard pattern. The main difference from #14535 is that tiles from both src0 and src1 are transposed.

On Adreno 830,

master

model	size	params	backend	ngl	test	t/s
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	pp512	145.45 ± 2.56
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	tg128	17.68 ± 0.14

this PR

model	size	params	backend	ngl	test	t/s
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	pp512	174.30 ± 10.32
qwen2 1.5B F16	2.88 GiB	1.54 B	OpenCL	99	tg128	17.73 ± 0.12

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jul 22, 2025

opencl: add mul_mat_f32_f32_l4_lm and mul_mat_f16_f32_l4_lm

33e5f11

lhez force-pushed the mul-mm-f16-f32-lm branch from 5666ed9 to 33e5f11 Compare July 29, 2025 05:02

lhez marked this pull request as ready for review July 29, 2025 06:29

lhez requested review from max-krasnyansky and rmatif July 29, 2025 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: tiled mul_mat with local memory for f16 and f32 #14809

opencl: tiled mul_mat with local memory for f16 and f32 #14809

lhez commented Jul 22, 2025

Uh oh!

Uh oh!

opencl: tiled mul_mat with local memory for f16 and f32 #14809

Are you sure you want to change the base?

opencl: tiled mul_mat with local memory for f16 and f32 #14809

Conversation

lhez commented Jul 22, 2025

Uh oh!

Uh oh!