-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Q2k interleaving implementation - x86/x64 SIMD #14373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Q2k interleaving implementation - x86/x64 SIMD #14373
Conversation
39ab344
to
c2c53bc
Compare
I tested this on a 13900k with gcc 13 and clang 19, but the improvement is not very significant. Repacking has a significant cost, since it increases load time and prevents usage of mmap, and as it is, I find this very hard to justify for AVX2. It may make sense for AVX512, but I cannot test that. GCC-13:
Clang-19:
|
75dd04b
to
3f6c61d
Compare
Hi @slaren , Thanks |
Hi @slaren , @ggerganov , |
} | ||
// Store the accumulated values | ||
for (int i = 0; i < 16; i++) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deduplicate the generic GEMV and GEMM implementations following #14897.
After that, feel free to merge.
6c758bb
to
a1053fb
Compare
Hi @slaren , @ggerganov , The code has been updated with de-duplication of generic code. Please let us know if the code is good for merging. Thanks |
Block Interleaving Formats
Block_Q2_Kx8 :
Performance Impact :
Gains of ~5.5 % seen with the AVX2 version and gains of ~25.5% seen with the AVX512 Version over the base commit with GCC Linux
GCC Linux :
Q2_K Model :
GCC Version = 12.3
Clang Linux:
More gains of ~26.3% seen with the AVX2 version and gains of ~53.9% seen with the AVX512 Version over the base commit with Clang Linux
Q2_K Model :
Clang Version = 20.1.0
The model tested was - https://huggingface.co/bartowski/Phi-3-mini-4k-instruct-GGUF
The PR was tested in AMD Ryzen 5 9600X which supports the following flags by default :
CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
Further the perplexity was tested and found to be similar with the Q2_K Model
The perplexity results are tabulated as follows :