Fix ARM builds #1529

johnnynunez · 2025-10-23T22:31:51Z

No description provided.

gemini-code-assist · 2025-10-23T22:32:11Z

Summary of Changes

Hello @johnnynunez, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving and fixing the build process for ARM architectures, specifically on aarch64 Linux. It integrates more precise ARM CPU instruction detection, applies appropriate compiler optimizations to leverage advanced ARM features, and refines core matrix multiplication routines to ensure compatibility and optimal performance. These changes are vital for enhancing the project's support and efficiency on ARM-based systems.

Highlights

ARM CPU Instruction Detection: Enhanced the setup.py script to accurately detect ARM CPU features like Scalable Vector Extension (SVE) and NEON on aarch64 Linux systems, improving the build configuration for ARM architectures.
ARM-specific CMake Flags: Introduced explicit C and C++ compiler flags (-march=armv8.6-a+...) for aarch64 Linux builds within setup.py, enabling a broad range of advanced ARMv8.6-a features for optimized performance.
Quantization Matrix Multiplication Adjustments: Modified iqk_mul_mat.inc and tinyblas_cpu_sgemm.inc to adapt quantization matrix multiplication functions for ARM, including changes to function signatures and how scales are loaded for Q80 and Q81 types using NEON intrinsics.
Simplified ARM SGEMM Dispatch: In sgemm.cpp, the ARM-specific SGEMM (Single-precision General Matrix Multiply) dispatch logic was simplified to directly utilize ARMv8.2+ optimized functions, removing runtime hardware capability checks for a more streamlined approach.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request aims to add support for ARM builds, but it introduces several critical issues that need to be addressed. The changes to the build script break support for Windows and use hardcoded CPU features that will limit ARM compatibility. More critically, changes in sgemm.cpp break builds for all non-ARM architectures by unconditionally selecting ARM-specific kernels. There are also some performance and clarity issues in the NEON implementation for quantized matrix multiplication. Please review the suggested changes to resolve these regressions and improve the implementation.

gemini-code-assist · 2025-10-23T22:33:55Z

setup.py

        if sys.platform.startswith("linux"):
            with open('/proc/cpuinfo', 'r', encoding="utf-8") as cpu_f:
                cpuinfo = cpu_f.read()
-            flags_line = [line for line in cpuinfo.split(
-                '\n') if line.startswith('flags')][0]
-            flags = flags_line.split(':')[1].strip().split(' ')
-            # fancy with AVX512-VL, AVX512-BW, AVX512-DQ, AVX512-VNNI
-            for flag in flags:
-                if 'avx512bw' in flag:
-                    return 'fancy'
-            for flag in flags:
-                if 'avx512' in flag:
-                    return 'avx512'
-            for flag in flags:
-                if 'avx2' in flag:
-                    return 'avx2'
-            raise ValueError(
-                "Unsupported cpu Instructions: {}".format(flags_line))
-        elif sys.platform == "win32":
-            from cpufeature.extension import CPUFeature
-
-            if CPUFeature.get("AVX512bw", False):
-                return 'fancy'
-            if CPUFeature.get("AVX512f", False):
-                return 'avx512'
-            if CPUFeature.get("AVX2", False):
-                return 'avx2'
-            raise ValueError(
-                "Unsupported cpu Instructions: {}".format(str(CPUFeature)))
-        else:
-            raise ValueError("Unsupported platform: {}".format(sys.platform))
+            if platform.machine() == "aarch64":
+                # Adapt this part based on GH200's /proc/cpuinfo
+                for line in cpuinfo.split('\n'):
+                    if line.startswith('Features'):
+                        features_line = line
+                        features = features_line.split(':')[1].strip().split(' ')
+                        if 'sve' in features: # Example: Scalable Vector Extension
+                            return 'sve' # Or a custom label
+                        elif 'neon' in features:
+                            return 'neon'
+                        else:
+                            print("Using generic Arm CPU instructions")
+                            return 'native_arm' # Or a default Arm label
+                print("Warning: Could not find 'Features' line in /proc/cpuinfo on aarch64. Using native.")
+                return 'native' # Fallback for aarch64 if 'Features' not found
+            else: # Assume x86-like if not aarch64
+                for line in cpuinfo.split('\n'):
+                    if line.startswith('flags'):
+                        flags_line = line
+                        flags = flags_line.split(':')[1].strip().split(' ')
+                        if 'avx512bw' in flags:
+                            return 'fancy'
+                        elif 'avx512' in flags:
+                            return 'avx512'
+                        elif 'avx2' in flags:
+                            return 'avx2'
+                        raise ValueError(
+                            "Unsupported cpu Instructions: {}".format(flags_line))
+                print("Warning: Could not find 'flags' line in /proc/cpuinfo on x86-like. Using native.")
+                return 'native' # Fallback for x86-like if 'flags' not found


This change removes support for Windows (win32) and other non-Linux platforms from the get_cpu_instruct method. The previous implementation had specific logic for Windows and a fallback for other platforms. On non-Linux systems, this function will now return None, which will likely cause failures later in the build process. While this PR focuses on ARM (which is often Linux-based), breaking existing platform support is a significant regression. Please restore the support for other platforms like Windows to avoid breaking builds for other users.

setup.py

gemini-code-assist · 2025-10-23T22:33:56Z

third_party/llamafile/iqk_mul_mat.inc

+        const block_q8_0 * y4 = (const block_q8_0 *)y[iy] + i;
+        float16_t d_val = GGML_FP16_TO_FP32(y4->d);
+        return vdup_n_f16(d_val);


The conversion from ggml_fp16_t to float16_t via GGML_FP16_TO_FP32 is inefficient. It involves a round trip from half-precision to single-precision and back to half-precision, which is unnecessary and adds overhead in an inline function. You can directly use y4->d with vdup_n_f16 as the compiler can handle the cast.

return vdup_n_f16(y4->d);

@johnnynunez Since you have tested on your device, could you please attach your running screenshot and your platform info from nvidia-smi and lscpu, for example? ( Because our next version will drop the support for llamafile ( for performance consideration)

GH200 @KMSorSMS

Architecture: aarch64 CPU op-mode(s): 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Vendor ID: ARM Model name: Neoverse-V2 Model: 0 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 1 Stepping: r0p0 BogoMIPS: 2000.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc f lagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti NUMA: NUMA node(s): 9 NUMA node0 CPU(s): 0-63 NUMA node1 CPU(s): NUMA node2 CPU(s): NUMA node3 CPU(s): NUMA node4 CPU(s): NUMA node5 CPU(s): NUMA node6 CPU(s): NUMA node7 CPU(s): NUMA node8 CPU(s): Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Reg file data sampling: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Not affected Srbds: Not affected Tsx async abort: Not affected

johnnynunez · 2025-10-24T04:01:50Z

KMSorSMS · 2025-10-29T02:31:14Z

@johnnynunez I think we should not hardcode the arch features. Have you tried native?
        if platform.system() == "Linux" and platform.machine() == "aarch64":
            # Using -march=native is more portable as it optimizes for the host CPU.
            cmake_args += [
                "-DCMAKE_C_FLAGS=-march=native",
                "-DCMAKE_CXX_FLAGS=-march=native",
            ]
And go further, I suppose we can add some features to cmakelist(https://github.com/kvcache-ai/ktransformers/blob/main/csrc/ktransformers_ext/CMakeLists.txt) there. Will you do this change or I may try to find an arm machine to fix and test?

Hey, I'm just wondering if you missed seeing this? And you may try to fix the conflict with the current main branch. Sorry that we blocked this pr as we are going to migrate our mat kernel part into other high-performance libraries, so this feature (llamafile) will be deprecated in the future. And gptq or awq quant method seems more efficient for cpu hardware. But it's welcome to give an update to llamafile part, and if it will be smoothly merged, we will accept.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

johnnynunez · 2025-10-29T02:39:36Z

@johnnynunez I think we should not hardcode the arch features. Have you tried native?
        if platform.system() == "Linux" and platform.machine() == "aarch64":
            # Using -march=native is more portable as it optimizes for the host CPU.
            cmake_args += [
                "-DCMAKE_C_FLAGS=-march=native",
                "-DCMAKE_CXX_FLAGS=-march=native",
            ]
And go further, I suppose we can add some features to cmakelist(https://github.com/kvcache-ai/ktransformers/blob/main/csrc/ktransformers_ext/CMakeLists.txt) there. Will you do this change or I may try to find an arm machine to fix and test?
Hey, I'm just wondering if you missed seeing this? And you may try to fix the conflict with the current main branch. Sorry that we blocked this pr as we are going to migrate our mat kernel part into other high-performance libraries, so this feature (llamafile) will be deprecated in the future. And gptq or awq quant method seems more efficient for cpu hardware. But it's welcome to give an update to llamafile part, and if it will be smoothly merged, we will accept.

im coming to at home after pytorch conference. I will check when i arrive

KMSorSMS · 2025-10-29T02:43:53Z

ok, thanks for your assistance ❤️.

johnnynunez added 7 commits October 23, 2025 15:00

fix arm

5baa276

fix arm

0c81f25

fix arm

5824bac

fix arm

5433ab2

fix arm

4eb6bff

fix arm

8d7d9bf

fix arm

0d78609

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

johnnynunez changed the title ~~[NVIDIA] Fix ARM builds~~ Fix ARM builds Oct 24, 2025

johnnynunez mentioned this pull request Oct 24, 2025

is this project dead? #1490

Open

ovowei requested a review from KMSorSMS October 24, 2025 10:16

KMSorSMS requested a review from chenht2022 October 24, 2025 12:11

Update setup.py

6b6a34f

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ARM builds #1529

Fix ARM builds #1529

johnnynunez commented Oct 23, 2025

Uh oh!

gemini-code-assist bot commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

Uh oh!

gemini-code-assist bot Oct 23, 2025

Uh oh!

KMSorSMS Oct 24, 2025 •

edited

Loading

Uh oh!

johnnynunez Oct 24, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Oct 24, 2025

Uh oh!

KMSorSMS commented Oct 29, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Oct 29, 2025

Uh oh!

KMSorSMS commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix ARM builds #1529

Are you sure you want to change the base?

Fix ARM builds #1529

Conversation

johnnynunez commented Oct 23, 2025

Uh oh!

gemini-code-assist bot commented Oct 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

KMSorSMS Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnynunez Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johnnynunez commented Oct 24, 2025

Uh oh!

KMSorSMS commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Oct 29, 2025

Uh oh!

KMSorSMS commented Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KMSorSMS Oct 24, 2025 •

edited

Loading

johnnynunez Oct 24, 2025 •

edited

Loading

KMSorSMS commented Oct 29, 2025 •

edited

Loading