- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Fix ARM builds #1529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix ARM builds #1529
Conversation
| Summary of ChangesHello @johnnynunez, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving and fixing the build process for ARM architectures, specifically on  Highlights
 Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either  
 Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a  Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to add support for ARM builds, but it introduces several critical issues that need to be addressed. The changes to the build script break support for Windows and use hardcoded CPU features that will limit ARM compatibility. More critically, changes in sgemm.cpp break builds for all non-ARM architectures by unconditionally selecting ARM-specific kernels. There are also some performance and clarity issues in the NEON implementation for quantized matrix multiplication. Please review the suggested changes to resolve these regressions and improve the implementation.
| if sys.platform.startswith("linux"): | ||
| with open('/proc/cpuinfo', 'r', encoding="utf-8") as cpu_f: | ||
| cpuinfo = cpu_f.read() | ||
| flags_line = [line for line in cpuinfo.split( | ||
| '\n') if line.startswith('flags')][0] | ||
| flags = flags_line.split(':')[1].strip().split(' ') | ||
| # fancy with AVX512-VL, AVX512-BW, AVX512-DQ, AVX512-VNNI | ||
| for flag in flags: | ||
| if 'avx512bw' in flag: | ||
| return 'fancy' | ||
| for flag in flags: | ||
| if 'avx512' in flag: | ||
| return 'avx512' | ||
| for flag in flags: | ||
| if 'avx2' in flag: | ||
| return 'avx2' | ||
| raise ValueError( | ||
| "Unsupported cpu Instructions: {}".format(flags_line)) | ||
| elif sys.platform == "win32": | ||
| from cpufeature.extension import CPUFeature | ||
|  | ||
| if CPUFeature.get("AVX512bw", False): | ||
| return 'fancy' | ||
| if CPUFeature.get("AVX512f", False): | ||
| return 'avx512' | ||
| if CPUFeature.get("AVX2", False): | ||
| return 'avx2' | ||
| raise ValueError( | ||
| "Unsupported cpu Instructions: {}".format(str(CPUFeature))) | ||
| else: | ||
| raise ValueError("Unsupported platform: {}".format(sys.platform)) | ||
| if platform.machine() == "aarch64": | ||
| # Adapt this part based on GH200's /proc/cpuinfo | ||
| for line in cpuinfo.split('\n'): | ||
| if line.startswith('Features'): | ||
| features_line = line | ||
| features = features_line.split(':')[1].strip().split(' ') | ||
| if 'sve' in features: # Example: Scalable Vector Extension | ||
| return 'sve' # Or a custom label | ||
| elif 'neon' in features: | ||
| return 'neon' | ||
| else: | ||
| print("Using generic Arm CPU instructions") | ||
| return 'native_arm' # Or a default Arm label | ||
| print("Warning: Could not find 'Features' line in /proc/cpuinfo on aarch64. Using native.") | ||
| return 'native' # Fallback for aarch64 if 'Features' not found | ||
| else: # Assume x86-like if not aarch64 | ||
| for line in cpuinfo.split('\n'): | ||
| if line.startswith('flags'): | ||
| flags_line = line | ||
| flags = flags_line.split(':')[1].strip().split(' ') | ||
| if 'avx512bw' in flags: | ||
| return 'fancy' | ||
| elif 'avx512' in flags: | ||
| return 'avx512' | ||
| elif 'avx2' in flags: | ||
| return 'avx2' | ||
| raise ValueError( | ||
| "Unsupported cpu Instructions: {}".format(flags_line)) | ||
| print("Warning: Could not find 'flags' line in /proc/cpuinfo on x86-like. Using native.") | ||
| return 'native' # Fallback for x86-like if 'flags' not found | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change removes support for Windows (win32) and other non-Linux platforms from the get_cpu_instruct method. The previous implementation had specific logic for Windows and a fallback for other platforms. On non-Linux systems, this function will now return None, which will likely cause failures later in the build process. While this PR focuses on ARM (which is often Linux-based), breaking existing platform support is a significant regression. Please restore the support for other platforms like Windows to avoid breaking builds for other users.
| const block_q8_0 * y4 = (const block_q8_0 *)y[iy] + i; | ||
| float16_t d_val = GGML_FP16_TO_FP32(y4->d); | ||
| return vdup_n_f16(d_val); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion from ggml_fp16_t to float16_t via GGML_FP16_TO_FP32 is inefficient. It involves a round trip from half-precision to single-precision and back to half-precision, which is unnecessary and adds overhead in an inline function. You can directly use y4->d with vdup_n_f16 as the compiler can handle the cast.
        return vdup_n_f16(y4->d);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@johnnynunez Since you have tested on your device, could you please attach your running screenshot and your platform info from nvidia-smi and lscpu, for example? ( Because our next version will drop the support for llamafile ( for performance consideration)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GH200 @KMSorSMS

Architecture:             aarch64
  CPU op-mode(s):         64-bit
  Byte Order:             Little Endian
CPU(s):                   64
  On-line CPU(s) list:    0-63
Vendor ID:                ARM
  Model name:             Neoverse-V2
    Model:                0
    Thread(s) per core:   1
    Core(s) per socket:   64
    Socket(s):            1
    Stepping:             r0p0
    BogoMIPS:             2000.00
    Flags:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc f
                          lagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh bti
NUMA:                     
  NUMA node(s):           9
  NUMA node0 CPU(s):      0-63
  NUMA node1 CPU(s):      
  NUMA node2 CPU(s):      
  NUMA node3 CPU(s):      
  NUMA node4 CPU(s):      
  NUMA node5 CPU(s):      
  NUMA node6 CPU(s):      
  NUMA node7 CPU(s):      
  NUMA node8 CPU(s):      
Vulnerabilities:          
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; __user pointer sanitization
  Spectre v2:             Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected| 
 Hey, I'm just wondering if you missed seeing this? And you may try to fix the conflict with the current main branch. Sorry that we blocked this pr as we are going to migrate our mat kernel part into other high-performance libraries, so this feature (llamafile) will be deprecated in the future. And gptq or awq quant method seems more efficient for cpu hardware. But it's welcome to give an update to llamafile part, and if it will be smoothly merged, we will accept. | 
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
| 
 im coming to at home after pytorch conference. I will check when i arrive | 
| ok, thanks for your assistance ❤️. | 

No description provided.