can't load 98 gb model despite having 96gb ram + 24gb vram #17396
Answered
by
taronaeo
ZuppaTuscana
asked this question in
Q&A
-
|
Any idea why I can't load this model? according to this benchmark https://huggingface.co/ubergarm/GLM-4.6-GGUF/discussions/5 I should be able to. |
Beta Was this translation helpful? Give feedback.
Answered by
taronaeo
Nov 22, 2025
Replies: 1 comment
-
You ran out of memory on the GPU. If you want it to spillover into system memory, you need to provide the GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 ./build/bin/llama-server -m /home/user/Documents/ik_llama.cpp/models/GLM-4.6-smol-IQ2_KS-00001-of-00003.gguf --alias GLM-4.6-IQ2_KS --ctx-size 32768 --n-gpu-layers 99 -ot exps=CPU -fa 1 -ub 4096 -b 4096 --threads 8 --host 127.0.0.1 --port 8080 -cram -1 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
ZuppaTuscana
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You ran out of memory on the GPU. If you want it to spillover into system memory, you need to provide the
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1flag, f.ex.,