Skip to content

Commit 3ba2f74

Browse files
committed
Reduce KV cache data type support to f32, f16, q8_0 and q4_0
1 parent 518dd98 commit 3ba2f74

File tree

3 files changed

+3
-4
lines changed

3 files changed

+3
-4
lines changed

examples/server.ps1

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Specifies the number of layers offloaded into the GPU.
2323
Specifies the models context length it was trained on.
2424
2525
.PARAMETER kvCacheDataType
26-
Specifies the KV cache data type (options: f32, f16, q8_0, q4_0, q4_1, iq4_nl, q5_0, or q5_1).
26+
Specifies the KV cache data type (options: f32, f16, q8_0, q4_0).
2727
2828
.PARAMETER verbose
2929
Increases the verbosity of the llama.cpp server.
@@ -38,7 +38,7 @@ Increases the verbosity of the llama.cpp server.
3838
.\server.ps1 -model "C:\models\openchat-3.5-0106.Q5_K_M.gguf" -contextSize 4096 -numberOfGPULayers 10
3939
4040
.EXAMPLE
41-
.\server.ps1 -model "C:\models\openchat-3.5-0106.Q5_K_M.gguf" -port 8081
41+
.\server.ps1 -model "C:\models\openchat-3.5-0106.Q5_K_M.gguf" -port 8081 -kvCacheDataType q8_0
4242
4343
.EXAMPLE
4444
.\server.ps1 -model "..\vendor\llama.cpp\models\openchat-3.5-0106.Q5_K_M.gguf" -verbose

rebuild_llama.cpp.ps1

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,6 @@ switch ($blasAccelerator) {
153153
cmake `
154154
-DLLAMA_CUDA=ON `
155155
-DLLAMA_CCACHE=OFF `
156-
-DLLAMA_CUDA_FA_ALL_QUANTS=ON `
157156
..
158157
}
159158

vendor/llama.cpp

0 commit comments

Comments
 (0)