We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 845a05e commit 0f89d32Copy full SHA for 0f89d32
examples/server_phind_codellama_34b_v2_32K.ps1
@@ -0,0 +1,15 @@
1
+Start-Process "http://127.0.0.1:8080"
2
+
3
+# We are increasing the context size of a Llama 2 model from 4096 token
4
+# to 32768 token, which is a ctx_scale of 8.0. The paramters formula is:
5
+#
6
+# --rope-freq-scale = 1 / ctx_scale
7
+# --rope-freq-base = 10000 * ctx_scale
8
9
+../vendor/llama.cpp/build/bin/Release/server `
10
+ --model "../vendor/llama.cpp/models/Phind-CodeLlama-34B-v2/model-quantized-q4_K_M.gguf" `
11
+ --ctx-size 16384 `
12
+ --rope-freq-scale 0.125 `
13
+ --rope-freq-base 80000 `
14
+ --threads 16 `
15
+ --n-gpu-layers 10
0 commit comments