Skip to content

Commit dd8368c

Browse files
committed
Rename binaries
1 parent 48d078f commit dd8368c

File tree

5 files changed

+35
-28
lines changed

5 files changed

+35
-28
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,13 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [1.19.0] - 2024-06-13
8+
9+
### Changed
10+
- [Server] Change binary `server` to `llama-server` to match renaming in llama.cpp project
11+
- [Tools] Change binary `tokenize` to `llama-tokenize` to match renaming in llama.cpp project
12+
- [Documentation] Update examples to match the state of the llama.cpp project
13+
714
## [1.18.0] - 2024-06-05
815

916
### Added

README.md

Lines changed: 25 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -84,12 +84,12 @@ To build llama.cpp binaries for a Windows environment with the best available BL
8484
8585
### 7. Download a large language model
8686

87-
Download a large language model (LLM) with weights in the GGUF format into the `./vendor/llama.cpp/models` directory. You can for example download the [OpenChat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) 7B model in a quantized GGUF format:
87+
Download a large language model (LLM) with weights in the GGUF format into the `./vendor/llama.cpp/models` directory. You can for example download the [openchat-3.6-8b-20240522](https://huggingface.co/openchat/openchat-3.6-8b-20240522) 8B model in a quantized GGUF format:
8888

89-
* https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF/resolve/main/openchat-3.5-0106.Q5_K_M.gguf
89+
* https://huggingface.co/bartowski/openchat-3.6-8b-20240522-GGUF/blob/main/openchat-3.6-8b-20240522-Q5_K_M.gguf
9090

9191
> [!TIP]
92-
> See the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) for best in class open source LLMs.
92+
> See the [🤗 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and [LMSYS Chatbot Arena Leaderboard](https://chat.lmsys.org/?leaderboard) for best in class open source LLMs.
9393
9494
## Usage
9595

@@ -98,7 +98,7 @@ Download a large language model (LLM) with weights in the GGUF format into the `
9898
You can easily chat with a specific model by using the [.\examples\server.ps1](./examples/server.ps1) script:
9999

100100
```PowerShell
101-
.\examples\server.ps1 -model ".\vendor\llama.cpp\models\openchat-3.5-0106.Q5_K_M.gguf"
101+
.\examples\server.ps1 -model ".\vendor\llama.cpp\models\openchat-3.6-8b-20240522-Q5_K_M.gguf"
102102
```
103103

104104
> [!NOTE]
@@ -115,13 +115,13 @@ Get-Help -Detailed .\examples\server.ps1
115115
You can now chat with the model:
116116

117117
```PowerShell
118-
./vendor/llama.cpp/build/bin/Release/main `
119-
--model "./vendor/llama.cpp/models/openchat-3.5-0106.Q5_K_M.gguf" `
118+
./vendor/llama.cpp/build/bin/Release/llama-cli `
119+
--model "./vendor/llama.cpp/models/openchat-3.6-8b-20240522-Q5_K_M.gguf" `
120120
--ctx-size 8192 `
121121
--threads 16 `
122-
--n-gpu-layers 32 `
122+
--n-gpu-layers 33 `
123123
--reverse-prompt '[[USER_NAME]]:' `
124-
--prompt-cache "./cache/openchat-3.5-0106.Q5_K_M.gguf.prompt" `
124+
--prompt-cache "./cache/openchat-3.6-8b-20240522-Q5_K_M.gguf.prompt" `
125125
--file "./vendor/llama.cpp/prompts/chat-with-vicuna-v1.txt" `
126126
--color `
127127
--interactive
@@ -132,11 +132,11 @@ You can now chat with the model:
132132
You can start llama.cpp as a webserver:
133133

134134
```PowerShell
135-
./vendor/llama.cpp/build/bin/Release/server `
136-
--model "./vendor/llama.cpp/models/openchat-3.5-0106.Q5_K_M.gguf" `
135+
./vendor/llama.cpp/build/bin/Release/llama-server `
136+
--model "./vendor/llama.cpp/models/openchat-3.6-8b-20240522-Q5_K_M.gguf" `
137137
--ctx-size 8192 `
138138
--threads 16 `
139-
--n-gpu-layers 32
139+
--n-gpu-layers 33
140140
```
141141

142142
And then access llama.cpp via the webinterface at:
@@ -154,20 +154,20 @@ rope_frequency_base = 10000 * context_scale
154154
```
155155

156156
> [!NOTE]
157-
> To increase the context size of an [OpenChat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) model from its original context size of `8192` to `32768` means, that the `context_scale` is `4.0`. The `rope_frequency_scale` will then be `0.25` and the `rope_frequency_base` equals `40000`.
157+
> To increase the context size of an [openchat-3.6-8b-20240522](https://huggingface.co/openchat/openchat-3.6-8b-20240522) model from its original context size of `8192` to `32768` means, that the `context_scale` is `4.0`. The `rope_frequency_scale` will then be `0.25` and the `rope_frequency_base` equals `40000`.
158158
159159
To extend the context to 32k execute the following:
160160

161161
```PowerShell
162-
./vendor/llama.cpp/build/bin/Release/main `
163-
--model "./vendor/llama.cpp/models/openchat-3.5-0106.Q5_K_M.gguf" `
162+
./vendor/llama.cpp/build/bin/Release/llama-cli `
163+
--model "./vendor/llama.cpp/models/openchat-3.6-8b-20240522-Q5_K_M.gguf" `
164164
--ctx-size 32768 `
165165
--rope-freq-scale 0.25 `
166166
--rope-freq-base 40000 `
167167
--threads 16 `
168-
--n-gpu-layers 32 `
168+
--n-gpu-layers 33 `
169169
--reverse-prompt '[[USER_NAME]]:' `
170-
--prompt-cache "./cache/openchat-3.5-0106.Q5_K_M.gguf.prompt" `
170+
--prompt-cache "./cache/openchat-3.6-8b-20240522-Q5_K_M.gguf.prompt" `
171171
--file "./vendor/llama.cpp/prompts/chat-with-vicuna-v1.txt" `
172172
--color `
173173
--interactive
@@ -178,12 +178,12 @@ To extend the context to 32k execute the following:
178178
You can enforce a specific grammar for the response generation. The following will always return a JSON response:
179179

180180
```PowerShell
181-
./vendor/llama.cpp/build/bin/Release/main `
182-
--model "./vendor/llama.cpp/models/openchat-3.5-0106.Q5_K_M.gguf" `
181+
./vendor/llama.cpp/build/bin/Release/llama-cli `
182+
--model "./vendor/llama.cpp/models/openchat-3.6-8b-20240522-Q5_K_M.gguf" `
183183
--ctx-size 8192 `
184184
--threads 16 `
185-
--n-gpu-layers 32 `
186-
--prompt-cache "./cache/openchat-3.5-0106.Q5_K_M.gguf.prompt" `
185+
--n-gpu-layers 33 `
186+
--prompt-cache "./cache/openchat-3.6-8b-20240522-Q5_K_M.gguf.prompt" `
187187
--prompt "The scientific classification (Taxonomy) of a Llama: " `
188188
--grammar-file "./vendor/llama.cpp/grammars/json.gbnf"
189189
--color
@@ -194,11 +194,11 @@ You can enforce a specific grammar for the response generation. The following wi
194194
Execute the following to measure the perplexity of the GGML formatted model:
195195

196196
```PowerShell
197-
./vendor/llama.cpp/build/bin/Release/perplexity `
198-
--model "./vendor/llama.cpp/models/openchat-3.5-0106.Q5_K_M.gguf" `
197+
./vendor/llama.cpp/build/bin/Release/llama-perplexity `
198+
--model "./vendor/llama.cpp/models/openchat-3.6-8b-20240522-Q5_K_M.gguf" `
199199
--ctx-size 8192 `
200200
--threads 16 `
201-
--n-gpu-layers 32 `
201+
--n-gpu-layers 33 `
202202
--file "./vendor/wikitext-2-raw-v1/wikitext-2-raw/wiki.test.raw"
203203
```
204204

@@ -208,15 +208,15 @@ You can easily count the tokens of a prompt for a specific model by using the [.
208208

209209
```PowerShell
210210
.\examples\count_tokens.ps1 `
211-
-model ".\vendor\llama.cpp\models\openchat-3.5-0106.Q5_K_M.gguf" `
211+
-model ".\vendor\llama.cpp\models\openchat-3.6-8b-20240522-Q5_K_M.gguf" `
212212
-file ".\prompts\chat_with_llm.txt"
213213
```
214214

215215
To inspect the actual tokenization result you can use the `-debug` flag:
216216

217217
```PowerShell
218218
.\examples\count_tokens.ps1 `
219-
-model ".\vendor\llama.cpp\models\openchat-3.5-0106.Q5_K_M.gguf" `
219+
-model ".\vendor\llama.cpp\models\openchat-3.6-8b-20240522-Q5_K_M.gguf" `
220220
-prompt "Hello Word!" `
221221
-debug
222222
```

examples/count_tokens.ps1

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ if ($debug) {
7676
}
7777

7878
# We are only interested in the numerical token IDs array format like [1, 2, 3].
79-
$tokensPythonArrayString = Invoke-Expression "${llamaCppPath}\build\bin\Release\tokenize.exe ``
79+
$tokensPythonArrayString = Invoke-Expression "${llamaCppPath}\build\bin\Release\llama-tokenize ``
8080
--log-disable ``
8181
--ids ``
8282
$(if ($modelPath) {"--model '${modelPath}'"}) ``

examples/server.ps1

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@ Start-Job -Name 'BrowserJob' -ScriptBlock {
280280

281281
Write-Host "Starting llama.cpp server with custom options..." -ForegroundColor "Yellow"
282282

283-
$command = "${llamaCppPath}\build\bin\Release\server ``
283+
$command = "${llamaCppPath}\build\bin\Release\llama-server ``
284284
--n-predict 1024 ``
285285
--log-disable ``
286286
--port '${port}' ``

vendor/llama.cpp

0 commit comments

Comments
 (0)