Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 03e8913

Browse files
authored
Merge pull request #194 from janhq/docs-fix-27-11
Minor fix on Nitro Docs
2 parents 0544364 + 70bde1a commit 03e8913

22 files changed

+165
-220
lines changed

docs/docs/demos/chatbox-vid.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.

docs/docs/examples/chatbox.md

Lines changed: 0 additions & 63 deletions
This file was deleted.

docs/docs/examples/jan.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Nitro with Jan
3+
description: Nitro integrates with Jan to enable a ChatGPT-like functional app, optimized for local AI.
34
---
45

56
You can effortlessly utilize Nitro through [Jan](https://jan.ai/), as it is fully integrated with all its functions. With Jan, using Nitro becomes straightforward without the need for any coding.

docs/docs/examples/openai-node.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
---
22
title: Nitro with openai-node
3+
description: Nitro intergration guide for Node.js.
34
---
45

56
You can migrate from OAI API or Azure OpenAI to Nitro using your existing NodeJS code quickly
6-
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
7+
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
78
- NodeJS OpenAI SDK: https://www.npmjs.com/package/openai
89

910
## Chat Completion
@@ -240,17 +241,23 @@ embedding();
240241
</table>
241242

242243
## Audio
243-
Coming soon
244+
245+
:::info Coming soon
246+
:::
244247

245248
## How to reproduce
246-
1. Step 1: Dependencies installation
247-
```
249+
250+
**Step 1:** Dependencies installation
251+
252+
```bash
248253
npm install --save openai typescript
249254
# or
250255
yarn add openai
251256
```
252-
2. Step 2: Fill `tsconfig.json`
253-
```json
257+
258+
**Step 2:** Fill `tsconfig.json`
259+
260+
```js
254261
{
255262
"compilerOptions": {
256263
"moduleResolution": "node",
@@ -263,7 +270,9 @@ yarn add openai
263270
"lib": ["es2015"]
264271
}
265272
```
266-
3. Step 3: Fill `index.ts` file with code
267-
3. Step 4: Build with `npx tsc`
268-
4. Step 5: Run the code with `node dist/index.js`
269-
5. Step 6: Enjoy!
273+
274+
**Step 3:** Fill `index.ts` file with code.
275+
276+
**Step 4:** Build with `npx tsc`.
277+
278+
**Step 5:** Run the code with `node dist/index.js`.

docs/docs/examples/openai-python.md

Lines changed: 28 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
---
22
title: Nitro with openai-python
3+
description: Nitro intergration guide for Python.
34
---
45

56

67
You can migrate from OAI API or Azure OpenAI to Nitro using your existing Python code quickly
7-
> The ONLY thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
8+
> The **ONLY** thing you need to do is to override `baseURL` in `openai` init with `Nitro` URL
89
- Python OpenAI SDK: https://pypi.org/project/openai/
910

1011
## Chat Completion
@@ -22,7 +23,10 @@ import asyncio
2223
from openai import AsyncOpenAI
2324

2425
# gets API Key from environment variable OPENAI_API_KEY
25-
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
26+
client = AsyncOpenAI(
27+
base_url="http://localhost:3928/v1/",
28+
api_key="sk-xxx"
29+
)
2630

2731

2832
async def main() -> None:
@@ -74,22 +78,16 @@ asyncio.run(main())
7478
```python
7579
from openai import AzureOpenAI
7680

77-
openai.api_key = '...' # Default is environment variable AZURE_OPENAI_API_KEY
81+
openai.api_key = '...' # Default is AZURE_OPENAI_API_KEY
7882

7983
stream = AzureOpenAI(
8084
api_version=api_version,
81-
# https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal#create-a-resource
8285
azure_endpoint="https://example-endpoint.openai.azure.com",
8386
)
8487

8588
completion = client.chat.completions.create(
8689
model="deployment-name", # e.g. gpt-35-instant
87-
messages=[
88-
{
89-
"role": "user",
90-
"content": "How do I output all files in a directory using Python?",
91-
},
92-
],
90+
messages=[{"role": "user", "content": "Say this is a test"}],
9391
stream=True,
9492
)
9593
for part in stream:
@@ -115,11 +113,15 @@ import asyncio
115113
from openai import AsyncOpenAI
116114

117115
# gets API Key from environment variable OPENAI_API_KEY
118-
client = AsyncOpenAI(base_url="http://localhost:3928/v1/", api_key="sk-xxx")
116+
client = AsyncOpenAI(base_url="http://localhost:3928/v1/",
117+
api_key="sk-xxx")
119118

120119

121120
async def main() -> None:
122-
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
121+
embedding = await client.embeddings.create(
122+
input='Hello How are you?',
123+
model='text-embedding-ada-002'
124+
)
123125
print(embedding)
124126

125127
asyncio.run(main())
@@ -140,7 +142,10 @@ client = AsyncOpenAI(api_key="sk-xxx")
140142

141143

142144
async def main() -> None:
143-
embedding = await client.embeddings.create(input='Hello How are you?', model='text-embedding-ada-002')
145+
embedding = await client.embeddings.create(
146+
input='Hello How are you?',
147+
model='text-embedding-ada-002'
148+
)
144149
print(embedding)
145150

146151
asyncio.run(main())
@@ -173,13 +178,17 @@ print(embeddings)
173178
</table>
174179

175180
## Audio
176-
Coming soon
181+
182+
:::info Coming soon
183+
:::
177184

178185
## How to reproduce
179-
1. Step 1: Dependencies installation
180-
```
186+
**Step 1:** Dependencies installation.
187+
188+
```bash title="Install OpenAI"
181189
pip install openai
182190
```
183-
3. Step 2: Fill `index.py` file with code
184-
4. Step 3: Run the code with `python index.py`
185-
5. Step 5: Enjoy!
191+
192+
**Step 2:** Fill `index.py` file with code.
193+
194+
**Step 3:** Run the code with `python index.py`.

docs/docs/examples/palchat.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
title: Nitro with Pal Chat
3+
description: Nitro intergration guide for mobile device usage.
34
---
45

56
This guide demonstrates how to use Nitro with Pal Chat, enabling local AI chat capabilities on mobile devices.
@@ -15,15 +16,15 @@ Pal is a mobile app available on the App Store. It offers a customizable chat pl
1516
**1. Start Nitro server**
1617

1718
Open your terminal:
18-
```
19+
```bash title="Run Nitro"
1920
nitro
2021
```
2122

2223
**2. Download Model**
2324

2425
Use these commands to download and save the [Llama2 7B chat model](https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main):
2526

26-
```bash
27+
```bash title="Get a model"
2728
mkdir model && cd model
2829
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
2930
```
@@ -34,7 +35,7 @@ wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GG
3435

3536
To load the model, use the following command:
3637

37-
```
38+
```bash title="Load model to the server"
3839
curl http://localhost:3928/inferences/llamacpp/loadmodel \
3940
-H 'Content-Type: application/json' \
4041
-d '{
@@ -44,11 +45,13 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
4445
}'
4546
```
4647

47-
**4. Config Pal Chat**
48+
**4. Configure Pal Chat**
49+
50+
In the `OpenAI API Key` field, just type any random text (e.g. key-xxxxxx).
4851

49-
Adjust the `provide custom host` setting under `advanced settings` in Pal Chat to connect with Nitro. Enter your LAN IPv4 address (It should be something like 192.xxx.x.xxx).
52+
Adjust the `provide custom host` setting under `advanced settings` in Pal Chat with your LAN IPv4 address (a series of numbers like 192.xxx.x.xxx).
5053

51-
> For instruction read: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
54+
> For instruction: [How to find your IP](https://support.microsoft.com/en-us/windows/find-your-ip-address-in-windows-f21a9bbc-c582-55cd-35e0-73431160a1b9)
5255
5356
![PalChat](img/pal.png)
5457

docs/docs/features/chat.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
11
---
22
title: Chat Completion
3+
description: Inference engine for chat completion, the same as OpenAI's
34
---
45

56
The Chat Completion feature in Nitro provides a flexible way to interact with any local Large Language Model (LLM).
67

7-
## Single Request Example
8+
### Single Request Example
89

910
To send a single query to your chosen LLM, follow these steps:
1011

docs/docs/features/cont-batch.md

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,19 @@
11
---
22
title: Continuous Batching
3+
description: Nitro's continuous batching combines multiple requests, enhancing throughput.
34
---
45

5-
## What is continous batching?
6+
Continuous batching boosts throughput and minimizes latency in large language model (LLM) inference. This technique groups multiple inference requests, significantly improving GPU utilization.
67

7-
Continuous batching is a powerful technique that significantly boosts throughput in large language model (LLM) inference while minimizing latency. This process dynamically groups multiple inference requests, allowing for more efficient GPU utilization.
8+
**Key Advantages:**
89

9-
## Why Continuous Batching?
10+
- Increased Throughput.
11+
- Reduced Latency.
12+
- Efficient GPU Use.
1013

11-
Traditional static batching methods can lead to underutilization of GPU resources, as they wait for all sequences in a batch to complete before moving on. Continuous batching overcomes this by allowing new sequences to start processing as soon as others finish, ensuring more consistent and efficient GPU usage.
14+
**Implementation Insight:**
1215

13-
## Benefits of Continuous Batching
14-
15-
- **Increased Throughput:** Improvement over traditional batching methods.
16-
- **Reduced Latency:** Lower p50 latency, leading to faster response times.
17-
- **Efficient Resource Utilization:** Maximizes GPU memory and computational capabilities.
16+
To evaluate its effectiveness, compare continuous batching with traditional methods. For more details on benchmarking, refer to this [article](https://www.anyscale.com/blog/continuous-batching-llm-inference).
1817

1918
## How to use continous batching
2019
Nitro's `continuous batching` feature allows you to combine multiple requests for the same model execution, enhancing throughput and efficiency.
@@ -30,8 +29,4 @@ curl http://localhost:3928/inferences/llamacpp/loadmodel \
3029
}'
3130
```
3231

33-
For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.
34-
35-
### Benchmark and Compare
36-
37-
To understand the impact of continuous batching on your system, perform benchmarks comparing it with traditional batching methods. This [article](https://www.anyscale.com/blog/continuous-batching-llm-inference) will help you quantify improvements in throughput and latency.
32+
For optimal performance, ensure that the `n_parallel` value is set to match the `thread_num`, as detailed in the [Multithreading](features/multi-thread.md) documentation.

0 commit comments

Comments
 (0)