|
1 | | -# Nitro - Embeddable AI |
2 | | -<p align="center"> |
3 | | - <img alt="nitrologo" src="https://raw.githubusercontent.com/janhq/nitro/main/assets/Nitro%20README%20banner.png"> |
4 | | -</p> |
| 1 | +# Cortex Monorepo |
| 2 | + |
| 3 | +This monorepo contains two projects: CortexJS and CortexCPP. |
| 4 | + |
| 5 | +## CortexJS: Stateful Business Backend |
| 6 | + |
| 7 | +* All of the stateful endpoints: |
| 8 | + + /threads |
| 9 | + + /messages |
| 10 | + + /models |
| 11 | + + /runs |
| 12 | + + /vector_store |
| 13 | + + /settings |
| 14 | + + /?auth |
| 15 | + + … |
| 16 | +* Database & Filesystem |
| 17 | +* API Gateway |
| 18 | +* Authentication & Authorization |
| 19 | +* Observability |
| 20 | + |
| 21 | +## CortexCPP: Stateless Embedding Backend |
| 22 | + |
| 23 | +* All of the high performance, stateless endpoints: |
| 24 | + + /chat/completion |
| 25 | + + /audio |
| 26 | + + /fine_tuning |
| 27 | + + /embeddings |
| 28 | + + /load_model |
| 29 | + + /unload_model |
| 30 | +* Kernel - Hardware Recognition |
| 31 | + |
| 32 | +## Project Structure |
5 | 33 |
|
6 | | -<p align="center"> |
7 | | - <a href="https://nitro.jan.ai/docs">Documentation</a> - <a href="https://nitro.jan.ai/api-reference">API Reference</a> |
8 | | - - <a href="https://github.com/janhq/nitro/releases/">Changelog</a> - <a href="https://github.com/janhq/nitro/issues">Bug reports</a> - <a href="https://discord.gg/AsJ8krTT3N">Discord</a> |
9 | | -</p> |
10 | | - |
11 | | -> ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs! |
12 | | -
|
13 | | -## Features |
14 | | -- Fast Inference: Built on top of the cutting-edge inference library llama.cpp, modified to be production ready. |
15 | | -- Lightweight: Only 3MB, ideal for resource-sensitive environments. |
16 | | -- Easily Embeddable: Simple integration into existing applications, offering flexibility. |
17 | | -- Quick Setup: Approximately 10-second initialization for swift deployment. |
18 | | -- Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency. |
19 | | - |
20 | | -## About Nitro |
21 | | - |
22 | | -Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration. |
23 | | - |
24 | | -The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍. |
| 34 | +``` |
| 35 | +. |
| 36 | +├── cortex-js/ |
| 37 | +│ ├── package.json |
| 38 | +│ ├── README.md |
| 39 | +│ ├── Dockerfile |
| 40 | +│ ├── docker-compose.yml |
| 41 | +│ ├── src/ |
| 42 | +│ │ ├── controllers/ |
| 43 | +│ │ ├── modules/ |
| 44 | +│ │ ├── services/ |
| 45 | +│ │ └── ... |
| 46 | +│ └── ... |
| 47 | +├── cortex-cpp/ |
| 48 | +│ ├── app/ |
| 49 | +│ │ ├── controllers/ |
| 50 | +│ │ ├── models/ |
| 51 | +│ │ ├── services/ |
| 52 | +│ │ ├── ?engines/ |
| 53 | +│ │ │ ├── llama.cpp |
| 54 | +│ │ │ ├── tensorrt-llm |
| 55 | +│ │ │ └── ... |
| 56 | +│ │ └── ... |
| 57 | +│ ├── CMakeLists.txt |
| 58 | +│ ├── config.json |
| 59 | +│ ├── Dockerfile |
| 60 | +│ ├── docker-compose.yml |
| 61 | +│ ├── README.md |
| 62 | +│ └── ... |
| 63 | +├── scripts/ |
| 64 | +│ └── ... |
| 65 | +├── README.md |
| 66 | +├── package.json |
| 67 | +├── Dockerfile |
| 68 | +├── docker-compose.yml |
| 69 | +└── docs/ |
| 70 | + └── ... |
| 71 | +``` |
25 | 72 |
|
26 | | -> Read more about Nitro at https://nitro.jan.ai/ |
| 73 | +## Installation |
27 | 74 |
|
28 | | -### Repo Structure |
| 75 | +### NPM Install |
29 | 76 |
|
| 77 | +* Pre-install script: |
| 78 | +```bash |
| 79 | +npm pre-install script; platform specific (MacOS / Windows / Linux) |
30 | 80 | ``` |
31 | | -. |
32 | | -├── controllers |
33 | | -├── docs |
34 | | -├── llama.cpp -> Upstream llama C++ |
35 | | -├── nitro_deps -> Dependencies of the Nitro project as a sub-project |
36 | | -└── utils |
| 81 | +* Tag based: |
| 82 | +```json |
| 83 | +npm install @janhq/cortex |
| 84 | +npm install @janhq/cortex#cuda |
| 85 | +npm install @janhq/cortex#cuda-avx512 |
| 86 | +npm install @janhq/cortex#cuda-avx |
37 | 87 | ``` |
38 | 88 |
|
39 | | -## Quickstart |
| 89 | +### CLI Install Script |
40 | 90 |
|
41 | | -**Step 1: Install Nitro** |
| 91 | +```bash |
| 92 | +cortex init (AVX2 + Cuda) |
42 | 93 |
|
43 | | -- For Linux and MacOS |
| 94 | +Enable GPU Acceleration? |
| 95 | +1. Nvidia (default) - detected |
| 96 | +2. AMD |
| 97 | +3. Mac Metal |
44 | 98 |
|
45 | | - ```bash |
46 | | - curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash - |
47 | | - ``` |
| 99 | +Enter your choice: |
48 | 100 |
|
49 | | -- For Windows |
| 101 | +CPU Instructions |
| 102 | +1. AVX2 (default) - Recommend based on what the user has |
| 103 | +2. AVX (old CPU) |
| 104 | +3. AVX512 |
50 | 105 |
|
51 | | - ```bash |
52 | | - powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }" |
53 | | - ``` |
| 106 | +Enter your choice: |
54 | 107 |
|
55 | | -**Step 2: Downloading a Model** |
| 108 | +Downloading cortex-cuda-avx.so........................25% |
56 | 109 |
|
57 | | -```bash |
58 | | -mkdir model && cd model |
59 | | -wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true |
60 | | -``` |
| 110 | +Cortex is ready! |
61 | 111 |
|
62 | | -**Step 3: Run Nitro server** |
| 112 | +It seems like you have installed models from other applications. Do you want to import them? |
| 113 | +1. Import from /Users/HOME/jan/models |
| 114 | +2. Import from /Users/HOME/lmstudio/models |
| 115 | +3. Import everything |
63 | 116 |
|
64 | | -```bash title="Run Nitro server" |
65 | | -nitro |
| 117 | +Importing from /Users/HOME/jan/models..................17% |
66 | 118 | ``` |
67 | 119 |
|
68 | | -**Step 4: Load model** |
| 120 | +## Backend (jan app) |
69 | 121 |
|
70 | | -```bash title="Load model" |
71 | | -curl http://localhost:3928/inferences/llamacpp/loadmodel \ |
72 | | - -H 'Content-Type: application/json' \ |
73 | | - -d '{ |
74 | | - "llama_model_path": "/model/llama-2-7b-model.gguf", |
75 | | - "ctx_len": 512, |
76 | | - "ngl": 100, |
77 | | - }' |
| 122 | +```json |
| 123 | +POST /settings |
| 124 | +{ |
| 125 | + "gpu_enabled": true, |
| 126 | + "gpu_family": "Nvidia", |
| 127 | + "cpu_instructions": "AVX2" |
| 128 | +} |
78 | 129 | ``` |
79 | 130 |
|
80 | | -**Step 5: Making an Inference** |
81 | | - |
82 | | -```bash title="Nitro Inference" |
83 | | -curl http://localhost:3928/v1/chat/completions \ |
84 | | - -H "Content-Type: application/json" \ |
85 | | - -d '{ |
86 | | - "messages": [ |
87 | | - { |
88 | | - "role": "user", |
89 | | - "content": "Who won the world series in 2020?" |
90 | | - }, |
91 | | - ] |
92 | | - }' |
93 | | -``` |
| 131 | +## Client Library Configuration |
94 | 132 |
|
95 | | -Table of parameters |
96 | | - |
97 | | -| Parameter | Type | Description | |
98 | | -|------------------|---------|--------------------------------------------------------------| |
99 | | -| `llama_model_path` | String | The file path to the LLaMA model. | |
100 | | -| `ngl` | Integer | The number of GPU layers to use. | |
101 | | -| `ctx_len` | Integer | The context length for the model operations. | |
102 | | -| `embedding` | Boolean | Whether to use embedding in the model. | |
103 | | -| `n_parallel` | Integer | The number of parallel operations. | |
104 | | -| `cont_batching` | Boolean | Whether to use continuous batching. | |
105 | | -| `user_prompt` | String | The prompt to use for the user. | |
106 | | -| `ai_prompt` | String | The prompt to use for the AI assistant. | |
107 | | -| `system_prompt` | String | The prompt to use for system rules. | |
108 | | -| `pre_prompt` | String | The prompt to use for internal configuration. | |
109 | | -| `cpu_threads` | Integer | The number of threads to use for inferencing (CPU MODE ONLY) | |
110 | | -| `n_batch` | Integer | The batch size for prompt eval step | |
111 | | -| `caching_enabled` | Boolean | To enable prompt caching or not | |
112 | | -| `clean_cache_threshold` | Integer | Number of chats that will trigger clean cache action| |
113 | | -|`grp_attn_n`|Integer|Group attention factor in self-extend| |
114 | | -|`grp_attn_w`|Integer|Group attention width in self-extend| |
115 | | -|`mlock`|Boolean|Prevent system swapping of the model to disk in macOS| |
116 | | -|`grammar_file`| String |You can constrain the sampling using GBNF grammars by providing path to a grammar file| |
117 | | -|`model_type` | String | Model type we want to use: llm or embedding, default value is llm| |
118 | | - |
119 | | -***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal |
120 | | -```zsh |
121 | | -./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port] [uploads_folder_path]) |
122 | | -``` |
123 | | -- thread_num : the number of thread that nitro webserver needs to have |
124 | | -- host : host value normally 127.0.0.1 or 0.0.0.0 |
125 | | -- port : the port that nitro got deployed onto |
126 | | -- uploads_folder_path: custom path for file uploads in Drogon. |
127 | | - |
128 | | -Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API. |
129 | | - |
130 | | -## Compile from source |
131 | | -To compile nitro please visit [Compile from source](docs/docs/new/build-source.md) |
132 | | - |
133 | | -## Download |
134 | | - |
135 | | -<table> |
136 | | - <tr> |
137 | | - <td style="text-align:center"><b>Version Type</b></td> |
138 | | - <td colspan="2" style="text-align:center"><b>Windows</b></td> |
139 | | - <td colspan="2" style="text-align:center"><b>MacOS</b></td> |
140 | | - <td colspan="2" style="text-align:center"><b>Linux</b></td> |
141 | | - </tr> |
142 | | - <tr> |
143 | | - <td style="text-align:center"><b>Stable (Recommended)</b></td> |
144 | | - <td style="text-align:center"> |
145 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-win-amd64.tar.gz'> |
146 | | - <img src='./docs/static/img/windows.png' style="height:15px; width: 15px" /> |
147 | | - <b>CPU</b> |
148 | | - </a> |
149 | | - </td> |
150 | | - <td style="text-align:center"> |
151 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-win-amd64-cuda.tar.gz'> |
152 | | - <img src='./docs/static/img/windows.png' style="height:15px; width: 15px" /> |
153 | | - <b>CUDA</b> |
154 | | - </a> |
155 | | - </td> |
156 | | - <td style="text-align:center"> |
157 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-mac-amd64.tar.gz'> |
158 | | - <img src='./docs/static/img/mac.png' style="height:15px; width: 15px" /> |
159 | | - <b>Intel</b> |
160 | | - </a> |
161 | | - </td> |
162 | | - <td style="text-align:center"> |
163 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-mac-arm64.tar.gz'> |
164 | | - <img src='./docs/static/img/mac.png' style="height:15px; width: 15px" /> |
165 | | - <b>M1/M2</b> |
166 | | - </a> |
167 | | - </td> |
168 | | - <td style="text-align:center"> |
169 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-linux-amd64.tar.gz'> |
170 | | - <img src='./docs/static/img/linux.png' style="height:15px; width: 15px" /> |
171 | | - <b>CPU</b> |
172 | | - </a> |
173 | | - </td> |
174 | | - <td style="text-align:center"> |
175 | | - <a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-linux-amd64-cuda.tar.gz'> |
176 | | - <img src='./docs/static/img/linux.png' style="height:15px; width: 15px" /> |
177 | | - <b>CUDA</b> |
178 | | - </a> |
179 | | - </td> |
180 | | - </tr> |
181 | | - <tr style="text-align: center"> |
182 | | - <td style="text-align:center"><b>Experimental (Nighlty Build)</b></td> |
183 | | - <td style="text-align:center" colspan="6"> |
184 | | - <a href='https://github.com/janhq/nitro/actions/runs/8146271749'> |
185 | | - <b>GitHub action artifactory</b> |
186 | | - </a> |
187 | | - </td> |
188 | | - </tr> |
189 | | -</table> |
190 | | - |
191 | | -Download the latest version of Nitro at https://nitro.jan.ai/ or visit the **[GitHub Releases](https://github.com/janhq/nitro/releases)** to download any previous release. |
192 | | - |
193 | | -## Nightly Build |
194 | | - |
195 | | -Nightly build is a process where the software is built automatically every night. This helps in detecting and fixing bugs early in the development cycle. The process for this project is defined in [`.github/workflows/build.yml`](.github/workflows/build.yml) |
196 | | - |
197 | | -You can join our Discord server [here](https://discord.gg/FTk2MvZwJH) and go to channel [github-nitro](https://discordapp.com/channels/1107178041848909847/1151022176019939328) to monitor the build process. |
198 | | - |
199 | | -The nightly build is triggered at 2:00 AM UTC every day. |
200 | | - |
201 | | -The nightly build can be downloaded from the url notified in the Discord channel. Please access the url from the browser and download the build artifacts from there. |
202 | | - |
203 | | -## Manual Build |
204 | | - |
205 | | -Manual build is a process where the software is built manually by the developers. This is usually done when a new feature is implemented or a bug is fixed. The process for this project is defined in [`.github/workflows/build.yml`](.github/workflows/build.yml) |
206 | | - |
207 | | -It is similar to the nightly build process, except that it is triggered manually by the developers. |
208 | | - |
209 | | -### Contact |
210 | | - |
211 | | -- For support, please file a GitHub ticket. |
212 | | -- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH). |
213 | | -- For long-form inquiries, please email hello@jan.ai. |
214 | | - |
215 | | -## Star History |
216 | | - |
217 | | -[](https://star-history.com/#janhq/nitro&Date) |
| 133 | +TBD |
0 commit comments