Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Commit 1962f93

Browse files
louis-jannamchuai
andauthored
feat: cortex project structure (#533)
Signed-off-by: James <namnh0122@gmail.com> Co-authored-by: James <namnh0122@gmail.com>
1 parent 49f72aa commit 1962f93

File tree

233 files changed

+7681
-17959
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

233 files changed

+7681
-17959
lines changed

.gitignore

Lines changed: 0 additions & 568 deletions
Large diffs are not rendered by default.

Dockerfile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
FROM node:14
2+
3+
WORKDIR /app
4+
5+
COPY package*.json ./
6+
7+
RUN npm install
8+
9+
COPY . .
10+
11+
RUN npm run build
12+
13+
EXPOSE 3000
14+
15+
CMD ["npm", "start"]

README.md

Lines changed: 111 additions & 195 deletions
Original file line numberDiff line numberDiff line change
@@ -1,217 +1,133 @@
1-
# Nitro - Embeddable AI
2-
<p align="center">
3-
<img alt="nitrologo" src="https://raw.githubusercontent.com/janhq/nitro/main/assets/Nitro%20README%20banner.png">
4-
</p>
1+
# Cortex Monorepo
2+
3+
This monorepo contains two projects: CortexJS and CortexCPP.
4+
5+
## CortexJS: Stateful Business Backend
6+
7+
* All of the stateful endpoints:
8+
+ /threads
9+
+ /messages
10+
+ /models
11+
+ /runs
12+
+ /vector_store
13+
+ /settings
14+
+ /?auth
15+
+ …
16+
* Database & Filesystem
17+
* API Gateway
18+
* Authentication & Authorization
19+
* Observability
20+
21+
## CortexCPP: Stateless Embedding Backend
22+
23+
* All of the high performance, stateless endpoints:
24+
+ /chat/completion
25+
+ /audio
26+
+ /fine_tuning
27+
+ /embeddings
28+
+ /load_model
29+
+ /unload_model
30+
* Kernel - Hardware Recognition
31+
32+
## Project Structure
533

6-
<p align="center">
7-
<a href="https://nitro.jan.ai/docs">Documentation</a> - <a href="https://nitro.jan.ai/api-reference">API Reference</a>
8-
- <a href="https://github.com/janhq/nitro/releases/">Changelog</a> - <a href="https://github.com/janhq/nitro/issues">Bug reports</a> - <a href="https://discord.gg/AsJ8krTT3N">Discord</a>
9-
</p>
10-
11-
> ⚠️ **Nitro is currently in Development**: Expect breaking changes and bugs!
12-
13-
## Features
14-
- Fast Inference: Built on top of the cutting-edge inference library llama.cpp, modified to be production ready.
15-
- Lightweight: Only 3MB, ideal for resource-sensitive environments.
16-
- Easily Embeddable: Simple integration into existing applications, offering flexibility.
17-
- Quick Setup: Approximately 10-second initialization for swift deployment.
18-
- Enhanced Web Framework: Incorporates drogon cpp to boost web service efficiency.
19-
20-
## About Nitro
21-
22-
Nitro is a high-efficiency C++ inference engine for edge computing, powering [Jan](https://jan.ai/). It is lightweight and embeddable, ideal for product integration.
23-
24-
The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment 👍.
34+
```
35+
.
36+
├── cortex-js/
37+
│ ├── package.json
38+
│ ├── README.md
39+
│ ├── Dockerfile
40+
│ ├── docker-compose.yml
41+
│ ├── src/
42+
│ │ ├── controllers/
43+
│ │ ├── modules/
44+
│ │ ├── services/
45+
│ │ └── ...
46+
│ └── ...
47+
├── cortex-cpp/
48+
│ ├── app/
49+
│ │ ├── controllers/
50+
│ │ ├── models/
51+
│ │ ├── services/
52+
│ │ ├── ?engines/
53+
│ │ │ ├── llama.cpp
54+
│ │ │ ├── tensorrt-llm
55+
│ │ │ └── ...
56+
│ │ └── ...
57+
│ ├── CMakeLists.txt
58+
│ ├── config.json
59+
│ ├── Dockerfile
60+
│ ├── docker-compose.yml
61+
│ ├── README.md
62+
│ └── ...
63+
├── scripts/
64+
│ └── ...
65+
├── README.md
66+
├── package.json
67+
├── Dockerfile
68+
├── docker-compose.yml
69+
└── docs/
70+
└── ...
71+
```
2572

26-
> Read more about Nitro at https://nitro.jan.ai/
73+
## Installation
2774

28-
### Repo Structure
75+
### NPM Install
2976

77+
* Pre-install script:
78+
```bash
79+
npm pre-install script; platform specific (MacOS / Windows / Linux)
3080
```
31-
.
32-
├── controllers
33-
├── docs
34-
├── llama.cpp -> Upstream llama C++
35-
├── nitro_deps -> Dependencies of the Nitro project as a sub-project
36-
└── utils
81+
* Tag based:
82+
```json
83+
npm install @janhq/cortex
84+
npm install @janhq/cortex#cuda
85+
npm install @janhq/cortex#cuda-avx512
86+
npm install @janhq/cortex#cuda-avx
3787
```
3888

39-
## Quickstart
89+
### CLI Install Script
4090

41-
**Step 1: Install Nitro**
91+
```bash
92+
cortex init (AVX2 + Cuda)
4293

43-
- For Linux and MacOS
94+
Enable GPU Acceleration?
95+
1. Nvidia (default) - detected
96+
2. AMD
97+
3. Mac Metal
4498

45-
```bash
46-
curl -sfL https://raw.githubusercontent.com/janhq/nitro/main/install.sh | sudo /bin/bash -
47-
```
99+
Enter your choice:
48100

49-
- For Windows
101+
CPU Instructions
102+
1. AVX2 (default) - Recommend based on what the user has
103+
2. AVX (old CPU)
104+
3. AVX512
50105

51-
```bash
52-
powershell -Command "& { Invoke-WebRequest -Uri 'https://raw.githubusercontent.com/janhq/nitro/main/install.bat' -OutFile 'install.bat'; .\install.bat; Remove-Item -Path 'install.bat' }"
53-
```
106+
Enter your choice:
54107

55-
**Step 2: Downloading a Model**
108+
Downloading cortex-cuda-avx.so........................25%
56109

57-
```bash
58-
mkdir model && cd model
59-
wget -O llama-2-7b-model.gguf https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_M.gguf?download=true
60-
```
110+
Cortex is ready!
61111

62-
**Step 3: Run Nitro server**
112+
It seems like you have installed models from other applications. Do you want to import them?
113+
1. Import from /Users/HOME/jan/models
114+
2. Import from /Users/HOME/lmstudio/models
115+
3. Import everything
63116

64-
```bash title="Run Nitro server"
65-
nitro
117+
Importing from /Users/HOME/jan/models..................17%
66118
```
67119

68-
**Step 4: Load model**
120+
## Backend (jan app)
69121

70-
```bash title="Load model"
71-
curl http://localhost:3928/inferences/llamacpp/loadmodel \
72-
-H 'Content-Type: application/json' \
73-
-d '{
74-
"llama_model_path": "/model/llama-2-7b-model.gguf",
75-
"ctx_len": 512,
76-
"ngl": 100,
77-
}'
122+
```json
123+
POST /settings
124+
{
125+
"gpu_enabled": true,
126+
"gpu_family": "Nvidia",
127+
"cpu_instructions": "AVX2"
128+
}
78129
```
79130

80-
**Step 5: Making an Inference**
81-
82-
```bash title="Nitro Inference"
83-
curl http://localhost:3928/v1/chat/completions \
84-
-H "Content-Type: application/json" \
85-
-d '{
86-
"messages": [
87-
{
88-
"role": "user",
89-
"content": "Who won the world series in 2020?"
90-
},
91-
]
92-
}'
93-
```
131+
## Client Library Configuration
94132

95-
Table of parameters
96-
97-
| Parameter | Type | Description |
98-
|------------------|---------|--------------------------------------------------------------|
99-
| `llama_model_path` | String | The file path to the LLaMA model. |
100-
| `ngl` | Integer | The number of GPU layers to use. |
101-
| `ctx_len` | Integer | The context length for the model operations. |
102-
| `embedding` | Boolean | Whether to use embedding in the model. |
103-
| `n_parallel` | Integer | The number of parallel operations. |
104-
| `cont_batching` | Boolean | Whether to use continuous batching. |
105-
| `user_prompt` | String | The prompt to use for the user. |
106-
| `ai_prompt` | String | The prompt to use for the AI assistant. |
107-
| `system_prompt` | String | The prompt to use for system rules. |
108-
| `pre_prompt` | String | The prompt to use for internal configuration. |
109-
| `cpu_threads` | Integer | The number of threads to use for inferencing (CPU MODE ONLY) |
110-
| `n_batch` | Integer | The batch size for prompt eval step |
111-
| `caching_enabled` | Boolean | To enable prompt caching or not |
112-
| `clean_cache_threshold` | Integer | Number of chats that will trigger clean cache action|
113-
|`grp_attn_n`|Integer|Group attention factor in self-extend|
114-
|`grp_attn_w`|Integer|Group attention width in self-extend|
115-
|`mlock`|Boolean|Prevent system swapping of the model to disk in macOS|
116-
|`grammar_file`| String |You can constrain the sampling using GBNF grammars by providing path to a grammar file|
117-
|`model_type` | String | Model type we want to use: llm or embedding, default value is llm|
118-
119-
***OPTIONAL***: You can run Nitro on a different port like 5000 instead of 3928 by running it manually in terminal
120-
```zsh
121-
./nitro 1 127.0.0.1 5000 ([thread_num] [host] [port] [uploads_folder_path])
122-
```
123-
- thread_num : the number of thread that nitro webserver needs to have
124-
- host : host value normally 127.0.0.1 or 0.0.0.0
125-
- port : the port that nitro got deployed onto
126-
- uploads_folder_path: custom path for file uploads in Drogon.
127-
128-
Nitro server is compatible with the OpenAI format, so you can expect the same output as the OpenAI ChatGPT API.
129-
130-
## Compile from source
131-
To compile nitro please visit [Compile from source](docs/docs/new/build-source.md)
132-
133-
## Download
134-
135-
<table>
136-
<tr>
137-
<td style="text-align:center"><b>Version Type</b></td>
138-
<td colspan="2" style="text-align:center"><b>Windows</b></td>
139-
<td colspan="2" style="text-align:center"><b>MacOS</b></td>
140-
<td colspan="2" style="text-align:center"><b>Linux</b></td>
141-
</tr>
142-
<tr>
143-
<td style="text-align:center"><b>Stable (Recommended)</b></td>
144-
<td style="text-align:center">
145-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-win-amd64.tar.gz'>
146-
<img src='./docs/static/img/windows.png' style="height:15px; width: 15px" />
147-
<b>CPU</b>
148-
</a>
149-
</td>
150-
<td style="text-align:center">
151-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-win-amd64-cuda.tar.gz'>
152-
<img src='./docs/static/img/windows.png' style="height:15px; width: 15px" />
153-
<b>CUDA</b>
154-
</a>
155-
</td>
156-
<td style="text-align:center">
157-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-mac-amd64.tar.gz'>
158-
<img src='./docs/static/img/mac.png' style="height:15px; width: 15px" />
159-
<b>Intel</b>
160-
</a>
161-
</td>
162-
<td style="text-align:center">
163-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-mac-arm64.tar.gz'>
164-
<img src='./docs/static/img/mac.png' style="height:15px; width: 15px" />
165-
<b>M1/M2</b>
166-
</a>
167-
</td>
168-
<td style="text-align:center">
169-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-linux-amd64.tar.gz'>
170-
<img src='./docs/static/img/linux.png' style="height:15px; width: 15px" />
171-
<b>CPU</b>
172-
</a>
173-
</td>
174-
<td style="text-align:center">
175-
<a href='https://github.com/janhq/nitro/releases/download/v0.3.22/nitro-0.3.22-linux-amd64-cuda.tar.gz'>
176-
<img src='./docs/static/img/linux.png' style="height:15px; width: 15px" />
177-
<b>CUDA</b>
178-
</a>
179-
</td>
180-
</tr>
181-
<tr style="text-align: center">
182-
<td style="text-align:center"><b>Experimental (Nighlty Build)</b></td>
183-
<td style="text-align:center" colspan="6">
184-
<a href='https://github.com/janhq/nitro/actions/runs/8146271749'>
185-
<b>GitHub action artifactory</b>
186-
</a>
187-
</td>
188-
</tr>
189-
</table>
190-
191-
Download the latest version of Nitro at https://nitro.jan.ai/ or visit the **[GitHub Releases](https://github.com/janhq/nitro/releases)** to download any previous release.
192-
193-
## Nightly Build
194-
195-
Nightly build is a process where the software is built automatically every night. This helps in detecting and fixing bugs early in the development cycle. The process for this project is defined in [`.github/workflows/build.yml`](.github/workflows/build.yml)
196-
197-
You can join our Discord server [here](https://discord.gg/FTk2MvZwJH) and go to channel [github-nitro](https://discordapp.com/channels/1107178041848909847/1151022176019939328) to monitor the build process.
198-
199-
The nightly build is triggered at 2:00 AM UTC every day.
200-
201-
The nightly build can be downloaded from the url notified in the Discord channel. Please access the url from the browser and download the build artifacts from there.
202-
203-
## Manual Build
204-
205-
Manual build is a process where the software is built manually by the developers. This is usually done when a new feature is implemented or a bug is fixed. The process for this project is defined in [`.github/workflows/build.yml`](.github/workflows/build.yml)
206-
207-
It is similar to the nightly build process, except that it is triggered manually by the developers.
208-
209-
### Contact
210-
211-
- For support, please file a GitHub ticket.
212-
- For questions, join our Discord [here](https://discord.gg/FTk2MvZwJH).
213-
- For long-form inquiries, please email hello@jan.ai.
214-
215-
## Star History
216-
217-
[![Star History Chart](https://api.star-history.com/svg?repos=janhq/nitro&type=Date)](https://star-history.com/#janhq/nitro&Date)
133+
TBD

assets/Nitro README banner.png

-868 KB
Binary file not shown.

assets/placeholder

Lines changed: 0 additions & 1 deletion
This file was deleted.
File renamed without changes.

0 commit comments

Comments
 (0)