Run LLMs on-device in React Native using MLX Swift.
- iOS 26.0+
npm install react-native-nitro-mlx react-native-nitro-modulesThen run pod install:
cd ios && pod installimport { ModelManager } from 'react-native-nitro-mlx'
await ModelManager.download('mlx-community/Qwen3-0.6B-4bit', (progress) => {
console.log(`Download progress: ${(progress * 100).toFixed(1)}%`)
})import { LLM } from 'react-native-nitro-mlx'
await LLM.load('mlx-community/Qwen3-0.6B-4bit', {
onProgress: (progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`)
}
})
const response = await LLM.generate('What is the capital of France?')
console.log(response)You can provide conversation history or few-shot examples when loading the model:
await LLM.load('mlx-community/Qwen3-0.6B-4bit', {
onProgress: (progress) => {
console.log(`Loading: ${(progress * 100).toFixed(0)}%`)
},
additionalContext: [
{ role: 'user', content: 'What is machine learning?' },
{ role: 'assistant', content: 'Machine learning is...' },
{ role: 'user', content: 'Can you explain neural networks?' }
]
})let response = ''
await LLM.stream('Tell me a story', (token) => {
response += token
console.log(response)
})LLM.stop()| Method | Description |
|---|---|
load(modelId: string, options?: LLMLoadOptions): Promise<void> |
Load a model into memory |
generate(prompt: string): Promise<string> |
Generate a complete response |
stream(prompt: string, onToken: (token: string) => void): Promise<string> |
Stream tokens as they're generated |
stop(): void |
Stop the current generation |
| Property | Type | Description |
|---|---|---|
onProgress |
(progress: number) => void |
Optional callback invoked with loading progress (0-1) |
additionalContext |
LLMMessage[] |
Optional conversation history or few-shot examples to provide to the model |
| Property | Type | Description |
|---|---|---|
role |
'user' | 'assistant' | 'system' |
The role of the message sender |
content |
string |
The message content |
| Property | Description |
|---|---|
isLoaded: boolean |
Whether a model is loaded |
isGenerating: boolean |
Whether generation is in progress |
modelId: string |
The currently loaded model ID |
debug: boolean |
Enable debug logging |
| Method | Description |
|---|---|
download(modelId: string, onProgress: (progress: number) => void): Promise<string> |
Download a model from Hugging Face |
isDownloaded(modelId: string): Promise<boolean> |
Check if a model is downloaded |
getDownloadedModels(): Promise<string[]> |
Get list of downloaded models |
deleteModel(modelId: string): Promise<void> |
Delete a downloaded model |
getModelPath(modelId: string): Promise<string> |
Get the local path of a model |
| Property | Description |
|---|---|
debug: boolean |
Enable debug logging |
Any MLX-compatible model from Hugging Face should work. The package exports an MLXModel enum with pre-defined models for convenience that are more likely to run well on-device:
import { MLXModel } from 'react-native-nitro-mlx'
await ModelManager.download(MLXModel.Llama_3_2_1B_Instruct_4bit, (progress) => {
console.log(`Download progress: ${(progress * 100).toFixed(1)}%`)
})| Model | Enum Key | Hugging Face ID |
|---|---|---|
| Llama 3.2 (Meta) | ||
| Llama 3.2 1B 4-bit | Llama_3_2_1B_Instruct_4bit |
mlx-community/Llama-3.2-1B-Instruct-4bit |
| Llama 3.2 1B 8-bit | Llama_3_2_1B_Instruct_8bit |
mlx-community/Llama-3.2-1B-Instruct-8bit |
| Llama 3.2 3B 4-bit | Llama_3_2_3B_Instruct_4bit |
mlx-community/Llama-3.2-3B-Instruct-4bit |
| Llama 3.2 3B 8-bit | Llama_3_2_3B_Instruct_8bit |
mlx-community/Llama-3.2-3B-Instruct-8bit |
| Qwen 2.5 (Alibaba) | ||
| Qwen 2.5 0.5B 4-bit | Qwen2_5_0_5B_Instruct_4bit |
mlx-community/Qwen2.5-0.5B-Instruct-4bit |
| Qwen 2.5 0.5B 8-bit | Qwen2_5_0_5B_Instruct_8bit |
mlx-community/Qwen2.5-0.5B-Instruct-8bit |
| Qwen 2.5 1.5B 4-bit | Qwen2_5_1_5B_Instruct_4bit |
mlx-community/Qwen2.5-1.5B-Instruct-4bit |
| Qwen 2.5 1.5B 8-bit | Qwen2_5_1_5B_Instruct_8bit |
mlx-community/Qwen2.5-1.5B-Instruct-8bit |
| Qwen 2.5 3B 4-bit | Qwen2_5_3B_Instruct_4bit |
mlx-community/Qwen2.5-3B-Instruct-4bit |
| Qwen 2.5 3B 8-bit | Qwen2_5_3B_Instruct_8bit |
mlx-community/Qwen2.5-3B-Instruct-8bit |
| Qwen 3 | ||
| Qwen 3 1.7B 4-bit | Qwen3_1_7B_4bit |
mlx-community/Qwen3-1.7B-4bit |
| Qwen 3 1.7B 8-bit | Qwen3_1_7B_8bit |
mlx-community/Qwen3-1.7B-8bit |
| Gemma 3 (Google) | ||
| Gemma 3 1B 4-bit | Gemma_3_1B_IT_4bit |
mlx-community/gemma-3-1b-it-4bit |
| Gemma 3 1B 8-bit | Gemma_3_1B_IT_8bit |
mlx-community/gemma-3-1b-it-8bit |
| Phi 3.5 Mini (Microsoft) | ||
| Phi 3.5 Mini 4-bit | Phi_3_5_Mini_Instruct_4bit |
mlx-community/Phi-3.5-mini-instruct-4bit |
| Phi 3.5 Mini 8-bit | Phi_3_5_Mini_Instruct_8bit |
mlx-community/Phi-3.5-mini-instruct-8bit |
| Phi 4 Mini (Microsoft) | ||
| Phi 4 Mini 4-bit | Phi_4_Mini_Instruct_4bit |
mlx-community/Phi-4-mini-instruct-4bit |
| Phi 4 Mini 8-bit | Phi_4_Mini_Instruct_8bit |
mlx-community/Phi-4-mini-instruct-8bit |
| SmolLM (HuggingFace) | ||
| SmolLM 1.7B 4-bit | SmolLM_1_7B_Instruct_4bit |
mlx-community/SmolLM-1.7B-Instruct-4bit |
| SmolLM 1.7B 8-bit | SmolLM_1_7B_Instruct_8bit |
mlx-community/SmolLM-1.7B-Instruct-8bit |
| SmolLM2 (HuggingFace) | ||
| SmolLM2 1.7B 4-bit | SmolLM2_1_7B_Instruct_4bit |
mlx-community/SmolLM2-1.7B-Instruct-4bit |
| SmolLM2 1.7B 8-bit | SmolLM2_1_7B_Instruct_8bit |
mlx-community/SmolLM2-1.7B-Instruct-8bit |
| OpenELM (Apple) | ||
| OpenELM 1.1B 4-bit | OpenELM_1_1B_4bit |
mlx-community/OpenELM-1_1B-4bit |
| OpenELM 1.1B 8-bit | OpenELM_1_1B_8bit |
mlx-community/OpenELM-1_1B-8bit |
| OpenELM 3B 4-bit | OpenELM_3B_4bit |
mlx-community/OpenELM-3B-4bit |
| OpenELM 3B 8-bit | OpenELM_3B_8bit |
mlx-community/OpenELM-3B-8bit |
Browse more models at huggingface.co/mlx-community.
MIT