MLX INFERENCE is an OpenAI API compatible inference service based on MLX-LM and MLX-VLM, providing the following endpoints:
/v1/chat/completions- Chat completion interface/v1/responses- Response interface/v1/models- Get available model list
pip install -r requirements.txt
# Copy environment file
cp .env.example .envExecute in project root directory:
uvicorn mlx_Inference:app --workers 1 --port 8002Parameters:
--workers: Number of worker processes--port: Service port number
- Compatible with OpenAI API specifications
- Backend inference uses MLX-LM and MLX-VLM, supports mlx-community models
- Easy to deploy and use
