FastAPI backend for Arabic Speech Recognition, Text Correction, and Text-to-Speech (TTS). Built with our own models — DeepAr for Arabic speech-to-text and AraFix for text correction — both included in this project as Git submodules. These models are also published on our CUAIStudents HuggingFace organization, where you can explore datasets and checkpoints used in training. This backend exposes them as REST APIs for speech recognition, text correction, and voice generation.
Make sure ffmpeg and python are installed on your system.
macOS:
# Install Homebrew (if not installed)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# Install ffmpeg
brew install ffmpeg
# Install Python (if needed)
brew install python@3.11Ubuntu/Debian:
# Update package list
sudo apt update
# Install ffmpeg
sudo apt install ffmpeg
# Install Python
sudo apt install python3.11 python3.11-venvClone with submodules (includes DeepAr + AraFix):
git clone --recurse-submodules -b git-submodules https://github.com/NourhanMahmoudd/Cairo-Dictionary-AI-FastAPI-Backend.git <any_dir>
cd <any_dir>/backendCreate a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txtCopy example config:
cp .env.example .envEdit .env if needed. Most values are optional.
uvicorn app.main:app --reload --reload-exclude 'logs/*'- API Documentation:
http://localhost:8000/docs - Health Check:
http://localhost:8000/health
POST /api/v1/speech/transcribe
-
Upload audio (
file) -
Optional:
reference_textfor accuracy comparison.
-
POST /api/v1/nlp/correct→ returns corrected Arabic text + accuracy. -
POST /api/v1/nlp/compare→ compare transcription vs reference text.
POST /api/v1/tts/generate
-
Input:
text+ optionallang,voice,rate,volume. -
Returns: MP3 audio stream.
project-root/
|
├── backend/ # Backend API
│ ├── app/
│ │ ├── helper/ # Helper modules
│ │ ├── main.py # FastAPI application
│ │ ├── models/ # ML model integration
│ │ │ ├── AraFix-V3.0/ # AraFix model files
│ │ │ ├── DeepAr/ # DeepAr model files
│ │ │ ├── araFix.py # AraT5 Text Corrector
│ │ │ └── deepAr.py # Arabic Whisper model
│ │ ├── routes/ # API endpoints
│ │ │ ├── speech.py # Speech transcription endpoints
│ │ │ ├── text.py # Text correction endpoint
│ │ │ └── tts.py # Text-to-speech endpoint
│ │ ├── schemas/ # Pydantic models
│ │ └── utils/ # Utility functions
│ │ └── comparison.py # Comparison utilities
│ ├── logs_config.yaml # Logging configuration
│ ├── requirements.txt # Python dependencies
│ └── .env # Environment variables
└── README.md # This file
Made with ❤️ for Arabic speech recognition