A powerful real-time audio transcription application that provides live captions for your system's audio output with built-in translation capabilities. Perfect for accessibility, language learning, international meetings, or any situation where you need visual representation and translation of spoken content.
Released: August 27, 2025
- 🌍 Real-Time Translation: Live translation of captions into 50+ languages
- 🔄 Dual Translation Engines: Choose between ArgosTranslate (offline) and MarianMT (offline)
- ⚡ Instant Translation: Translate both final text and partial text in real-time
- 📱 Smart Language Detection: Automatic model management for translation pairs
- Improved Performance: Optimized audio processing for better real-time performance
- Enhanced UI: Refined user interface with translation controls and status indicators
- Better Model Management: Automatic downloading for both speech recognition and translation models
- Stability Improvements: Fixed memory leaks and improved application stability
- Updated Dependencies: Added translation libraries and upgraded existing dependencies
- Optimized memory usage during long transcription and translation sessions
- Enhanced error handling for translation model loading
- Improved startup time with lazy model loading
- Better handling of system audio configuration changes
- Streamlined translation pipeline for minimal latency
- Live speech-to-text using Vosk offline speech recognition
- Audio loopback capture from system speakers
- Multi-threaded processing for smooth performance
- Configurable audio buffer sizes for optimal accuracy/speed balance
- Real-time translation of live captions as they appear
- 50+ target languages supported for translation
- Dual translation backends:
- ArgosTranslate: Completely offline, privacy-focused
- MarianMT: High-quality Hugging Face models, also offline
- Automatic model management - download translation models on demand
- Instant partial translation - even incomplete sentences get translated live
- 50+ languages and dialects supported for speech recognition including:
- English (multiple variants), Chinese, Russian, French, German, Spanish
- Portuguese, Arabic, Hindi, Japanese, Korean, Italian, Dutch
- And many more regional variants
- Automatic model downloading and management
- Easy language switching through settings
- Caption Styling: Custom colors, fonts, and sizes
- Window Properties: Adjustable width, height, opacity, and padding
- Theme Support: Dark/Light mode compatibility
- Overlay Positioning: Draggable caption window
- Real-time preview of all visual changes
- Audio processing parameters (block size, delay thresholds)
- Translation engine selection and target language configuration
- Searchable dropdown menus for easy navigation
- Settings persistence with JSON configuration
- First-run setup wizard with guided tour
- Modern CustomTkinter interface with translation controls
- Right-click context menu for quick access
- Tooltips and helpful descriptions for all settings
- Responsive, scrollable settings panel
- One-click model installation for both speech and translation
- Operating System: Windows 10/11, macOS 10.14+, or Linux
- Python: 3.7 or higher
- RAM: Minimum 4GB (8GB recommended for translation features)
- Storage: 1GB - 10GB depending on language and translation models
- Audio: System audio output (speakers/headphones)
- Internet: Required for initial model downloads only
customtkinter>=5.0.0
soundcard>=0.4.0
vosk>=0.3.45
numpy>=1.19.0
requests>=2.25.0
argostranslate>=1.9.0
transformers>=4.21.0
torch>=1.12.0
sentencepiece>=0.1.97
protobuf>=3.20.0
# Clone the repository
git clone https://github.com/oscurprof/Realtime-Subtitles-Generator-using-Python.git
cd Realtime-Subtitles-Generator-using-Python
# Install dependencies
pip install -r requirements.txt
# Run the application
python LivescriptV2.01.py
- Download the latest release from Releases
- Extract the archive
- Install dependencies:
pip install -r requirements.txt
- Run:
python LivescriptV2.01.py
pip install livescript-captions
livescript
- Launch the application - The About window will appear with usage instructions
- Right-click the caption overlay to access settings
- Select your spoken language - The app will automatically download the required speech recognition model
- Enable translation (optional) - Choose your target language and translation engine
- Customize appearance to your preferences
- Start speaking or play audio - Captions will appear in real-time, with live translation if enabled!
- Move Window: Left-click and drag the caption bar
- Access Settings: Right-click anywhere on the caption window
- Quick Restart: Use the restart option when changing languages or models
- Enable Translation: Settings → Translation → Check "Enable Translation"
- Choose Engine: Select ArgosTranslate (recommended) or MarianMT
- Select Target Language: Choose from 50+ available languages
- Download Models: App will automatically download required translation models
- Start Translating: Speak or play audio - see live translated captions!
- Colors: Subtitle and background colors with color picker
- Typography: Font family, size, and styling options
- Layout: Window dimensions, padding, and opacity controls
- Positioning: Draggable overlay with size customization
- Spoken Language: 50+ supported languages for speech recognition
- Audio Processing: Block size and delay threshold adjustments
- Model Management: Automatic installation and status monitoring
- Enable/Disable: Toggle real-time translation
- Translation Engine: Choose between ArgosTranslate and MarianMT
- Target Language: Select from 50+ supported languages
- Model Status: View download progress and model availability
- Block Size: Lower = faster response, Higher = better accuracy
1000-2000
: Fast response (good for real-time chat)3000-4000
: Balanced (recommended for most users)5000-8000
: High accuracy (good for detailed transcription)
- Delay Threshold: How long to wait before clearing captions during silence
Settings are automatically saved to settings.json
:
{
"subtitle_color": "#FFFFFF",
"background_color": "#000000",
"background_opacity": 0.75,
"subtitle_size": 16,
"subtitle_font": "Arial",
"language": "English",
"block_size": 3000,
"window_width": 1200,
"window_height": 70,
"delay_threshold": 3.0,
"translation_enabled": false,
"translation_backend": "ArgosTranslate",
"translation_target_language": "Spanish",
"version": "2.0.1"
}
- Speech Recognition Models: Downloaded to application directory (30MB - 5GB)
- Translation Models:
- ArgosTranslate: ~50-200MB per language pair
- MarianMT: ~300MB - 1GB per language pair
- Models are reusable and only need to be downloaded once
- All models work completely offline after download
- Completely offline - No internet required after setup
- Privacy-focused - All processing happens locally
- 50+ language pairs supported
- Smaller model sizes (~50-200MB per language pair)
- Good translation quality for most use cases
- High-quality translations using Hugging Face models
- Also completely offline
- Professional-grade accuracy for supported language pairs
- Larger model sizes (~300MB - 1GB per language pair)
- Best for professional/academic use
Popular language pairs include:
- English ↔ Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi
- Spanish ↔ English, French, Italian, Portuguese
- French ↔ English, German, Spanish, Italian
- German ↔ English, French, Spanish, Italian
- And many more combinations!
- Solution: Check that your system has audio output devices
- Windows: Ensure "Stereo Mix" or similar loopback device is enabled
- macOS: Grant microphone permissions in System Preferences
- Linux: Install and configure PulseAudio or ALSA
- Solution: Open Settings → Select your language → Wait for automatic download
- Manual: Download models from Vosk Models
- Solution:
- Check that translation is enabled in Settings
- Verify target language is selected
- Wait for translation model download to complete
- Try switching translation backends (ArgosTranslate ↔ MarianMT)
- Solution:
- Try MarianMT backend for higher quality (if language pair supported)
- Ensure clear audio input for better source transcription
- Check that source language matches the speech recognition setting
- Solution:
- Check internet connection
- Try different language pair
- Switch to alternative translation backend
- Manually install models if needed
- Solution:
- Use ArgosTranslate for better performance
- Increase block size for less frequent processing
- Close other resource-intensive applications
- Consider using smaller translation models
- For speed: Use ArgosTranslate, smaller models, lower block sizes
- For accuracy: Use MarianMT, larger models, higher block sizes
- For battery life: Disable translation when not needed, use smaller models
We welcome contributions! Here's how you can help:
- 🐛 Bug Reports: Report issues via GitHub Issues
- 💡 Feature Requests: Suggest new features or improvements
- 🌍 Language Support: Help test and improve language and translation models
- 📖 Documentation: Improve guides, README, or code comments
- 🔧 Code: Submit pull requests for bug fixes or new features
- 🌐 Translation Testing: Help test translation accuracy for different language pairs
git clone https://github.com/oscurprof/Realtime-Subtitles-Generator-using-Python.git
cd Realtime-Subtitles-Generator-using-Python
pip install -r requirements.txt
# Make your changes
python LivescriptV2.01.py # Test your changes
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes and test thoroughly
- Test translation features with multiple language pairs
- Commit with descriptive messages
- Push to your fork and submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Vosk - Offline speech recognition toolkit
- CustomTkinter - Modern GUI framework
- SoundCard - Audio capture library
- ArgosTranslate - Open-source offline translation
- Hugging Face Transformers - MarianMT translation models
- Community Contributors - Thanks to everyone who has contributed!
- 🌟 Star this repo if you find it helpful!
- 🐛 Report bugs via GitHub Issues
- 💬 Discussions for questions and feature requests
- 📧 Email: oscurprof@gmail.com
- 🔗 LinkedIn: oscurprof
- Cloud Translation APIs - Integration with Google Translate, DeepL, etc.
- Multi-Speaker Recognition - Distinguish between different speakers
- Export Functionality - Save transcriptions and translations to text files
- Hotkey Support - Keyboard shortcuts for common actions
- Advanced Translation Options - Context-aware translation, custom dictionaries
- Translation History - Save and review past translations
- Batch Translation - Translate saved transcription files
- More Language Pairs - Expand ArgosTranslate and MarianMT support
- Custom Translation Models - Support for user-trained models
- Translation Quality Indicators - Confidence scores for translations
- Bidirectional Translation - Translate both directions simultaneously
- v2.0.1 - Added real-time translation with ArgosTranslate and MarianMT support
- v1.01 - Initial release with core transcription functionality
- v1.00 - Beta testing and development