A powerful iOS application that automatically transcribes and summarizes YouTube videos using advanced AI models. Built with SwiftUI and leveraging cutting-edge machine learning technologies.
- YouTube Video Processing: Extract audio from YouTube videos automatically
- AI-Powered Transcription: Convert speech to text using WhisperKit (OpenAI's Whisper model)
- Intelligent Summarization: Generate concise summaries using MLX Gemma 3n model
- Offline Processing: All AI processing happens locally on your device for privacy
- History Management: Keep track of all your transcriptions and summaries
- Search Functionality: Easily find past transcriptions
- Modern SwiftUI Interface: Clean, intuitive design with smooth animations
- Real-time Progress Tracking: Monitor transcription and summarization progress
- Cross-Platform Support: Optimized for iPhone, iPad, and macOS (designed for iPad)
- Efficient Caching: Smart caching system to avoid reprocessing
- Error Handling: Comprehensive error management with user-friendly messages
- SwiftUI: Modern declarative UI framework
- SwiftData: Data persistence and management
- WhisperKit: On-device speech recognition using OpenAI's Whisper
- MLX Swift: Apple's machine learning framework for efficient LLM inference
- Combine: Reactive programming for data flow management
- Whisper: State-of-the-art speech recognition model
- Gemma 3n (2B 4-bit): Efficient language model for text summarization
- MVVM Pattern: Clean separation of concerns
- Async/Await: Modern concurrency handling
- Observable Objects: Reactive state management
- Dependency Injection: Modular and testable code structure
- Xcode 26.0 or later
- iOS 26.0 or later (> iPhone 15)
- macOS 26.0 or later (for Mac version)
The project uses Swift Package Manager for dependency management:
- WhisperKit - On-device speech recognition
- MLX Swift Examples - Machine learning framework
- Clone the repository:
git clone https://github.com/uplg/summary-swift.git
cd summary-swift- Open the project in Xcode:
open summary.xcodeproj- Build and run using XCode
- if you have an "old" iPhone (< 15), you might need to use macOS (Designed for iPad)
- it may work on iPhones / iPad but it's not fully tested as I don't have a recent Apple device except a Mac, UI follows most recent guidelines and works great but memory issues while loading Gemma 3n on an iPhone 13, backend URL needs to be changed as it is hardcoded to localhost:8000.
- Also, iOS simulator don't handle metal fully, so you might need to use a real device to test the app.
- Start the Backend: The app use the FastAPI backend found in Video-summarize for easy yt-dlp usage. Make sure to set it up following the instructions in the repository.
- Launch the App: Open Summary on your iOS device
- Enter YouTube URL: Paste any YouTube video URL in the input field
- Start Processing: Tap the process button to begin transcription
- Monitor Progress: Watch real-time progress updates for each processing stage
- View Results: Access transcription and summary in the detail view
- Manage History: Browse, search, and delete past transcriptions
The app follows this processing pipeline:
- URL Validation: Verify YouTube URL format and accessibility
- Metadata Extraction: Retrieve video title, duration, and thumbnail
- Audio Download: Extract and download audio from YouTube video
- Model Loading: Initialize WhisperKit and MLX models (first run only)
- Transcription: Convert audio to text using Whisper model
- Summarization: Generate summary using Gemma 3n model
- Storage: Save results to local database with SwiftData
- Local Processing: All AI operations happen on-device
- No Data Collection: No personal data is sent to external servers
- Secure Storage: Transcriptions stored locally using SwiftData
- Cache Management: Automatic cleanup of temporary files
This project is actively maintained and continuously improved. Current focus areas:
- Performance optimizations for larger videos
- Enhanced error handling and recovery
- Additional language support for summarization
- Export functionality for transcriptions
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
If you encounter any issues or have questions, please open an issue on GitHub or contact the development team.
Built with ❤️ using SwiftUI, WhisperKit, and MLX


