-
Notifications
You must be signed in to change notification settings - Fork 3k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Requested Feature
Add comprehensive MLX Whisper support to the docling ASR pipeline to provide 5x performance improvement on Apple Silicon devices through automatic hardware-aware model selection. The integration should be completely transparent to users - they simply use regular Whisper models and get MLX optimization automatically when beneficial.
User Need Addressed
- Performance: Apple Silicon users currently experience slower ASR performance compared to native MLX implementations
- Simplicity: Users shouldn't need to manually configure MLX-specific models or pipelines
- Compatibility: The solution should work seamlessly across all platforms with appropriate fallbacks
- Transparency: Existing code should work unchanged while gaining performance benefits
Key Requirements
- Automatic Detection: Detect Apple Silicon (MPS) and MLX Whisper availability
- Transparent Integration: Users use regular
WHISPER_TURBO
,WHISPER_BASE
, etc. - Smart Fallback: Fall back to native Whisper on non-Apple Silicon systems
- Complete Coverage: Support all Whisper model sizes (tiny, small, base, medium, large, turbo)
- CLI Enhancement: Automatic pipeline detection for audio files
- Type Safety: Proper type annotations and MyPy compliance
Alternatives
Alternative 1: Manual MLX Configuration
- Approach: Require users to explicitly configure MLX models
- Rejected: Adds complexity and breaks transparency principle
- Why: Users would need to know about MLX-specific models and configuration
Alternative 2: Separate MLX Pipeline
- Approach: Create a separate MLX-specific ASR pipeline
- Rejected: Fragments the user experience and requires manual pipeline selection
- Why: Users would need to choose between native and MLX pipelines
Alternative 3: Runtime Detection Only
- Approach: Only detect MLX at runtime without automatic model selection
- Rejected: Doesn't provide the seamless experience users expect
- Why: Users would still need to manually configure MLX models
Chosen Solution: Automatic Hardware-Aware Selection
- Approach: Embed automatic MLX/Native Whisper selection directly into model specs
- Benefits:
- Completely transparent to users
- Automatic performance optimization
- Backward compatible
- Works across all interfaces (Python API, CLI, examples)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request