A comprehensive, real-time emotion recognition system that combines Facial Emotion Recognition (FER) and Textual Emotion Recognition (TER) with advanced multimodal fusion capabilities and Furhat robot integration. This system provides accurate emotion detection through computer vision and natural language processing, with support for interactive robotics applications.
- π Facial Emotion Recognition: Real-time emotion detection from camera feed using CNN models trained on FER2013
- π¬ Textual Emotion Recognition: Voice-to-text emotion analysis using DistilBERT-based models
- π Multimodal Fusion: Advanced fusion strategies (confidence-based, weighted average, and formula-based)
- π€ Furhat Integration: Complete social robot platform integration with real-time emotion feedback
- β‘ Real-time Processing: Live emotion recognition with interactive GUI interfaces
- π‘οΈ Robust Architecture: Comprehensive error handling and fallback mechanisms
- π Comprehensive Testing: Full test suite with system health checks and validation
The system includes comprehensive testing capabilities to ensure reliability:
# Run comprehensive system tests
python tests/test_multimodal_system.py # Complete system validation
python tests/test_fer_model.py # FER component testing
python tests/test_furhat_integration.py # Furhat integration testing
python tests/test_formula_fusion.py # Fusion algorithm testing
python tests/test_corrected_formula.py # Mathematical validation
# Validate system configuration and dependencies
python demos/demo_multimodal_usage.py
The test suite validates:
- Model loading and initialization
- Camera and microphone availability
- Emotion fusion algorithm accuracy
- Robot integration functionality
- Error handling and fallback mechanisms
fer_and_ter_model/
βββ src/ # Source code
β βββ fer/ # Facial Emotion Recognition
β β βββ camera_fer_inference.py
β βββ ter/ # Textual Emotion Recognition
β β βββ voice_ter_inference.py
β β βββ setup_voice_ter.py
β βββ multimodal/ # Multimodal Fusion
β β βββ multimodal_emotion_inference.py
β βββ furhat/ # Furhat Robot Integration
β β βββ furhat_multimodal_emotion_inference.py
β βββ utils/ # Shared utilities
βββ models/ # Trained models
β βββ fer2013_final_model.pth
β βββ ter_distilbert_model/
βββ datasets/ # Dataset files
β βββ multimodal_emotion_dataset.json
βββ notebooks/ # Jupyter notebooks
β βββ fer2013_model_training.ipynb
β βββ multimodal_emotion_fusion.ipynb
β βββ multimodal_emotion_recognition.ipynb
β βββ textual_emotion_recognition_distilbert.ipynb
βββ tests/ # Test scripts
β βββ test_corrected_formula.py
β βββ test_fer_model.py
β βββ test_formula_fusion.py
β βββ test_furhat_integration.py
β βββ test_multimodal_system.py
βββ demos/ # Demo scripts
β βββ demo_furhat_usage.py
β βββ demo_multimodal_usage.py
β βββ demo_usage.py
β βββ demo_voice_ter.py
βββ docs/ # Documentation
β βββ DATASET_SETUP.md
β βββ FER_PROJECT_SUMMARY.md
β βββ FURHAT_INTEGRATION_SUMMARY.md
β βββ IMPLEMENTATION_SUMMARY.md
β βββ JUNIE.md
β βββ README_*.md files
βββ requirements/ # Requirements files
β βββ requirements_backup.txt
β βββ requirements_camera_inference.txt
β βββ requirements_furhat.txt
β βββ requirements_multimodal.txt
β βββ requirements_voice_ter.txt
βββ README.md # This file
- Python 3.8+ (3.11+ recommended)
- Webcam for facial emotion recognition
- Microphone for voice/text emotion recognition
- GPU support recommended for optimal performance
-
Clone the repository:
git clone https://github.com/kudosscience/fer_and_ter_model.git cd fer_and_ter_model
-
Install the package:
# Install with all components pip install -e ".[all]" # OR install specific components: pip install -e ".[multimodal]" # For multimodal processing pip install -e ".[furhat]" # For Furhat robot integration pip install -e ".[camera_inference]" # For FER only pip install -e ".[voice_ter]" # For TER only
-
Alternative: Install from requirements files:
# Choose based on your use case: pip install -r requirements/requirements_multimodal.txt # Recommended pip install -r requirements/requirements_furhat.txt # For robot integration pip install -r requirements/requirements_camera_inference.txt # FER only pip install -r requirements/requirements_voice_ter.txt # TER only
# Run the comprehensive multimodal demo
python demos/demo_multimodal_usage.py
# Try individual components
python demos/demo_usage.py # Basic FER demo
python demos/demo_voice_ter.py # TER demo
python demos/demo_furhat_usage.py # Furhat integration demo
After installation, you can use these console commands:
# Run individual components
fer-camera # Launch facial emotion recognition
ter-voice # Launch textual emotion recognition
multimodal-emotion # Launch multimodal system
furhat-emotion # Launch Furhat integration
- Real-time Processing: Live camera feed emotion detection with GUI overlay
- CNN Architecture: Deep learning model trained on FER2013 dataset
- 7 Emotion Classes: Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral
- High Accuracy: Optimized model with robust preprocessing pipeline
- Fallback Support: Graceful handling of camera unavailability
Usage:
# Direct script execution
python src/fer/camera_fer_inference.py
# Console command
fer-camera
- Voice-to-Text Pipeline: Real-time speech recognition and emotion analysis
- DistilBERT Model: State-of-the-art transformer-based emotion classification
- Multi-emotion Support: Comprehensive emotion category detection
- Robust Processing: Advanced text preprocessing and normalization
- Audio Fallback: Multiple audio input handling strategies
Usage:
# Direct script execution
python src/ter/voice_ter_inference.py
# Console command
ter-voice
The core innovation of this system - combines FER and TER for superior accuracy:
-
Multiple Fusion Strategies:
- Confidence-based: Selects prediction with highest confidence score
- Weighted Average: Combines predictions with 60% facial, 40% textual weighting
- Formula-based: Advanced mathematical fusion using custom algorithm
-
Real-time Integration: Simultaneous processing of visual and audio streams
-
Adaptive Fallback: Works with single modality when needed
-
Interactive GUI: Live visualization of both streams and fusion results
Usage:
# Default multimodal processing
python src/multimodal/multimodal_emotion_inference.py
# With specific fusion strategy
python src/multimodal/multimodal_emotion_inference.py --fusion confidence_based
python src/multimodal/multimodal_emotion_inference.py --fusion weighted_average
python src/multimodal/multimodal_emotion_inference.py --fusion formula_based
# Console command
multimodal-emotion
Complete social robotics platform integration for interactive emotion recognition:
- Furhat Remote API: Official SDK integration for robot communication
- Interactive Responses: Robot gestures, speech, and LED feedback based on detected emotions
- Voice Integration: Uses robot's microphone for natural interaction
- Real-time Feedback: Immediate emotional responses and social cues
- Robust Connection: Graceful fallback when robot unavailable
Robot Capabilities:
- Emotional gesture mapping (BigSmile, Frown, Surprised, etc.)
- LED color changes reflecting emotional states
- Speech synthesis for emotion acknowledgment
- Interactive conversation flow
Usage:
# Furhat integration (requires robot connection)
python src/furhat/furhat_multimodal_emotion_inference.py
# With fusion strategy
python src/furhat/furhat_multimodal_emotion_inference.py --fusion formula_based
# Console command
furhat-emotion
Detailed documentation for each component is available in the docs/
directory:
- IMPLEMENTATION_SUMMARY.md - Complete technical implementation overview
- FER_PROJECT_SUMMARY.md - Facial emotion recognition deep dive
- FURHAT_INTEGRATION_SUMMARY.md - Robot integration guide
- DATASET_SETUP.md - Dataset preparation and training
- README_multimodal.md - Multimodal system usage and configuration
- README_furhat.md - Furhat robot setup and integration
- README_camera_inference.md - FER system configuration
- README_voice_ter.md - Voice/text emotion recognition setup
# Use custom FER model
python src/multimodal/multimodal_emotion_inference.py \
--fer_model ./path/to/custom_fer_model.pth
# Use custom TER model
python src/multimodal/multimodal_emotion_inference.py \
--ter_model ./path/to/custom_ter_model/
# Confidence-based fusion (chooses most confident prediction)
python src/multimodal/multimodal_emotion_inference.py --fusion confidence_based
# Weighted average fusion (60% facial, 40% textual)
python src/multimodal/multimodal_emotion_inference.py --fusion weighted_average
# Formula-based fusion (mathematical optimization)
python src/multimodal/multimodal_emotion_inference.py --fusion formula_based
# GPU acceleration (if available)
export CUDA_VISIBLE_DEVICES=0
python src/multimodal/multimodal_emotion_inference.py
# CPU-only mode
export CUDA_VISIBLE_DEVICES=""
python src/multimodal/multimodal_emotion_inference.py
The system follows a modular architecture allowing easy extension and modification:
src/
- Core source code with modular componentsmodels/
- Pre-trained models (FER CNN, TER DistilBERT)datasets/
- Training and evaluation datasetstests/
- Comprehensive test suitedemos/
- Usage examples and demonstrationsdocs/
- Detailed technical documentationrequirements/
- Component-specific dependency management
- Create module in appropriate
src/
subdirectory - Add requirements to relevant requirements file
- Create tests in
tests/
directory - Add demo in
demos/
directory - Update documentation in
docs/
This project uses professional development practices:
- Modular Design: Each component is self-contained and reusable
- Comprehensive Testing: Full test coverage with validation scripts
- Documentation: Extensive documentation for all components
- Package Management: Proper Python packaging with setuptools
- Console Integration: Command-line tools for easy usage
Camera not detected:
# Test camera access
python -c "import cv2; print('Camera available:', cv2.VideoCapture(0).isOpened())"
Microphone issues:
# Test microphone access
python -c "import speech_recognition as sr; print('Microphone available:', len(sr.Microphone.list_microphone_names()) > 0)"
Model loading errors:
- Ensure models are downloaded and in
models/
directory - Check file permissions and paths
- Verify CUDA availability for GPU models
Furhat connection issues:
- Verify robot IP address and port
- Check network connectivity
- Ensure Furhat Remote API service is running
- Use GPU acceleration when available
- Close other applications using camera/microphone
- Ensure adequate lighting for facial recognition
- Use external microphone for better voice recognition
- Run system health check before important usage
- Architecture: Custom CNN trained on FER2013
- Input: 48x48 grayscale facial images
- Output: 7 emotion classes (Happy, Sad, Angry, Fear, Surprise, Disgust, Neutral)
- Accuracy: Optimized for real-time performance with robust preprocessing
- Architecture: DistilBERT-based transformer
- Input: Text transcribed from speech
- Output: Multi-dimensional emotion classification
- Features: Advanced text preprocessing and normalization
- Confidence-based: Selects highest confidence prediction
- Weighted Average: Optimized 60/40 facial/textual weighting
- Formula-based: Mathematical fusion using correlation analysis
- Research: Emotion recognition research and experimentation
- Education: Teaching multimodal AI and emotion recognition
- Healthcare: Patient emotion monitoring and therapy assistance
- Human-Computer Interaction: Emotional interfaces and feedback systems
- Social Robotics: Interactive robots with emotional intelligence
- Accessibility: Emotion-aware assistive technologies
The system has been validated across multiple scenarios:
- Real-time Processing: < 100ms latency for combined FER+TER
- Accuracy: Improved performance through multimodal fusion
- Robustness: Graceful degradation with single modality
- Scalability: Modular architecture supports easy extension
This project is licensed under the MIT License - see the LICENSE file for details.
Henry Ward
- GitHub: @kudosscience
- Email: 45144290+kudosscience@users.noreply.github.com
- FER2013 dataset contributors
- Hugging Face Transformers library
- OpenCV community
- Furhat Robotics platform
- PyTorch and scikit-learn teams
For issues, questions, or contributions:
- Check the Issues page
- Review the comprehensive documentation in
docs/
- Run the system health check:
python demos/demo_multimodal_usage.py
- Create a new issue with detailed information
Built with β€οΈ for multimodal emotion recognition research and applications