This project focuses on solving a multi-class facial expression recognition problem using the FER2013 dataset. Through transfer learning, sampling strategies, face alignment, and explainable AI techniques, a robust and real-time expression recognition system has been developed.
- Dataset and Class Imbalance
- Sampling Strategies
- Model Architectures
- Face Alignment
- Explainability with LIME
- Model Comparisons
- Real-Time Application
The FER2013 dataset contains 48x48 grayscale facial images labeled across 7 emotion classes: Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral.
The original dataset is highly imbalanced.
Original label distribution showing class imbalance.
To address the imbalance, three sampling strategies were applied:
Class distribution after applying undersampling and oversampling.
Balanced class distribution using hybrid sampling.
Four transfer learning-based CNN architectures were fine-tuned:
- XceptionNet (best performing)
- MobileNet-V3-Large-100
- EfficientNet-B0
- ResNet-18
Each model was evaluated using Precision, Accuracy, Recall, and F1 Score.
Face alignment was applied using MTCNN (Multi-task Cascaded Convolutional Networks), improving model focus and performance.
Without Alignment | With Alignment |
---|---|
![]() |
![]() |
LIME (Local Interpretable Model-Agnostic Explanations) was used to visualize model attention.
The following figures illustrate class-wise metrics before and after alignment for the XceptionNet model:
Before Alignment | After Alignment |
---|---|
![]() |
![]() |
The final models were exported to ONNX format and deployed in a real-time facial expression recognition system.
Pipeline:
- Face detection via MTCNN.
- 96x96 RGB face crops fed to ONNX model.
- Predicted emotion displayed on screen.
- Achieves high FPS and low latency performance.
- Sampling strategies and custom loss functions help mitigate class imbalance.
- Face alignment significantly improves model accuracy and reliability.
- ONNX conversion and simplification enable real-time applications.
- LIME visualizations aid in interpreting model behavior and misclassifications.
- FER2013 is grayscale and low-resolution, limiting fine-grained feature learning.
- Extremely low sample count in classes like Disgust still causes occasional misclassifications.
- Real-world deployment requires more diverse and high-resolution datasets.