This project tackles the common challenge of class imbalance in credit card fraud detection by leveraging Generative Adversarial Networks (GANs) to create synthetic fraud transaction data. The goal is to improve machine learning model performance by generating realistic synthetic samples of the minority class.
Credit card fraud is a rare yet high-impact event. Traditional oversampling techniques (e.g., SMOTE) may not accurately replicate the distribution of fraudulent transactions. This project uses GANs to create realistic synthetic fraud samples that balance the dataset and enhance model recall and F1-score β ultimately improving fraud detection efficiency.
- Python π
- Pandas, NumPy, Matplotlib, Seaborn
- Scikit-learn
- TensorFlow / Keras (GANs)
- PCA (Principal Component Analysis)
- ML Algorithms: Logistic Regression, Random Forest, XGBoost
- Data Preprocessing β Normalized and cleaned the dataset for training.
- Imbalance Analysis β Visualized class distribution showing imbalance.
- GAN Training β Developed and trained a GAN to synthesize fraud data.
- PCA Visualization β Compared real vs. synthetic data using PCA plots.
- Model Evaluation β Compared model performance with/without GAN data.
- Performance Boost β Demonstrated improved F1-score and recall.
- The Generator network learns to produce realistic fraud samples from noise.
- The Discriminator learns to distinguish between real and fake transactions.
- Together, the adversarial setup drives the generator to create highly realistic data.
The following PCA plot illustrates how closely the synthetic fraud samples (orange) generated by the GAN align with the real fraud transactions (blue) in the feature space.
Blue = Real Fraud | Orange = Synthetic Fraud
A strong overlap indicates successful data generation and realistic learning by the GAN.
This project demonstrates the practical use of Generative AI in fraud analytics. By combining GANs and machine learning, it showcases an advanced technique to solve real-world problems like data imbalance, making it ideal for a data science and AI portfolio.