This project focuses on detecting fraudulent transactions using machine learning. It leverages Logistic Regression as a baseline model and applies SMOTE to handle class imbalance, ensuring better fraud detection performance.
- Goal: Identify fraudulent transactions in a highly imbalanced dataset.
- Model: Logistic Regression
- Techniques Used:
- Data preprocessing
- SMOTE (Synthetic Minority Over-sampling Technique)
- Model training and evaluation
- Confusion matrix and classification metrics
- The dataset contains transaction records labeled as:
0
: Non-fraudulent1
: Fraudulent
- Highly imbalanced: Fraud cases are rare compared to non-fraud.
- Python
- scikit-learn
- imbalanced-learn
- pandas, numpy
- matplotlib / seaborn (optional for visualization)
-
Data Preprocessing
- Feature selection and scaling
- Train-test split
-
Balancing the Training Set
- Applied SMOTE to oversample the minority class
-
Model Training
- Trained Logistic Regression on the balanced dataset
-
Evaluation
- Confusion matrix
- Classification report
- Accuracy, precision, recall, F1-score
[[55406 1458] [ 8 90]]
Class | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 (Non-Fraud) | 1.00 | 0.97 | 0.99 | 56864 |
1 (Fraud) | 0.06 | 0.92 | 0.11 | 98 |
- Accuracy: 97.4%
- High recall for fraud: 92% of fraud cases detected
- Low precision for fraud: Many false positives
Oussama
Passionate about building scalable systems, solving technical challenges, and exploring data-driven solutions.
Feel free to reach out or contribute to the project!