A production-ready machine learning system for detecting fraudulent credit card transactions using XGBoost, MLflow, and MinIO.
| Metric | Score | Status |
|---|---|---|
| Accuracy | 99.87% | 🏆 Excellent |
| AUC | 99.79% | 🏆 Excellent |
| Precision | 93.28% | ✅ Very Good |
| Recall | 83.21% | |
| F1 Score | 87.96% | ✅ Very Good |
- Docker & Docker Compose
- Python 3.9+
- Git
git clone https://github.com/your-username/transfraud-model.git
cd transfraud-model# Start MLflow, MinIO, and monitoring services
docker-compose up -dpython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txt# Create data directory and add your dataset
mkdir -p data
# Place your credit-card-transaction-fraud-dataset.csv in data/ folderpython src/train.pytransfraud-model/
├── src/ # Source code
│ ├── train.py # Main training script
│ ├── data_utils.py # Data loading & validation
│ ├── features.py # Feature engineering
│ ├── eval_utils.py # Model evaluation
│ └── __init__.py
├── data/ # Dataset directory
├── artifacts/ # Generated artifacts
│ ├── models/ # Trained models
│ ├── plots/ # Evaluation plots
│ ├── reports/ # Performance reports
│ └── analysis/ # Feature analysis
├── logs/ # Training logs
├── docker-compose.yml # Service orchestration
├── requirements.txt # Python dependencies
└── README.md # This file
- XGBoost - Gradient boosting for fraud detection
- Scikit-learn - Data preprocessing & evaluation
- Pandas & NumPy - Data manipulation
- MLflow - Experiment tracking & model registry
- MinIO - S3-compatible artifact storage
- Docker - Containerization
- Boto3 - AWS SDK for MinIO integration
- Matplotlib/Seaborn - Performance visualization
- Structured Logging - Comprehensive training logs
- Time-based Features: Hour, day, month extraction
- Location Features: Distance calculations
- Categorical Encoding: Merchant, category, location encoding
- Transaction Patterns: Amount analysis, frequency features
- Class Imbalance Handling: Scale positive weight adjustment
- Threshold Optimization: Dynamic classification thresholds
- Multiple Model Strategies: Balanced, recall-optimized, high-recall variants
MLFLOW_TRACKING_URI=http://localhost:5001
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
AWS_ACCESS_KEY_ID=minio
AWS_SECRET_ACCESS_KEY=minio123- MLflow: http://localhost:5001
- MinIO: http://localhost:9000
- MinIO Console: http://localhost:9001
Access experiment results at: http://localhost:5001
- Parameter tracking
- Metric comparison
- Artifact storage
- Model versioning
Access stored models at: http://localhost:9001
- Organized model versions
- Evaluation plots
- Performance reports
- Feature analysis
- ROC Curves & Confusion Matrices
- Feature Importance Analysis
- Classification Reports
- Model Performance Metrics
python src/train.py# With debug logging
python src/train.py --log-level DEBUG
# Single model strategy (faster)
python src/train.py --single-model- Data Validation - Schema and quality checks
- Feature Engineering - Time, location, categorical features
- Model Selection - Multiple XGBoost variants compared
- Threshold Optimization - Recall-precision balance
- Artifact Generation - Plots, reports, analysis
- Model Registration - MLflow model registry
- Storage - MinIO artifact organization
amt,is_fraud,merchant,category,city,state,lat,long,merch_lat,merch_long,trans_date_trans_time
29.99,0,store_A,gas,New York,NY,40.7128,-74.0060,40.7130,-74.0062,2023-01-01 08:30:00
amt: Transaction amount (numeric)is_fraud: Fraud indicator (0/1, binary)merchant,category: Transaction details- Location fields:
lat,long,merch_lat,merch_long - Timestamp:
trans_date_trans_time
The model prioritizes these patterns for fraud detection:
- Transaction Amount Patterns
- Geographic Anomalies
- Time-based Suspicious Activity
- Merchant Category Risk Levels
- User Behavioral Patterns
- 83% Fraud Detection Rate (Recall)
- 93% Alert Accuracy (Precision)
- <7% False Positive Rate
- 99.8% Overall Accuracy
Minimum Standards:
- Recall: >80%
- Precision: >85%
- F1 Score: >85%
- AUC: >95%Access real-time metrics at:
- MLflow: http://localhost:5001
- MinIO: http://localhost:9001
- Data Validation → 2. Model Training → 3. Evaluation → 4. Registry → 5. Deployment
- Data quality checks
- Model performance validation
- Artifact integrity verification
- Storage confirmation
- Model training pipeline
- Experiment tracking
- Artifact storage
- Model versioning
- Performance monitoring
- Docker containerization
- Real-time inference API
- Automated retraining
- Advanced feature engineering
- A/B testing framework
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open Pull Request
- Follow PEP 8 guidelines
- Include type hints
- Add comprehensive logging
- Update documentation
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset: Kaggle Fraud Detection
- MLflow for experiment tracking
- XGBoost for robust gradient boosting
- MinIO for S3-compatible storage
Ready for Production 🚀
For questions or support, please open an issue or reach out to me at nestorabiawuh@gmail.com.