Skip to content

coded-streams/transfraud-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection Model 🔍

A production-ready machine learning system for detecting fraudulent credit card transactions using XGBoost, MLflow, and MinIO.

Python MLflow XGBoost Docker

📊 Model Performance

Metric Score Status
Accuracy 99.87% 🏆 Excellent
AUC 99.79% 🏆 Excellent
Precision 93.28% ✅ Very Good
Recall 83.21% ⚠️ Good (Improving)
F1 Score 87.96% ✅ Very Good

🚀 Quick Start

Prerequisites

  • Docker & Docker Compose
  • Python 3.9+
  • Git

1. Clone & Setup

git clone https://github.com/your-username/transfraud-model.git
cd transfraud-model

2. Start Services

# Start MLflow, MinIO, and monitoring services
docker-compose up -d

3. Install Dependencies

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

4. Prepare Data

# Create data directory and add your dataset
mkdir -p data
# Place your credit-card-transaction-fraud-dataset.csv in data/ folder

5. Train Model

python src/train.py

🏗️ Project Architecture

transfraud-model/
├── src/                    # Source code
│   ├── train.py           # Main training script
│   ├── data_utils.py      # Data loading & validation
│   ├── features.py        # Feature engineering
│   ├── eval_utils.py      # Model evaluation
│   └── __init__.py
├── data/                   # Dataset directory
├── artifacts/             # Generated artifacts
│   ├── models/            # Trained models
│   ├── plots/             # Evaluation plots
│   ├── reports/           # Performance reports
│   └── analysis/          # Feature analysis
├── logs/                  # Training logs
├── docker-compose.yml     # Service orchestration
├── requirements.txt       # Python dependencies
└── README.md             # This file

🛠️ Tech Stack

Core ML

  • XGBoost - Gradient boosting for fraud detection
  • Scikit-learn - Data preprocessing & evaluation
  • Pandas & NumPy - Data manipulation

MLOps & Infrastructure

  • MLflow - Experiment tracking & model registry
  • MinIO - S3-compatible artifact storage
  • Docker - Containerization
  • Boto3 - AWS SDK for MinIO integration

Monitoring & Visualization

  • Matplotlib/Seaborn - Performance visualization
  • Structured Logging - Comprehensive training logs

📈 Model Features

Feature Engineering

  • Time-based Features: Hour, day, month extraction
  • Location Features: Distance calculations
  • Categorical Encoding: Merchant, category, location encoding
  • Transaction Patterns: Amount analysis, frequency features

Model Optimization

  • Class Imbalance Handling: Scale positive weight adjustment
  • Threshold Optimization: Dynamic classification thresholds
  • Multiple Model Strategies: Balanced, recall-optimized, high-recall variants

🔧 Configuration

Environment Variables

MLFLOW_TRACKING_URI=http://localhost:5001
MLFLOW_S3_ENDPOINT_URL=http://localhost:9000
AWS_ACCESS_KEY_ID=minio
AWS_SECRET_ACCESS_KEY=minio123

Docker Services

📊 Monitoring & Evaluation

MLflow Tracking

Access experiment results at: http://localhost:5001

  • Parameter tracking
  • Metric comparison
  • Artifact storage
  • Model versioning

MinIO Artifacts

Access stored models at: http://localhost:9001

  • Organized model versions
  • Evaluation plots
  • Performance reports
  • Feature analysis

Generated Artifacts

  • ROC Curves & Confusion Matrices
  • Feature Importance Analysis
  • Classification Reports
  • Model Performance Metrics

🎯 Model Training

Basic Training

python src/train.py

Advanced Options

# With debug logging
python src/train.py --log-level DEBUG

# Single model strategy (faster)
python src/train.py --single-model

Training Process

  1. Data Validation - Schema and quality checks
  2. Feature Engineering - Time, location, categorical features
  3. Model Selection - Multiple XGBoost variants compared
  4. Threshold Optimization - Recall-precision balance
  5. Artifact Generation - Plots, reports, analysis
  6. Model Registration - MLflow model registry
  7. Storage - MinIO artifact organization

🗂️ Data Requirements

Expected Dataset Format

amt,is_fraud,merchant,category,city,state,lat,long,merch_lat,merch_long,trans_date_trans_time
29.99,0,store_A,gas,New York,NY,40.7128,-74.0060,40.7130,-74.0062,2023-01-01 08:30:00

Required Columns

  • amt: Transaction amount (numeric)
  • is_fraud: Fraud indicator (0/1, binary)
  • merchant, category: Transaction details
  • Location fields: lat, long, merch_lat, merch_long
  • Timestamp: trans_date_trans_time

🔍 Model Interpretation

Key Features

The model prioritizes these patterns for fraud detection:

  1. Transaction Amount Patterns
  2. Geographic Anomalies
  3. Time-based Suspicious Activity
  4. Merchant Category Risk Levels
  5. User Behavioral Patterns

Business Impact

  • 83% Fraud Detection Rate (Recall)
  • 93% Alert Accuracy (Precision)
  • <7% False Positive Rate
  • 99.8% Overall Accuracy

🚨 Alerting & Monitoring

Performance Thresholds

Minimum Standards:
  - Recall: >80%
  - Precision: >85%
  - F1 Score: >85%
  - AUC: >95%

Monitoring Dashboard

Access real-time metrics at:

🔄 CI/CD Pipeline

Automated Workflow

  1. Data Validation → 2. Model Training → 3. Evaluation → 4. Registry → 5. Deployment

Quality Gates

  • Data quality checks
  • Model performance validation
  • Artifact integrity verification
  • Storage confirmation

🛡️ Production Readiness

✅ Completed

  • Model training pipeline
  • Experiment tracking
  • Artifact storage
  • Model versioning
  • Performance monitoring
  • Docker containerization

🔄 In Progress

  • Real-time inference API
  • Automated retraining
  • Advanced feature engineering
  • A/B testing framework

🤝 Contributing

Development Setup

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/amazing-feature
  3. Commit changes: git commit -m 'Add amazing feature'
  4. Push to branch: git push origin feature/amazing-feature
  5. Open Pull Request

Code Standards

  • Follow PEP 8 guidelines
  • Include type hints
  • Add comprehensive logging
  • Update documentation

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Dataset: Kaggle Fraud Detection
  • MLflow for experiment tracking
  • XGBoost for robust gradient boosting
  • MinIO for S3-compatible storage

Ready for Production 🚀

For questions or support, please open an issue or reach out to me at nestorabiawuh@gmail.com.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published