Transaction Fraud Detection

Overview

This repository contains a fraud detection pipeline for financial transactions, leveraging data preprocessing, feature engineering, class imbalance handling (SMOTE), and a diverse set of machine learning models (Logistic Regression, Random Forest, LightGBM, CatBoost, XGBoost, and ensemble methods).

Highlights:

Novel feature engineering (time-based features, transaction amount bucketing, etc.)
Handling imbalanced data via SMOTE
Boosting algorithms (LightGBM, XGBoost, CatBoost) for high-dimensional data
Advanced neural network approach with a supervised AutoEncoder for anomaly detection
Stacking and voting ensembles for robust, high AUC-ROC performance

Our best model (LightGBM) achieved AUC-ROC of 0.89 on the Vesta Corporation dataset.

Data

We use the Vesta Corporation dataset (part of a Kaggle competition) (https://www.kaggle.com/competitions/ieee-fraud-detection/overview) which includes:

Transaction data (TransactionID, card info, transaction amount, time, etc.)
Identity data (Device info, etc.)

Due to size and privacy concerns, the real dataset is not included in this repo.

Key columns:

TransactionID
isFraud (target)
TransactionDT, TransactionAmt
Category features (ProductCD, card1, card2, etc.)
Identity features (DeviceType, DeviceInfo)

Methodology

Data Preprocessing
- Missing value imputation
- High-correlation feature removal (via correlation heatmap)
- Encoding categorical features (one-hot or label encoding)
Feature Engineering
- Transaction amount bucketing (micro, small, etc.)
- Time-based features (day-of-week, hour-of-day)
- Email domain grouping (e.g., major providers vs. niche)
Handling Class Imbalance
- SMOTE (Synthetic Minority Oversampling Technique) to oversample the minority (fraud) class.
Model Training
- Logistic Regression, Random Forest (baselines)
- LightGBM, CatBoost, XGBoost (boosting methods)
- Hyperparameter tuning via Bayesian Optimization
- AUC-ROC as primary metric
Ensemble Methods
- Voting (soft voting across LGBM, CatBoost, XGB, etc.)
- Stacking with a meta-learner
AutoEncoder (Optional Neural Approach)
- A supervised autoencoder that outputs fraud probability (or uses reconstruction error).

Results

|------------------|----------|
| Model            | AUC-ROC  |
|------------------|----------|
| Logistic Reg     |   0.80   |
| Random Forest    |   0.855  |
| LightGBM         | **0.89** |
| CatBoost         |   0.881  |
| XGBoost          |   0.874  |
| Voting Ensembles |   0.86   |
| Stacking         |   0.88   |
| AutoEncoder      |   0.86   |
|------------------|----------|

LightGBM emerges as the top performer with 0.89 AUC-ROC, balancing speed and accuracy on this high-dimensional dataset.

Usage

Clone the repo:

git clone https://github.com/YourUser/transaction-fraud-detection.git
cd transaction-fraud-detection

Set up environment:

conda create -n fraud python=3.8
conda activate fraud
pip install -r requirements.txt

(Create a requirements.txt if you like.)

Jupyter Notebook:
```
jupyter notebook notebooks/main.ipynb
```
Adjust paths as needed to point to your dataset.

Next Steps

Explore other techniques for class imbalance (e.g., ADASYN, cost-sensitive learning).
Investigate deeper neural network architectures or specialized anomaly detection methods.
Implement real-time streaming pipelines (Spark Streaming, Kafka) for transaction-level fraud detection.

Acknowledgments

Dataset by Vesta Corporation [https://www.kaggle.com/competitions/ieee-fraud-detection/overview].
Project under Dr. Yanjie Fu, Arizona State University.

License

This project is released under the MIT License. That means you’re free to use, modify, and distribute the code, but you do so at your own risk.

Contact

Author: Varshith Dupati
GitHub: @dvarshith
Email: dvarshith942@gmail.com
Issues: Please open an issue on this repo if you have questions or find bugs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Transaction Fraud Detection

Overview

Data

Methodology

Results

Usage

Next Steps

Acknowledgments

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

varshithdupati/transaction-fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Transaction Fraud Detection

Overview

Data

Methodology

Results

Usage

Next Steps

Acknowledgments

License

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages