Customer Churn Prediction

This project aims to predict customer churn using machine learning techniques. The primary goal is to build a predictive model that can determine whether a customer will churn (leave) based on their attributes. The project utilizes various classification algorithms and evaluates model performance using precision, recall, and accuracy metrics.

Overview

Customer churn refers to the phenomenon where customers stop doing business with a company. It is crucial for companies to predict churn, as this helps in customer retention strategies. In this project, we use machine learning algorithms to predict whether a customer will churn based on several features like customer demographics, subscription details, and usage behavior.

The following models are implemented:

Logistic Regression
Random Forest Classifier
K-Nearest Neighbors (KNN)

The models are trained, evaluated, and compared based on their performance on a given dataset.

Installation

To get started with the project, follow the steps below:

Clone the repository:

git clone https://github.com/codehass/Customer-Churn-Prediction.git

Navigate to the project directory:
```
cd Customer-Churn-Prediction
```
Install the required dependencies:

You can install the required Python libraries via pip. Make sure you have Python 3.6+ installed.
```
pip install -r requirements.txt
```
Alternatively, you can manually install the dependencies:
```
pip install pandas numpy scikit-learn matplotlib seaborn pytest
```

Data

The data used in this project comes from the Customer Churn Prediction dataset. The dataset contains customer information, such as demographics, account information, usage patterns, and whether the customer has churned.

The dataset is assumed to be in CSV format (data-68e11476082f9096032105.csv), with a column "Churn" indicating whether a customer has churned (Yes) or not (No).

Data Preprocessing

Loading the Data: The raw data is loaded using pandas.
Feature Engineering: The features are processed to clean the data, handle missing values, encode categorical variables, and scale numerical features.
Train-Test Split: The data is split into training and testing datasets using train_test_split from sklearn.

Modeling

Three machine learning models are used for churn prediction:

Logistic Regression
Random Forest Classifier
K-Nearest Neighbors (KNN)

Model Training and Evaluation

Each model is trained on the processed dataset and evaluated based on:

Accuracy: Overall correctness of the model.
Precision: Correct positive predictions divided by all positive predictions.
Recall: Correct positive predictions divided by actual positives.
F1-Score: Harmonic mean of precision and recall.

Precision-Recall (PR) curves are plotted to visualize the trade-off between precision and recall for each model.

Code Implementation

pipeline.py: Contains functions for data preprocessing and splitting the dataset.
eda_analysis.ipynb/: Contains Jupyter notebooks for exploratory data analysis (EDA) and model experimentation.
tests/: Contains unit tests for various parts of the pipeline, ensuring that the data processing and modeling steps function correctly.

Evaluation

The model's performance is evaluated based on several metrics:

Confusion Matrix
Precision-Recall Curve

The precision-recall curve is plotted for all three models, and the one with the best trade-off between precision and recall is selected as the final model for churn prediction.

Running the Project

Open the eda_analysis.ipynb and run the cells in sequence. This notebook trains the three models: Logistic Regression, Random Forest, and K-Nearest Neighbors. The notebook will:

Load the dataset
Preprocess the data
Train each model

Evaluate each model’s performance based on accuracy, precision, recall, and F1-score

Plot Precision-Recall curves for comparison

2. View Results

After running the training and evaluation steps, you will see evaluation metrics such as:

Confusion Matrix

Precision-Recall Curve

These metrics will help you compare the performance of the models and choose the best one.

Tests

This project includes unit tests to ensure the correctness of various parts of the code, including:

Data processing and splitting: Verify the consistency and correctness of train-test splits.
Model evaluation: Ensure that the models are evaluated correctly.

To run the tests:

pytest

This will run all the tests in the tests/ directory.

Contributing

Contributions are welcome! If you find a bug or want to improve the project, feel free to fork the repository and submit a pull request.

To contribute:

Fork the repository
Create a new branch
Make your changes
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
.gitignore		.gitignore
README.md		README.md
eda_analysis.ipynb		eda_analysis.ipynb
pipeline.py		pipeline.py
requirements.txt		requirements.txt
test_pipeline.py		test_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Churn Prediction

Table of Contents

Overview

The following models are implemented:

Installation

Data

Data Preprocessing

Modeling

Model Training and Evaluation

Code Implementation

Evaluation

Running the Project

Tests

Contributing

License

About

Uh oh!

Releases

Packages

Languages

codehass/Customer-Churn-Prediction

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

Table of Contents

Overview

The following models are implemented:

Installation

Data

Data Preprocessing

Modeling

Model Training and Evaluation

Code Implementation

Evaluation

Running the Project

Tests

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages