This project builds a complete Machine Learning pipeline to predict customer churn using real telecom data.
Predict whether a customer is likely to leave the telecom service (churn) based on their usage and profile.
Source: Telco Customer Churn - Kaggle
Rows: 7043 customers
Target Column: Churn
(Yes/No)
- Tuned with
GridSearchCV
- Evaluated with Accuracy, Confusion Matrix, and Classification Report
- Explained using SHAP and LIME
- Data Loading using Pandas
- Data Cleaning & Preprocessing
- Handled missing values
- Encoded categorical variables
- Converted
TotalCharges
to numeric - Target column:
Churn
→ binary
- Train/Test Split
- Model Training with XGBoost + hyperparameter tuning
- Model Interpretation
- Global + Local explanations using SHAP
- Individual prediction explained using LIME
- Saving Outputs
- Model saved as
.pkl
- Predictions saved as
.csv
and.db
- Model saved as
-
Best Parameters:
n_estimators=50
,learning_rate=0.1
,max_depth=4
-
Accuracy:
79.3%
-
Classification Report:
Class | Precision | Recall | F1-Score |
---|---|---|---|
No Churn | 0.83 | 0.90 | 0.87 |
Churn | 0.65 | 0.49 | 0.56 |


File | Description |
---|---|
churn_model.pkl |
Trained XGBoost model |
churn_predictions.csv |
Saved test predictions |
churn_results.db |
SQLite version of predictions |
Telco-Customer-Churn.csv |
Raw dataset |
churn_notebook.ipynb |
Full end-to-end ML notebook |
README.md |
This project overview |
- Data Cleaning & Preprocessing
- Feature Engineering
- XGBoost + Hyperparameter Tuning
- Model Interpretation (SHAP + LIME)
- SQLite Integration
- Git & GitHub project documentation
Sushma Sandanshiv
🔗 LinkedIn
💻 GitHub