The project aims to optimize the loan approval strategy using machine learning techniques. Here, the goal is to predict whether a loan application will be approved or denied based on various applicant features and financial data. The dataset includes information about applicants, and the objective is to predict if the loan should be approved ("Y") or denied ("N").
An application has been deployed to allow users to check the status of their loan approval. The app uses the trained machine learning models to predict loan approval based on the provided applicant details.
In this project, we aim to develop a machine learning model that helps in making informed loan approval decisions. The project explores various classification models, such as Logistic Regression, Decision Tree, and Random Forest, to determine the best approach for predicting loan approval outcomes.
The dataset includes the following features:
- Applicant Demographics: Information such as age, gender, marital status, education, etc.
- Loan Details: Loan amount, loan term, purpose, etc.
- Financial Information: Applicant’s income, credit score, existing debts, and financial history.
- Target Variable:
Y: The loan was approved.N: The loan was denied.
| Feature | Description |
|---|---|
| Age | Age of the applicant |
| Credit Score | Credit score of the applicant |
| Loan Amount | Amount requested by the applicant |
| Annual Income | Annual income of the applicant |
| Loan Approval (Target) | Whether the loan was approved (Y) or denied (N) |
The dataset was cleaned and preprocessed, including:
- Handling missing values
- Encoding categorical features
- Scaling numerical features to improve model performance
The following machine learning models were used for training:
- Logistic Regression
- Decision Tree
- Random Forest
The models were evaluated based on accuracy, precision, recall, F1-score, and confusion matrix to select the best model for loan approval prediction.
| Model | Accuracy | Precision (Y) | Recall (Y) | F1-Score (Y) |
|---|---|---|---|---|
| Logistic Regression | 0.79 | 0.76 | 0.99 | 0.86 |
| Decision Tree | 0.69 | 0.76 | 0.78 | 0.77 |
| Random Forest | 0.76 | 0.75 | 0.94 | 0.83 |
Logistic Regression outperformed other models with high recall for the positive class (Y), meaning it is very good at identifying loan approval cases.
To run this project, ensure you have Python 3.x installed. It's recommended to use a virtual environment to keep dependencies isolated.
-
Clone the repository:
git clone https://github.com/your-username/loan-approval-strategy-optimization
-
Install required dependencies:
pip install -r requirements.txt
-
Run the project: To run the Jupyter notebook for exploration and model development:
jupyter notebook
The project is implemented in a Jupyter notebook, where the following tasks are performed:
Data Preprocessing: Cleaning, encoding, and scaling the data.
Model Training: Training Logistic Regression, Decision Tree, and Random Forest models.
Model Evaluation: Evaluating the models using key performance metrics.
You can use the provided code to experiment with other models or datasets to improve the loan approval prediction system.
Logistic Regression emerged as the best model with a high recall rate, indicating its effectiveness in detecting loan approval cases.
The project demonstrates how machine learning can be used to optimize loan approval strategies, aiding financial institutions in making more accurate and data-driven decisions.
An application has been deployed to check the status of loan approval, enabling users to interact with the model in a user-friendly way.