Skip to content

End to End Machine Learning Pipeline Project. Insurance Premium Prediction is an Machine Learning Project which predicts Insurance premium price based on some input data.

License

Notifications You must be signed in to change notification settings

ishanbakshi91/ML-Project

Repository files navigation

ML-Project

End to End ML Pipeline project with Modular coding

Problem Statement

The purpose of this data is to examine the different features and to observe their relationship. The ML model is based on several features of individuals, such as age, physical/family condition, and location, against their existing medical expenses to predict future medical expenses of individuals. This helps the medical insurance company to decide and charge the premium.



Table of contents

About Project

This project aims to provide individuals with a personalised estimate of their healthcare needs, enabling them to choose a health insurance plan that aligns with those needs. By focusing on the projected costs from our study, customers can prioritise the health-related aspects of insurance policies over less relevant features.

Tech Stack used:

  • Python Modular Coding
  • Machine Learning
  • MongoDB Database
  • Jupyter Notebook
  • Git
  • CI/CD Pipeline
  • Streamlit

Setup

Create a conda environment

conda create -p venv python==3.11 -y

activate conda environment

conda activate venv

To install the requirement file

pip install -r requirements.txt
  • Add files to git git add . or git add <file_name>
  • To check the git status git status
  • To check all versions maintained by git git log
  • To create a version/commit all changes by git git commit -m "message"
  • To send version/changes to GitHub git push origin main

Project Architecture

architecture

Project Pipeline

  1. Data Ingestion
  2. Data Validation
  3. Data Transformation
  4. Model Training
  5. Model Evaluation
  6. Model Deployment

1. Data Ingestion:

  • Data ingestion is the process by which unstructured data is extracted from one or multiple sources and then prepared for training machine learning models.

2. Data Validation:

  • Data validation is an integral part of the ML pipeline. It is checking the quality of source data before training a new mode
  • It focuses on checking that the statistics of the new data are as expected (e.g. feature distribution, number of categories, etc).

3. Data Transformation

  • Data transformation is the process of converting raw data into a format or structure that is more suitable for model building.
  • It is an imperative step in feature engineering that facilitates discovering insights.

4. Model Training

  • Model training in machine learning is the process in which a machine learning (ML) algorithm is fed with sufficient training data to learn from.

5. Model Evaluation

  • Model evaluation is the process of using different evaluation metrics to understand a machine learning model’s performance, as well as its strengths and weaknesses.
  • Model evaluation is important to assess the efficacy of a model during initial research phases, and it also plays a role in model monitoring.

6. Model Deployment

  • Deployment is the method by which we integrate a machine-learning model into the production environment to make practical business decisions based on data.





About

End to End Machine Learning Pipeline Project. Insurance Premium Prediction is an Machine Learning Project which predicts Insurance premium price based on some input data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages