Skip to content

πŸ”΄ Predicting Insurance Claim Amounts πŸ”΄ This project analyzes the Medical Cost Personal Insurance Dataset to understand key factors influencing healthcare expenses. Through data cleaning, visualization, and feature engineering, important patterns in age, BMI, smoking, and region were uncovered.

Notifications You must be signed in to change notification settings

Abdullah321Umar/DevelopersHub-DataScience-Analytics_Internship-TASK1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

29 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌟 Task 1: Insurance Claim Cost Prediction | πŸ” Data Intelligence Behind Healthcare Expenses

A Data Analytics & Machine Learning Project by Abdullah Umar

🌐 Prologue β€” Decoding the Economics of Healthcare Through Data

In today’s rapidly evolving healthcare landscape, understanding what drives medical expenses is more important than ever. From policyholders to insurance companies, everyone seeks clarity on why medical costs vary β€” and how future claims can be predicted more accurately. In this project, I dive deep into the Medical Cost Personal Insurance Dataset to analyze, interpret, and predict insurance claim charges using the power of data analytics and machine learning. This journey transforms raw healthcare data into powerful insights β€” revealing how demographic and lifestyle factors shape medical expenditure patterns. πŸ“ŠπŸ’‘


🎯 Project Overview β€” Predicting Medical Insurance Charges with Precision

The Insurance Claim Cost Prediction Project is a comprehensive analytical and predictive modeling study designed to:

  • Explore the hidden trends behind medical insurance charges
  • Understand the impact of variables such as age, BMI, region, and smoking
  • Build a machine learning model that predicts insurance claim amounts
  • Visualize key patterns with rich, meaningful, and creative visualizations
  • This project demonstrates the fusion of data science and health domain analytics, enabling smarter and more transparent decision-making.

🧩1️⃣ Dataset Foundation β€” The Blueprint of Healthcare Costs

The dataset provides a detailed summary of individuals insured under a health insurance plan with these key features:

πŸ“Š Dataset Composition

  • Total Records: 1,338

Features:

  • age β€” Age of the insured individual
  • sex β€” Gender
  • BMI β€” Body mass index
  • children β€” Number of dependents
  • smoker β€” Smoking status
  • region β€” Geographical location
  • charges β€” Actual medical claims (target variable)

πŸ’‘ Insight:

These features hold the potential to reveal how lifestyle, demographics, and personal choices contribute to medical expenses.


🧼2️⃣ Data Preparation & Refinement β€” Crafting Clean and Reliable Data

A comprehensive preprocessing pipeline was implemented to ensure that the dataset was clean, consistent, and ready for modeling:

πŸ”§ Operations Performed

  • Checked for missing values (dataset confirmed clean)
  • Transformed categorical variables using Label Encoding
  • Conducted outlier detection and analysis
  • Performed feature scaling where necessary
  • Explored data distributions using statistical summaries

πŸ’‘ Insight:

Quality data lays the groundwork for accurate predictions. Preprocessing ensures that the machine learning model learns from correct, unbiased patterns.


🎨3️⃣ Exploratory Visual Intelligence β€” Bringing Healthcare Data to Life

Visualization is the heart of this project. Using Matplotlib, Seaborn, and bright themes, I created a series of colorful and meaningful insights:

🌈 15+ Visuals Crafted

Some highlights include:

  • Age vs. Medical Charges β€” Line & scatter patterns revealing cost escalation
  • BMI Distribution β€” Understanding weight-related risks
  • Smoker vs. Non-Smoker Charges β€” The biggest cost gap visualized
  • Charges by Region β€” Geographic healthcare expense differences
  • Correlation Heatmap β€” Relationships influencing claim amounts
  • Children vs. Charges β€” Dependency count impact
  • Sex-wise Cost Comparison
  • BMI Category vs. Charges (Obese, Overweight, Fit)
  • Boxplots, Histograms, Pairplots, Countplots, KDE plots, and more

πŸ’‘ Insight:

These visuals convert healthcare complexity into accessible insights β€” exposing hidden drivers of medical costs.


πŸ€–4️⃣ Predictive Modeling β€” Machine Learning Behind Insurance Claims

To predict insurance charges, multiple regression approaches were tested:

πŸ” Models Implemented

  • Linear Regression
  • Random Forest Regressor
  • Decision Tree Regressor After evaluating performance:
  • πŸ”₯ Random Forest delivered the most accurate and stable predictions
  • Metrics like MAE, MSE, and RΒ² Score confirmed model reliability

πŸ’‘ Insight:

Machine learning uncovers non-linear relationships beyond human intuition β€” enabling smarter premium pricing strategies.


πŸ“Œ5️⃣ Analytical Insights & Key Discoveries

🧭 Major Findings:

  • Smokers have drastically higher medical charges compared to non-smokers
  • BMI strongly impacts medical costs, especially in obesity ranges
  • Age is a major cost driver, with expenses rising steadily in older individuals
  • Region affects charges, hinting at lifestyle and cost-of-living differences
  • Families with more children tend to have stable but slightly higher costs

πŸ’‘ Inference:

These insights help insurance providers design better policies while enabling individuals to understand financial health risks linked to lifestyle.


🧰 6️⃣ Tools & Technologies

🐍 Programming Language

  • Python

πŸ“Š Libraries Used

  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • Plotly (optional)

πŸ’‘ Workflow

A structured workflow ensured seamless movement from data preprocessing β†’ visualization β†’ modeling β†’ insights.


🌟7️⃣ Concluding Thoughts β€” The Story Behind Insurance Claims

Medical expenses aren’t just numbers β€” they reflect lifestyle choices, health conditions, and demographic realities. This project highlights how data analytics can demystify insurance costs and empower better decision-making for:

  • Individuals
  • Healthcare planners
  • Insurance companies From understanding risks to building predictive systems, this project showcases the power of data in shaping the future of health insurance.

🌍 Epilogue β€” Beyond Predictions

Healthcare analytics isn't just about predicting expenses β€” it's about understanding people. Through data, we uncover patterns that help improve lives, promote healthier choices, and strengthen policy transparency.

β€œData doesn’t just predict costs β€” it reveals the story behind every claim.”

β€” Author β€” Abdullah Umar, Data Science & Analytics Intern at DevelopersHub Corporation


πŸ”— Let's Connect:-


Task 1 Statement:-

Preview


Bright Background Plots Preview:-

Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview Preview


About

πŸ”΄ Predicting Insurance Claim Amounts πŸ”΄ This project analyzes the Medical Cost Personal Insurance Dataset to understand key factors influencing healthcare expenses. Through data cleaning, visualization, and feature engineering, important patterns in age, BMI, smoking, and region were uncovered.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published