Skip to content

🩺This project uses Random Forest classification to predict liver cirrhosis stages (1, 2, or 3) based on patient records from a Mayo Clinic study. By analyzing key clinical indicators such as bilirubin, albumin, and copper levels, the model supports early diagnosis and medical decision-making with interpretable machine learning.

License

Notifications You must be signed in to change notification settings

SBanditaDas/Liver-Cirrhosis-Stage-Classification-Using-Clinical-Data-And-ML-Algorithms

Repository files navigation


🧾 Liver Cirrhosis Stage Detection – Predicting Disease Progression from Clinical Data

Using Random Forest classification to predict liver cirrhosis stages based on patient data from a Mayo Clinic study (1974–1984).


Python Kaggle License

πŸ“Œ Table of Contents


Overview

This project predicts the histologic stage of liver cirrhosis using clinical data collected from patients over a 10-year period. The pipeline includes data cleaning, encoding, normalization, exploratory analysis, and Random Forest classification β€” all built and executed in a Kaggle kernel.


Project Problem

Early detection of liver cirrhosis progression can improve treatment outcomes. This project aims to:

  • Predict cirrhosis stage (1, 2, or 3) from patient data
  • Identify key clinical indicators of disease severity
  • Support medical decision-making with interpretable ML models

Dataset

  • Source: Mayo Clinic study on primary biliary cirrhosis (1974–1984)

|    |   N_Days | Status   | Drug    |   Age | Sex   | Ascites   | Hepatomegaly   | Spiders   | Edema   |   Bilirubin |   Cholesterol |   Albumin |   Copper |   Alk_Phos |   SGOT |   Tryglicerides |   Platelets |   Prothrombin |   Stage |
|---:|---------:|:---------|:--------|------:|:------|:----------|:---------------|:----------|:--------|------------:|--------------:|----------:|---------:|-----------:|-------:|----------------:|------------:|--------------:|--------:|
|  0 |     2221 | C        | Placebo | 18499 | F     | N         | Y              | N         | N       |         0.5 |           149 |      4.04 |      227 |        598 |  52.7  |              57 |         256 |           9.9 |       1 |
|  1 |     1230 | C        | Placebo | 19724 | M     | Y         | N              | Y         | N       |         0.5 |           219 |      3.93 |       22 |        663 |  45    |              75 |         220 |          10.8 |       2 |
|  2 |     4184 | C        | Placebo | 11839 | F     | N         | N              | N         | N       |         0.5 |           320 |      3.54 |       51 |       1243 | 122.45 |              80 |         225 |          10   |       2 |
|  3 |     2090 | D        | Placebo | 16467 | F     | N         | N              | N         | N       |         0.7 |           255 |      3.74 |       23 |       1024 |  77.5  |              58 |         151 |          10.2 |       2 |
|  4 |     2105 | D        | Placebo | 21699 | F     | N         | Y              | N         | N       |         1.9 |           486 |      3.54 |       74 |       1052 | 108.5  |             109 |         151 |          11.5 |       1 |

Tools & Technologies

  • Python (Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib)
  • Kaggle Kernels (Notebook execution and visualization)
  • GitHub (Version control and portfolio hosting)

Project Structure

liver_cirrhosis_stage_detection/
β”‚
β”œβ”€β”€ README.md
β”œβ”€β”€ liver_cirrhosis.csv                  # Dataset
β”œβ”€β”€ liver_cirrhosis_stage_detection.ipynb  # Kaggle notebook
β”œβ”€β”€ visuals/                             # Plots and charts
β”‚   β”œβ”€β”€ stage_distribution.png
β”‚   β”œβ”€β”€ heatmap.png
β”‚   └── confusion_matrix.png

Data Cleaning & Preparation

  • Encoded categorical features (Sex, Drug, Edema, etc.)
  • Normalized numerical features using StandardScaler
  • Verified absence of missing values
  • Split dataset into training and test sets

Exploratory Data Analysis (EDA)

Stage Distribution:

  • Balanced across stages 1, 2, and 3

Feature Correlations:

  • Strong correlation between Bilirubin, Albumin, and Stage

Visuals:

  • Correlation heatmap
  • Boxplots for key features
  • Stage distribution bar chart


Modeling & Evaluation

  • Model: Random Forest Classifier
  • Accuracy: ~85% on test set
  • Evaluation Metrics:
    • Precision, Recall, F1-score
    • Confusion Matrix
    • Feature Importance

How to Run This Project

  1. Clone the repository:
git clone https://github.com/SBanditaDas/Liver-Cirrhosis-Stage-Detection.git

Author & Contact

Sushree Bandita Das

πŸ“§ Email: sushreebanditadas01@gmail.com

S_Bandita_Das sushree-bandita-das-160651309 SBanditaDas dasbanditasushree


About

🩺This project uses Random Forest classification to predict liver cirrhosis stages (1, 2, or 3) based on patient records from a Mayo Clinic study. By analyzing key clinical indicators such as bilirubin, albumin, and copper levels, the model supports early diagnosis and medical decision-making with interpretable machine learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published