Using Random Forest classification to predict liver cirrhosis stages based on patient data from a Mayo Clinic study (1974β1984).
- Overview
- Project Problem
- Dataset
- Tools & Technologies
- Project Structure
- Data Cleaning & Preparation
- Exploratory Data Analysis (EDA)
- Modeling & Evaluation
- How to Run This Project
- Author & Contact
This project predicts the histologic stage of liver cirrhosis using clinical data collected from patients over a 10-year period. The pipeline includes data cleaning, encoding, normalization, exploratory analysis, and Random Forest classification β all built and executed in a Kaggle kernel.
Early detection of liver cirrhosis progression can improve treatment outcomes. This project aims to:
- Predict cirrhosis stage (1, 2, or 3) from patient data
- Identify key clinical indicators of disease severity
- Support medical decision-making with interpretable ML models
- Source: Mayo Clinic study on primary biliary cirrhosis (1974β1984)
| | N_Days | Status | Drug | Age | Sex | Ascites | Hepatomegaly | Spiders | Edema | Bilirubin | Cholesterol | Albumin | Copper | Alk_Phos | SGOT | Tryglicerides | Platelets | Prothrombin | Stage |
|---:|---------:|:---------|:--------|------:|:------|:----------|:---------------|:----------|:--------|------------:|--------------:|----------:|---------:|-----------:|-------:|----------------:|------------:|--------------:|--------:|
| 0 | 2221 | C | Placebo | 18499 | F | N | Y | N | N | 0.5 | 149 | 4.04 | 227 | 598 | 52.7 | 57 | 256 | 9.9 | 1 |
| 1 | 1230 | C | Placebo | 19724 | M | Y | N | Y | N | 0.5 | 219 | 3.93 | 22 | 663 | 45 | 75 | 220 | 10.8 | 2 |
| 2 | 4184 | C | Placebo | 11839 | F | N | N | N | N | 0.5 | 320 | 3.54 | 51 | 1243 | 122.45 | 80 | 225 | 10 | 2 |
| 3 | 2090 | D | Placebo | 16467 | F | N | N | N | N | 0.7 | 255 | 3.74 | 23 | 1024 | 77.5 | 58 | 151 | 10.2 | 2 |
| 4 | 2105 | D | Placebo | 21699 | F | N | Y | N | N | 1.9 | 486 | 3.54 | 74 | 1052 | 108.5 | 109 | 151 | 11.5 | 1 |
- Python (Pandas, NumPy, Scikit-learn, Seaborn, Matplotlib)
- Kaggle Kernels (Notebook execution and visualization)
- GitHub (Version control and portfolio hosting)
liver_cirrhosis_stage_detection/
β
βββ README.md
βββ liver_cirrhosis.csv # Dataset
βββ liver_cirrhosis_stage_detection.ipynb # Kaggle notebook
βββ visuals/ # Plots and charts
β βββ stage_distribution.png
β βββ heatmap.png
β βββ confusion_matrix.png
- Encoded categorical features (Sex, Drug, Edema, etc.)
- Normalized numerical features using
StandardScaler - Verified absence of missing values
- Split dataset into training and test sets
Stage Distribution:
- Balanced across stages 1, 2, and 3
Feature Correlations:
- Strong correlation between Bilirubin, Albumin, and Stage
Visuals:
- Correlation heatmap
- Boxplots for key features
- Stage distribution bar chart
- Model: Random Forest Classifier
- Accuracy: ~85% on test set
- Evaluation Metrics:
- Precision, Recall, F1-score
- Confusion Matrix
- Feature Importance
- Clone the repository:
git clone https://github.com/SBanditaDas/Liver-Cirrhosis-Stage-Detection.git-
Open the notebook in Kaggle or Jupyter: π liver_cirrhosis_stage_detection.ipynb
-
Run all cells to reproduce results and visuals
Sushree Bandita Das
π§ Email: sushreebanditadas01@gmail.com


