Skip to content

Second-hand car price estimator built with scikit-learn (Linear Reg., Random Forest, Gradient Boosting, Voting ensemble) and wrapped in a Streamlit UI.

Notifications You must be signed in to change notification settings

HamzaHassan9320/autotrader-price-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoTrader Price Regression

Open in Streamlit

Screenshot of the Streamlit UI

Predicting second‑hand car prices with classic tabular ML.
Data  402 006 rows · 12 columns (target =price).
Models  Linear Regression · Random Forest · Gradient Boosting · Voting Ensemble.


Table of contents

  1. Project motivation
  2. Data
  3. Quick start
  4. Notebook & code guide
  5. Results at a glance
  6. Model interpretation
  7. Directory layout

Project motivation

Buying a used car is a price‑sensitive decision.
The goal is to build transparent, reproducible baselines that predict price given mileage, age, fuel type and a handful of categorical descriptors.
Grades in the coursework are not the focus; clean code and solid discussion are.


Data

  • Source AutoTrader extract supplied by Manchester Metropolitan University.
    The licence prohibits redistribution, so the CSV is not committed to this repository.
  • Rows 402 006 Columns 12 (all except price used as predictors).
  • Cleaning steps
    • Trim outliers in mileage & price via 1.5 × IQR.
    • Drop cars registered before 1975.
    • Mode‑impute gaps in fuel_type, body_type, standard_colour.
  • Engineered features
    • vehicle_age = 2024 – year_of_registration
    • mileage_to_age_ratio = mileage / vehicle_age

See notebooks/01_autotrader_walkthrough.ipynb for the exact code.


Quick start

# clone repo
git clone https://github.com/hamzahassan9320/autotrader-price-regression.git
cd autotrader-price-regression

# place the CSV in the expected location
mkdir -p data
cp /path/to/Adverts.csv data/

# set up environment
conda create -n autotrader-price python=3.10
conda activate autotrader-price
pip install -r requirements.txt

# full pipeline
python -m src.train --csv data/Adverts.csv

# run the Streamlit app locally
streamlit run app.py

Tested with Python 3.10 and scikit‑learn 1.3.2.

4 · Notebook & code guide

file purpose
notebooks/01_autotrader_walkthrough.ipynb data snapshot, EDA, demos
src/data.py load + cleanse CSV
src/features.py feature engineering & preprocessing
src/models.py pipelines · param grids · grid‑search helper
src/train.py one‑shot CLI training run; saves models & plots
src/visualise.py regenerates figures in docs/images/

5 · Results at a glance

model CV MAE ↓ Test R²
Linear Regression 1 642 ± 394 0.79
Random Forest 1 831 ± 51 0.90
Gradient Boosting 2 742 ± 95 0.87
Voting Ensemble 1 894 ± 44 0.89

Random Forest brings the best MAE and R² without visible over‑fit.


6 · Model interpretation

  • SHAP beeswarm → global drivers (top features: vehicle_age, mileage).
  • SHAP waterfall → why a single advert (row 39) is priced ± £9 k.
  • Partial dependence → price drops near‑linearly with age; flattening after ~15 yrs hints at a market floor.

All figures live in docs/images/, regenerated by src/visualise.py.


7 · Directory layout

.
├── data/                # <empty> – you add Adverts.csv locally
├── notebooks/           # single exploratory notebook
├── src/                 # reusable code
├── configs/             # YAML config(s)
├── docs/images/         # plots for README
└── requirements.txt

About

Second-hand car price estimator built with scikit-learn (Linear Reg., Random Forest, Gradient Boosting, Voting ensemble) and wrapped in a Streamlit UI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published