This project presents a full-stack machine learning pipeline to forecast Air Quality Index (AQI) using:
- Multi-step pollutant prediction via Echo State Networks (ESNs)
- Future AQI classification using XGBoost
- Experiment tracking and metric logging with MLflow
- Predict AQI categories (
0–4
, mapped to Good → Very Poor) for next 6 time steps - Use previously forecasted pollutant values to make these predictions
- Log and compare all results using MLflow
- Built individual Echo State Network models per pollutant (
pm2.5
,pm10
,co
, etc.) - Trained models can be found inside "models" file
- Forecasted each pollutant 12 steps ahead using delay-embedded temporal inputs
- Visualized each forecast vs ground truth
- Used predicted pollutant values as inputs
- Trained multi-output XGBoost classifier to predict AQI levels at
t+1
tot+6
- Labels categorized into AQI classes:
Good (0)
toVery Poor (4)
- Logged all training metrics (
f1_macro
,precision
,recall
,accuracy
) for each step - Saved classification reports and confusion matrices
- Auto-generated CSV summary of all MLflow runs
The dataset consists of pollution measurements for various Indian cities, with features:
date, aqi, co, no, no2, o3, so2, pm2_5, pm10, nh3
- Address class imbalance using:
- SMOTE / RandomOverSampler
- Class-weighted loss
📦 MLflow Run Summary
- mlflow_summary_baseline.csv
- Includes metrics, params, tags for every model run.