# Core Machine Learning Concepts in the Housing Price Prediction Practical This document explains the primary machine learning concepts demonstrated in the Housing Price prediction practical notebook. The notebook employs an end-to-end pipeline—from data acquisition to model evaluation—using the California housing dataset. ## 1. Data Loading and Preprocessing - **Dataset Acquisition:** - The notebook uses `fetch_california_housing` from `sklearn.datasets` to load the California housing dataset. - **Data Structuring:** - A Pandas DataFrame is created from the dataset features, and a `target` column is appended representing the median house prices. - **Initial Exploration:** - A preview of the DataFrame is displayed to verify the data structure and to inspect feature values. ## 2. Feature Selection and Splitting - **Feature Selection:** - Two features (`MedInc` and `AveRooms`) are removed from the DataFrame to create the feature matrix *X*. - The target variable *y* is set as the `target` column. - **Train/Test Split:** - The dataset is partitioned into training and test sets using an 80/20 split with a fixed random state, ensuring reliable evaluation on unseen data. ## 3. Model Training - **Linear Regression Model:** - A `LinearRegression` model is instantiated and trained using the training data. - After training, the model is used to predict the housing prices on the test set. ## 4. Model Evaluation - **Metric Computation:** - The notebook calculates several regression metrics: - Mean Absolute Error (MAE) - Mean Squared Error (MSE) - Root Mean Squared Error (RMSE) - R² Score (Coefficient of Determination) - **Visualization:** - A plot is created using matplotlib to graphically compare the evaluation metrics, utilizing a logarithmic scale for clearer visualization. - **Output Display:** - The calculated metrics are printed to provide a quantitative measure of the model’s performance. This practical notebook serves as a comprehensive example of applying fundamental machine learning techniques to a real-world housing dataset for predictive modeling.