Skip to content

UNSUPERVISED_LEARNING

TYehan edited this page Feb 27, 2025 · 1 revision

Core Machine Learning Concepts in the Unsupervised Learning Practical

This document explains the primary machine learning concepts demonstrated in the Unsupervised Learning practical notebook. The notebook showcases customer segmentation using K-Means clustering on a real-world dataset.

1. Data Loading and Preprocessing

  • Dataset Acquisition:
    • The dataset is loaded from a remote CSV URL containing Mall Customers data.
  • Data Structuring:
    • A Pandas DataFrame is created and the necessary features (i.e., "Annual Income (k$)" and "Spending Score (1-100)") are selected for clustering.

2. Cluster Initialization and Evaluation

  • K-Means Clustering:
    • The K-Means algorithm is used to identify the underlying clusters.
  • Range of Clusters Evaluation:
    • A range of cluster values (from 2 to 10) is evaluated.
  • Performance Metrics:
    • Inertia: Measures the sum of squared distances from each point to its cluster center.
    • Silhouette Score: Assesses how similar an object is to its own cluster versus other clusters.
    • Both metrics are printed for each k to help identify the optimal number of clusters.

3. Model Training and Optimizing Clusters

  • Optimal Cluster Selection:
    • Based on the inertia and silhouette scores, the optimal number of clusters is determined (in this case, k=5).
  • Final Model Fitting:
    • A K-Means model with k=5 is trained on the data.
  • Cluster Labeling:
    • Each customer is assigned a cluster label based on the fitted model.

4. Visualization of Clusters

  • Scatter Plot:
    • A two-dimensional scatter plot visualizes customer clusters using "Annual Income (k$)" and "Spending Score (1-100)".
  • Centroid Display:
    • Cluster centroids are highlighted in red to indicate the central point of each cluster.

This practical notebook serves as a comprehensive example of applying unsupervised machine learning techniques, guiding you through data manipulation, cluster evaluation, model training, and visualization.

Clone this wiki locally