-
Notifications
You must be signed in to change notification settings - Fork 2
UNSUPERVISED_LEARNING
TYehan edited this page Feb 27, 2025
·
1 revision
This document explains the primary machine learning concepts demonstrated in the Unsupervised Learning practical notebook. The notebook showcases customer segmentation using K-Means clustering on a real-world dataset.
-
Dataset Acquisition:
- The dataset is loaded from a remote CSV URL containing Mall Customers data.
-
Data Structuring:
- A Pandas DataFrame is created and the necessary features (i.e., "Annual Income (k$)" and "Spending Score (1-100)") are selected for clustering.
-
K-Means Clustering:
- The K-Means algorithm is used to identify the underlying clusters.
-
Range of Clusters Evaluation:
- A range of cluster values (from 2 to 10) is evaluated.
-
Performance Metrics:
- Inertia: Measures the sum of squared distances from each point to its cluster center.
- Silhouette Score: Assesses how similar an object is to its own cluster versus other clusters.
- Both metrics are printed for each k to help identify the optimal number of clusters.
-
Optimal Cluster Selection:
- Based on the inertia and silhouette scores, the optimal number of clusters is determined (in this case, k=5).
-
Final Model Fitting:
- A K-Means model with k=5 is trained on the data.
-
Cluster Labeling:
- Each customer is assigned a cluster label based on the fitted model.
-
Scatter Plot:
- A two-dimensional scatter plot visualizes customer clusters using "Annual Income (k$)" and "Spending Score (1-100)".
-
Centroid Display:
- Cluster centroids are highlighted in red to indicate the central point of each cluster.
This practical notebook serves as a comprehensive example of applying unsupervised machine learning techniques, guiding you through data manipulation, cluster evaluation, model training, and visualization.