Customer Segmentation Analysis

This project aims to perform customer segmentation based on demographic and transaction data from a wholesale company. Using unsupervised clustering techniques, customers will be grouped into several segments based on shared characteristics. The goal is to better understand customer behavior, allowing the company to maximize the value of each customer and tailor more effective marketing strategies.

🎯 The Challenge

Understanding a diverse customer base is crucial for any business. Treating all customers the same can lead to ineffective marketing and missed opportunities. The challenge is to move beyond a one-size-fits-all approach and identify distinct groups within the customer data.

The goal of this project is to build a clustering model that answers the question: "What are the distinct customer archetypes in our data?" by using customer personality and transaction data. This will enable the business to create targeted campaigns and improve customer satisfaction.

💾 Dataset

The dataset used in this project is "Customer Personality Analysis" available on Kaggle.

Dataset Link

🛠️ Tech Stack

Data Manipulation & Analysis: pandas, numpy
Data Visualization: matplotlib, seaborn, plotly
Machine Learning: scikit-learn, scikit-learn-extra, yellowbrick
Environment: Jupyter Notebook

🚀 Getting Started

Clone the repository or download the project files.

Create and activate a virtual environment (recommended):

# Create the environment
python -m venv .venv

# Activate on Windows (PowerShell)
.\.venv\Scripts\Activate.ps1

# Activate on macOS/Linux
source .venv/bin/activate

Install the required dependencies from within the notebook: Run the first few cells in the notebook to install scikit-learn-extra and yellowbrick.
Launch Jupyter Notebook and open notebook.ipynb.

📊 Analysis Workflow

The analysis in this notebook follows these steps:

Data Cleaning:
- Handling missing values in the Income column by removing the corresponding rows.
Feature Engineering:
- Created new features like Customer_For, Age, Spent, Living_With, Children, Family_Size, and Is_Parent to better represent customer characteristics.
- Simplified the Education and Marital_Status categories.
- Removed redundant or irrelevant features.
Outlier Handling:
- Removing outliers in the Age and Income features to improve model quality.
Data Preprocessing:
- Label Encoding: Converting categorical features (Education, Living_With) into a numerical format.
- Standard Scaling: Scaling all numerical features to have a uniform distribution.
Dimensionality Reduction:
- Using Principal Component Analysis (PCA) to reduce the data's dimensions to 3 principal components, which capture most of the data's variance.
Clustering Modeling:
- Elbow Method: Using KElbowVisualizer with a K-Means model to determine the optimal number of clusters (k). The result shows that k=4 is the best number of clusters.
- K-Medoids: Applying the K-Medoids algorithm to group the data into the 4 determined segments.
Visualization:
- Creating a 3D scatter plot visualization of the clustering results to see the distribution of each customer segment.

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.gitignore		.gitignore
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Segmentation Analysis

🎯 The Challenge

💾 Dataset

🛠️ Tech Stack

🚀 Getting Started

📊 Analysis Workflow

📄 License

About

Uh oh!

Uh oh!

Languages

sandiindika/Grocery-Customer-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation Analysis

🎯 The Challenge

💾 Dataset

🛠️ Tech Stack

🚀 Getting Started

📊 Analysis Workflow

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages