Fuzzy C-Means (C-Means)

This project applies Fuzzy C-Means (FCM) clustering (via scikit-fuzzy) to the Default of Credit Card Clients dataset using two features:

LIMIT_BAL — credit limit
BILL TOTAL — BILL_AMT1+…+BILL_AMT6

The pipeline:

Load data/credit_card_clients.csv (your exact path).
Create BILL TOTAL.
Scale ['LIMIT_BAL', 'BILL TOTAL'] to [0,1].
Run FCM for a sweep of cluster counts (c = 2..10).
Plot FPC vs c and a grid of mini-scatter plots.
Pick the c with the highest FPC and plot the final clustering.

What is Fuzzy C-Means?

FCM finds $c$ centers $v_i$ and a membership matrix $U=[u_{ik}]$ where each sample $x_k$ has soft memberships:

$u_{ik}\in[0,1]$, and $\sum_{i=1}^c u_{ik}=1$ for each $k$ (every column of $U$ sums to 1).
Unlike K-Means’ hard labels, FCM tells you how much each point belongs to each cluster.

Objective (Bezdek)

Minimize the fuzzy within-cluster SSE with fuzzifier $m>1$ (typical $1.6!-!2.0$):

$$ J_m(U,V)=\sum_{i=1}^{c}\sum_{k=1}^{n} u_{ik}^{,m},\lVert x_k - v_i\rVert^2. $$

Alternating updates (Euclidean)

Centers

$$ v_i=\frac{\sum_{k=1}^{n} u_{ik}^{,m},x_k}{\sum_{k=1}^{n} u_{ik}^{,m}}. $$

Memberships

$$ u_{ik}=\left(\sum_{j=1}^{c}\left(\frac{\lVert x_k-v_i\rVert}{\lVert x_k-v_j\rVert}\right)^{\frac{2}{m-1}}\right)^{-1}. $$

Stopping Stop when $\lVert U^{(t)}-U^{(t-1)}\rVert_{\infty}<\text{error}$ or when maxiter is reached.

Choosing the number of clusters

We report the Fuzzy Partition Coefficient (FPC) per $c$:

$$ \mathrm{FPC}=\frac{1}{n}\sum_{k=1}^{n}\sum_{i=1}^{c} u_{ik}^{,2}. $$

Higher is better. $\mathrm{FPC}\approx 1/c$ means very fuzzy/overlapping partitions.
In this 2-feature run, FPC peaks at $c=2$ and then decreases as $c$ grows.

(Optionally, another index you may see is Xie–Beni:
$\displaystyle \mathrm{XB}=\frac{\sum_{i,k} u_{ik}^{,m}\lVert x_k-v_i\rVert^2}{n,\min_{i\ne j}\lVert v_i-v_j\rVert^2} $; lower is better. If two centers coincide, the denominator $\to 0$ and XB $\to \infty$.)

Results

1) FPC vs number of clusters

How to read:

The curve shows FPC for c = 2..10.
Pick the peak (here it’s at c = 2).
The downward slope after c=2 means adding more clusters makes the partition fuzzier (less crisp separation) for these two features.

2) Cluster grids (c = 2…10)

What you’re seeing:

Each panel is an FCM run for a specific c.
Colors = hard labels from the fuzzy memberships (argmax across clusters).
Black/red squares = cluster centers (in scaled space).
As c increases, the algorithm keeps subdividing the dense region at low limit / low bill totals.
FPC shown in each title steadily declines with c, indicating the split becomes less crisp.

Takeaway: For these two features, few clusters (especially c=2) summarize the structure best. Large c just slices the same mass in arbitrary ways.

3) Final clustering (chosen c by FPC)

Interpretation:

Two broad groups appear in scaled space:
1. smaller limits & small bills;
2. higher limits & larger bills.
The “X” markers are the fuzzy centers.
Remember: points near the boundary have non-trivial memberships in both clusters; colors show the hard label for visualization only.

Practical tips

Always scale features before FCM (Euclidean metric).
If you switch to more features, avoid heavy collinearity (or use a compact subset).
If you ever get FPC ≈ 1/c across all c, that’s a degenerate run (uniform memberships). Rerun with different initializations or adjust features/scale.
Soft memberships are great for: thresholding borderline points, ranking “how typical” a point is for a cluster, and flagging outliers (low max-membership).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
docs		docs
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fuzzy C-Means (C-Means)

What is Fuzzy C-Means?

Objective (Bezdek)

Alternating updates (Euclidean)

Choosing the number of clusters

Results

1) FPC vs number of clusters

2) Cluster grids (c = 2…10)

3) Final clustering (chosen c by FPC)

Practical tips

About

Uh oh!

Releases

Packages

Languages

hasancatalgol/fl-cmeans-clustering

Folders and files

Latest commit

History

Repository files navigation

Fuzzy C-Means (C-Means)

What is Fuzzy C-Means?

Objective (Bezdek)

Alternating updates (Euclidean)

Choosing the number of clusters

Results

1) FPC vs number of clusters

2) Cluster grids (c = 2…10)

3) Final clustering (chosen c by FPC)

Practical tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages