Skip to content

Commit 132c0c7

Browse files
committed
minor cleanup / rewrite for conciseness
1 parent 417813a commit 132c0c7

File tree

1 file changed

+85
-70
lines changed

1 file changed

+85
-70
lines changed

doc/examples_sphinx-gallery/stochastic_variability.py

Lines changed: 85 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -5,52 +5,61 @@
55
Stochastic Variability in Community Detection Algorithms
66
=========================================================
77
8-
This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures normalized mutual information (NMI), variation of information (VI), rand index (RI) on both random and structured graphs.
8+
This example demonstrates the use of stochastic community detection methods to check whether a network possesses a strong community structure, and whether the partitionings we obtain are meaningul. Many community detection algorithms are randomized, and return somewhat different results after each run, depending on the random seed that was set. When there is a robust community structure, we expect these results to be similar to each other. When the community structure is weak or non-existent, the results may be noisy and highly variable. We will employ several partion similarity measures to analyse the consistency of the results, including the normalized mutual information (NMI), the variation of information (VI), and the Rand index (RI).
99
1010
"""
1111
# %%
12-
# Import libraries
1312
import igraph as ig
1413
import matplotlib.pyplot as plt
1514
import itertools
15+
import random
1616

1717
# %%
18-
# First, we generate a graph.
19-
# Load the karate club network
18+
# .. note::
19+
# We set a random seed to ensure that the results look exactly the same in
20+
# the gallery. You don't need to do this when exploring randomness.
21+
random.seed(42)
22+
23+
# %%
24+
# We will use Zachary's karate club dataset [1]_, a classic example of a network
25+
# with a strong community structure:
2026
karate = ig.Graph.Famous("Zachary")
2127

2228
# %%
23-
#For the random graph, we use an Erdős-Rényi :math:`G(n, m)` model, where 'n' is the number of nodes
24-
#and 'm' is the number of edges. We set 'm' to match the edge count of the empirical (Karate Club)
25-
#network to ensure structural similarity in terms of connectivity, making comparisons meaningful.
26-
n_nodes = karate.vcount()
27-
n_edges = karate.ecount()
28-
#Generate an Erdős-Rényi graph with the same number of nodes and edges
29-
random_graph = ig.Graph.Erdos_Renyi(n=n_nodes, m=n_edges)
29+
# We will compare it to an an Erdős-Rényi :math:`G(n, m)` random network having
30+
# the same number of vertices and edges. The parameters 'n' and 'm' refer to the
31+
# vertex and edge count, respectively. Since this is a random network, it should
32+
# have no community structure.
33+
random_graph = ig.Graph.Erdos_Renyi(n=karate.vcount(), m=karate.ecount())
3034

3135
# %%
32-
# Now, lets plot the graph to visually understand them.
36+
# First, let us plot the two networks for a visual comparison:
3337

3438
# Create subplots
35-
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
39+
fig, axes = plt.subplots(1, 2, figsize=(12, 6), subplot_kw={'aspect': 'equal'})
3640

37-
# Karate Club Graph
38-
layout_karate = karate.layout("fr")
41+
# Karate club network
3942
ig.plot(
40-
karate, layout=layout_karate, target=axes[0], vertex_size=30, vertex_color="lightblue", edge_width=1,
41-
vertex_label=[str(v.index) for v in karate.vs], vertex_label_size=10
43+
karate, target=axes[0],
44+
vertex_color="lightblue", vertex_size=30,
45+
vertex_label=range(karate.vcount()), vertex_label_size=10,
46+
edge_width=1
4247
)
43-
axes[0].set_title("Karate Club Network")
48+
axes[0].set_title("Karate club network")
4449

45-
# Erdős-Rényi Graph
46-
layout_random = random_graph.layout("fr")
50+
# Random network
4751
ig.plot(
48-
random_graph, layout=layout_random, target=axes[1], vertex_size=30, vertex_color="lightcoral", edge_width=1,
49-
vertex_label=[str(v.index) for v in random_graph.vs], vertex_label_size=10
52+
random_graph, target=axes[1],
53+
vertex_color="lightcoral", vertex_size=30,
54+
vertex_label=range(random_graph.vcount()), vertex_label_size=10,
55+
edge_width=1
5056
)
51-
axes[1].set_title("Erdős-Rényi Random Graph")
57+
axes[1].set_title("Erdős-Rényi random network")
58+
59+
plt.show()
60+
5261
# %%
53-
# Function to compute similarity between partitions
62+
# Function to compute similarity between partitions using various methods:
5463
def compute_pairwise_similarity(partitions, method):
5564
similarities = []
5665

@@ -61,74 +70,80 @@ def compute_pairwise_similarity(partitions, method):
6170
return similarities
6271

6372
# %%
64-
# We have used, stochastic community detection using the Louvain method, iteratively generating partitions and computing similarity metrics to assess stability.
65-
# The Louvain method is a modularity maximization approach for community detection.
66-
# Since exact modularity maximization is NP-hard, the algorithm employs a greedy heuristic that processes vertices in a random order.
67-
# This randomness leads to variations in the detected communities across different runs, which is why results may differ each time the method is applied.
68-
def run_experiment(graph, iterations=50):
69-
partitions = [graph.community_multilevel().membership for _ in range(iterations)]
73+
# The Leiden method, accessible through :meth:`igraph.Graph.community_leiden()`,
74+
# is a modularity maximization approach for community detection. Since exact
75+
# modularity maximization is NP-hard, the algorithm employs a greedy heuristic
76+
# that processes vertices in a random order. This randomness leads to
77+
# variation in the detected communities across different runs, which is why
78+
# results may differ each time the method is applied. The following function
79+
# runs the Leiden algorithm multiple times:
80+
def run_experiment(graph, iterations=100):
81+
partitions = [graph.community_leiden(objective_function='modularity').membership for _ in range(iterations)]
7082
nmi_scores = compute_pairwise_similarity(partitions, method="nmi")
7183
vi_scores = compute_pairwise_similarity(partitions, method="vi")
7284
ri_scores = compute_pairwise_similarity(partitions, method="rand")
7385
return nmi_scores, vi_scores, ri_scores
7486

7587
# %%
76-
# Run experiments
88+
# Run the experiment on both networks:
7789
nmi_karate, vi_karate, ri_karate = run_experiment(karate)
7890
nmi_random, vi_random, ri_random = run_experiment(random_graph)
7991

8092
# %%
81-
# Lastly, lets plot probability density histograms to understand the result.
82-
fig, axes = plt.subplots(3, 2, figsize=(12, 10))
93+
# Finally, let us plot histograms of the pairwise similarities of the obtained
94+
# partitionings to understand the result:
95+
fig, axes = plt.subplots(2, 3, figsize=(12, 6))
8396
measures = [
84-
(nmi_karate, nmi_random, "NMI", 0, 1), # Normalized Mutual Information (0-1, higher = more similar)
85-
(vi_karate, vi_random, "VI", 0, None), # Variation of Information (0+, lower = more similar)
86-
(ri_karate, ri_random, "RI", 0, 1), # Rand Index (0-1, higher = more similar)
97+
# Normalized Mutual Information (0-1, higher = more similar)
98+
(nmi_karate, nmi_random, "NMI", 0, 1),
99+
# Variation of Information (0+, lower = more similar)
100+
(vi_karate, vi_random, "VI", 0, max(vi_karate + vi_random)),
101+
# Rand Index (0-1, higher = more similar)
102+
(ri_karate, ri_random, "RI", 0, 1),
87103
]
88104
colors = ["red", "blue", "green"]
89105

90106
for i, (karate_scores, random_scores, measure, lower, upper) in enumerate(measures):
91-
# Karate Club histogram
92-
axes[i][0].hist(
93-
karate_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black",
94-
density=True # Probability density
107+
# Karate club histogram
108+
axes[0][i].hist(
109+
karate_scores, bins=20, range=(lower, upper),
110+
density=True, # Probability density
111+
alpha=0.7, color=colors[i], edgecolor="black"
95112
)
96-
axes[i][0].set_title(f"Probability Density of {measure} - Karate Club Network")
97-
axes[i][0].set_xlabel(f"{measure} Score")
98-
axes[i][0].set_ylabel("Density")
99-
axes[i][0].set_xlim(lower, upper) # Set axis limits explicitly
100-
101-
# Erdős-Rényi Graph histogram
102-
axes[i][1].hist(
103-
random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black",
104-
density=True
113+
axes[0][i].set_title(f"{measure} - Karate club network")
114+
axes[0][i].set_xlabel(f"{measure} score")
115+
axes[0][i].set_ylabel("PDF")
116+
117+
# Random network histogram
118+
axes[1][i].hist(
119+
random_scores, bins=20, range=(lower, upper), density=True,
120+
alpha=0.7, color=colors[i], edgecolor="black"
105121
)
106-
axes[i][1].set_title(f"Probability Density of {measure} - Erdős-Rényi Graph")
107-
axes[i][1].set_xlabel(f"{measure} Score")
108-
axes[i][1].set_xlim(lower, upper) # Set axis limits explicitly
122+
axes[1][i].set_title(f"{measure} - Random network")
123+
axes[1][i].set_xlabel(f"{measure} score")
124+
axes[0][i].set_ylabel("PDF")
109125

110126
plt.tight_layout()
111127
plt.show()
112128

113129
# %%
114-
# We have compared the probability density of NMI, VI, and RI for the Karate Club network (structured) and an Erdős-Rényi random graph.
130+
# We have compared the pairwise similarities using the NMI, VI, and RI measures
131+
# between partitonings obtained for the karate club network (strong community
132+
# structure) and a comparable random graph (which lacks communities).
115133
#
116-
# **NMI (Normalized Mutual Information):**
117-
#
118-
# - Karate Club Network: The distribution is concentrated near 1, indicating high similarity across multiple runs, suggesting stable community detection.
119-
# - Erdős-Rényi Graph: The values are more spread out, with lower NMI scores, showing inconsistent partitions due to the lack of clear community structures.
134+
# The Normalized Mutual Information (NMI) and Rand Index (RI) both quantify
135+
# similarity, and take values from :math:`[0,1]`. Higher values indicate more
136+
# similar partitionings, with a value of 1 attained when the partitionings are
137+
# identical.
120138
#
121-
# **VI (Variation of Information):**
139+
# The Variation of Information (VI) is a distance measure. It takes values from
140+
# :math:`[0,\infty]`, with lower values indicating higher similarities. Identical
141+
# partitionings have a distance of zero.
122142
#
123-
# - Karate Club Network: The values are low and clustered, indicating stable partitioning with minor variations across runs.
124-
# - Erdős-Rényi Graph: The distribution is broader and shifted toward higher VI values, meaning higher partition variability and less consistency.
125-
#
126-
# **RI (Rand Index):**
127-
#
128-
# - Karate Club Network: The RI values are high and concentrated near 1, suggesting consistent clustering results across multiple iterations.
129-
# - Erdős-Rényi Graph: The distribution is more spread out, but with lower RI values, confirming unstable community detection.
130-
#
131-
# **Conclusion**
132-
#
133-
# The Karate Club Network exhibits strong, well-defined community structures, leading to consistent results across runs.
134-
# The Erdős-Rényi Graph, being random, lacks clear communities, causing high variability in detected partitions.
143+
# For the karate club network, NMI and RI value are concentrated near 1, while
144+
# VI is concentrated near 0, suggesting a robust community structure. In contrast
145+
# the values obtained for the random network are much more spread out, showing
146+
# inconsistent partitionings due to the lack of a clear community structure.
147+
148+
# %%
149+
# .. [1] W. Zachary: "An Information Flow Model for Conflict and Fission in Small Groups". Journal of Anthropological Research 33, no. 4 (1977): 452–73. https://www.jstor.org/stable/3629752

0 commit comments

Comments
 (0)