minor cleanup / rewrite for conciseness

szhorvat · szhorvat · commit 132c0c79c0ad · 2025-07-20T13:34:41.000+08:00
diff --git a/doc/examples_sphinx-gallery/stochastic_variability.py b/doc/examples_sphinx-gallery/stochastic_variability.py
@@ -5,52 +5,61 @@
 Stochastic Variability in Community Detection Algorithms
 =========================================================
 
-This example demonstrates the variability of stochastic community detection methods by analyzing the consistency of multiple partitions using similarity measures normalized mutual information (NMI), variation of information (VI), rand index (RI) on both random and structured graphs.
+This example demonstrates the use of stochastic community detection methods to check whether a network possesses a strong community structure, and whether the partitionings we obtain are meaningul. Many community detection algorithms are randomized, and return somewhat different results after each run, depending on the random seed that was set. When there is a robust community structure, we expect these results to be similar to each other. When the community structure is weak or non-existent, the results may be noisy and highly variable. We will employ several partion similarity measures to analyse the consistency of the results, including the normalized mutual information (NMI), the variation of information (VI), and the Rand index (RI).
 
 """
 # %%
-# Import libraries
 import igraph as ig
 import matplotlib.pyplot as plt
 import itertools
+import random
 
 # %%
-# First, we generate a graph.
-# Load the karate club network
+# .. note::
+#   We set a random seed to ensure that the results look exactly the same in
+#   the gallery. You don't need to do this when exploring randomness.
+random.seed(42)
+
+# %%
+# We will use Zachary's karate club dataset [1]_, a classic example of a network
+# with a strong community structure:
 karate = ig.Graph.Famous("Zachary")
   
 # %%
-#For the random graph, we use an Erdős-Rényi :math:`G(n, m)` model, where 'n' is the number of nodes 
-#and 'm' is the number of edges. We set 'm' to match the edge count of the empirical (Karate Club) 
-#network to ensure structural similarity in terms of connectivity, making comparisons meaningful.
-n_nodes = karate.vcount()
-n_edges = karate.ecount()
-#Generate an Erdős-Rényi graph with the same number of nodes and edges
-random_graph = ig.Graph.Erdos_Renyi(n=n_nodes, m=n_edges)
+# We will compare it to an an Erdős-Rényi :math:`G(n, m)` random network having
+# the same number of vertices and edges. The parameters 'n' and 'm' refer to the
+# vertex and edge count, respectively. Since this is a random network, it should
+# have no community structure.
+random_graph = ig.Graph.Erdos_Renyi(n=karate.vcount(), m=karate.ecount())
 
 # %%
-# Now, lets plot the graph to visually understand them.
+# First, let us plot the two networks for a visual comparison:
 
 # Create subplots
-fig, axes = plt.subplots(1, 2, figsize=(12, 6))
+fig, axes = plt.subplots(1, 2, figsize=(12, 6), subplot_kw={'aspect': 'equal'})
 
-# Karate Club Graph
-layout_karate = karate.layout("fr")
+# Karate club network
 ig.plot(
-    karate, layout=layout_karate, target=axes[0], vertex_size=30, vertex_color="lightblue", edge_width=1,
-    vertex_label=[str(v.index) for v in karate.vs], vertex_label_size=10
+    karate, target=axes[0], 
+    vertex_color="lightblue", vertex_size=30,
+    vertex_label=range(karate.vcount()), vertex_label_size=10,
+    edge_width=1
 )
-axes[0].set_title("Karate Club Network")
+axes[0].set_title("Karate club network")
 
-# Erdős-Rényi Graph
-layout_random = random_graph.layout("fr")
+# Random network
 ig.plot(
-    random_graph, layout=layout_random, target=axes[1], vertex_size=30, vertex_color="lightcoral", edge_width=1,
-    vertex_label=[str(v.index) for v in random_graph.vs], vertex_label_size=10
+    random_graph, target=axes[1], 
+    vertex_color="lightcoral", vertex_size=30,
+    vertex_label=range(random_graph.vcount()), vertex_label_size=10,
+    edge_width=1
 )
-axes[1].set_title("Erdős-Rényi Random Graph")
+axes[1].set_title("Erdős-Rényi random network")
+
+plt.show()
+
 # %%
-# Function to compute similarity between partitions
+# Function to compute similarity between partitions using various methods:
 def compute_pairwise_similarity(partitions, method):
     similarities = []
     
@@ -61,74 +70,80 @@ def compute_pairwise_similarity(partitions, method):
     return similarities
 
 # %%
-# We have used, stochastic community detection using the Louvain method, iteratively generating partitions and computing similarity metrics to assess stability.
-# The Louvain method is a modularity maximization approach for community detection. 
-# Since exact modularity maximization is NP-hard, the algorithm employs a greedy heuristic that processes vertices in a random order. 
-# This randomness leads to variations in the detected communities across different runs, which is why results may differ each time the method is applied.
-def run_experiment(graph, iterations=50):
-    partitions = [graph.community_multilevel().membership for _ in range(iterations)]
+# The Leiden method, accessible through :meth:`igraph.Graph.community_leiden()`,
+# is a modularity maximization approach for community detection.  Since exact
+# modularity maximization is NP-hard, the algorithm employs a greedy heuristic
+# that processes vertices in a random order.  This randomness leads to
+# variation in the detected communities across different runs, which is why
+# results may differ each time the method is applied. The following function
+# runs the Leiden algorithm multiple times:
+def run_experiment(graph, iterations=100):
+    partitions = [graph.community_leiden(objective_function='modularity').membership for _ in range(iterations)]
     nmi_scores = compute_pairwise_similarity(partitions, method="nmi")
     vi_scores = compute_pairwise_similarity(partitions, method="vi")
     ri_scores = compute_pairwise_similarity(partitions, method="rand")
     return nmi_scores, vi_scores, ri_scores
 
 # %%
-# Run experiments
+# Run the experiment on both networks:
 nmi_karate, vi_karate, ri_karate = run_experiment(karate)
 nmi_random, vi_random, ri_random = run_experiment(random_graph)
 
 # %%
-# Lastly, lets plot probability density histograms to understand the result.
-fig, axes = plt.subplots(3, 2, figsize=(12, 10))
+# Finally, let us plot histograms of the pairwise similarities of the obtained
+# partitionings to understand the result:
+fig, axes = plt.subplots(2, 3, figsize=(12, 6))
 measures = [
-    (nmi_karate, nmi_random, "NMI", 0, 1),  # Normalized Mutual Information (0-1, higher = more similar)
-    (vi_karate, vi_random, "VI", 0, None),  # Variation of Information (0+, lower = more similar)
-    (ri_karate, ri_random, "RI", 0, 1),  # Rand Index (0-1, higher = more similar)
+    # Normalized Mutual Information (0-1, higher = more similar)
+    (nmi_karate, nmi_random, "NMI", 0, 1),
+    # Variation of Information (0+, lower = more similar)
+    (vi_karate, vi_random, "VI", 0, max(vi_karate + vi_random)),
+    # Rand Index (0-1, higher = more similar)
+    (ri_karate, ri_random, "RI", 0, 1),
 ]
 colors = ["red", "blue", "green"]
 
 for i, (karate_scores, random_scores, measure, lower, upper) in enumerate(measures):
-    # Karate Club histogram
-    axes[i][0].hist(
-        karate_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black",
-        density=True  # Probability density
+    # Karate club histogram
+    axes[0][i].hist(
+        karate_scores, bins=20, range=(lower, upper),
+        density=True,  # Probability density
+        alpha=0.7, color=colors[i], edgecolor="black"
     )
-    axes[i][0].set_title(f"Probability Density of {measure} - Karate Club Network")
-    axes[i][0].set_xlabel(f"{measure} Score")
-    axes[i][0].set_ylabel("Density")
-    axes[i][0].set_xlim(lower, upper)  # Set axis limits explicitly
-
-    # Erdős-Rényi Graph histogram
-    axes[i][1].hist(
-        random_scores, bins=20, alpha=0.7, color=colors[i], edgecolor="black",
-        density=True
+    axes[0][i].set_title(f"{measure} - Karate club network")
+    axes[0][i].set_xlabel(f"{measure} score")
+    axes[0][i].set_ylabel("PDF")
+
+    # Random network histogram
+    axes[1][i].hist(
+        random_scores, bins=20, range=(lower, upper), density=True,
+        alpha=0.7, color=colors[i], edgecolor="black"
     )
-    axes[i][1].set_title(f"Probability Density of {measure} - Erdős-Rényi Graph")
-    axes[i][1].set_xlabel(f"{measure} Score")
-    axes[i][1].set_xlim(lower, upper)  # Set axis limits explicitly
+    axes[1][i].set_title(f"{measure} - Random network")
+    axes[1][i].set_xlabel(f"{measure} score")
+    axes[0][i].set_ylabel("PDF")
 
 plt.tight_layout()
 plt.show()
 
 # %%
-# We have compared the probability density of NMI, VI, and RI for the Karate Club network (structured) and an Erdős-Rényi random graph.
+# We have compared the pairwise similarities using the NMI, VI, and RI measures
+# between partitonings obtained for the karate club network (strong community
+# structure) and a comparable random graph (which lacks communities).
 #
-# **NMI (Normalized Mutual Information):**
-# 
-# - Karate Club Network: The distribution is concentrated near 1, indicating high similarity across multiple runs, suggesting stable community detection.
-# - Erdős-Rényi Graph: The values are more spread out, with lower NMI scores, showing inconsistent partitions due to the lack of clear community structures.
+# The Normalized Mutual Information (NMI) and Rand Index (RI) both quantify
+# similarity, and take values from :math:`[0,1]`. Higher values indicate more
+# similar partitionings, with a value of 1 attained when the partitionings are
+# identical.
 #
-# **VI (Variation of Information):**
+# The Variation of Information (VI) is a distance measure. It takes values from
+# :math:`[0,\infty]`, with lower values indicating higher similarities. Identical
+# partitionings have a distance of zero.
 #
-# - Karate Club Network: The values are low and clustered, indicating stable partitioning with minor variations across runs.
-# - Erdős-Rényi Graph: The distribution is broader and shifted toward higher VI values, meaning higher partition variability and less consistency.
-#
-# **RI (Rand Index):**
-#
-# - Karate Club Network: The RI values are high and concentrated near 1, suggesting consistent clustering results across multiple iterations.
-# - Erdős-Rényi Graph: The distribution is more spread out, but with lower RI values, confirming unstable community detection.
-#
-# **Conclusion**
-# 
-# The Karate Club Network exhibits strong, well-defined community structures, leading to consistent results across runs.  
-# The Erdős-Rényi Graph, being random, lacks clear communities, causing high variability in detected partitions.
+# For the karate club network, NMI and RI value are concentrated near 1, while
+# VI is concentrated near 0, suggesting a robust community structure. In contrast
+# the values obtained for the random network are much more spread out, showing
+# inconsistent partitionings due to the lack of a clear community structure.
+
+# %%
+# .. [1] W. Zachary: "An Information Flow Model for Conflict and Fission in Small Groups". Journal of Anthropological Research 33, no. 4 (1977): 452–73. https://www.jstor.org/stable/3629752