Skip to content

[BUG] Rendering restarts until timeout for huge graphs #729

@ltcompounder

Description

@ltcompounder

Describe the bug
If I plotted a graph with 95,157 nodes and 12,000,000 edges, graphistry will attempt to render, but towards the completion of the render, it will "refresh" the process and start again. This cycle repeats itself until it enters "herding stray GPUs" and then it goes to timeout.

I would like to add that I tried this for 1,000 nodes and 2,000 edges and does behave the same way. If I tried this at a different time it might work.

To Reproduce
Code, including data, than can be run without editing:

import matplotlib.cm as cm
import matplotlib.colors as mcolors
import pandas as pd
import graphistry
#graphistry.register(api=3, username='...', password='...')

num_nodes = 95157
num_edges = 12000000
num_clusters = 744

print(f"Generating a sparse graph with {num_nodes} nodes and {num_edges} edges...")

rows = np.random.randint(0, num_nodes, num_edges)
cols = np.random.randint(0, num_nodes, num_edges)
data = np.ones(num_edges, dtype=int)

adj_matrix_csr = sp.csr_matrix((data, (rows, cols)), shape=(num_nodes, num_nodes))
adj_matrix_csr.setdiag(0)
adj_matrix_csr.eliminate_zeros()
source_nodes, target_nodes = adj_matrix_csr.nonzero()

edges_df = pd.DataFrame({
    'source': source_nodes,
    'destination': target_nodes,
    'edge_weight': 1
})

edges_df.drop_duplicates(subset=['source', 'destination'], inplace=True)

print(f"Generated {len(edges_df)} unique edges.")

cluster_labels = np.random.randint(0, num_clusters, num_nodes)

nodes_df = pd.DataFrame({
    'node': np.arange(num_nodes),
    'type': cluster_labels
})

print(f"Generated {len(nodes_df)} nodes with {744} cluster labels.")

print("Binding data to PyGraphistry and plotting...")

g = graphistry.edges(edges_df, 'source', 'destination').nodes(nodes_df, 'node')

cmap = cm.get_cmap('viridis', 744) # 'hsv' or 'rainbow' are good for max distinctness, but not perceptually uniform
colors_list = [mcolors.rgb2hex(cmap(i)) for i in range(744)]

custom_cluster_colors_all = {i: colors_list[i] for i in range(744)}

g = g.encode_point_color(
    'type',
    categorical_mapping=custom_cluster_colors_all
).plot()

print("Plotting command issued. Check your browser or Jupyter output for the visualization.")
g

Expected behavior
The graph should render with the colored nodes.

Actual behavior
The rendering process keeps restarting when it's almost completed.

Screenshots

graphistry-timeout.mov

Browser environment (please complete the following information):

  • OS: Mac OS
  • Browser chrome, firefox
  • Version 138 chrome

Graphistry GPU server environment

  • Where run, Hub

PyGraphistry API client environment

  • Where run Graphistry 2.43.4 Jupyter Notebook 4.3.6
  • Version 0.41.0
  • Python Version Python 3.13.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions