MPI Comm cleanup update #124

JoseJVS · 2025-10-15T09:00:44Z

This is an updated version of the latest mpi comm branch from Bruno's fork.
I have cleaned up unrelated benchmarking scripts as well as old automake related scripts and deprecated c++ examples.
I also bumped the version to 2.0.0.

…on rule

…on rule: removing obsolete files

…n rule

…on rule

…gging output still to be removed

…ved unnecessary debugging output

… HPC benchmark with this connection rule

… remote connection creation time

…rce nodes are actually used in the connections

…on of remote connections

… fast building of the maps when the source neurons of a remote connection command are in a sequence of contiguous integers

…rce neurons of a remote connection command are in a sequence of contiguous integers in target host (RemoteConnectSource)

…n the source neurons of a remote connection command are in a sequence of contiguous integers in source host (RemoteConnectTarget)

…en the source neurons of a remote connection command are in a sequence of contiguous integers in source host (RemoteConnectTarget)

…GPU memory allocation, with additional timers for remote connection creation

…onnectDistributedFixedIndegree

… bit-packing compression

…ber of MPI processes with the command SetNHosts(n_hosts)

… should be memorized

…rs needed only for specific choices of the algorithm

…allocating arrays useful for output spike buffer only using the number of local nodes rather than local+image nodes

jhnnsnk

Thanks for this tremendous work! Could you please remove also the raster plots and the data unless it is needed for tests? And the Python examples and tests should be revised.

jhnnsnk

Thank you for this tremendous work! 👍

JoseJVS · 2025-10-31T13:36:00Z

Ping @lucapontisso @gmtiddia

lucapontisso · 2025-10-31T14:06:51Z

Ping @lucapontisso @gmtiddia

Thanks Jose for the huge work!
Sorry for the late reply. I will conclude the review in the weekend.

gmtiddia · 2025-10-31T17:10:16Z

I've just had a look at the code, it is ok for me, Thanks a lot for the huge work!

golosio · 2025-10-31T18:12:07Z

Thank you for all this work Jose! However I have to ask you one more thing, for practical reasons. The python tests, previously in the folder python/test and usually launched with the bash scripts test_all.sh and test_mpi.sh, do not work any more, because data files for the test folder have been moved. I know that this is a temporary solution, because as soon as possible they should be handled in a similar way as the NEST (CPU) tests, however until we have that solution it would be better to keep them working in the old way because they are used after every changes to the code to check that everything is working properly. For the same reason I ask you to put back the all the files that were in the folder python/hpc_benchmark/test/, i.e. in the subfolders data_check, data_check_dfi, test_hpc_benchmark_hg, test_hpc_benchmark_p2p, test_hpc_benchmark_wg, test_hpc_benchmark_wg_dfi, and the files in the Potjans_2014 folder.

JoseJVS · 2025-11-03T13:52:07Z

Dear @golosio I have gone over all tests and I indeed noticed some broken tests however this is not due to re-organizing but partly because of incorrect test design and partly because of outdated tests.

In particular for the different tests of the hpc_benchmark, doing a binary comparison on the produced raster plots will yield differences when using different matplotlib versions.
I have confirmed this to be true both in the mpi_comm_dev2 branch of your repo and this pull request's mpi_comm_clean branch, however the firing rate comparison remains correct.
To summarize, all check.sh scripts that compare against raster plots in the data* directories fail due to different matplotlib versions (at least for my setup), those that compare against raster plots generated by the p2p test succeed (i.e. all raster plots generated with the same setup), and all check2.sh succeed (all firing rate comparisons).
Though, there is a check_kernel_params.sh bash script which attempts to run a non existing python file, but this issue is already present in the mpi_comm_dev2 branch of your repo.

As for other tests, most of the tests found in the new_features directory (previously test_new_features) are outdated and do not execute properly anymore (this is, I believe, with respect to updates on the command for setting neuron parameters).

As for the test_all.sh scripts you mention, previously found in the test directory now in static_tests, I have successfully ran all tests after reorganizing the mpi_comm_clean branch.
I did clean one file I should not have and have restored it with the recent commits I pushed, and confirmed that the rest of the test_* bash scripts correctly work.

Finally the files in the Potjans_2014 directory are only performance information on different timers that from what I found are not used in any test.
Anyhow, performance data should not be used for verification tests as this is is highly volatile and not only depend on the hardware used but are, for any practical purposes, unique to the specific run.
For the purpose of storing performance information we have published our benchmarking data on Zenodo.

If you have issues running any of the tests I confirmed to work for me, I would be happy to discuss and possibly debug with you, however for the moment I did not find any other issue aside from those stated above, and in any case would suggest to fix the outdated tests in another pull request.

golosio · 2025-11-03T16:03:46Z

Thank @JoseJVS, I confirm that all the tests are working for me as well. The only issue that I found is that after upgrading the CUDA toolkits to version 13, I receive the error:
... error: namespace "std" has no member "vector"
unless I include the header in remote_spike.h:

#ifndef REMOTE_SPIKE_H
#define REMOTE_SPIKE_H
#include <vector>

I use the Potjans_2014 script for a quick check that code changes do not affect the simulation time on the same hardware. Concerning the hpc_benchmark tests, I agree, but until we prepare clean tests they are the fastest way to check that different MPI communication schemes are working consistently, while test_hpc_benchmark_wg_dfi was the only test that we had for the distributed fixed_indegree connection rule. But clearly I can keep a script out of the repo for this purpose, if you think it is better.

JoseJVS · 2025-11-03T16:36:52Z

Dear @golosio I have added the <vector> inclusion in remote_spike.h, I guess we will soon need to perform thorough tests on CUDA versions compatibility...

As for the raster plot comparisons, I agree that these can be used as a temporary solution until we develop a proper verification pipeline.
In the case of test_hpc_benchmark_wg_dfi I think it makes sense to use this model as our verification for the distributed fixed indegree rule, however in my previous comment the issue I was alluding to was that precisely in this test directory there is the check_kernel_params.sh script that executes a hpc_benchmark_wg_check_kernel_params.py which does not exist.
Perhaps, you could push this python script to your repo so that I can then include it in this test directory?

As for the Potjans_2014, I agree that having a quick performance reference is handy when developing changes, however that performance data is only applicable to your development workstation and cannot serve as reference to anyone else.
Of course documenting these performance behaviors of our code across different hardware is also important (which is why we publish our data on Zenodo), so as a compromise I would suggest to either add your hardware specification and performance data directly to the readme (and you would have to keep this updated whenever we optimize something) or define a standard for naming these performance files and write this format to the .gitignore so that any developer can produce their own performance references for their corresponding development machines without interfering with the commit history or git status.

golosio · 2025-11-03T17:35:11Z

Thank you again @JoseJVS. The check_kernel_params.sh and hpc_benchmark_wg_check_kernel_params.py were temporary checks and are now obsolete, you did well to remove them. We have to develop some checks for kernel parameter configuration, but I would do it in a separate pull request. About the Potjans_2014, I think that I did not explain myself well: I do not need the data folder, with the raster plot or other data. I would only need the scripts to run the simulation. But if you think that it would be better to keep the scripts in a separate repository, it would be fine. By the way, the folders Potjans_2014_hc and Potjans_2014_s are obsolete and should be removed.

JoseJVS · 2025-11-03T18:37:37Z

Thanks for the quick back and forth @golosio !
I have just pushed a commit where I removed check_kernel_params.sh.
And I also see we have reached an agreement on the data files of the Potjans_2014 model.
For now I think it is fine to leave the scripts of this model in our repo as it is still NEST GPU specific (I did remove the directories Potjans_2014_hc and Potjans_2014_s though).
Once we have unified our interface with PyNEST we can begin the discussion on migrating the model scripts to the dedicated (new) microcircuit repository managed by the NEST Initiative.

lucapontisso · 2025-11-06T17:17:52Z

Dear all,
I tried the new version [JoseJVS:mpi_comm_clean] until commit ec5f6a0
on Leonardo and i receive this error when trying to import nestgpu in python:

GPUassert: invalid device symbol /leonardo/home/userexternal/nest-gpu/src/input_spike_buffer.h 857
terminate called after throwing an instance of 'ngpu_exception'
what(): CUDA error

cuda/12.2
openmpi/4.1.6--gcc--12.2.0-cuda-12.2

with the fork on Bruno's repo (mpi_comm branch) everything is fine with the same environment.

JoseJVS · 2025-11-06T20:02:31Z

Dear @lucapontisso ,

When you mention Bruno's mpi_comm branch do you mean:
this mpi_comm branch

or this mpi_comm_dev2 branch?

The current clean-up branch in this pull request is branched off of mpi_comm_dev2.

lucapontisso · 2025-11-07T10:46:27Z

Dear @lucapontisso ,

When you mention Bruno's mpi_comm branch do you mean: this mpi_comm branch or this mpi_comm_dev2 branch?

The current clean-up branch in this pull request is branched off of mpi_comm_dev2.

Dear @JoseJVS ,

the (https://github.com/golosio/nest-gpu/tree/mpi_comm) branch it works for me on Leonardo.
I just tried the "mpi_comm_dev2" (always on Leonardo) and i receive the error i mentioned before when trying to import nestgpu in python:

GPUassert: invalid device symbol /leonardo/home/userexternal/lpontis1/NEST-GPU-TEST/nest-gpu-bruno-fork/nest-gpu/src/input_spike_buffer.h 857
terminate called after throwing an instance of 'ngpu_exception'
what(): CUDA error
Aborted

lucapontisso · 2025-11-07T11:16:28Z

I understood the problem.
My script was compiling for a slightly different Nvidia architecture than the A100 (sm=89 instead sm=80). It worked anyway until the mpi_comm branch but not with the newest version.
Changing the script and recompiling, everything works.

JoseJVS · 2025-11-07T15:17:28Z

Ah okay, interesting to note @lucapontisso about the sm information.
I was under the impression that the Ampere architecture could handle sm=89, but I guess we can investigate this further on a different PR.
Am I right to understand you also approve of this PR now?

lucapontisso · 2025-11-08T09:37:11Z

Ah okay, interesting to note @lucapontisso about the sm information. I was under the impression that the Ampere architecture could handle sm=89, but I guess we can investigate this further on a different PR. Am I right to understand you also approve of this PR now?

Done!
Thank you again @JoseJVS for the tremendous work :)

JoseJVS · 2025-11-08T11:15:08Z

Thanks for all the reviews!

golosio added 30 commits March 4, 2025 01:05

Implementing fixed indegree distributed across MPI processes connecti…

0afe950

…on rule

Implementing fixed indegree distributed across MPI processes connecti…

cad5a64

…on rule: removing obsolete files

Debugging distributed fixed-indegree connection rule

bf36d47

Fixed bug when debugging distributed fixed-indegree connection rule

f1ed70e

Fixed bug when debugging distributed fixed-indegree connection rule

d712aee

Removed debugging output for distributed fixed-indegree connection rule

987f7c9

Implementing pyhon interface for distributed-fixed-indegree connectio…

0a7a05a

…n rule

Implemented C-pyhon interface for distributed-fixed-indegree connecti…

05168b8

…on rule

Implemented test for distributed-fixed-indegree connection rule, debu…

07801d8

…gging output still to be removed

Completed distributed-fixed-indegree connection rule with tests, remo…

9c4d587

…ved unnecessary debugging output

Fixed bug in new version of copass_sort.h

692ef53

Small fix to test script for distributed fixed-indegree connection rule

df94d8a

Fixed bugs in distributed fixed-indegree connection rule. Implemented…

7baa4f5

… HPC benchmark with this connection rule

Implemented timers to evaluate contribution of different functions to…

e4b1aec

… remote connection creation time

Improved creation of remote connections when all (or most) remote sou…

04c2283

…rce nodes are actually used in the connections

Improved deallocation of temporary allocated GPU memory in the creati…

c1757bc

…on of remote connections

Implemented test of remote node maps. Implementing special method for…

be517b8

… fast building of the maps when the source neurons of a remote connection command are in a sequence of contiguous integers

Implemented special method for fast building of the maps when the sou…

8543eb0

…rce neurons of a remote connection command are in a sequence of contiguous integers in target host (RemoteConnectSource)

Started implementing special method for fast building of the maps whe…

bace5d0

…n the source neurons of a remote connection command are in a sequence of contiguous integers in source host (RemoteConnectTarget)

Finished implementing special method for fast building of the maps wh…

8cd8e7f

…en the source neurons of a remote connection command are in a sequence of contiguous integers in source host (RemoteConnectTarget)

Introduced additional verbosity level 5, lower than that for checkin …

96ac489

…GPU memory allocation, with additional timers for remote connection creation

Print timers also after calibration

34bbe7f

Fixed print timers also after calibration

8283b32

Added timer for inserting source node indexes in host group maps in C…

08a4502

…onnectDistributedFixedIndegree

Implemented bit-packing compression for collective MPI communication

db2cab6

Added counters for spikes sent through MPI_Allgather with and without…

5c738a8

… bit-packing compression

Implemented possibility to use a number of hosts smaller than the num…

4b83f23

…ber of MPI processes with the command SetNHosts(n_hosts)

Fixed synaptic group extraction for deciding whether last spike times…

5eba877

… should be memorized

Improved GPU memory usage by avoiding allocation of nested loop buffe…

e849804

…rs needed only for specific choices of the algorithm

Improved GPU memory usage in case of input-spike-buffer algorithm by …

d5da1a5

…allocating arrays useful for output spike buffer only using the number of local nodes rather than local+image nodes

JoseJVS requested a review from gmtiddia October 15, 2025 09:00

JoseJVS assigned jhnnsnk Oct 15, 2025

jhnnsnk reviewed Oct 15, 2025

View reviewed changes

JoseJVS added 2 commits October 15, 2025 11:27

More detailed cleanup

f9ebbc7

Organized python examples and tests structure

bfc8eb4

jhnnsnk approved these changes Oct 24, 2025

View reviewed changes

Removed another text file

ec5f6a0

gmtiddia approved these changes Oct 31, 2025

View reviewed changes

JoseJVS added 2 commits November 3, 2025 14:20

Finished cleaning up test directories

cf5afd4

Fixed broken path

e10a89d

Added include vector for CUDA 13 compatibility

a3e7f8b

Removed obsolete check_kernel_params.sh

963aa11

lucapontisso approved these changes Nov 8, 2025

View reviewed changes

JoseJVS merged commit 105e807 into nest:nest-gpu-2.0-mpi-comm Nov 8, 2025

MPI Comm cleanup update #124

MPI Comm cleanup update #124

Uh oh!

Conversation

JoseJVS commented Oct 15, 2025

Uh oh!

jhnnsnk left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhnnsnk left a comment

Choose a reason for hiding this comment

Uh oh!

JoseJVS commented Oct 31, 2025

Uh oh!

lucapontisso commented Oct 31, 2025

Uh oh!

gmtiddia commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

golosio commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoseJVS commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

golosio commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoseJVS commented Nov 3, 2025

Uh oh!

golosio commented Nov 3, 2025

Uh oh!

JoseJVS commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucapontisso commented Nov 6, 2025

Uh oh!

JoseJVS commented Nov 6, 2025

Uh oh!

lucapontisso commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucapontisso commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JoseJVS commented Nov 7, 2025

Uh oh!

lucapontisso commented Nov 8, 2025

Uh oh!

JoseJVS commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jhnnsnk left a comment •

edited

Loading

gmtiddia commented Oct 31, 2025 •

edited

Loading

golosio commented Oct 31, 2025 •

edited

Loading

JoseJVS commented Nov 3, 2025 •

edited

Loading

golosio commented Nov 3, 2025 •

edited

Loading

JoseJVS commented Nov 3, 2025 •

edited

Loading

lucapontisso commented Nov 7, 2025 •

edited

Loading

lucapontisso commented Nov 7, 2025 •

edited

Loading