Skip to content

MPI crash with large activation meshes in get_microxs_and_flux #3570

@azylstra

Description

@azylstra

Bug Description

When using openmc.deplete.get_microxs_and_flux mesh sizes that are too large generate MPI errors related to an internal broadcast, even when not run using MPI.

Steps to Reproduce

An example to reproduce can be found by modifying the neutronics workshop task 11.3, specifically here. I have modified this file as follows:

  1. On line 93 change regular_mesh = openmc.RegularMesh().from_domain(my_geometry, dimension=1000) to regular_mesh = openmc.RegularMesh().from_domain(my_geometry, dimension=(x,x,x)) where x will change below.
  2. On line 111 add a few energy groups, e.g. change energies=[0,30e6] to energies=[0,10e6,20e6,30e6]

This will run fine with the activation mesh dimension=(10,10,10). However changing to dimension=(100,100,100) crashes as follows:

running neutron transport to activate materials
Authorization required, but no authorization protocol specified
Authorization required, but no authorization protocol specified
                                %%%%%%%%%%%%%%%
                           %%%%%%%%%%%%%%%%%%%%%%%%
                        %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                      %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                    %%%%%%%%%%%%%%%%%%%%%%%%
                                     %%%%%%%%%%%%%%%%%%%%%%%%
                 ###############      %%%%%%%%%%%%%%%%%%%%%%%%
                ##################     %%%%%%%%%%%%%%%%%%%%%%%
                ###################     %%%%%%%%%%%%%%%%%%%%%%%
                ####################     %%%%%%%%%%%%%%%%%%%%%%
                #####################     %%%%%%%%%%%%%%%%%%%%%
                ######################     %%%%%%%%%%%%%%%%%%%%
                #######################     %%%%%%%%%%%%%%%%%%
                 #######################     %%%%%%%%%%%%%%%%%
                 ######################     %%%%%%%%%%%%%%%%%
                  ####################     %%%%%%%%%%%%%%%%%
                    #################     %%%%%%%%%%%%%%%%%
                     ###############     %%%%%%%%%%%%%%%%
                       ############     %%%%%%%%%%%%%%%
                          ########     %%%%%%%%%%%%%%
                                      %%%%%%%%%%%

                 | The OpenMC Monte Carlo Code
       Copyright | 2011-2025 MIT, UChicago Argonne LLC, and contributors
         License | https://docs.openmc.org/en/latest/license.html
         Version | 0.15.3-dev63
     Commit Hash | dc123439c450fab2607399179ff172ebea4884c8
       Date/Time | 2025-09-14 12:18:13
   MPI Processes | 1
  OpenMP Threads | 48

 Reading model XML file 'model.xml' ...
 Reading chain file: /fsx/Software/openmc/depletion/chain_endf_b8.0.xml...
 Reading cross sections XML file...
 Reading Fe56 from /fsx/Software/openmc/endfb-viii.0-hdf5/neutron/Fe56.h5
 Reading Fe57 from /fsx/Software/openmc/endfb-viii.0-hdf5/neutron/Fe57.h5
 Reading Al27 from /fsx/Software/openmc/endfb-viii.0-hdf5/neutron/Al27.h5
 Minimum neutron data temperature: 294 K
 Maximum neutron data temperature: 294 K
 Preparing distributed cell instances...
 Writing summary.h5 file...
 Maximum neutron transport energy: 150000000 eV for Fe56

 ===============>     FIXED SOURCE TRANSPORT SIMULATION     <===============

 Simulating batch 1
 Simulating batch 2
 Simulating batch 3
 Simulating batch 4
 Simulating batch 5
 Simulating batch 6
 Simulating batch 7
 Simulating batch 8
 Simulating batch 9
 Simulating batch 10
 Creating state point statepoint.10.h5...

 =======================>     TIMING STATISTICS     <=======================

 Total time for initialization     = 9.0662e-01 seconds
   Reading cross sections          = 5.0505e-01 seconds
 Total time in simulation          = 2.5169e+01 seconds
   Time in transport only          = 1.8216e+01 seconds
   Time in active batches          = 2.5169e+01 seconds
   Time accumulating tallies       = 2.2075e+00 seconds
   Time writing statepoints        = 4.7443e+00 seconds
 Total time for finalization       = 1.1492e+02 seconds
 Total time elapsed                = 1.4377e+02 seconds
 Calculation Rate (active)         = 39731.8 particles/second

 ============================>     RESULTS     <============================

 Leakage Fraction            = 1.08901 +/- 0.00058

Traceback (most recent call last):
  File "/fsx/users/alex/mcproblem/users/alex/openmc/R2S_bug/3_R2S_regularmesh_based_shutdown_dose_rate.py", line 108, in <module>
    flux_in_each_mesh_voxel, all_micro_xs = openmc.deplete.get_microxs_and_flux(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/.ignite/venvs/mcproblem/lib/python3.11/site-packages/openmc/deplete/microxs.py", line 143, in get_microxs_and_flux
    rr_tally = comm.bcast(rr_tally)
               ^^^^^^^^^^^^^^^^^^^^
  File "src/mpi4py/MPI.src/Comm.pyx", line 2113, in mpi4py.MPI.Comm.bcast
  File "src/mpi4py/MPI.src/msgpickle.pxi", line 767, in mpi4py.MPI.PyMPI_bcast
  File "src/mpi4py/MPI.src/msgpickle.pxi", line 773, in mpi4py.MPI.PyMPI_bcast
mpi4py.MPI.Exception: MPI_ERR_ARG: invalid argument of some other kind

While running the transport the openmc memory usage peaks at ~3.7% so there should be plenty of memory. But why is an MPI broadcast occurring with just one MPI process, and is it possible to catch the case of large tallies to avoid whatever underlying MPI limit is occurring here?

Environment

Ubuntu, openmc 0.15.3, mpi4py 4.0.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions