Skip to content

Conversation

@foraxe
Copy link

@foraxe foraxe commented Dec 12, 2025

Problem

Running perftest with -d fp16 or -d bf16 fails with:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream'
AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream'

Root Cause

collective_on_buffer constructs binding function names using the user-provided dtype directly (e.g., fp16_sum_reduce_on_stream), but the actual bindings use:

  • half for fp16

  • bfloat16 for bf16

Solution

Added a dtype alias mapping in nvshmem4py/nvshmem/core/collective.py to normalize user-friendly shorthand names to their binding-compatible equivalents:

  • fp16half

  • bf16bfloat16

Changes

  • nvshmem4py/nvshmem/core/collective.py: Added a dtype_aliases dict and applied normalization before constructing binding function names.

Testing

fp16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \
 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \
mpirun -np 4 -N 4 --bind-to none \
  python nvshmem4py/perftest/reduction_on_stream.py \
    -b 2M -e 2M -f 2 \
    -d fp16 -o sum \
    -w 5 -n 20

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'fp16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'?

Now:

size(B)     count       type      latency(us)       min_lat(us)       max_lat(us)       algbw(GB/s)    busbw(GB/s)
2097152     1048576     half-sum  27.4416002        25.088            33.824            76.422         114.634

bf16

Command:

OMPI_ALLOW_RUN_AS_ROOT=1 \
 OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1 \
mpirun -np 4 -N 4 --bind-to none \
  python nvshmem4py/perftest/reduction_on_stream.py \
    -b 2M -e 2M -f 2 \
    -d bf16 -o sum \
    -w 5 -n 20

Former error:

AttributeError: module 'nvshmem.bindings' has no attribute 'bf16_sum_reduce_on_stream'. Did you mean: 'int16_sum_reduce_on_stream'?

Now:

size(B)     count       type           latency(us)       min_lat(us)       max_lat(us)       algbw(GB/s)    busbw(GB/s)
2097152     1048576     bfloat16-sum   34.4000001        25.280            147.360           60.964         91.446

Known Issue (Out of Scope)

The following dtypes are listed in the perftest argument parser choices but have no corresponding bindings:

  • ulonglong — missing ulonglong_*_reduce_on_stream bindings

  • ptrdiff — missing ptrdiff_*_reduce_on_stream bindings

These will fail at runtime if used. Consider adding bindings or removing these from the supported choices in a future PR.

Dtype Compatibility Analysis

User Input Binding Function Exists? Notes
int int_sum_reduce_on_stream
int32 int32_sum_reduce_on_stream
uint32 uint32_sum_reduce_on_stream
int64 int64_sum_reduce_on_stream
uint64 uint64_sum_reduce_on_stream
long long_sum_reduce_on_stream
longlong longlong_sum_reduce_on_stream
ulonglong Missing: no ulonglong_sum_reduce_on_stream
size size_sum_reduce_on_stream
ptrdiff Missing: no ptrdiff_sum_reduce_on_stream
float float_sum_reduce_on_stream
double double_sum_reduce_on_stream
fp16 Maps to half_sum_reduce_on_stream (with this fix)
bf16 Maps to bfloat16_sum_reduce_on_stream (with this fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant