Skip to content

Communication and compute on separate Streams do not overlap #64

@garrett361

Description

@garrett361

Cross-posting this issue from ipex, in case the torch-ccl team is not aware of it.

Key issues:

  • Compute and collective communications do not overlap on intel GPU devices
  • Collectives block the host thread, rather than launching a kernel and immediately returning (as on NVIDIA devices)

The pytorch profiler traces highlight the issues (copied from the other thread):

A100 Trace

nvidia_a100_trace

Non-blocking kernel launch and comms/compute overlap.

Intel Max 1550 Trace

intel_1550_trace

Blocking kernel launch and no comms/compute overlap.

See the other thread for more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions