Skip to content

MeshGraphNet Performance: Automaticaly Use transformer engine for LayerNorm. #1036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Aug 4, 2025

Conversation

coreyjadams
Copy link
Collaborator

PhysicsNeMo Pull Request

Description

This PR provides performance enhancements for mesh graph net on GPUs by automatically using transformer engine for layer norm. Here's what happens:

  • The MeshGraphMLP modules used to have a parameter to select which norm backend to use - the default was pytorch.
  • Switching between the two backends (after a save and restore, for example), previously, caused issues because transformer engine adds an extra component of the state dict for fp8 normalizations.
  • Now, physicsnemo has a dynamic layer norm layer - use TE if available, and fall back to torch. This also handles the state dict differences to ensure the results are consistent after reload. Since MGN does not use fp8, it doesn't affect anything.

A note about test changes: transformer engine is not supported on CPU. So the tests are modified to let physicsnemo select the layernorm backend optimally for performance, except if the test is a CPU test: then it is forced to torch instead of TE. MGN tests therefore use both torch and TE backends during testing, and both show agreement after restoring from file.

On synthetic data, the performance improvement is pretty good on large graphs. I measured with PyG and DGL, up to 200k nodes and up to 1M+ edges (more goes out of memory). At small graphs, the performance is similar. For large graphs, transformer engine is better - especially during training.

Additionally, PyG is faster than DGL in all measurements.

Including performance measurements here in the PR for posterity.

Float 32

Training
image

Inference
image

Float16

Training
image

Inference
image

BFloat16

Training
image

Inference
image

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

@coreyjadams coreyjadams added ! - Release PRs or Issues releating to a release 3 - Ready for Review Ready for review by team labels Jul 29, 2025
@coreyjadams coreyjadams self-assigned this Jul 29, 2025
@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

FYI - this PR looks bigger than it is. Many tests are updated but most are not really changed: I have some env variables to force the torch version of layernorm on tests that explicitly use CPU. This prevents transformer engine from being used on CPU models.

There are new additional tests for layer norm, however, that should get reviewed before merge.

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

Update docstring to use torch layernorm (for CPU tests).
Disable TE for docstring tests.
@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

Copy link
Collaborator

@Alexey-Kamenev Alexey-Kamenev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments!

coreyjadams and others added 2 commits August 1, 2025 13:48
- remove warnings about deprecation
- add warning if env variable PHYSICSNEMO_FORCE_TE is set, but to
  an unexpected value.
@NickGeneva
Copy link
Collaborator

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams coreyjadams merged commit 0dde722 into NVIDIA:main Aug 4, 2025
1 check passed
@coreyjadams coreyjadams deleted the te-ln branch August 4, 2025 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team ! - Release PRs or Issues releating to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants