MeshGraphNet Performance: Automaticaly Use transformer engine for LayerNorm. #1036

coreyjadams · 2025-07-29T19:36:23Z

PhysicsNeMo Pull Request

Description

This PR provides performance enhancements for mesh graph net on GPUs by automatically using transformer engine for layer norm. Here's what happens:

The MeshGraphMLP modules used to have a parameter to select which norm backend to use - the default was pytorch.
Switching between the two backends (after a save and restore, for example), previously, caused issues because transformer engine adds an extra component of the state dict for fp8 normalizations.
Now, physicsnemo has a dynamic layer norm layer - use TE if available, and fall back to torch. This also handles the state dict differences to ensure the results are consistent after reload. Since MGN does not use fp8, it doesn't affect anything.

A note about test changes: transformer engine is not supported on CPU. So the tests are modified to let physicsnemo select the layernorm backend optimally for performance, except if the test is a CPU test: then it is forced to torch instead of TE. MGN tests therefore use both torch and TE backends during testing, and both show agreement after restoring from file.

On synthetic data, the performance improvement is pretty good on large graphs. I measured with PyG and DGL, up to 200k nodes and up to 1M+ edges (more goes out of memory). At small graphs, the performance is similar. For large graphs, transformer engine is better - especially during training.

Additionally, PyG is faster than DGL in all measurements.

Including performance measurements here in the PR for posterity.

Float 32

Training

Inference

Float16

Training

Inference

BFloat16

Training

Inference

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

… training speed up on GPU.

coreyjadams · 2025-07-29T19:37:14Z

/blossom-ci

physicsnemo/models/gnn_layers/mesh_graph_mlp.py

test/distributed/distributed_utils_for_testing.py

test/models/common/optimization.py

test/models/conftest.py

coreyjadams · 2025-07-29T19:43:35Z

FYI - this PR looks bigger than it is. Many tests are updated but most are not really changed: I have some env variables to force the torch version of layernorm on tests that explicitly use CPU. This prevents transformer engine from being used on CPU models.

There are new additional tests for layer norm, however, that should get reviewed before merge.

coreyjadams · 2025-07-29T21:27:36Z

/blossom-ci

coreyjadams · 2025-07-30T13:59:56Z

/blossom-ci

coreyjadams · 2025-07-30T14:42:16Z

/blossom-ci

Update docstring to use torch layernorm (for CPU tests).

Disable TE for docstring tests.

coreyjadams · 2025-07-31T13:33:55Z

/blossom-ci

coreyjadams · 2025-07-31T14:54:46Z

/blossom-ci

coreyjadams · 2025-07-31T17:36:24Z

/blossom-ci

Alexey-Kamenev

LGTM with minor comments!

physicsnemo/models/gnn_layers/mesh_graph_mlp.py

physicsnemo/models/layers/layer_norm.py

physicsnemo/models/meshgraphnet/meshgraphkan.py

test/models/meshgraphnet/test_meshgraphnet.py

- remove warnings about deprecation - add warning if env variable PHYSICSNEMO_FORCE_TE is set, but to an unexpected value.

NickGeneva · 2025-08-01T20:36:47Z

/blossom-ci

coreyjadams · 2025-08-01T21:51:20Z

/blossom-ci

coreyjadams · 2025-08-01T23:05:21Z

/blossom-ci

coreyjadams and others added 7 commits July 16, 2025 09:09

adding layer norm utils, skipping precommit since ORD is so unstable

092761d

Add doc strings and type annotations to layer_norm implementation

b028649

merge

efa86eb

Merge branch 'NVIDIA:main' into te-ln

e3ecb1b

Merge branch 'NVIDIA:main' into te-ln

8da32fd

Snapshot layernorm optimizations.

44be5ea

Enable dynamic selection of layer norm in MeshGraphNet. Yields a good…

46cd1ec

… training speed up on GPU.

coreyjadams added ! - Release PRs or Issues releating to a release 3 - Ready for Review Ready for review by team labels Jul 29, 2025

coreyjadams self-assigned this Jul 29, 2025

coreyjadams requested review from Alexey-Kamenev and mnabian July 29, 2025 19:36

coreyjadams commented Jul 29, 2025

View reviewed changes

physicsnemo/models/gnn_layers/mesh_graph_mlp.py Outdated Show resolved Hide resolved

coreyjadams added 2 commits July 29, 2025 19:38

Remove old code.

45031ff

Remove unneeded file.

801ce6b

coreyjadams commented Jul 29, 2025

View reviewed changes

test/distributed/distributed_utils_for_testing.py Outdated Show resolved Hide resolved

coreyjadams commented Jul 29, 2025

View reviewed changes

test/models/common/optimization.py Outdated Show resolved Hide resolved

coreyjadams commented Jul 29, 2025

View reviewed changes

test/models/conftest.py Outdated Show resolved Hide resolved

Update test to avoid te on CPU

c230d2f

Merge branch 'main' into te-ln

f188566

Update formatting

927a385

coreyjadams added 3 commits July 31, 2025 08:26

Update meshgraphnet.py

918fa35

Update docstring to use torch layernorm (for CPU tests).

Update meshgraphkan.py

a5fbb69

Disable TE for docstring tests.

Update meshgraphnet.py

acb4b6a

Fix ruff formatting

52396d8

Formatting ....

2ec3d68

Alexey-Kamenev approved these changes Aug 1, 2025

View reviewed changes

coreyjadams and others added 2 commits August 1, 2025 13:48

Merge branch 'NVIDIA:main' into te-ln

80bdf1b

Address PR feedback:

74de9c7

- remove warnings about deprecation - add warning if env variable PHYSICSNEMO_FORCE_TE is set, but to an unexpected value.

coreyjadams and others added 4 commits August 1, 2025 21:35

Update tests: env modification coming through a fixture now.

9c263de

Address graphcast too: use a fixture instead of contexts.

017cd1e

Fix layer norm tests too.

ec577ea

Merge branch 'main' into te-ln

064e067

Fix a test

d5c44b7

coreyjadams merged commit 0dde722 into NVIDIA:main Aug 4, 2025
1 check passed

coreyjadams deleted the te-ln branch August 4, 2025 11:57

MeshGraphNet Performance: Automaticaly Use transformer engine for LayerNorm. #1036

MeshGraphNet Performance: Automaticaly Use transformer engine for LayerNorm. #1036

Uh oh!

Conversation

coreyjadams commented Jul 29, 2025

PhysicsNeMo Pull Request

Description

Float 32

Float16

BFloat16

Checklist

Dependencies

Uh oh!

coreyjadams commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coreyjadams commented Jul 29, 2025

Uh oh!

coreyjadams commented Jul 29, 2025

Uh oh!

coreyjadams commented Jul 30, 2025

Uh oh!

coreyjadams commented Jul 30, 2025

Uh oh!

coreyjadams commented Jul 31, 2025

Uh oh!

coreyjadams commented Jul 31, 2025

Uh oh!

coreyjadams commented Jul 31, 2025

Uh oh!

Alexey-Kamenev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NickGeneva commented Aug 1, 2025

Uh oh!

coreyjadams commented Aug 1, 2025

Uh oh!

coreyjadams commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!