Skip to content

Conversation

coreyjadams
Copy link
Collaborator

PhysicsNeMo Pull Request

This PR updates the transolver model and introduces an external aero example that uses it. Summary of changes to the model code:

  • Transolver is extended to irregular meshes and 3D data, previously this wasn't actually usable.
  • Transolver's interface is improved for readability.
  • The PhysicsAttention layer has been consolidated to improve readability and reduce code duplication between 2D, 3D, and irregular inputs.
  • The PhysicsAttention layer has been modified in several cases:
    • Using einops rearrange to manipulate data shapes
    • Using matmuls directly instead of einsum
    • Reordering the weight normalization and projection; this improves numerical stability for lower precisions.
  • The Transolver_block implementation can now use transformer_engine for most layers.

Additionally, the darcy_transolver example was updated to accomodate these changes, and still converges like the paper result.

A summary of the external aero example:

  • The model can be trained on the physicsnemo curator outputs from domino; if you have a dataset that works for domino in zarr format, it can work for transolver.
  • The model uses an irregular mesh, and only surface OR volume data can be used at a given time.
  • The surface data uses the mesh centers + normals, the volume data uses just mesh centers. SDF could be added but hasn't been.
  • The training script supports fp32, bf16, fp16, and fp8 "in principle". In reality, only fp32 and bf16 are known to be stable for the surface data. Volumetric experiments are ongoing.
  • The dataloading is ready to use domain parallelism, but the training script has not enabled it yet. The model should be uniquely scalable due to the physics-attention state projections.
  • the training normalizes the targets to mean 0, std 1, but un-normalizes the targets and predictions for relative L2 calculations.

There is not yet a good interface for inference with a trained model. For the surface data, considering the simplicity of the preprocessing and reuse of domino inputs, it should be straightforward.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

coreyjadams and others added 20 commits June 27, 2025 07:34
…he 'fix' version of the example easier to run, scalable, usable at full resolution, and faster
…lelism, this is a

disk to gpu pipeline that natively optimizes bandwidth when domain parallel.
@coreyjadams
Copy link
Collaborator Author

/blossom-ci

@coreyjadams coreyjadams added ! - Release PRs or Issues releating to a release 3 - Ready for Review Ready for review by team labels Aug 1, 2025
@coreyjadams
Copy link
Collaborator Author

Some updates on this PR:

  • fp8 training appears to be functional and stable. It does not, however, produce a significant speedup yet.
  • No training curves or results are included in the example README, since they are still being validated.
  • Some volumetric functions are present but in general volumetric training is WIP. Its not supported yet, and the README makes that clear.
  • The same can be said for domain parallelism: WIP, not supported yet.

The goal here was to have a transformer-engine leveraging model, so focus was on training stability, model compatibility, and inference.

@coreyjadams coreyjadams added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Aug 1, 2025
@@ -0,0 +1,210 @@
# Transolver for External Aerodynamics on Irregular Meshes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and comprehensive overall. I am not calling out minor knit picks. We should have a section on customization - if a user needs to adapt this to a new problem, say CFD data from internal flow problem, what is the guidance in terms of where the user needs to look to customize - assume its VTK data, so can reuse the datapipe but what other elements would need modification.

introduces modifications for improved numerical stability and compatibility with NVIDIA
TransformerEngine.

The training workflow for Transolver leverages the same input datasets as DoMINO. For
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same input datasets as other models perhaps. We are specifically mentioning things in the context of DoMINO - can be avoided.

@coreyjadams coreyjadams mentioned this pull request Aug 6, 2025
5 tasks
@coreyjadams
Copy link
Collaborator Author

This has been divided into two PRS and merged separately. Closing.

@coreyjadams coreyjadams closed this Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team 5 - DO NOT MERGE Hold off on merging; see PR for details ! - Release PRs or Issues releating to a release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants