Construct trajectory from MLMD data when present #1252

mcgalcode · 2025-06-16T16:40:07Z

WIP

esoteric-ephemera · 2025-06-16T16:44:23Z

I'll move the OSZICAR patch to a separate PR so that can go out sooner thanks for the catch!

codecov-commenter · 2025-06-16T16:45:52Z

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.47%. Comparing base (4bb6f72) to head (101ba98).
⚠️ Report is 479 commits behind head on main.

Files with missing lines	Patch %	Lines
emmet-core/emmet/core/vasp/calculation.py	9.09%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1252      +/-   ##
==========================================
- Coverage   89.52%   89.47%   -0.05%     
==========================================
  Files         150      150              
  Lines       15311    15320       +9     
==========================================
+ Hits        13707    13708       +1     
- Misses       1604     1612       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gpetretto · 2025-06-24T08:58:15Z

Hi @mcgalcode @esoteric-ephemera, it is nice to see that there is much going on to improve the support for trajectories in MD simulations. I think this is a broad topic and maybe it would be better to limit the discussions in a few places. However I have at least one comment that is specific to this PR. As I have partially mentioned in #872 and materialsproject/atomate2#515, my experience is that it is relatively easy to generate very large trajectories (in the order of the 100000 steps). In my case it was already a challange to parse the outputs from the vasprun.xml with pymatgen, both in terms of time and memory. In this respect I would suggest to avoid making a full copy of all the md_data and then discard the structures, because I expect that for a long trajectory the deepcopy may result in a considerable overhead.

esoteric-ephemera · 2025-06-24T15:40:48Z

@gpetretto for these really long MD runs, have you tried parsing a trajectory from the vaspout.h5? The Vaspout parser in pymatgen should only load objects as needed, and you could in principle just load in the ionic step data

Asking primarily because it might be better to switch to vaspout.h5 parsing overall (more calculation detail, faster parsing)

gpetretto · 2025-06-25T20:44:34Z

Thanks for the comment @esoteric-ephemera. In some of the cases I was forced to work with the vasprun.xml, since the data was provided by someone else. In general, I totally agree that especially for long MD trajectories extracting the data from the h5 file is way more convenient. Are you considering introducing the option to parse from the hdf5 file in general in emmet? This would be interesting.

I should also add that for long MD, in my opinion, serializing hundreds of thousands of pymatgen Structures in JSON is probably not the best strategy. In such cases, I would consider storing the trajectory in some other format. I see that there is work ongoing to support pyarrow, but I don't know if it is planned to address MD trajectories as well. In any case I suppose that it will still imply parsing the data and dumping to a specific format. I keep thinking that maybe it would be better storing directly the output files that contain the trajectory (i.e. XDATCAR or vaspout.h5), so that they can be parsed with other tools to make the MD analysis at a later stage.

esoteric-ephemera · 2025-06-26T00:06:10Z

Yeah the long term plan is to use vaspout.h5 parsing if possible, if only because this is what VASP appears to be developing (new features are only added to vaspout.h5, not vasprun.xml)

The pyarrow/parquet emmet.core.trajectory.Trajectory class should natively support MD runs (it's intended to replace pymatgen's Trajectory) but there isn't a constructor directly from XDATCAR or vaspout.h5 to it. I've started adding this to atomate2 forcefield MD runs as well.

Happy to add parsing directly from VASP output if it would be useful!

esoteric-ephemera · 2025-07-18T18:11:46Z

Hey @mcgalcode can I do anything to help move this forward? Looks like it just needs precommit / mypy cleanup?

esoteric-ephemera · 2025-09-15T19:49:27Z

emmet-core/emmet/core/vasp/calculation.py

+            if vasprun.incar.get("ML_LMLFF"):
+                # Note that md_data includes the structures, but 
+                # to avoid redundance, we'll copy then remove them
+                frame_properties = copy.deepcopy(vasprun.md_data)


Is there a way to avoid the deepcopy here? Could take up a lot of memory / CPU

mcgalcode added 2 commits June 16, 2025 09:37

Construct trajectory from MLMD data when present

5ec392b

Add oszicar to discoverable files

101ba98

tsmathis mentioned this pull request Jun 16, 2025

Bug fixes, continuing PEP585 support, PropertyDoc schema #1253

Merged

esoteric-ephemera reviewed Sep 15, 2025

View reviewed changes

tsmathis marked this pull request as draft October 1, 2025 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Construct trajectory from MLMD data when present #1252

Construct trajectory from MLMD data when present #1252

Uh oh!

mcgalcode commented Jun 16, 2025

Uh oh!

esoteric-ephemera commented Jun 16, 2025

Uh oh!

codecov-commenter commented Jun 16, 2025 •

edited

Loading

Uh oh!

gpetretto commented Jun 24, 2025

Uh oh!

esoteric-ephemera commented Jun 24, 2025

Uh oh!

gpetretto commented Jun 25, 2025

Uh oh!

esoteric-ephemera commented Jun 26, 2025 •

edited

Loading

Uh oh!

esoteric-ephemera commented Jul 18, 2025

Uh oh!

esoteric-ephemera Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Construct trajectory from MLMD data when present #1252

Are you sure you want to change the base?

Construct trajectory from MLMD data when present #1252

Uh oh!

Conversation

mcgalcode commented Jun 16, 2025

Uh oh!

esoteric-ephemera commented Jun 16, 2025

Uh oh!

codecov-commenter commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

gpetretto commented Jun 24, 2025

Uh oh!

esoteric-ephemera commented Jun 24, 2025

Uh oh!

gpetretto commented Jun 25, 2025

Uh oh!

esoteric-ephemera commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esoteric-ephemera commented Jul 18, 2025

Uh oh!

esoteric-ephemera Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jun 16, 2025 •

edited

Loading

esoteric-ephemera commented Jun 26, 2025 •

edited

Loading