Skip to content

Conversation

@mcgalcode
Copy link
Contributor

WIP

@esoteric-ephemera
Copy link
Collaborator

I'll move the OSZICAR patch to a separate PR so that can go out sooner thanks for the catch!

@codecov-commenter
Copy link

codecov-commenter commented Jun 16, 2025

Codecov Report

❌ Patch coverage is 9.09091% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.47%. Comparing base (4bb6f72) to head (101ba98).
⚠️ Report is 479 commits behind head on main.

Files with missing lines Patch % Lines
emmet-core/emmet/core/vasp/calculation.py 9.09% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1252      +/-   ##
==========================================
- Coverage   89.52%   89.47%   -0.05%     
==========================================
  Files         150      150              
  Lines       15311    15320       +9     
==========================================
+ Hits        13707    13708       +1     
- Misses       1604     1612       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gpetretto
Copy link
Contributor

Hi @mcgalcode @esoteric-ephemera, it is nice to see that there is much going on to improve the support for trajectories in MD simulations. I think this is a broad topic and maybe it would be better to limit the discussions in a few places. However I have at least one comment that is specific to this PR. As I have partially mentioned in #872 and materialsproject/atomate2#515, my experience is that it is relatively easy to generate very large trajectories (in the order of the 100000 steps). In my case it was already a challange to parse the outputs from the vasprun.xml with pymatgen, both in terms of time and memory. In this respect I would suggest to avoid making a full copy of all the md_data and then discard the structures, because I expect that for a long trajectory the deepcopy may result in a considerable overhead.

@esoteric-ephemera
Copy link
Collaborator

@gpetretto for these really long MD runs, have you tried parsing a trajectory from the vaspout.h5? The Vaspout parser in pymatgen should only load objects as needed, and you could in principle just load in the ionic step data

Asking primarily because it might be better to switch to vaspout.h5 parsing overall (more calculation detail, faster parsing)

@gpetretto
Copy link
Contributor

Thanks for the comment @esoteric-ephemera. In some of the cases I was forced to work with the vasprun.xml, since the data was provided by someone else. In general, I totally agree that especially for long MD trajectories extracting the data from the h5 file is way more convenient. Are you considering introducing the option to parse from the hdf5 file in general in emmet? This would be interesting.

I should also add that for long MD, in my opinion, serializing hundreds of thousands of pymatgen Structures in JSON is probably not the best strategy. In such cases, I would consider storing the trajectory in some other format. I see that there is work ongoing to support pyarrow, but I don't know if it is planned to address MD trajectories as well. In any case I suppose that it will still imply parsing the data and dumping to a specific format. I keep thinking that maybe it would be better storing directly the output files that contain the trajectory (i.e. XDATCAR or vaspout.h5), so that they can be parsed with other tools to make the MD analysis at a later stage.

@esoteric-ephemera
Copy link
Collaborator

esoteric-ephemera commented Jun 26, 2025

Yeah the long term plan is to use vaspout.h5 parsing if possible, if only because this is what VASP appears to be developing (new features are only added to vaspout.h5, not vasprun.xml)

The pyarrow/parquet emmet.core.trajectory.Trajectory class should natively support MD runs (it's intended to replace pymatgen's Trajectory) but there isn't a constructor directly from XDATCAR or vaspout.h5 to it. I've started adding this to atomate2 forcefield MD runs as well.

Happy to add parsing directly from VASP output if it would be useful!

@esoteric-ephemera
Copy link
Collaborator

Hey @mcgalcode can I do anything to help move this forward? Looks like it just needs precommit / mypy cleanup?

if vasprun.incar.get("ML_LMLFF"):
# Note that md_data includes the structures, but
# to avoid redundance, we'll copy then remove them
frame_properties = copy.deepcopy(vasprun.md_data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to avoid the deepcopy here? Could take up a lot of memory / CPU

@tsmathis tsmathis marked this pull request as draft October 1, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants