-
Notifications
You must be signed in to change notification settings - Fork 78
Construct trajectory from MLMD data when present #1252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I'll move the OSZICAR patch to a separate PR so that can go out sooner thanks for the catch! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1252 +/- ##
==========================================
- Coverage 89.52% 89.47% -0.05%
==========================================
Files 150 150
Lines 15311 15320 +9
==========================================
+ Hits 13707 13708 +1
- Misses 1604 1612 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @mcgalcode @esoteric-ephemera, it is nice to see that there is much going on to improve the support for trajectories in MD simulations. I think this is a broad topic and maybe it would be better to limit the discussions in a few places. However I have at least one comment that is specific to this PR. As I have partially mentioned in #872 and materialsproject/atomate2#515, my experience is that it is relatively easy to generate very large trajectories (in the order of the 100000 steps). In my case it was already a challange to parse the outputs from the vasprun.xml with pymatgen, both in terms of time and memory. In this respect I would suggest to avoid making a full copy of all the |
|
@gpetretto for these really long MD runs, have you tried parsing a trajectory from the vaspout.h5? The Vaspout parser in pymatgen should only load objects as needed, and you could in principle just load in the ionic step data Asking primarily because it might be better to switch to vaspout.h5 parsing overall (more calculation detail, faster parsing) |
|
Thanks for the comment @esoteric-ephemera. In some of the cases I was forced to work with the vasprun.xml, since the data was provided by someone else. In general, I totally agree that especially for long MD trajectories extracting the data from the h5 file is way more convenient. Are you considering introducing the option to parse from the hdf5 file in general in emmet? This would be interesting. I should also add that for long MD, in my opinion, serializing hundreds of thousands of pymatgen Structures in JSON is probably not the best strategy. In such cases, I would consider storing the trajectory in some other format. I see that there is work ongoing to support pyarrow, but I don't know if it is planned to address MD trajectories as well. In any case I suppose that it will still imply parsing the data and dumping to a specific format. I keep thinking that maybe it would be better storing directly the output files that contain the trajectory (i.e. XDATCAR or vaspout.h5), so that they can be parsed with other tools to make the MD analysis at a later stage. |
|
Yeah the long term plan is to use vaspout.h5 parsing if possible, if only because this is what VASP appears to be developing (new features are only added to vaspout.h5, not vasprun.xml) The pyarrow/parquet Happy to add parsing directly from VASP output if it would be useful! |
|
Hey @mcgalcode can I do anything to help move this forward? Looks like it just needs precommit / mypy cleanup? |
| if vasprun.incar.get("ML_LMLFF"): | ||
| # Note that md_data includes the structures, but | ||
| # to avoid redundance, we'll copy then remove them | ||
| frame_properties = copy.deepcopy(vasprun.md_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to avoid the deepcopy here? Could take up a lot of memory / CPU
WIP