Enable `save_metric=1` and sources MCMC metric info from new JSON file #844

amas0 · 2025-12-02T21:41:32Z

Submission Checklist

Run unit tests
Declare copyright holder and open-source license: see below

Summary

This PR enables save_metric=1 for cmdstan sampling with adaptation. This outputs a new metric JSON file that contains the step size, inv_metric, and metric type for the sampling run. This PR also now lazily sources this info from these files when the corresponding properties on the CmdStanMCMC objects are accessed. By changing the source of this info, we can now also remove all the code where we were parsing adaptation info from the Stan CSV file, which this PR does.

Importantly, this PR proposes adding pydantic as a dependency to cmdstanpy. With sourcing more info from JSON files (in this metric change and soon from the config files), we want to be more careful about I/O validation. Pydantic is now a fairly standard way to parse and validate data in the Python ecosystem and I think is a good fit for our needs.

Closes #713.

@WardBrian In the issue comments, we discussed an update to the save_csvfiles methods that would include the other output files. This PR doesn't include that change. I could work to incorporate it here, but I think it makes sense to look at that as a standalone PR.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company): myself

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)

In shifting the metric/type/step size info to being sourced from the metric JSON files instead of the CSV comments, we no longer will populate those attributes during the _assemble_draws step and instead lazily load the attributes from the metric JSON files when the properties are accessed. This allows the draws to still be used/manipulated in cases when the metric information is unavailable such as when only the Stan CSV files are saved.

…ut files

WardBrian · 2025-12-03T19:33:12Z

I'll need some time to look over this but at first glance it looks pretty good. I don't think Pydantic would make a ton of sense for this alone, but as you note we will probably want something more powerful when loading the config files.

I do worry about being too strict about IO, since it could mean that cmdstanpy becomes unusable with development versions or right when a new version is released before cmdstanpy has itself updated. Right now this will most-likely 'just work' as long as you don't request information related to the part of the config that is new/different in that cmdstan version, but if we were doing full validation it would fail even if you only actually need some unrelated piece

amas0 · 2025-12-03T19:59:16Z

Yeah if it were just this, I wouldn't be interested in adding the dependency, but with potentially quite a few output files that need to be parsed, I think it would be useful. We could always implement the equivalent structures without pydantic, would just be a bit extra work.

Good point about strictness -- perhaps we could have validation issues throw warnings instead of errors? And only be strict where it would indicate something has gone wrong

WardBrian

First set of questions!

cmdstanpy/stanfit/mcmc.py

cmdstanpy/stanfit/metadata.py

test/data/logistic_output_1_metric.json

This necessitates using `from __future__ import annotations` but will eventually become default behavior.

WardBrian

I think this is pretty close to good, just a couple small questions and test comments

cmdstanpy/stanfit/mcmc.py

test/test_metadata.py

WardBrian

I think this is good to go! Sorry for the lag time between reviews

WardBrian · 2025-12-17T16:55:05Z

(assuming those test failures are just noise?)

amas0 · 2025-12-17T16:56:22Z

No worries!

There was one test failure that was numerical. That was just noise.

I'm scratching my head on these Windows failures, though. I tried to replicate in a VM locally, and they were all passing. Have you seen anything like this before?

WardBrian · 2025-12-17T16:59:06Z

I wouldn't be shocked if there was some weird difference, but it's hard to tell with just the test output as it is. Maybe print the metric files, and put [] inside the all call so it's a list comprehension rather than a generator expression? That will make it more obvious which index is the problem

amas0 · 2025-12-17T17:17:42Z

🤷

I guess we're good

WardBrian · 2025-12-17T17:19:17Z

huh, alright!

amas0 added 19 commits November 18, 2025 08:57

Add save_metric=1 to adapt sampler args

cdb9d9d

Add metric files to RunSet

62fefae

Add metric files runset tests

c7fef6b

Add initial metric parsing logic

838b78c

Fix string name collisions cmdstan args test

f7a9ae9

Fix field_validator to be classmethod

4823796

Properly handle one process per chain metric output

d23a9d6

Remove _step_size initialization from assemble_draws

66512d5

Allow stepsize to be nan

4a40ef7

Short-circuit metric properties to None when fixed param

cb493b5

Only enable save_metric=1 when adapt engaged

84ae036

Add metric file output for testing CmdStanMCMC construction from outp…

063dfb9

…ut files

Merge branch 'stan-dev:develop' into enable-save-metric

2d076af

Add metric files for runset-big

b138308

Fix metric output filenames test to reflect one proc per chain

29ddbfd

Remove functionality and tests for parsing metric info from CSV

4502aef

Add pydantic as a dependency

d80461d

Add tests of MetricInfo validators

0f5dab8

WardBrian reviewed Dec 9, 2025

View reviewed changes

amas0 added 4 commits December 12, 2025 15:29

Remove unused chain_id from MetricInfo

82a85d2

Remove stringified type hints

1635915

This necessitates using `from __future__ import annotations` but will eventually become default behavior.

Clarify arbitrary_types_allowed usage

1f0f0a9

Remove _metric_info_parsed

5fcee74

WardBrian reviewed Dec 16, 2025

View reviewed changes

cmdstanpy/stanfit/mcmc.py Show resolved Hide resolved

test/test_metadata.py Outdated Show resolved Hide resolved

test/test_metadata.py Show resolved Hide resolved

amas0 added 3 commits December 16, 2025 17:24

Add tests for invalid metric type

2497754

Convert MetricInfo.inv_metric to native Python types

37080af

Fixup mypy issue in tests

b5926c9

WardBrian approved these changes Dec 17, 2025

View reviewed changes

Convert to list for test clarity

915eef5

amas0 merged commit a2da636 into stan-dev:develop Dec 17, 2025
16 checks passed

Uh oh!

Enable save_metric=1 and sources MCMC metric info from new JSON file #844

Enable save_metric=1 and sources MCMC metric info from new JSON file #844

Uh oh!

Conversation

amas0 commented Dec 2, 2025

Submission Checklist

Summary

Copyright and Licensing

Uh oh!

WardBrian commented Dec 3, 2025

Uh oh!

amas0 commented Dec 3, 2025

Uh oh!

WardBrian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WardBrian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WardBrian left a comment

Choose a reason for hiding this comment

Uh oh!

WardBrian commented Dec 17, 2025

Uh oh!

amas0 commented Dec 17, 2025

Uh oh!

WardBrian commented Dec 17, 2025

Uh oh!

amas0 commented Dec 17, 2025

Uh oh!

WardBrian commented Dec 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enable `save_metric=1` and sources MCMC metric info from new JSON file #844

Enable `save_metric=1` and sources MCMC metric info from new JSON file #844