-
-
Notifications
You must be signed in to change notification settings - Fork 455
Description
Describe the bug
I am currently using arviz for an undergrad internship project.
When using compare() to weigh different models using Stacking or Pseudo-BMA, arviz seems to calculate different elpds and weights than I would expect. The “stats.py” file containing the compare() code seems to calculate certain steps in this process in a different way to how I would expect. Specifically, the steps where Importance Sampling / Pointwise Log-Likelihood values are calculated, and the step where the Stacking weights are calculated. I’ve written some of my own code following this paper’s process: https://arxiv.org/pdf/1704.02030 (Yao et al. 2017). The main difference being in the Importance Sampling step, where I use a more complex LOO process outlined in this paper: “https://arxiv.org/abs/2410.03507” (Nguyen et al. 2024). The first paper states that the Importance Sampling method arviz uses and the LOO-based IS method Nguyen et al. uses are equivalent, or at least proportional. I’d like to investigate whether the differences seen in the calculated weights between these two methods are due to this difference, whether this is expected or if there is any evident bug in my code
To Reproduce
I have linked a notebook and a .py file.
https://github.com/antdifo66/antoniobma/tree/main/arvizforum
The code attempts to fit three slightly different candidate polynomial models to a more complicated polynomial model that generates some noisy data. It then uses arviz’s compare() function to attempt to model average between the candidate models. I also do the same analysis using code that I’ve written, in which I have tried to replicate the equations given in the papers above. The resulting elpds (and more importantly, the weights) are different between the two codes.
Expected behavior
The elpds and weights should be the same, or at least quite similar. Sometimes the weights are off by 10%, and sometimes different models are picked out entirely (usually in the case of stacking.
Additional context
Other libraries: emcee, pandas, getdist, numpy, matplotlib, tqdm. Windows 11, arviz version 0.20.0.