Skip to content

Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet #1574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: devel
Choose a base branch
from

Conversation

perrydv
Copy link
Contributor

@perrydv perrydv commented Jul 13, 2025

There is a lot in the PR, and there are also some loose ends as I put this up.

Summary of changes

  • New function PDinverse_logdet(mat, prec_param) that returns a flattened vector with the upper triangular elements of the inverse of mat (if prec_param is FALSE, so mat is a covariance), with log det(mat) appended as a final element. This results in length n*(n+1)/2 + 1, where mat is n x n. There is also an atomic for AD for PDinverse_logdet. The reason to do this is that for both the inverse and log det, working from the Cholesky is useful, but we don't want to put Cholesky on an AD tape because it is inefficient. And the forward and reverse mode AD needs for log det can use the inverse, so it helps to have it in the same operation.
  • New distribution dmnormAD that takes the result of PDinverse_logdet as an input. This allows a more efficient AD tape to be recorded because CppAD is good at achieving efficiency for a quadratic form (i.e. not doing Cholesky solves as we would for only the value).
  • Changes to CppAD to make its "dynamic" (what we call "update") tape more efficient for no-ops like +0 or *1. CppAD had this kind of efficiency for the primary "variable" tape but not for the "dynamic" tape, and we use that heavily, so we were getting burned there. I am also doing a PR to get these changes into CppAD, but that is delayed while I work on it for our package(s) here.
  • Changes to execution of forward and reverse mode steps in our C++ core "getDerivs" function, which in some cases were wasteful.
  • Extension of the AD system to allow additional arguments "outInds", "inDir" and "outDir".
    • outInds is analogous to wrt but for the the output or "y" dimensions. If one has m output values and wants derivatives only for some of them, outInds can control that, potentially saving work.
    • inDir allows a single input direction (weights for linear combination of x directions) to be used in Forward mode for order 1. (Forward mode is not used for order 2.) This reduces work in some cases, although not currently used but built alongside outDir because they go hand in hand.
    • outDir allows a single output direction (derivatives will be taken of the inner product of outDir [i.e. weights] and y). For some needs in Laplace, this can save a large amount of computing.
  • Probably some other things that I'm forgetting.

Performance:

  • We have had two working cases of memory explosion and long run times with Laplace in the development package nimbleQuad. One is a spatial model and the other a crossed random effects GLMM. Along with changes I will PR on nimbleQuad, use of the changes here makes the memory explosions go away and the run times way faster.

Loose ends

  • For PDinverse_logdet, the atomic for prec_param=TRUE IS NOT IMPLEMENTED. This is a real gap and needs attention.
  • I added a single test file test-ADPDinverse_logdet with a test of the function on its own and in a model with dmnormAD. There is room for more thorough testing, e.g. of dmnormAD on its own.
  • Methods of the PDinverse_logdet atomic are still inlined in the .h file, easier for development, and could be moved over to the .cpp (with inline removed).
  • I have not modified documentation at all. I would be comfortable leaving outDir, inDir, and outInds undocumented for purposes of the next release in order not to delay further.
  • Lifting of alternative parmeterizations of dmnormAD does not work. I expect any of us could get to the bottom of this. @paciorek and I started to discuss whether dmnormAD should really have a separate name from dmnorm or instead because another alternative parameterization.

Naming

  • PDinverse_logdet: PD is for "positive definite". It seems verbose but is accurate...?
  • dmnormAD: This may or may not work well as a name. The "AD" indicates "recommended for AD", however it is not the case that it only works with AD (which is how it might sound). Also see last "loose end".

@paul-vdb
Copy link
Contributor

@perrydv @paciorek For a prior on the random effects that uses this new dmnormAD, will we want to do the manually known gradient and hessian on the prior distribution or leave it for the tape?

@paciorek
Copy link
Contributor

I would think that this doesn't change our previous thinking that we would generally just extract the known derivative info based on the MVN rather than using AD to get it.

@perrydv
Copy link
Contributor Author

perrydv commented Jul 15, 2025

It's worth comparing because in this case (that the log prior Hessian is constant, or constant for purposes of inner optimization), CppAD with some of the changes is capable of more or less recording it as a constant operation and returning that known Hessian very fast.

@paciorek
Copy link
Contributor

paciorek commented Jul 16, 2025

Using getParam with the precision for dmnormAD when dmnormAD is provided with the cov returns an upper-triangular matrix rather than the full precision matrix.

This breaks our conjugate updating in some cases as we use the full prec. And could break other things or user code.

The most minimal fix is, I think, to modify calc_dmnorm_prec_ldet_AltParams to ensure the full precision is returned (leaving PDinverse_logdet as an internal function that returns only the upper triangle by design). I will plan to do that, at least for now to continue with testing, but will want @perrydv input

@paciorek
Copy link
Contributor

I am working through lifting PDinverse_logdet. It's a bit of a hassle as the dimension of the lifted node is 1-d of length $n^2+1$, which doesn't fit the pattern of how we generally handle dimensionality of lifted nodes.

@paciorek
Copy link
Contributor

I am setting up a set of tests in test-ADPDinverse_logdet.R (renamed as test-ADdmnorm.R) on my own branch automate_dmnormAD, where I am also changing all model use of dmnorm to dmnormAD when buildDerivs=TRUE.

Also there is a bug in transposing in calc_dmnorm_prec_ldet_AltParams() that I will fix on fix-cppad-memory-explosion.

@paciorek
Copy link
Contributor

Ok, on this branch I have:

  • fixed a linear algebra bug in calc_dmnorm_prec_ldet_AltParams
  • set things up to lift PDinverse_logdet
  • fix an argument error in calling the new rmnorm from R

In a separate PR with branch automate_dmnormAD, I will discuss changes I've made to use dmnormAD whenever model derivs are enabled, with a bunch of testing added that requires the automation change.

Also for not good reasons, I am fixing the getParam issue of returning only the upper triangle of the prec cholesky on the automate_dmnormAD branch.

@perrydv
Copy link
Contributor Author

perrydv commented Jul 23, 2025

Thanks @paciorek. Let me know when you want to go over this stuff.

@perrydv
Copy link
Contributor Author

perrydv commented Jul 23, 2025

@paciorek A question for you: How important is it to support the case that the user provide the precision rather than the covariance with dmnormAD? I presume we should support it for completeness but just asking because, for example, they should not invert the covariance in the model just because they can, or it will result in less efficient AD. If we're going to support it, should I just try to jump in on the nimble and nimbleQuad branches you've also worked on at this point, or wait for a merge and then go from there?

@paul-vdb
Copy link
Contributor

@perrydv I think it matters for future sparsity computation we might include as the precision matrix for spatial problems will often be sparse when the covariance is not. See "Bayesian Spatial Modelling with R-INLA" So I think important for nimbleQuad, and even important before we have sparsity if we want to support these spatial models.

image

@paciorek
Copy link
Contributor

Yeah, I agree with @paul-vdb 's point.

@paciorek
Copy link
Contributor

@perrydv here is what I am seeing in terms of test failures:

  1. test-ADPDinverse_logdet.R had its first test fail because cov was not defined. I fixed that.
  2. test-ADPDinverse_logdet.R had its second test fail because of a segfault that I am hoping you'll look at. It's in the complicated test_ADmodelCalculate stuff, but hopefully it won't be too hard to get into.
    I commented that test out to allow later tests to run but didn't actually re-run testing, so not sure what would happen. All later tests in the batch are not AD-related, so hopefully ok.
  3. test-ADdists.R is failing with numerical comparisons, but it seems to mainly (hopefully only) relate to 2nd derivs and to CderivsJacF1 having all zeros. The first case where you can dig into it is the second (Dirichlet) test.

* Automatically use dmnormAD if building model derivs.
Handle conjugacy and PG with dmnormAD declarations.

* Fix dmnormAD conjugacy.

* Fix getting full prec for dmnormAD.

* Add testing for dmnormAD.

* Trap use of cholesky with dmnormAD and fix length of lifted node.

* Make minor edits to test-ADdmnorm.R.

* Add option to avoid using dmnormAD.

* Fix first PDinverse test.
@paciorek
Copy link
Contributor

I merged automate_dmnormAD into fix-cppad-memory-explosion so the segfault is now in the commented-out second test in test-ADdmnorm.R, which replaces test-ADPDinverse_logdet.R. There are now also new failures in test-ADdmnorm.R. I will take a look at those.

@paciorek
Copy link
Contributor

test-ADdmnorm.R passes tests locally (apart from the commented-out second test). I'm going to re-push to trigger testing again, as I don't see what could cause the weird failures seen in the CI output.

As far as the seg-faulting, it seems this may be related to our long-standing issue of having seg faults in testing when running tests one after another. If I re-run the first test after running it once, it seg faults. If I run the second test alone, it runs, and the only failures are because the Jacobian from the {0,1} order derivs is now equal but not identical to the Jacobian from the {0,1,2} order derivs, plus a minor out-of-tolerance issue. I presume this is due to Perry's changes to how forward and reverse mode are being used. At least for now I am checking AD_test_utils.R to check for equality not identical.

Running the second test first and then the first test seems to succeed, so for now I am going to reverse the order.

@paciorek
Copy link
Contributor

Ok, @perrydv I think I've got this down to the only test failures coming from R and C deriv values not matching in test-ADdists.R, mainly for order 2 derivs, and in cases where the compiled deriv is exactly equal to 0.

Can you take a look and see what you think? As I mentioned above, it looks like the first test failure where you can dig into it is the second test in test-ADdists.R, which is a test involving the Dirichlet distribution.

@perrydv
Copy link
Contributor Author

perrydv commented Aug 1, 2025

Thanks @paciorek for the work to isolate where I should look.

It appears that part of the new pathways we use in some cases for derivs (subgraph_jac_rev) is not compatible with the combination of double-taping and CppAD conditionals. It is not specific to ddirch but that is just the first test that uses a non-fixed log argument version. This is a glitch but I think we can work around it.

Most of our use of conditionals are for the log argument of distributions. We could replace

dens = CppAD::CondExpEq(give_log, Type(1), dens, exp(dens));
with
dense = give_log * dens + (1-give_log)*exp(dens)
or with an atomic.

The number of operations is small and may not be worse in complexity than the conditional on the AD tape.

The other CppAD conditional uses are for boundaries such as in dunif or dhalfflat. We could handle those with an atomic, which might be as good or better anyway.

That is my current thought. Open to other ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants