-
Notifications
You must be signed in to change notification settings - Fork 23
Improve AD system, clear memory explosions, add dmnormAD and PDinverse_logdet #1574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
I would think that this doesn't change our previous thinking that we would generally just extract the known derivative info based on the MVN rather than using AD to get it. |
It's worth comparing because in this case (that the log prior Hessian is constant, or constant for purposes of inner optimization), CppAD with some of the changes is capable of more or less recording it as a constant operation and returning that known Hessian very fast. |
Using This breaks our conjugate updating in some cases as we use the full The most minimal fix is, I think, to modify |
I am working through lifting PDinverse_logdet. It's a bit of a hassle as the dimension of the lifted node is 1-d of length |
I am setting up a set of tests in Also there is a bug in transposing in |
Ok, on this branch I have:
In a separate PR with branch Also for not good reasons, I am fixing the |
Thanks @paciorek. Let me know when you want to go over this stuff. |
@paciorek A question for you: How important is it to support the case that the user provide the precision rather than the covariance with dmnormAD? I presume we should support it for completeness but just asking because, for example, they should not invert the covariance in the model just because they can, or it will result in less efficient AD. If we're going to support it, should I just try to jump in on the nimble and nimbleQuad branches you've also worked on at this point, or wait for a merge and then go from there? |
@perrydv I think it matters for future sparsity computation we might include as the precision matrix for spatial problems will often be sparse when the covariance is not. See "Bayesian Spatial Modelling with R-INLA" So I think important for nimbleQuad, and even important before we have sparsity if we want to support these spatial models. ![]() |
Yeah, I agree with @paul-vdb 's point. |
Comment out seg-faulting PDinverse_logdet test.
@perrydv here is what I am seeing in terms of test failures:
|
* Automatically use dmnormAD if building model derivs. Handle conjugacy and PG with dmnormAD declarations. * Fix dmnormAD conjugacy. * Fix getting full prec for dmnormAD. * Add testing for dmnormAD. * Trap use of cholesky with dmnormAD and fix length of lifted node. * Make minor edits to test-ADdmnorm.R. * Add option to avoid using dmnormAD. * Fix first PDinverse test.
I merged |
As far as the seg-faulting, it seems this may be related to our long-standing issue of having seg faults in testing when running tests one after another. If I re-run the first test after running it once, it seg faults. If I run the second test alone, it runs, and the only failures are because the Jacobian from the {0,1} order derivs is now equal but not identical to the Jacobian from the {0,1,2} order derivs, plus a minor out-of-tolerance issue. I presume this is due to Perry's changes to how forward and reverse mode are being used. At least for now I am checking Running the second test first and then the first test seems to succeed, so for now I am going to reverse the order. |
Ok, @perrydv I think I've got this down to the only test failures coming from R and C deriv values not matching in Can you take a look and see what you think? As I mentioned above, it looks like the first test failure where you can dig into it is the second test in |
Thanks @paciorek for the work to isolate where I should look. It appears that part of the new pathways we use in some cases for derivs ( Most of our use of conditionals are for the
The number of operations is small and may not be worse in complexity than the conditional on the AD tape. The other CppAD conditional uses are for boundaries such as in That is my current thought. Open to other ideas. |
There is a lot in the PR, and there are also some loose ends as I put this up.
Summary of changes
PDinverse_logdet(mat, prec_param)
that returns a flattened vector with the upper triangular elements of the inverse ofmat
(ifprec_param
isFALSE
, somat
is a covariance), withlog det(mat)
appended as a final element. This results in length n*(n+1)/2 + 1, wheremat
is n x n. There is also an atomic for AD forPDinverse_logdet
. The reason to do this is that for both the inverse and log det, working from the Cholesky is useful, but we don't want to put Cholesky on an AD tape because it is inefficient. And the forward and reverse mode AD needs for log det can use the inverse, so it helps to have it in the same operation.dmnormAD
that takes the result of PDinverse_logdet as an input. This allows a more efficient AD tape to be recorded because CppAD is good at achieving efficiency for a quadratic form (i.e. not doing Cholesky solves as we would for only the value).outInds
is analogous towrt
but for the the output or "y" dimensions. If one hasm
output values and wants derivatives only for some of them,outInds
can control that, potentially saving work.inDir
allows a single input direction (weights for linear combination of x directions) to be used in Forward mode for order 1. (Forward mode is not used for order 2.) This reduces work in some cases, although not currently used but built alongsideoutDir
because they go hand in hand.outDir
allows a single output direction (derivatives will be taken of the inner product ofoutDir
[i.e. weights] and y). For some needs in Laplace, this can save a large amount of computing.Performance:
nimbleQuad
. One is a spatial model and the other a crossed random effects GLMM. Along with changes I will PR onnimbleQuad
, use of the changes here makes the memory explosions go away and the run times way faster.Loose ends
PDinverse_logdet
, the atomic forprec_param=TRUE
IS NOT IMPLEMENTED. This is a real gap and needs attention.test-ADPDinverse_logdet
with a test of the function on its own and in a model withdmnormAD
. There is room for more thorough testing, e.g. ofdmnormAD
on its own.outDir
,inDir
, andoutInds
undocumented for purposes of the next release in order not to delay further.dmnormAD
does not work. I expect any of us could get to the bottom of this. @paciorek and I started to discuss whetherdmnormAD
should really have a separate name fromdmnorm
or instead because another alternative parameterization.Naming
PDinverse_logdet
: PD is for "positive definite". It seems verbose but is accurate...?dmnormAD
: This may or may not work well as a name. The "AD" indicates "recommended for AD", however it is not the case that it only works with AD (which is how it might sound). Also see last "loose end".