added StatsAPI.mss(m::LinearMixedModel) #816

ajinkya-k · 2025-03-22T22:26:30Z

related to #514

Did behavior change? Did you add need features? If so, please update NEWS.md

add entry in NEWS.md
after opening this PR, add a reference and run docs/NEWS-update.jl to update the cross-references.

Should we release your changes right away? If so, bump the version:

I've bumped the version appropriately

codecov · 2025-03-22T22:32:38Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.34%. Comparing base (6bfbebe) to head (eb75ca9).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #816   +/-   ##
=======================================
  Coverage   97.33%   97.34%           
=======================================
  Files          36       36           
  Lines        3495     3499    +4     
=======================================
+ Hits         3402     3406    +4     
  Misses         93       93

Flag	Coverage Δ
current	`96.99% <100.00%> (+<0.01%)`	⬆️
minimum	`97.34% <100.00%> (+<0.01%)`	⬆️
nightly	`96.92% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dmbates · 2025-03-30T17:58:14Z

Why is the test under @test_broken?

Is this formula correct if the fixed-effects doesn't include a constant? Usually when mean(m.y) occurs in an expression like this it represents the fitted values from a trivial model y ~ 1 but the comparison is only meaningful if that model is nested within the current model.

Would it be better to write the loop as sum(abs2, mean(m.y) .- fitted(m))?
(This is not a big deal and it may be the dot-broadcasting will avoid allocations entirely so the current form is better. I don't know if sum(dot_broadcast_expression) expands the dot-broadcast loop to include the sum operation.) Testing confirms that this is not a big deal

julia> @be m1 sum(abs2.(mean(_.y) .- fitted(_)))
Benchmark: 1804 samples with 4 evaluations
 min    6.531 μs (69 allocs: 5.516 KiB)
 median 7.146 μs (69 allocs: 5.516 KiB)
 mean   13.561 μs (69 allocs: 5.516 KiB, 0.11% gc time)
 max    8.763 ms (69 allocs: 5.516 KiB, 99.60% gc time)

julia> @be m1 sum(abs2, mean(_.y) .- fitted(_))
Benchmark: 2960 samples with 4 evaluations
 min    6.729 μs (68 allocs: 5.484 KiB)
 median 7.052 μs (68 allocs: 5.484 KiB)
 mean   8.180 μs (68 allocs: 5.484 KiB, 0.13% gc time)
 max    856.917 μs (68 allocs: 5.484 KiB, 96.40% gc time)

ajinkya-k · 2025-03-30T18:00:52Z

I put it under test broken because the same value in R only matches up to two digits, and I wanted to get comments . Sorry this should have been a draft PR

ajinkya-k · 2025-03-30T18:51:17Z

@dmbates Is model sum of squares even defined if there is no intercept in the model?

ajinkya-k · 2025-03-30T18:53:18Z

I used the definition here (though it uses "explained" instead of "model"): https://en.wikipedia.org/wiki/Explained_sum_of_squares

test/fit.jl

ajinkya-k · 2025-03-31T02:48:32Z

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

palday · 2025-03-31T03:01:22Z

Would it be better to write the loop as sum(abs2, mean(m.y) .- fitted(m))?
(This is not a big deal and it may be the dot-broadcasting will avoid allocations entirely so the current form is better. I don't know if sum(dot_broadcast_expression) expands the dot-broadcast loop to include the sum operation.) Testing confirms that this is not a big deal

Broadcast fusion reduces the number of arrays allocated, but fundamentally the limit is that fitted(m) allocates a vector equal in length to the response.

palday · 2025-03-31T03:04:50Z

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

I pushed a commit with an example for you. 😄

ajinkya-k · 2025-03-31T03:06:50Z

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

I pushed a commit with an example for you. 😄

Thanks!

PS: How did you push to my branch?

palday · 2025-03-31T03:12:18Z

PS: How did you push to my branch?

You have "allow maintainers to make edits" checked:

ajinkya-k · 2025-03-31T04:59:45Z

PS: How did you push to my branch?

You have "allow maintainers to make edits" checked:

That is what i thought, thanks for confirming!

ajinkya-k · 2025-03-31T05:00:39Z

Based on this: https://timnewbold.github.io/teaching_resources/GLMs.html , mss is not well defined (or atleast useful) for generalized linear mixed models, is that correct? If so, I think we should still have a method that just throws an error and tells the user to work with difference in deviances instead. Thoughts?

dmbates · 2025-03-31T15:38:40Z

Based on this: https://timnewbold.github.io/teaching_resources/GLMs.html , mss is not well defined (or atleast useful) for generalized linear mixed models, is that correct? If so, I think we should still have a method that just throws an error and tells the user to work with difference in deviances instead. Thoughts?

Yes. I should have responded earlier but I have been occupied with family matters for the last couple of days.

This is another example of a computation shortcut that, strictly speaking, applies only to linear models and only has relevance if you are doing the computation by hand. Unfortunately, people focus on the formula and not on why the result should be of use.

If you have a linear model in which the constant is within the column span of the model matrix, X, and the sum of squared residuals is the measure of goodness of fit then you may want to compare the model that you fit to a trivial model consisting only of a constant. This formula is from a Pythagorean relationship of the residual from the full model and the residual for the constant model so that it provides the difference in the two sums of squared residuals. Hence this is the sum of squared residuals accounted for but the model

For a GLM or a LMM the criterion for estimation of the coefficients isn't minimizing the sum of squared residuals so the rationale for looking at this quantity doesn't exist. If you want to compare the model that you fit versus a trivial model just use a likelihood ratio test.

So much of what is taught about analysis of variance is based on computational shortcuts that are irrelevant when you are using computers to fit the model. Fisher understood the geometry of the tests but needed to make them available to people who didn't have his insight into n-dimensional geometry so he summarized the results in tables. If you had to fit every model by hand calculation you quickly came to employ computational shortcuts, but that is no longer important. Unfortunately formulas like this survive and people think they must be meaningful in their own right.

dmbates · 2025-03-31T15:47:00Z

I see that this PR adds a hasintercept method based on the formula. As I said, if you did want to make this statistic available (and I don't think we should), the technical criterion is whether the constant vector lies in the column span of X. That is, is the constant model contained in the one that was fit. So if you fit a model y ~ 0 + f where f is a categorical factor, the formula doesn't have an intercept term but the constant vector is contained in the column span of X.

ajinkya-k · 2025-03-31T15:49:23Z

@dmbates In that case we can just discard this. I was just adding this to implement all relevant functions in StatsAPI to be fully compliant with that API, though perhaps we don't have to

dmbates · 2025-03-31T16:28:46Z

Thanks. I think we are better off not including such methods and will close the PR. I do appreciate your work on this.

added StatsAPI.mss(m::LinearMixedModel)

43bab38

ajinkya-k force-pushed the ahk/statsapi branch from 7671d40 to 43bab38 Compare March 30, 2025 02:57

fixed MSS formula for linear mixed effects models

b15ba2e

palday reviewed Mar 31, 2025

View reviewed changes

test/fit.jl Outdated Show resolved Hide resolved

clean up mss definition a little

2e3bb5c

palday and others added 2 commits March 30, 2025 22:16

tweak tolerance

eb75ca9

tweaked error message for MSS

c7c365f

Merge branch 'main' into ahk/statsapi

9a0a329

dmbates closed this Mar 31, 2025

added StatsAPI.mss(m::LinearMixedModel) #816

added StatsAPI.mss(m::LinearMixedModel) #816

Uh oh!

Conversation

ajinkya-k commented Mar 22, 2025

Uh oh!

codecov bot commented Mar 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dmbates commented Mar 30, 2025

Uh oh!

ajinkya-k commented Mar 30, 2025

Uh oh!

ajinkya-k commented Mar 30, 2025

Uh oh!

ajinkya-k commented Mar 30, 2025

Uh oh!

Uh oh!

ajinkya-k commented Mar 31, 2025

Uh oh!

palday commented Mar 31, 2025

Uh oh!

palday commented Mar 31, 2025

Uh oh!

ajinkya-k commented Mar 31, 2025

Uh oh!

palday commented Mar 31, 2025

Uh oh!

ajinkya-k commented Mar 31, 2025

Uh oh!

ajinkya-k commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmbates commented Mar 31, 2025

Uh oh!

dmbates commented Mar 31, 2025

Uh oh!

ajinkya-k commented Mar 31, 2025

Uh oh!

dmbates commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Mar 22, 2025 •

edited

Loading

ajinkya-k commented Mar 31, 2025 •

edited

Loading