Skip to content

Conversation

ajinkya-k
Copy link
Contributor

related to #514

Did behavior change? Did you add need features? If so, please update NEWS.md

  • add entry in NEWS.md
  • after opening this PR, add a reference and run docs/NEWS-update.jl to update the cross-references.

Should we release your changes right away? If so, bump the version:

  • I've bumped the version appropriately

Copy link

codecov bot commented Mar 22, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.34%. Comparing base (6bfbebe) to head (eb75ca9).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #816   +/-   ##
=======================================
  Coverage   97.33%   97.34%           
=======================================
  Files          36       36           
  Lines        3495     3499    +4     
=======================================
+ Hits         3402     3406    +4     
  Misses         93       93           
Flag Coverage Δ
current 96.99% <100.00%> (+<0.01%) ⬆️
minimum 97.34% <100.00%> (+<0.01%) ⬆️
nightly 96.92% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dmbates
Copy link
Collaborator

dmbates commented Mar 30, 2025

Why is the test under @test_broken?

Is this formula correct if the fixed-effects doesn't include a constant? Usually when mean(m.y) occurs in an expression like this it represents the fitted values from a trivial model y ~ 1 but the comparison is only meaningful if that model is nested within the current model.

Would it be better to write the loop as sum(abs2, mean(m.y) .- fitted(m))?
(This is not a big deal and it may be the dot-broadcasting will avoid allocations entirely so the current form is better. I don't know if sum(dot_broadcast_expression) expands the dot-broadcast loop to include the sum operation.) Testing confirms that this is not a big deal

julia> @be m1 sum(abs2.(mean(_.y) .- fitted(_)))
Benchmark: 1804 samples with 4 evaluations
 min    6.531 μs (69 allocs: 5.516 KiB)
 median 7.146 μs (69 allocs: 5.516 KiB)
 mean   13.561 μs (69 allocs: 5.516 KiB, 0.11% gc time)
 max    8.763 ms (69 allocs: 5.516 KiB, 99.60% gc time)

julia> @be m1 sum(abs2, mean(_.y) .- fitted(_))
Benchmark: 2960 samples with 4 evaluations
 min    6.729 μs (68 allocs: 5.484 KiB)
 median 7.052 μs (68 allocs: 5.484 KiB)
 mean   8.180 μs (68 allocs: 5.484 KiB, 0.13% gc time)
 max    856.917 μs (68 allocs: 5.484 KiB, 96.40% gc time)

@ajinkya-k
Copy link
Contributor Author

I put it under test broken because the same value in R only matches up to two digits, and I wanted to get comments . Sorry this should have been a draft PR

@ajinkya-k
Copy link
Contributor Author

@dmbates Is model sum of squares even defined if there is no intercept in the model?

@ajinkya-k
Copy link
Contributor Author

I used the definition here (though it uses "explained" instead of "model"): https://en.wikipedia.org/wiki/Explained_sum_of_squares

@ajinkya-k
Copy link
Contributor Author

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

@palday
Copy link
Member

palday commented Mar 31, 2025

Would it be better to write the loop as sum(abs2, mean(m.y) .- fitted(m))?
(This is not a big deal and it may be the dot-broadcasting will avoid allocations entirely so the current form is better. I don't know if sum(dot_broadcast_expression) expands the dot-broadcast loop to include the sum operation.) Testing confirms that this is not a big deal

Broadcast fusion reduces the number of arrays allocated, but fundamentally the limit is that fitted(m) allocates a vector equal in length to the response.

@palday
Copy link
Member

palday commented Mar 31, 2025

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

I pushed a commit with an example for you. 😄

@ajinkya-k
Copy link
Contributor Author

Is there an easy way to detect if an intercept has been included in the formula? I am thinking of adding code to throw an error if mss is called on a mixed model without intercept.

I pushed a commit with an example for you. 😄

Thanks!

PS: How did you push to my branch?

@palday
Copy link
Member

palday commented Mar 31, 2025

PS: How did you push to my branch?

You have "allow maintainers to make edits" checked:

image

@ajinkya-k
Copy link
Contributor Author

PS: How did you push to my branch?

You have "allow maintainers to make edits" checked:

That is what i thought, thanks for confirming!

@ajinkya-k
Copy link
Contributor Author

ajinkya-k commented Mar 31, 2025

Based on this: https://timnewbold.github.io/teaching_resources/GLMs.html , mss is not well defined (or atleast useful) for generalized linear mixed models, is that correct? If so, I think we should still have a method that just throws an error and tells the user to work with difference in deviances instead. Thoughts?

@dmbates
Copy link
Collaborator

dmbates commented Mar 31, 2025

Based on this: https://timnewbold.github.io/teaching_resources/GLMs.html , mss is not well defined (or atleast useful) for generalized linear mixed models, is that correct? If so, I think we should still have a method that just throws an error and tells the user to work with difference in deviances instead. Thoughts?

Yes. I should have responded earlier but I have been occupied with family matters for the last couple of days.

This is another example of a computation shortcut that, strictly speaking, applies only to linear models and only has relevance if you are doing the computation by hand. Unfortunately, people focus on the formula and not on why the result should be of use.

If you have a linear model in which the constant is within the column span of the model matrix, X, and the sum of squared residuals is the measure of goodness of fit then you may want to compare the model that you fit to a trivial model consisting only of a constant. This formula is from a Pythagorean relationship of the residual from the full model and the residual for the constant model so that it provides the difference in the two sums of squared residuals. Hence this is the sum of squared residuals accounted for but the model

For a GLM or a LMM the criterion for estimation of the coefficients isn't minimizing the sum of squared residuals so the rationale for looking at this quantity doesn't exist. If you want to compare the model that you fit versus a trivial model just use a likelihood ratio test.

So much of what is taught about analysis of variance is based on computational shortcuts that are irrelevant when you are using computers to fit the model. Fisher understood the geometry of the tests but needed to make them available to people who didn't have his insight into n-dimensional geometry so he summarized the results in tables. If you had to fit every model by hand calculation you quickly came to employ computational shortcuts, but that is no longer important. Unfortunately formulas like this survive and people think they must be meaningful in their own right.

@dmbates
Copy link
Collaborator

dmbates commented Mar 31, 2025

I see that this PR adds a hasintercept method based on the formula. As I said, if you did want to make this statistic available (and I don't think we should), the technical criterion is whether the constant vector lies in the column span of X. That is, is the constant model contained in the one that was fit. So if you fit a model y ~ 0 + f where f is a categorical factor, the formula doesn't have an intercept term but the constant vector is contained in the column span of X.

@ajinkya-k
Copy link
Contributor Author

@dmbates In that case we can just discard this. I was just adding this to implement all relevant functions in StatsAPI to be fully compliant with that API, though perhaps we don't have to

@dmbates
Copy link
Collaborator

dmbates commented Mar 31, 2025

Thanks. I think we are better off not including such methods and will close the PR. I do appreciate your work on this.

@dmbates dmbates closed this Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants