Skip to content

Commit 25cd719

Browse files
committed
Merge branch 'main' of github.com:JuliaStats/MixedModels.jl into pa/deprecation-removal-5.0
2 parents accba9a + 718af55 commit 25cd719

14 files changed

+334
-347
lines changed

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,10 @@ MixedModels v5.0.0 Release Notes
1111
- Internal code around optimization in profiling has been restructuring so that fitting done during calls to `profile` respect the `backend` and `optimizer` settings. [#853]
1212
- The `prfit!` convenience function has been removed. [#853]
1313
- The `dataset` and `datasets` functions have been removed. They are now housed in `MixedModelsDatasets`.[#854]
14+
- The local implementation of `fulldummy` and the nesting syntax has been removed and a dependency on RegressionFormulae.jl for their implementation has been added. [#855]
1415
- One argument `predict(::GeneralizedLinearMixedModel)`, i.e. prediction on the original data, now supports the `type` keyword argument. [#856]
16+
- `isnested(A::ReMat, B::ReMat)` is now a method of `StatsModels.isnested`.[#858]
17+
- [BREAKING ]`likelihoodratiotest` has been reworked to be a thin wrapper around `StatsModels.lrtest`. The historical difference in behavior in terms of nesting checks created some confusion. Users advanced enough to create models with non-obvious nesting are assumed to be advanced enough to manually compute the likelihood ratio test. The function `likelihoodratiotest` and associated `LikelihoodRatioTest` type (now with a type parameter for number of models) has been kept to enable printing of test results with model formulae. Most users should not notice a difference in behavior, but the display has been slightly changed and the internal field structure has changed.[#858]
1518

1619
MixedModels v4.38.0 Release Notes
1720
==============================
@@ -678,3 +681,6 @@ Package dependencies
678681
[#853]: https://github.com/JuliaStats/MixedModels.jl/issues/853
679682
[#854]: https://github.com/JuliaStats/MixedModels.jl/issues/854
680683
[#856]: https://github.com/JuliaStats/MixedModels.jl/issues/856
684+
[#855]: https://github.com/JuliaStats/MixedModels.jl/issues/855
685+
[#856]: https://github.com/JuliaStats/MixedModels.jl/issues/856
686+
[#858]: https://github.com/JuliaStats/MixedModels.jl/issues/858

Project.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,10 @@ MixedModelsDatasets = "7e9fb7ac-9f67-43bf-b2c8-96ba0796cbb6"
1717
NLopt = "76087f3c-5699-56af-9a33-bf431cd00edd"
1818
PooledArrays = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
1919
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
20+
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
2021
ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
2122
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
23+
RegressionFormulae = "545c379f-4ec2-4339-9aea-38f2fb6a8ba2"
2224
SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
2325
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
2426
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
@@ -61,6 +63,7 @@ NLopt = "0.6, 1"
6163
PRIMA = "0.2"
6264
PooledArrays = "0.5, 1"
6365
PrecompileTools = "1"
66+
Printf = "1"
6467
ProgressMeter = "1.7"
6568
Random = "1"
6669
RegressionFormulae = "0.1.3"
@@ -93,4 +96,4 @@ Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
9396
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
9497

9598
[targets]
96-
test = ["Aqua", "DataFrames", "ExplicitImports", "FiniteDiff", "ForwardDiff", "InteractiveUtils", "PRIMA", "RegressionFormulae", "StableRNGs", "Suppressor", "Test"]
99+
test = ["Aqua", "DataFrames", "ExplicitImports", "FiniteDiff", "ForwardDiff", "InteractiveUtils", "PRIMA", "StableRNGs", "Suppressor", "Test"]

docs/make.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ makedocs(;
2222
"rankdeficiency.md",
2323
"mime.md",
2424
"derivatives.md",
25+
"formula_syntax.md",
2526
"api.md",
2627
],
2728
)

docs/src/constructors.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,9 +188,6 @@ DisplayAs.Text(ans) # hide
188188
(Notice that the variance component for `days: 1` is estimated as zero, so the correlations for this component are undefined and expressed as `NaN`, not a number.)
189189

190190
An alternative is to force all the levels of `days` as indicators using `fulldummy` encoding.
191-
```@docs
192-
fulldummy
193-
```
194191
```@example Main
195192
fit(MixedModel, @formula(reaction ~ 1 + days + (1 + fulldummy(days)|subj)), sleepstudy,
196193
contrasts = Dict(:days => DummyCoding()))

docs/src/formula_syntax.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
2+
# Formula syntax
3+
4+
MixedModels.jl uses the variant of the Wilkinson-Rogers (1973) notation for models of (co)variance implemented by [StatsModels.jl](https://juliastats.org/StatsModels.jl/stable/formula/#Modeling-tabular-data).
5+
Additionally, MixedModels.jl extends this syntax to use the pipe `|` as the grouping operator.
6+
Further extensions are provided by [RegressionFormulae.jl](https://github.com/kleinschmidt/RegressionFormulae.jl?tab=readme-ov-file), in particular the use of the slash `/` as the nesting operator and the use of the caret `^` to indicate main effects and interactions up to a specified order.
7+
Currently, MixedModels.jl loads RegressionFormulae.jl by default, though this may change in a future release.
8+
If you require specific functionality from RegressionFormulae.jl, it is best to load it directly so that you can control the version used.
9+
10+
## General rules
11+
12+
- "Addition" (`+`) indicates additive, i.e., main effects: `a + b` indicates main effects of `a` and `b`.
13+
- "Multiplication" (`*`) indicates crossing: main effects and interactions between two terms: `a * b` indicates main effects of `a` and `b` as well as their interaction.
14+
- Usual algebraic rules apply (associativity and distributivity):
15+
- `(a + b) * c` is equivalent to `a * c + b * c`
16+
- `a * b * c` corresponds to main effects of `a`, `b`, and `c`, as well as all three two-way interactions and the three-way interaction.
17+
- Categorical terms are expanded into the associated indicators/contrast variables. See the [StatsModels.jl documentation on contrasts](https://juliastats.org/StatsModels.jl/stable/contrasts/) for more information.
18+
- Interactions are expressed with the ampersand (`&`). (This is contrast to R, which uses the colon `:` for this operation.). `a&b` is the interaction of `a` and `b`. For categorical terms, appropriate combinations of indicators/contrast variables are generated.
19+
- Tilde (`~`) is used to separate response from predictors.
20+
- The intercept is indicated by `1`.
21+
- `y ~ 1 + (a + b) * c` is read as:
22+
- The response variable is `y`.
23+
- The model contains an intercept.
24+
- The model contains main effects of `a`, `b`, and `c`.
25+
- The model contains interactions between `a` and `c` and between `b` and `c` but not `a` and `b`.
26+
- An intercept is included by default, i.e. there is an implicit `1 + ` in every formula. The intercept may be suppressed by including a `0 + ` in the formula. (In contrast to R, the use of `-1` is **not** supported.)
27+
28+
### MixedModels.jl provided extensions
29+
30+
- The pipe operator (`|`) indicates grouping or blocking.
31+
- `(1 + a | subject)` indicates "by-subject random effects for the intercept and main effect `a`".
32+
- This is in line with the usual statistical reading of `|` as "conditional on".
33+
34+
### RegressionFormulae.jl provided extensions
35+
36+
- "Exponentiation" (`^`) works like repeated multiplication and generates all multiplicative and additive terms up to the given order.
37+
- `(a + b + c)^2` generates `a + b + c + a&b + a&c + b&c`, but not `a&b&c`.
38+
- The presence of interaction terms within the base will result in redundant terms and is currently unsupported.
39+
- `fulldummy(a)` assigns "contrasts" to `a` that include all indicator columns (dummy variables) and an intercept column. The resulting overparameterization is generally useful in the fixed effects only as part of nesting.
40+
- The slash operator (`/`) indicates nesting:
41+
- `a / b` is read as "`b` is nested within `a`".
42+
- `a / b` expands to `a + fulldummy(a) & b`.
43+
- It is generally not necessary to specify nesting in the blocking variables, when the inner levels are unique across outer levels. In other words, in a study with children (`C1`, `C2`, etc. ) nested within schools (`S1`, `S2`, etc.),
44+
- it is not necessary to specify the nesting when `C1` identifies a unique child across schools. In other words, intercept-only random effects terms can be written as `(1|C) + `(1|S)`.
45+
- it is necessary to specify the nesting when chid identifiers are re-used across schools, e.g. `C1` refers to a child in `S1` and a different child in `S2`. In this case, the nested syntax `(1|S/C)` expands to `(1|S) + (1|S&C)`. The interaction term in the second blocking variable generates unique labels for each child across schools.
46+
47+
48+
49+
## Mixed models in Wilkinson-Rogers and mathematical notation
50+
51+
Models fit with MixedModels.jl are generally linear mixed-effects models with unconstrained random effects covariance matrices and homoskedastic, normally distributed residuals.
52+
Under these assumptions, the model specification
53+
54+
`response ~ 1 + (age + sex) * education * n_children + (1 | subject)`
55+
56+
corresponds to the statistical model
57+
58+
```math
59+
\begin{align*}
60+
\left(Y |\mathcal{B}=b\right) &\sim N\left(X\beta + Zb, \sigma^2 I \right) \\
61+
\mathcal{B} &\sim N\left(0, G\right)
62+
\end{align*}
63+
```
64+
65+
for which we wish to obtain the maximum-likelihood estimates for ``G`` and thus the fixed-effects ``\beta``.
66+
67+
- The model contains no restrictions on ``G``, except that it is positive semidefinite.
68+
- The response ``Y`` is the value of a given response.
69+
- The fixed-effects design matrix ``X`` consists of columns for
70+
- the intercept, age, sex, education, and number of children (contrast coded as appropriate)
71+
- the interaction of all lower order terms, excluding interactions between age and sex
72+
- The random-effects design matrix ``Z`` includes a column for
73+
- the intercept for each subject

src/MixedModels.jl

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,10 @@ using MixedModelsDatasets: dataset
2323
using PooledArrays: PooledArrays, PooledArray
2424
using NLopt: NLopt
2525
using PrecompileTools: PrecompileTools, @setup_workload, @compile_workload
26+
using Printf: @sprintf
2627
using ProgressMeter: ProgressMeter, Progress, finish!, next!
2728
using Random: Random, AbstractRNG, randn!
29+
using RegressionFormulae: fulldummy
2830
using SparseArrays: SparseArrays, SparseMatrixCSC, SparseVector, dropzeros!, nnz
2931
using SparseArrays: nonzeros, nzrange, rowvals, sparse
3032
using StaticArrays: StaticArrays, SVector
@@ -36,12 +38,12 @@ using StatsAPI:
3638
loglikelihood, meanresponse, modelmatrix, nobs, pvalue, predict, r2, residuals
3739
using StatsAPI: response, responsename, stderror, vcov, weights
3840
using StatsBase: StatsBase, CoefTable, model_response, summarystats
39-
using StatsFuns: log2π, normccdf
41+
using StatsFuns: chisqccdf, log2π, normccdf
4042
using StatsModels: StatsModels, AbstractContrasts, AbstractTerm, CategoricalTerm
4143
using StatsModels: ConstantTerm, DummyCoding, EffectsCoding, FormulaTerm, FunctionTerm
4244
using StatsModels: HelmertCoding, HypothesisCoding, InteractionTerm, InterceptTerm
4345
using StatsModels: MatrixTerm, SeqDiffCoding, TableRegressionModel
44-
using StatsModels: apply_schema, drop_term, formula, lrtest, modelcols, @formula
46+
using StatsModels: apply_schema, drop_term, formula, lrtest, modelcols, isnested, @formula
4547
using StructTypes: StructTypes
4648
using Tables: Tables, columntable
4749
using TypedTables: TypedTables, DictTable, FlexTable, Table

0 commit comments

Comments
 (0)