Skip to content

Simple SEM with Multiple Imputations

Mattan S. Ben-Shachar edited this page Jul 27, 2021 · 1 revision

The data:

# Make some fake data with missing values
data_with_missing_values <- mice::ampute(mtcars, prop = 0.5)$amp

Impute with mice

data_mice <- mice::mice(data_with_missing_values)
#>  iter imp variable
#>   1   1  mpg  cyl  disp  hp  qsec  vs  am  gear  carb
#>   ...

# Get all imputed data sets (in a list)
imputed_data_frames <- lapply(1:data_mice$m,
                              mice::complete, 
                              data = data_mice)

Fit with semTools + lavaan

# The model
m <- "
cyl ~ b*mpg + c*am
mpg ~ a*am

indirect := a*b
   total := indirect + c
"

fit.l <- semTools::cfa.mi(m, data = imputed_data_frames,
                          ordered = "cyl")
summary(fit.l, standardize = TRUE)
#> lavaan.mi object based on 5 imputed data sets. 
#> See class?lavaan.mi help page for available methods. 
#> 
#> Convergence information:
#> The model converged on 5 imputed data sets 
#> 
#> Rubin's (1987) rules were used to pool point and SE estimates across 5 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                           Robust.sem
#>   Information                                 Expected
#>   Information saturated (h1) model        Unstructured
#> 
#> Regressions:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>   cyl ~                                                                          
#>     mpg        (b)   -0.211    0.031   -6.826      Inf    0.000   -0.211   -0.999
#>     am         (c)    0.080    0.200    0.397      Inf    0.691    0.080    0.033
#>   mpg ~                                                                          
#>     am         (a)    6.643    1.688    3.935      Inf    0.000    6.643    0.585
#> 
#> Intercepts:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>    .cyl               0.000                                        0.000    0.000
#>    .mpg              17.011    1.269   13.406      Inf    0.000   17.011    3.002
#> 
#> Thresholds:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>     cyl|t1           -4.615    0.662   -6.974      Inf    0.000   -4.615   -3.850
#>     cyl|t2           -3.924    0.575   -6.829 9692.654    0.000   -3.924   -3.274
#> 
#> Variances:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>    .cyl               0.057                                        0.057    0.040
#>    .mpg              21.109    6.377    3.311      Inf    0.001   21.109    0.658
#> 
#> Scales y*:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>     cyl               1.000                                        1.000    1.000
#> 
#> Defined Parameters:
#>                    Estimate  Std.Err  t-value       df  P(>|t|)   Std.lv  Std.all
#>     indirect         -1.404    0.435   -3.228      Inf    0.001   -1.404   -0.585
#>     total            -1.324    0.451   -2.935      Inf    0.003   -1.325   -0.551

Created on 2021-07-27 by the reprex package (v2.0.0)

Clone this wiki locally