library(MIIVefa)
#> This is version 0.1.2 of MIIVefa.
#> MIIVefa is BETA software! Please report any bugs.
library(mnormt)
MIIVefa is data-driven algorithm for Exploratory Factor Analysis (EFA) that uses Model Implied Instrumental Variables (MIIVs). The method starts with a one factor model and arrives at a suggested model with enhanced interpretability that allows cross-loadings and correlated errors.
1, Prepare your data.
The input dataframe should be in a wide format: columns being different observations and rows being the specific data entries.
Column names should be clearly labeled.
2, Installing MIIVefa.
In the R console, enter and execute ‘install.packages(“MIIVefa”)’ or ‘devtools::install_github(“https://github.com/lluo0/MIIVefa”)’ after installing the “devtools” package.
Load the MIIVefa by executing ‘library(MIIVefa)’ after installing.
3, Running miivefa.
The only necessarily required input is the raw data matrix.
All 4 arguments are shown below.
‘sigLevel’ is the significance level with a default of 0.05. ‘scalingCrit’ is the specified criterion for selecting the scaling indicator whenever a new latent factor is created and the default is ‘sargan+factorloading_R2.’ And ‘CorrelatedErrors’ is a vector containing correlated error relations between observed variables with a default of NULL.
The output of a miivefa object contains 2 parts:
1, a suggested model, of which the syntax is identical to a ‘lavaan’ model. Accessible via output$model.
2, a miivsem model fit of the suggested model. The suggested model is run and evaluated using ‘MIIvsem’ and all miivsem attributes can be accessed. Accessible via output$fit.
We now demonstrate the usage and performance of MIIVefa across different conditions using three simulations and three empirical examples.
We first demonstrate the ability of MIIVefa to recover the true DGM even for complicated data structures. The simulation example consists of a total of 20 observed variables on 4 latent factors. While each factor contains 5 primary variables that load on each, there are two variables (x15 and x20) that also crossload on two factors. The code to generate this simulation example is:
seed <- 1235
#generate latent factor values
eta <- rmnorm(n=500,
mean = c(0,0,0,0),
varcov = matrix(c(1,.5, .5, .5,
.5,1, .5, .5,
.5, .5,1, .5,
.5, .5, .5,1), nrow = 4))
#generate errors
seed <- 1235
e <- rmnorm(n=500,
varcov = diag(.25, nrow = 20))
lambda <- cbind(c(1, .8, .75, .7, .65, rep(0,9), .4, rep(0,5)),
c(rep(0,5), 1, .8, .75, .7, .65, rep(0,9), .3),
c(rep(0,10), 1, .8, .75, .7, .65, rep(0,5)),
c(rep(0,15), 1, .8, .75, .7, .65))
rep <- 500
#obtain observed variable values
sim1 <- eta %*% t(lambda) + e
#create column names
colnames(sim1) <- paste0("x", 1:ncol(sim1))
#make it a data frame
sim1 <- as.data.frame(sim1)
And we run MIIVefa opting for a significance level of .01.
miivefa(sim1, .01)
#> $model
#> [1] "f1=~x1+x2+x3+x4+x5+x15\nf2=~x6+x7+x8+x9+x10+x20\nf3=~x11+x12+x13+x14+x15\nf4=~x16+x17+x18+x19+x20"
#>
#> $fit
#> MIIVsem (0.5.8) results
#>
#> Number of observations 500
#> Number of equations 16
#> Estimator MIIV-2SLS
#> Standard Errors standard
#> Missing listwise
#>
#>
#> Parameter Estimates:
#>
#>
#> STRUCTURAL COEFFICIENTS:
#> Estimate Std.Err z-value P(>|z|) Sargan df P(Chi)
#> f1 =~
#> x1 1.000
#> x2 0.799 0.030 26.952 0.000 18.497 17 0.358
#> x3 0.725 0.030 24.333 0.000 23.849 17 0.124
#> x4 0.687 0.030 22.914 0.000 22.207 17 0.177
#> x5 0.609 0.027 22.483 0.000 13.382 17 0.710
#> x15 0.405 0.034 11.830 0.000 9.969 15 0.822
#> f2 =~
#> x6 1.000
#> x7 0.826 0.033 25.259 0.000 12.179 17 0.789
#> x8 0.752 0.032 23.487 0.000 19.961 17 0.276
#> x9 0.725 0.031 23.159 0.000 14.616 17 0.623
#> x10 0.671 0.030 22.232 0.000 17.150 17 0.444
#> x20 0.310 0.035 8.842 0.000 20.938 15 0.139
#> f3 =~
#> x11 1.000
#> x15 0.655 0.034 19.063 0.000 9.969 15 0.822
#> x12 0.803 0.032 25.356 0.000 15.981 17 0.525
#> x13 0.737 0.028 25.955 0.000 19.933 17 0.278
#> x14 0.684 0.028 24.344 0.000 18.303 17 0.370
#> f4 =~
#> x16 1.000
#> x20 0.635 0.032 19.549 0.000 20.938 15 0.139
#> x17 0.800 0.030 26.949 0.000 12.985 17 0.737
#> x18 0.690 0.028 24.497 0.000 15.218 17 0.580
#> x19 0.660 0.026 24.912 0.000 17.005 17 0.454
#>
#> INTERCEPTS:
#> Estimate Std.Err z-value P(>|z|)
#> x1 0.000
#> x10 -0.002 0.027 -0.059 0.953
#> x11 0.000
#> x12 0.031 0.030 1.034 0.301
#> x13 0.019 0.026 0.717 0.473
#> x14 0.009 0.026 0.357 0.721
#> x15 0.047 0.027 1.751 0.080
#> x16 0.000
#> x17 0.043 0.029 1.478 0.139
#> x18 0.005 0.028 0.198 0.843
#> x19 0.039 0.026 1.510 0.131
#> x2 0.018 0.027 0.651 0.515
#> x20 -0.030 0.028 -1.061 0.289
#> x3 0.032 0.028 1.139 0.255
#> x4 0.041 0.028 1.457 0.145
#> x5 0.025 0.026 0.998 0.318
#> x6 0.000
#> x7 -0.037 0.029 -1.243 0.214
#> x8 -0.023 0.029 -0.808 0.419
#> x9 0.019 0.028 0.678 0.498
#>
#> VARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 0.997
#> f2 0.927
#> f3 1.005
#> f4 1.097
#> x1 0.257
#> x10 0.245
#> x11 0.219
#> x12 0.289
#> x13 0.241
#> x14 0.241
#> x15 0.230
#> x16 0.253
#> x17 0.256
#> x18 0.260
#> x19 0.227
#> x2 0.221
#> x20 0.251
#> x3 0.251
#> x4 0.261
#> x5 0.224
#> x6 0.269
#> x7 0.235
#> x8 0.282
#> x9 0.258
#>
#> COVARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 ~~
#> f2 0.436
#> f3 0.497
#> f4 0.411
#> f2 ~~
#> f3 0.453
#> f4 0.454
#> f3 ~~
#> f4 0.426
#>
#> attr(,"class")
#> [1] "miivefa" "list"
We can see that MIIVefa was able to recover the exact data structure in the DGM, including the two crossloading variables x15 and x20. One of the main differences between the output and that from most other EFA approaches is the simplicity of the final model. Because only relevant factor loadings are included in the final model, it does not require additional subjective factor loading trimming for interpretation.
The first empirical example comes from Holzinger and Swineford (1937), which contains mental ability test scores of seventh- and eighth- grade children from two schools. The original dataset has a total of 26 test scores but a subset of 9 test scores is more widely known in the literature and it is usually used to demonstrate the usage and performance of CFA. We used this 9 test score subset available in lavaan (Rosseel, 2012).
library(lavaan)
#> This is lavaan 0.6-15
#> lavaan is FREE software! Please report any bugs.
holzingerdata <- lavaan::HolzingerSwineford1939[,7:15]
head(holzingerdata)
#> x1 x2 x3 x4 x5 x6 x7 x8 x9
#> 1 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
#> 2 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
#> 3 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
#> 4 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
#> 5 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
#> 6 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000
The 9 test scores in this empirical example (N=301) are scores for tests on: visual perception (X1), cubes (X2), lozenges (X3), paragraph comprehension (X4), sentence completion (X5), word meaning (X6), speeded addition (X7), speeded counting of dots (X8), and speeded discrimination on straight and curved capitals (X9). The widely used CFA model for it is a hypothesized 3 factor model of mental abilities on visual, textual, and speed tasks. MIIVefa recovered an almost identical model to the hypothesized 3 factor model, with the only difference being it also recovered one crossloading: speeded discrimination on straight and curved capitals on both the speed the visual factor. The code to run MIIVefa and the output containing both the recovered model and parameter estimates are as follow:
miivefa(holzingerdata, sigLevel = .01, 'sargan+factorloading_R2')
#> $model
#> [1] "f1=~x6+x4+x5\nf2=~x1+x2+x3+x9\nf3=~x8+x9+x7"
#>
#> $fit
#> MIIVsem (0.5.8) results
#>
#> Number of observations 301
#> Number of equations 6
#> Estimator MIIV-2SLS
#> Standard Errors standard
#> Missing listwise
#>
#>
#> Parameter Estimates:
#>
#>
#> STRUCTURAL COEFFICIENTS:
#> Estimate Std.Err z-value P(>|z|) Sargan df P(Chi)
#> f1 =~
#> x6 1.000
#> x4 1.079 0.064 16.909 0.000 6.251 6 0.396
#> x5 1.197 0.071 16.768 0.000 12.279 6 0.056
#> f2 =~
#> x1 1.000
#> x2 0.632 0.099 6.378 0.000 7.528 6 0.275
#> x3 0.727 0.097 7.491 0.000 15.556 6 0.016
#> x9 0.368 0.088 4.162 0.000 2.207 4 0.698
#> f3 =~
#> x8 1.000
#> x9 0.665 0.107 6.212 0.000 2.207 4 0.698
#> x7 0.767 0.122 6.267 0.000 21.817 6 0.001
#>
#> INTERCEPTS:
#> Estimate Std.Err z-value P(>|z|)
#> x1 0.000
#> x2 2.970 0.494 6.016 0.000
#> x3 -1.337 0.483 -2.768 0.006
#> x4 0.702 0.149 4.715 0.000
#> x5 1.724 0.166 10.398 0.000
#> x6 0.000
#> x7 -0.054 0.679 -0.079 0.937
#> x8 0.000
#> x9 -0.115 0.583 -0.198 0.843
#>
#> VARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 0.843
#> f2 0.792
#> f3 0.630
#> x1 0.567
#> x2 1.106
#> x3 0.839
#> x4 0.372
#> x5 0.446
#> x6 0.356
#> x7 0.768
#> x8 0.392
#> x9 0.547
#>
#> COVARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 ~~
#> f2 0.371
#> f3 0.156
#> f2 ~~
#> f3 0.217
#>
#> attr(,"class")
#> [1] "miivefa" "list"
MIIVefa recovered the general three factor structure. The crossloading of the speeded discrimination on straight and curved capitals task for both speed and visual ability was an interesting but not surprising recovery, as it is indeed a test that involves both mental abilities of differentiating between distinct types of capitals visually and being able to do so quickly.
The second empirical example comes from Bergh et al. (2016) and it consists items measuring openness, agreeableness, and prejudice against minorities. The dataset is available in the R package MPsychoR (Mair, 2020) and we used a subset of it excluding demographics.
The 10 items in this example (N=861) are 3 agreeableness indicators (A1-A3), 3 openness indicators (O1-O3), and 4 prejudice indicators: ethnic prejudice (EP), sexism (SP), sexual prejudice against gays and lesbians (HP), and prejudice against people with mentally disabilities (DP). The four prejudice items were hypothesized as indicators to the generalized prejudice (GP) factor (Bergh et al., 2016). This dataset was also used in Mair (2018, p.49-52) to demonstrate multilevel CFA on the generalized prejudice factor. Here we included all 10 variables to run MIIVefa. The code and output are as follows:
library(MPsychoR)
data("Bergh")
miivefa(Bergh[,1:10], .01, 'sargan+factorloading_R2')
#> $model
#> [1] "f1=~O2+O1+O3+HP\nf2=~EP+SP+DP+HP+A2\nf3=~A3+A1+A2"
#>
#> $fit
#> MIIVsem (0.5.8) results
#>
#> Number of observations 861
#> Number of equations 7
#> Estimator MIIV-2SLS
#> Standard Errors standard
#> Missing listwise
#>
#>
#> Parameter Estimates:
#>
#>
#> STRUCTURAL COEFFICIENTS:
#> Estimate Std.Err z-value P(>|z|) Sargan df P(Chi)
#> f1 =~
#> O2 1.000
#> O1 1.063 0.040 26.280 0.000 15.887 7 0.026
#> O3 1.217 0.043 28.139 0.000 7.207 7 0.408
#> HP -0.753 0.208 -3.613 0.000 12.292 5 0.031
#> f2 =~
#> EP 1.000
#> HP 0.634 0.158 4.016 0.000 12.292 5 0.031
#> SP 0.879 0.051 17.394 0.000 17.013 7 0.017
#> DP 0.753 0.040 18.924 0.000 17.801 7 0.013
#> A2 -0.189 0.027 -6.905 0.000 2.129 5 0.831
#> f3 =~
#> A3 1.000
#> A2 0.772 0.033 23.704 0.000 2.129 5 0.831
#> A1 0.958 0.030 31.768 0.000 16.801 7 0.019
#>
#> INTERCEPTS:
#> Estimate Std.Err z-value P(>|z|)
#> A1 -0.091 0.111 -0.817 0.414
#> A2 1.336 0.156 8.567 0.000
#> A3 0.000
#> DP 0.560 0.081 6.908 0.000
#> EP 0.000
#> HP 2.583 0.979 2.637 0.008
#> O1 -0.159 0.142 -1.122 0.262
#> O2 0.000
#> O3 -0.644 0.152 -4.243 0.000
#> SP 0.366 0.103 3.559 0.000
#>
#> VARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> A1 0.071
#> A2 0.067
#> A3 0.035
#> DP 0.125
#> EP 0.224
#> HP 2.108
#> O1 0.075
#> O2 0.078
#> O3 0.051
#> SP 0.248
#> f1 0.142
#> f2 0.283
#> f3 0.196
#>
#> COVARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 ~~
#> f2 -0.112
#> f3 0.043
#> f2 ~~
#> f3 -0.104
#>
#> attr(,"class")
#> [1] "miivefa" "list"
MIIVefa recovered a 3 factor model that generally separated openness, agreeable, and generalized prejudice. It also recovered two crossloadings. The first one is HP being crossloaded negatively on the openness factor. The other prejudice indicators, however, loaded solely on the GP factor. This crossloading detection suggested that although being more open appeared to be associated with a lower level of prejudice against people with minority sexual orientations, openness was not necessarily related to a lower level of other types prejudice. This could have meaningful empirical implications such as that the other types of prejudice are likely complicated by other factors. The second crossloading is the second agreeableness indicator being crossloaded negatively on the GP factor. This indicated that this agreeableness indicators is closely related to the GP factor. However, the specific questions/items for the agreeableness and openness indicators were not provided.
The third empirical example is taken from Sidanius and Pratto (2001) which measures Social Dominance Orientation (SDO) across five years. The dataset is again available in the R package MPsychoR (Mair, 2020) and we used a subset of it of items from only two years. The subset (N =612) contains a total of 8 variables, which are consisted of 4 items taken from year 1996 and 2000: ‘It’s probably a good thing that certain groups are at the top and other groups are at the bottom’ (I1), ‘Inferior groups should stay in their place’ (I2), ‘We should do what we can to equalize conditions for different groups (reversed)’ (I3), and ‘Increased social equality is beneficial to society (reversed)’ (I4). The code and recovered model are as follows when we first omit the correlations between the same items taken at different years:
data("SDOwave")
sdowavedata <- SDOwave[,
c(paste0(c('I1.','I2.','I3.','I4.'), 1996),
paste0(c('I1.','I2.','I3.','I4.'), 2000))]
miivefa(sdowavedata, .01)
#> $model
#> [1] "f1=~I4.2000+I3.2000\nf2=~I4.1996\nf3=~I2.1996\nf4=~I1.2000\nf5=~I1.1996\nf6=~I3.1996\nf7=~I2.2000"
#>
#> $fit
#> MIIVsem (0.5.8) results
#>
#> Number of observations 612
#> Number of equations 1
#> Estimator MIIV-2SLS
#> Standard Errors standard
#> Missing listwise
#>
#>
#> Parameter Estimates:
#>
#>
#> STRUCTURAL COEFFICIENTS:
#> Estimate Std.Err z-value P(>|z|) Sargan df P(Chi)
#> f1 =~
#> I4.2000 1.000
#> I3.2000 1.066 0.064 16.578 0.000 10.749 5 0.057
#>
#> INTERCEPTS:
#> Estimate Std.Err z-value P(>|z|)
#> I3.2000 0.204 0.146 1.401 0.161
#> I4.2000 0.000
#>
#> VARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> I1.1996 0.977
#> I1.2000 0.897
#> I2.1996 0.834
#> I2.2000 0.555
#> I3.1996 1.071
#> I3.2000 0.632
#> I4.1996 0.963
#> I4.2000 0.461
#> f1 1.212
#> f2 0.888
#> f3 0.501
#> f4 0.690
#> f5 0.930
#> f6 1.186
#> f7 0.218
#>
#> COVARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> f1 ~~
#> f2 0.694
#> f3 0.270
#> f4 0.479
#> f5 0.284
#> f6 0.824
#> f7 0.409
#> f2 ~~
#> f3 0.517
#> f4 0.362
#> f5 0.653
#> f6 1.647
#> f7 0.257
#> f3 ~~
#> f4 0.431
#> f5 0.971
#> f6 0.506
#> f7 0.321
#> f4 ~~
#> f5 0.564
#> f6 0.299
#> f7 0.601
#> f5 ~~
#> f6 0.537
#> f7 0.260
#> f6 ~~
#> f7 0.237
#>
#> attr(,"class")
#> [1] "miivefa" "list"
We can see that the final recovered 7 factor model was hard to interpret, similar to what we see in Simulation 2 where most variables were placed in their own latent factors. However, after we included the four pairs of correlated errors, the recovered model was very different. The code and MIIVefa output are as follows:
miivefa(sdowavedata, .01,
correlatedErrors =
'I1.1996~~I1.2000
I2.1996~~I2.2000
I3.1996~~I3.2000
I4.1996~~I4.2000')
#> $model
#> [1] "f1=~I4.1996+I3.1996\nf2=~I4.2000+I3.2000\nf3=~I2.1996+I1.1996\nf4=~I1.2000+I2.2000\nI1.1996~~I1.2000\n I2.1996~~I2.2000\n I3.1996~~I3.2000\n I4.1996~~I4.2000"
#>
#> $fit
#> MIIVsem (0.5.8) results
#>
#> Number of observations 612
#> Number of equations 4
#> Estimator MIIV-2SLS
#> Standard Errors standard
#> Missing listwise
#>
#>
#> Parameter Estimates:
#>
#>
#> STRUCTURAL COEFFICIENTS:
#> Estimate Std.Err z-value P(>|z|) Sargan df P(Chi)
#> f1 =~
#> I4.1996 1.000
#> I3.1996 0.886 0.067 13.235 0.000 5.389 3 0.145
#> f2 =~
#> I4.2000 1.000
#> I3.2000 1.054 0.087 12.082 0.000 7.354 3 0.061
#> f3 =~
#> I2.1996 1.000
#> I1.1996 1.199 0.127 9.472 0.000 6.404 3 0.094
#> f4 =~
#> I1.2000 1.000
#> I2.2000 0.786 0.085 9.197 0.000 6.681 3 0.083
#>
#> INTERCEPTS:
#> Estimate Std.Err z-value P(>|z|)
#> I1.1996 0.036 0.218 0.166 0.868
#> I1.2000 0.000
#> I2.1996 0.000
#> I2.2000 -0.086 0.175 -0.493 0.622
#> I3.1996 0.580 0.167 3.465 0.001
#> I3.2000 0.229 0.194 1.183 0.237
#> I4.1996 0.000
#> I4.2000 0.000
#>
#> VARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> I1.1996 0.745
#> I1.2000 0.821
#> I2.1996 0.525
#> I2.2000 0.302
#> I3.1996 0.755
#> I3.2000 0.705
#> I4.1996 0.026
#> I4.2000 0.417
#> f1 1.855
#> f2 1.202
#> f3 0.806
#> f4 0.764
#>
#> COVARIANCES:
#> Estimate Std.Err z-value P(>|z|)
#> I1.1996 ~~
#> I1.2000 0.136
#> I2.1996 ~~
#> I2.2000 0.056
#> I3.1996 ~~
#> I3.2000 0.133
#> I4.1996 ~~
#> I4.2000 -0.173
#> f1 ~~
#> f2 0.798
#> f3 0.522
#> f4 0.338
#> f2 ~~
#> f3 0.251
#> f4 0.507
#> f3 ~~
#> f4 0.339
#>
#> attr(,"class")
#> [1] "miivefa" "list"
Now the recovered model became a four factor model that not only separated the same items from different years into different factors, but also separated I1, I2 and I3, I4. This is an interesting recovery, because I3 and I4 are actually reversed coded items. This could potentially imply that not wanting to actively get involved to increase social equality does NOT necessarily equal to wanting to assert social dominance.
Bergh, R., Akrami, N., Sidanius, J., & Sibley, C. G. (2016). Is group membership necessary for understanding generalized prejudice? a re-evaluation of why prejudices are interrelated. Journal of personality and social psychology, 111 (3), 367.
Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2 (1), 41–54.
Mair, P. (2018). Modern psychometrics with r. Springer.
Mair, P. (2020). Mpsychor: Modern psychometrics with r [R package version 0.10-8]. https://cran.r-project.org/package=MPsychoR
Rosseel, Y. (2012). Lavaan: An r package for structural equation modeling. Journal of statistical software, 48, 1–36.
Sidanius, J., & Pratto, F. (2001). Social dominance: An intergroup theory of social hierarchy and oppression. Cambridge University Press