powerNLSEM

Model-implied simulation-based power estimation (MSPE) for nonlinear Structural Equation Modeling (NLSEM)

This is an R package to use the model-impled simulation-based power estimation (MSPE) method to find the minimum required sample size for a given power within a nonlinear Structural Equation Model (NLSEM) for several parameters of interest (POI). The package was created as a supplement to the publication Irmer et al. (2024b) and its theory is based on Irmer et al. (2024a). Here, a probit regression model with \(\sqrt{n}\) as a predictor is fit to significance decisions for single parameters (using the \(z\)-test) within simulated data to describe the relationship between power and sample size \(n\) (Irmer et al., 2024b).

Install the latest working version from Github

This requires the package devtools.

install.packages("devtools")
devtools::install_github("jpirmer/powerNLSEM", build_vignettes = TRUE)

Use build_vignettes = TRUE to be able to see the documentation linked in “Getting Started”.

Load the package:

library(powerNLSEM)
#> Loading required package: ggplot2
#> Package 'powerNLSEM' version 0.1.2
#> Type 'citation("powerNLSEM")' for citing this R package in publications.

Write Model

The powerNLSEM packages uses lavaan syntax (Rosseel, 2012) to describe the model including population values:

model <- "
# measurement models
X =~ 1*x1 + 0.8*x2 + 0.7*x3
Y =~ 1*y1 + 0.85*y2 + 0.78*y3
Z =~ 1*z1 + 0.9*z2 + 0.6*z3

# structural models
Y ~ 0.3*X + .2*Z +  .2*X:Z

# residual variances
Y~~.7975*Y
X~~1*X
Z~~1*Z

# covariances
X~~0.5*Z

# measurement error variances
x1~~.1*x1
x2~~.2*x2
x3~~.3*x3
z1~~.2*z1
z2~~.3*z2
z3~~.4*z3
y1~~.5*y1
y2~~.4*y2
y3~~.3*y3
"

All parameters in the model are given by the user, otherwise lavaan’s defaults are used. These are 1 for variances, 0.5 for covariance and 0 for all other coefficients. Hence, not stating a coefficient will result in zero-effects for which the power is just the level of significance (\(\alpha\) or the type-I error).

Interactions among latent variables have not yet been included into lavaan (version 0.6.13), which is why this is handeled by the powerNLSEM package by translating the syntax into syntax for which nonlinear models can be estimated. For now these are LMS (latent moderated structural equations, Klein & Moosbrugger, 2000), which needs an installation of Mplus (Muthén & Muthén, 1998-2017), UPI (unconstrained product indicator approach, Marsh et al., 2004, Kelava & Brandt, 2009), which further makes use of the semTools package (Jorgensen et al., 2022) to compute the product indicators in a matched or unmatched way (including different ways of centering the indicators), a factor score approach using the SL-method (named after Skrondal & Laake, 2001, as studied in Ng & Chang, 2020) and a scale mean regression based path analysis where the latent variables are collapsed to means of the indicators per latent variables and path analysis is used to fit the NLSEM.

#> # structural models
#> Y ~ 0.3*X + .2*Z +  .2*X:Z
#> 
#> # residual variances
#> Y~~.7975*Y

States the structural model of the NLSEM as \(Y=.3X + .2Z + .2XZ + \varepsilon_Y\), where \(\varepsilon_Y\sim\mathcal{N}(0,.7975)\), i.e., the variance \(\mathbb{V}ar[\varepsilon_Y]=.7975\). We are interested how large the sample size needs to be for a power of 80% for the three regression coefficients. We will use the adaptive search algorithm to find the necessary sample size.

Plots

The powerNLSEM package offers several plots, which visualize the power:

plot(Result_Power)

plots the model implied power for the POI vs. sample size N. The vertical line indicates the necessary sample size found be the adaptive search algorithm. The horizontal line indicates the desired power level.

plot(Result_Power, se = TRUE)

Within this plot the standard errors of the power_modeling_method are included into the plot.

plot(Result_Power, se = TRUE, plot = "empirical")

plots the empirical power per sample size and fits a LOESS fit to the resulting data. All plots indicate that the linear effect of Z has the smallest power.

Find other sample sizes from a fitted model

One can also find other sample sizes for power values other than that the process has been optimized for by using the reanalyze.powerNlSEM function.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9, .95))
#> $Npower
#> [1] 166 191 227 328 601 906
#> 
#> $power
#> [1] 0.50 0.60 0.70 0.80 0.90 0.95
#> 
#> $beta
#> [1] 0.50 0.40 0.30 0.20 0.10 0.05
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 0.05
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"

These new values can also be plotted into the plot

plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9, .95))

were we see that some of the power values actually fall out of the support for which sample sizes had been drawn indicating that these values might be less precise.

Further, if we want more precision in the power modeling process we can alter alpha_power_modeling to a lower value.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = .001)
#> $Npower
#> [1]  182  207  300 -Inf -Inf
#> 
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#> 
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 0.001
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"

plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = .001)

If we wish to not use confidence bands in the power modeling process we can use alpha_power_modeling = 1.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = 1)
#> $Npower
#> [1] 127 158 196 246 323
#> 
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#> 
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 1
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"

plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = 1)

If we choose alpha_power_modeling = 1 within the adaptive search algorithm using powerNLSEM, then the sample sizes get optimized for that value. However, this is not adviced since in approx. half of the replications (retrials of the adaptive algorithm or brute algorithm) the sample size will actually be smaller than that resulting in the desired power rate.

Checking Results

The required sample size can be double checked by running a simulation using just this sample size. This is presented for a linear SEM next, as other functions exist which can independently check the results. First we need to formulate the population model and the analysis model. We use a simplified version of the model used above.

populationModel <- "
# measurement models
X =~ 1*x1 + 0.8*x2 + 0.7*x3
Y =~ 1*y1 + 0.85*y2 + 0.78*y3

# structural models
Y ~ 0.3*X 

# residual variances
Y~~.91*Y
X~~1*X

# measurement error variances
x1~~.1*x1
x2~~.2*x2
x3~~.3*x3
y1~~.5*y1
y2~~.4*y2
y3~~.3*y3
"

Now we fit the MSPE to estimate power for the effect "Y~X" for this linear SEM. We use method = "UPI" as the product indicator approach without any nonlinear effects simplifies to standard SEM.

Simple <- powerNLSEM(model = populationModel, POI = c("Y~X"), method = "UPI",
                     search_method = "adaptive", steps = 2, 
                     power_modeling_method = "probit",
                     R = 200, power_aim = .8, alpha = .05, 
                     seed = 2024, CORES = 1)
#> Initiating smart search to find simulation based N for power of 0.8 within 2 steps
#> and in total 200 replications. Ns are drawn randomly...
#> Step 1 of 2. Fitting 67 models with Ns in [75, 225].
#> Step 2 of 2. Fitting 133 models with Ns in [75, 112].

The required sample size is

Simple$N
#> [1] 99

Now, we simulate data for this particular sample size and compute the resulting power rate.

VerifyRes <- powerNLSEM(model = populationModel, POI = c("Y~X"), method = "UPI",
                       search_method = "bruteforce", Ns = Simple$N,
                       R = 200, seed = 2024, CORES = 1)
#> Initiating brute force search to find simulation based N for power of 0.8 within 200 replications...
#> Fitting 200 models with Ns in [99, 99].

The argument $powersPerN gives the power rate per selected sample size which is what we need here:

summary(VerifyRes)$powersPerN
#>   Ns       Y~X
#> 1 99 0.8505155

This power rate is close to the desired power rate of \(.8\). In fact the confidence interval [0.8, 0.901] just excludes \(.8\). This is desired, as the computed required sample size is derived from the lower bound of the confidence band around the predicted power rate. Hence, with alpha_power_modeling = .05, we have that in 2.5% of cases the computed sample size will result in a power rate smaller than the desired power rate of .8. Hence, in most cases, the computed power rate will be larger. With increasing R this slight overestimation becomes smaller.

Literature

Irmer, J. P., Klein, A. G., & Schermelleh-Engel, K. (2024a). A General Model-Implied Simulation-Based Power Estimation Method for Correctly and Misspecfied Models: Applications to Nonlinear and Linear Structural Equation Models. Behavior Research Methods. https://doi.org/10.31219/osf.io/pe5bj

Irmer, J. P., Klein, A. G., & Schermelleh-Engel, K. (2024b). Estimating Power in Complex Nonlinear Structural Equation Modeling Including Moderation Effects: The powerNLSEM R-Package. Behavior Research Methods. https://doi.org/10.3758/s13428-024-02476-3

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., & Rosseel, Y. (2022). semTools: Useful tools for structural equation modeling. R package version 0.5-6. Retrieved from https://CRAN.R-project.org/package=semTools

Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent structural equation models using the extended unconstrained approach. Review of Psychology, 16(2), 123–131.

Klein, A. G., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65(4), 457–474. https://doi.org/10.1007/BF02296338

Marsh, H. W., Wen, Z. & Hau, K. T. (2004). Structural equation models of latent interactions: Evaluation of alternative estimation strategies and indicator construction. Psychological Methods, 9(3), 275–300. https://doi.org/10.1037/1082-989X.9.3.275

Muthén, L., & Muthén, B. (1998-2017). Mplus user’s guide (Eighth ed.). Los Angeles, CA: Muthén & Muthén.

Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02

Skrondal, A., & Laake, P. (2001). Regression among factor scores. Psychometrika, 66(4), 563-575. https://doi.org/10.1007/BF02296196