powerNLSEM

This is an R package to use the model-impled simulation-based power estimation (MSPE) method to find the minimum required sample size for a given power within a nonlinear Structural Equation Model (NLSEM) for several parameters of interest (POI). The package was created as a supplement to the publication Irmer et al. (2024b) and its theory is based on Irmer et al. (2024a). Here, a probit regression model with \(\sqrt{n}\) as a predictor is fit to significance decisions for single parameters (using the \(z\)-test) within simulated data to describe the relationship between power and sample size \(n\) (Irmer et al., 2024b).

Write Model

The powerNLSEM packages uses lavaan syntax (Rosseel, 2012) to describe the model including population values:

model <- "
# measurement models
X =~ 1*x1 + 0.8*x2 + 0.7*x3
Y =~ 1*y1 + 0.85*y2 + 0.78*y3
Z =~ 1*z1 + 0.9*z2 + 0.6*z3

# structural models
Y ~ 0.3*X + .2*Z +  .2*X:Z

# residual variances
Y~~.7975*Y
X~~1*X
Z~~1*Z

# covariances
X~~0.5*Z

# measurement error variances
x1~~.1*x1
x2~~.2*x2
x3~~.3*x3
z1~~.2*z1
z2~~.3*z2
z3~~.4*z3
y1~~.5*y1
y2~~.4*y2
y3~~.3*y3
"

All parameters in the model are given by the user, otherwise lavaan’s defaults are used. These are 1 for variances, 0.5 for covariance and 0 for all other coefficients. Hence, not stating a coefficient will result in zero-effects for which the power is just the level of significance (\(\alpha\) or the type-I error).

Interactions among latent variables have not yet been included into lavaan (version 0.6.13), which is why this is handeled by the powerNLSEM package by translating the syntax into syntax for which nonlinear models can be estimated. For now these are LMS (latent moderated structural equations, Klein & Moosbrugger, 2000), which needs an installation of Mplus (Muthén & Muthén, 1998-2017), UPI (unconstrained product indicator approach, Marsh et al., 2004, Kelava & Brandt, 2009), which further makes use of the semTools package (Jorgensen et al., 2022) to compute the product indicators in a matched or unmatched way (including different ways of centering the indicators), a factor score approach using the SL-method (named after Skrondal & Laake, 2001, as studied in Ng & Chang, 2020) and a scale mean regression based path analysis where the latent variables are collapsed to means of the indicators per latent variables and path analysis is used to fit the NLSEM.

#> # structural models
#> Y ~ 0.3*X + .2*Z +  .2*X:Z
#> 
#> # residual variances
#> Y~~.7975*Y

States the structural model of the NLSEM as \(Y=.3X + .2Z + .2XZ + \varepsilon_Y\), where \(\varepsilon_Y\sim\mathcal{N}(0,.7975)\), i.e., the variance \(\mathbb{V}ar[\varepsilon_Y]=.7975\). We are interested how large the sample size needs to be for a power of 80% for the three regression coefficients. We will use the adaptive search algorithm to find the necessary sample size.

Adaptive Search

After stating the model, we can use the powerNLSEM function to use the adaptive search algorithm to find the optimal sample size for our desired power for a given Type I error rate for our latent moderation model using the product indicator approach (UPI, with matched products). For computational reasons a very small number of replications is used and only two steps are utilized (for further information see also Irmer et al., 2024b):

Result_Power <- powerNLSEM(model = model, 
                           POI = c("Y~X", "Y~Z", "Y~X:Z"), 
                           method = "UPI",
                           search_method = "adaptive", 
                           steps = 2, # for computational reasons, better >= 10
                           power_modeling_method = "probit",
                           R = 200, # for computational reasons, better >= 2000
                           power_aim = .8, 
                           alpha = .05, 
                           alpha_power_modeling = .05,
                           CORES = 1, 
                           seed = 2024)
#> Initiating smart search to find simulation based N for power of 0.8 within 2 steps
#> and in total 200 replications. Ns are drawn randomly...
#> Step 1 of 2. Fitting 67 models with Ns in [140, 420].
#> Step 2 of 2. Fitting 133 models with Ns in [140, 270].

The argument model is given our previously stated model in lavaan-syntax, POI describe the Parameters Of Interest (here we are interested in the power of the linear effect of X and Z on Y and the interaction between X:Z on Y, namely all the structural effects within the model, therefore POI = c("Y~X", "Y~Z", "Y~X:Z")), the method is choosen to be "UPI", which indicates that the unconstrained product indicator approach should be used (Marsh et al., 2004, Kelava & Brandt, 2009), search_method is choosen to be "adaptive", an alternative would be "bruteforce" (see documentation for more details), 2 adaptive search steps are chosen with a power_modeling_method of "probit", which means the significance decisions per parameter are modeled via a probit regression model. R is the total number of replications fitted (here 200 is small, this number should be increased for higher precision, values much smaller may create unwanted behaviour of the search algorithm as the power-model might be to plane, a good suggestion is \(R\ge 2000\) and for precise results better \(R\ge 10^5\)). power_aim is the desired power level (here 0.8) for which the adaptive algorithm is optimized (and for which the \(N\) is found), alpha is the corresponding Type I error rate for the significance decision per replication (level of significance, here 0.05), alpha_power_modeling is the Type I error rate used within the power modeling process, i.e., the confidence band used to derive the lower bound of power that is then used to solve for \(\hat{N}\) which further enables that \(\hat{N}\) will ensure the desired power rate with Type I error alpha_power_modeling divided by 2, CORES are the number of computer cpu cores used to estimate the models (here 1 is chosen, this number should be increased to reduce runtime). As this is a random search algorithm, we need to set a seed for comparison: seed.

The output object Result_Power is a list with the following objects:

names(Result_Power)
#>  [1] "N"                     "N_trials"              "est"                  
#>  [4] "se"                    "Ns"                    "fitOK"                
#>  [7] "truth"                 "power"                 "beta"                 
#> [10] "alpha"                 "alpha_power_modeling"  "method"               
#> [13] "search_method"         "power_modeling_method" "test"                 
#> [16] "convergenceRate"       "Performance"           "AveragePerformance"   
#> [19] "seed"                  "model"                 "runtime"              
#> [22] "call"                  "args"

When we apply the summary function to the output of the powerNLSEM package, we get an overview of the most important information of the estimation with some visual highlights:

summary(Result_Power)
#> -----------------------------------------------------------------------------
#> Model-Implied Simulation-Based Power Estimation: powerNLSEM 0.1.2
#> 
#> Parameters of Interest (POI):
#> Y~X, Y~Z, Y~X:Z
#> 
#> True Values for POI:
#>   Y~X   Y~Z Y~X:Z 
#>   0.3   0.2   0.2 
#> 
#> 
#> Method:
#> UPI
#> 
#> Test:
#> onesided z-Test
#> 
#> Power (optimized for):
#> 0.8
#> 
#> Type I error/Alpha (for significance decision/z-Tests):
#> 0.05
#> 
#> Power Modeling:
#> probit
#> 
#> Type I error/Alpha (for power modeling):
#> 0.05
#> 
#> R (number of replications):
#> 200
#> 
#> Convergence Rate:
#> 0.965 (converged samples: 193)
#> 
#> Seed:
#> 2024
#> 
#> -------------------------------Results---------------------------------------
#> Desired Sample Size:
#> 328
#> 
#> Estimation Performance:
#>                      Y~X           Y~Z       Y~X:Z
#> Bias          0.01267340 -0.0009403001 0.001975726
#> absolute Bias 0.07162958  0.0673287802 0.057565311
#> relative Bias 0.04224466 -0.0047015005 0.009878631
#> RWMSE         0.08962312  0.0841871999 0.073139976
#>  *weighted bias, absolute bias, relative bias,
#>     and root weighted mean squared error
#> -----------------------------------------------------------------------------

The Result_Power contains far more information than that of the summary.

Result_Power$N
#> [1] 328

is the necessary sample size to ensure that the power is \(\ge .8\).

dim(Result_Power$est) # dimensions
#> [1] 200   3

head(Result_Power$est) # first 6 rows
#>         Y~X       Y~Z     Y~X:Z
#> 1 0.4259735 0.1642598 0.1517858
#> 2 0.3364663 0.2582934 0.1800099
#> 3 0.3177444 0.1617880 0.1866130
#> 4 0.2699603 0.2074662 0.1787506
#> 5 0.1727635 0.2363284 0.2591777
#> 6 0.3324757 0.1781149 0.1983582

is the data.frame including all parameter estimates from which the significance decision are computed using the corresponding standard errors in

head(Result_Power$se) # first 6 rows
#>          Y~X        Y~Z      Y~X:Z
#> 1 0.06618769 0.06789268 0.04367723
#> 2 0.07393442 0.06911093 0.04735818
#> 3 0.05915466 0.06298318 0.04493189
#> 4 0.05719477 0.05868278 0.03609598
#> 5 0.06024498 0.05513426 0.05130208
#> 6 0.06596830 0.06104587 0.04105743

head(Result_Power$fitOK)
#> [1] TRUE TRUE TRUE TRUE TRUE TRUE

is a vector of logicals indicating whether the models converged and the results are trustworthy to be used in power modeling with

Result_Power$convergenceRate
#> [1] 0.965

being the convergence rate.

Result_Power$N_trials
#> [1] 242 328

are the calculated necessary sample sizes within every step of the adaptive search algorithm. This can be used for diagnistics and to check whether the algorithm has converged.

Result_Power$power
#> [1] 0.8

Result_Power$beta
#> [1] 0.2

Result_Power$alpha
#> [1] 0.05

are the desired power level, the corresponding beta-error level (type II-error level: \(\beta=\mathbb{P}(H_0|H_1)\), since Power = \(1-\beta = \mathbb{P}(H_1 | H_1)\)) and the desired alpha-error level (Type I error level: \(\alpha=\mathbb{P}(H_1|H_0)\)).

Result_Power$search_method
#> [1] "adaptive"

Result_Power$power_modeling_method
#> [1] "probit"

Result_Power$runtime
#>    user  system elapsed 
#>   4.570   0.065   4.635

Result_Power$seed # general seed
#> [1] 2024

head(Result_Power$args$seeds) # seeds within each simulation
#> [1] 373379621 774681276 889107345 711740832 543694458 652342271

include information on the search algorithm (here “adaptive” search, could also be “bruteforce”), the chosen method to model the power (here “probit”, i.e., probit regression model), the runtime, the general seed and the seeds used within each simulation (for replicability of e.g., non-convergences, etc.) used for replicability.

Plots

The powerNLSEM package offers several plots, which visualize the power:

plot(Result_Power)

plots the model implied power for the POI vs. sample size N. The vertical line indicates the necessary sample size found be the adaptive search algorithm. The horizontal line indicates the desired power level.

plot(Result_Power, se = TRUE)

Within this plot the standard errors of the power_modeling_method are included into the plot.

plot(Result_Power, se = TRUE, plot = "empirical")

plots the empirical power per sample size and fits a LOESS fit to the resulting data. All plots indicate that the linear effect of Z has the smallest power.

Find other sample sizes from a fitted model

One can also find other sample sizes for power values other than that the process has been optimized for by using the reanalyze.powerNlSEM function.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9, .95))
#> $Npower
#> [1] 166 191 227 328 601 906
#> 
#> $power
#> [1] 0.50 0.60 0.70 0.80 0.90 0.95
#> 
#> $beta
#> [1] 0.50 0.40 0.30 0.20 0.10 0.05
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 0.05
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"

These new values can also be plotted into the plot

plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9, .95))

were we see that some of the power values actually fall out of the support for which sample sizes had been drawn indicating that these values might be less precise.

Further, if we want more precision in the power modeling process we can alter alpha_power_modeling to a lower value.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = .001)
#> $Npower
#> [1]  182  207  300 -Inf -Inf
#> 
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#> 
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 0.001
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"


plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = .001)

If we wish to not use confidence bands in the power modeling process we can use alpha_power_modeling = 1.

reanalyse.powerNLSEM(Result_Power, 
                     powerLevels = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = 1)
#> $Npower
#> [1] 127 158 196 246 323
#> 
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#> 
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#> 
#> $alpha
#> [1] 0.05
#> 
#> $alpha_power_modeling
#> [1] 1
#> 
#> $method
#> [1] "UPI"
#> 
#> $test
#> [1] "onesided"
#> 
#> $search_method
#> [1] "adaptive"
#> 
#> $power_modeling_method
#> [1] "probit"
#> 
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"


plot(Result_Power, se = TRUE, 
     power_aim = c(.5, .6, .7, .8, .9),
     alpha_power_modeling = 1)

If we choose alpha_power_modeling = 1 within the adaptive search algorithm using powerNLSEM, then the sample sizes get optimized for that value. However, this is not adviced since in approx. half of the replications (retrials of the adaptive algorithm or brute algorithm) the sample size will actually be smaller than that resulting in the desired power rate.

powerNLSEM

Model-implied simulation-based power estimation (MSPE) for nonlinear Structural Equation Modeling (NLSEM)

Install the latest working version from Github

Write Model

Adaptive Search

Plots

Find other sample sizes from a fitted model

Checking Results

Literature