The package semlbci
(Cheung &
Pesigan, 2023) includes functions for finding the likelihood-based
confidence intervals (LBCIs) of parameters in the output of a structural
equation modeling (SEM) function. Currently, it supports the output from
lavaan::lavaan()
and its wrappers, such as
lavaan::sem()
and lavaan::cfa()
.
The latest stable version can be installed from GitHub:
Further information about semlbci
can be found in Cheung and Pesigan
(2023).
The package has a dataset, simple_med
, with three
variables, x
, m
, and y
. Let us
fit a simple mediation model to this dataset.
library(semlbci)
data(simple_med)
dat <- simple_med
head(dat)
#> x m y
#> 1 -0.3447375 7.284273 -5.636897
#> 2 -0.3658919 -5.449121 -4.525402
#> 3 -0.8294968 -7.016254 -7.823257
#> 4 -0.3389654 4.367018 1.563098
#> 5 -0.9628162 -4.015469 -7.288511
#> 6 -1.0749302 -11.538140 -4.153572
library(lavaan)
mod <-
"
m ~ a*x
y ~ b*m
ab := a * b
"
# We set fixed.x = FALSE because we will also find the LBCIs for
# standardized solution
fit <- sem(mod, simple_med, fixed.x = FALSE)
To illustrate how to find the LBCIs for user-defined parameters, we
labelled the m ~ x
path by a
, the
y ~ m
path by b
, and defined the indirect
effect, ab
, by a * b
.
This is the summary:
summary(fit, standardized = TRUE)
#> lavaan 0.6.17 ended normally after 1 iteration
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 5
#>
#> Number of observations 200
#>
#> Model Test User Model:
#>
#> Test statistic 10.549
#> Degrees of freedom 1
#> P-value (Chi-square) 0.001
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> m ~
#> x (a) 1.676 0.431 3.891 0.000 1.676 0.265
#> y ~
#> m (b) 0.535 0.073 7.300 0.000 0.535 0.459
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> .m 34.710 3.471 10.000 0.000 34.710 0.930
#> .y 40.119 4.012 10.000 0.000 40.119 0.790
#> x 0.935 0.094 10.000 0.000 0.935 1.000
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) Std.lv Std.all
#> ab 0.897 0.261 3.434 0.001 0.897 0.122
The main function to find the LBCIs for free parameters is
semlbci()
. This should be the only function used by normal
users. We will first illustrate its usage by some examples, and then
present other technical details in the following section.
All free parameters can be specified in lavaan
style.
For example, the path from m
to y
is denoted
by "y ~ m"
, and the covariance or correlation between
x
and m
(not in the example) is denoted by
"x ~~ m"
(order does not matter).
Since version 0.10.4.25, lavaan
model syntax operators
can be used to represent all parameters of the same type:
"~"
for regression paths, "~~"
for variances
and covariances, "=~"
for factor loadings, and
":="
for all user-defined parameters. For example, the
following call and the one above find LBCIs for the same set of
parameters:
The output is the parameter table of the fitted lavaan
object, with two columns added, lbci_lb
and
lbci_ub
, the likelihood-based lower bounds and upper
bounds, respectively.
out
#>
#> Results:
#> id lhs op rhs label est lbci_lb lbci_ub lb ub cl_lb cl_ub
#> 1 1 m ~ x a 1.676 0.828 2.525 0.832 2.520 0.950 0.950
#> 2 2 y ~ m b 0.535 0.391 0.679 0.391 0.679 0.950 0.950
#>
#> Annotation:
#> * lbci_lb, lbci_ub: The lower and upper likelihood-based bounds.
#> * est: The point estimates from the original lavaan output.
#> * lb, ub: The original lower and upper bounds, extracted from the
#> original lavaan output. Usually Wald CIs for free parameters and
#> delta method CIs for user-defined parameters
#> * cl_lb, cl_ub: One minus the p-values of chi-square difference tests
#> at the bounds. Should be close to the requested level of
#> confidence, e.g., .95 for 95% confidence intervals.
#>
#> Call:
#> semlbci(sem_out = fit, pars = c("~"))
In this example, the point estimate of the unstandardized coefficient
from x
to m
is 1.676, and the LBCI is 0.828 to
2.525.
By default, factor loadings, covariances (except for error
covariances), and regression paths are included in the search.
Therefore, pars
can be omitted, although the search will
take time to run for a big model. In this case, it is advised to enable
parallel processing by add parallel = TRUE
and set
ncpus
to the number of processes to run (these arguments
are explained later):
out <- semlbci(sem_out = fit,
parallel = TRUE,
ncpus = 6)
print(out,
annotation = FALSE)
#>
#> Results:
#> id lhs op rhs label est lbci_lb lbci_ub lb ub cl_lb cl_ub
#> 1 1 m ~ x a 1.676 0.828 2.525 0.832 2.520 0.950 0.950
#> 2 2 y ~ m b 0.535 0.391 0.679 0.391 0.679 0.950 0.950
#> 6 6 ab := a*b ab 0.897 0.427 1.464 0.385 1.409 0.950 0.950
For users familiar with the column names, the annotation can be
disabled by calling print()
and add
annotation = FALSE
:
print(out, annotation = FALSE)
#>
#> Results:
#> id lhs op rhs label est lbci_lb lbci_ub lb ub cl_lb cl_ub
#> 1 1 m ~ x a 1.676 0.828 2.525 0.832 2.520 0.950 0.950
#> 2 2 y ~ m b 0.535 0.391 0.679 0.391 0.679 0.950 0.950
#> 6 6 ab := a*b ab 0.897 0.427 1.464 0.385 1.409 0.950 0.950
The results can also be printed in a lavaan
-like format
by calling print()
, setting sem_out
to the
original fit object (fit
in this example), and add
output = "lavaan"
:
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> m ~
#> x (a) 1.676 0.431 3.891 0.000 0.828 2.525
#> y ~
#> m (b) 0.535 0.073 7.300 0.000 0.391 0.679
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.897 0.261 3.434 0.001 0.427 1.464
By default, the original confidence intervals will not be printed.
See the help page of print.semlbci()
for other options
available.
To find the LBCI for a user-defined parameter, use
label :=
, where label
is the label used in the
model specification. The definition of this parameter can be omitted.
The content after :=
will be ignored by
semlbci()
.
out <- semlbci(sem_out = fit,
pars = c("ab := "))
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.897 0.261 3.434 0.001 0.427 1.464
In this example, the point estimate of the indirect effect is 0.897, and the LBCI is 0.427 to 1.464.
(Note: In some examples, we added annotation = FALSE
to
suppress the annotation in the printout to minimize the length of this
vignette.)
By the default, the unstandardized solution is used by
semlbci()
. If the LBCIs for the standardized solution
solution are needed, set standardized = TRUE
.
This one also works:
This is the printout, in lavaan
-style:
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> m ~
#> x (a) 0.265 0.066 4.035 0.000 0.133 0.389
#> y ~
#> m (b) 0.459 0.056 8.215 0.000 0.342 0.561
If LBCIs are for the standardized solution and output
set to "lavaan"
when printing the results, the parameter
estimates, standard errors, z-values, and p-values are
those from the standardized solution.
The LBCIs for standardized user-defined parameters can be requested similarly.
out <- semlbci(sem_out = fit,
pars = c("ab :="),
standardized = TRUE)
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Defined Parameters:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.122 0.034 3.538 0.000 0.059 0.194
out <- semlbci(sem_out = fit,
standardized = TRUE)
print(out,
sem_out = fit,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Regressions:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> m ~
#> x (a) 0.265 0.066 4.035 0.000 0.133 0.389
#> y ~
#> m (b) 0.459 0.056 8.215 0.000 0.342 0.561
#>
#> Defined Parameters:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.122 0.034 3.538 0.000 0.059 0.194
semlbci()
sem_out
and pars
: The Fit Object and the
ParametersThe only required argument for semlbci()
is
sem_out
, the fit object from lavaan::lavaan()
or its wrappers (e.g., lavann::cfa()
and
lavaan::sem()
). By default, semlbci()
will
find the LBCIs for all free parameters (except for variances and error
variances) and user-defined parameters, which can take a long time for a
model with many parameters. Moreover, LBCI is usually used when
Wald-type confidence interval may not be suitable, for example, forming
the confidence interval for an indirect effect or a parameter in the
standardized solution. These parameters may have sampling distributions
that are asymmetric or otherwise substantially nonnormal due to bounded
parameter spaces or other reasons.
Therefore, it is recommended to call semlbci()
without
specifying any parameters. If the time to run is long, then call
semlbci()
only for selected parameters. The argument
pars
should be a model syntax or a vector of strings which
specifies the parameters for which LBCIs will be formed (detailed
below).
If time is not a concern, for example, when users want to explore the
LBCIs for all free and user-defined parameters in a final model, then
pars
can be omitted to request the LBCIs for all free
parameters (except for variances and covariances) and user-defined
parameters (if any) in a model.
ciperc
: The Level of ConfidenceBy default, the 95% LBCIs for the unstandardized solution will be
formed. To change the level of confidence, set the argument
ciperc
to the desired coverage probability, e.g., .95 for
95%, .90 for 90%.
standardized
: Whether Standardized Solution Is
UsedBy default, the LBCIs for the unstandardized solution will be formed.
If the LBCIs for the standardized solution are desired, set
standardized = TRUE
. Note that for some models it can be
much slower to find the LBCIs for the standardized solution than for the
unstandardized solution.
parallel
and ncpus
The search for the bounds needs to be done separately for each bound
and this can take a long time for a model with many parameters and/or
with equality constraints. Therefore, parallel processing should always
be enabled by setting parallel
to TRUE
and
ncpus
to a number smaller than the number of available
cores. For example, without parallel processing, the following search
took about 28 seconds on Intel i7-8700:
data(HolzingerSwineford1939)
mod_test <-
'
visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
'
fit_cfa <- cfa(model = mod_test,
data = HolzingerSwineford1939)
semlbci(fit_cfa)
With parallel processing enabled and using 6 cores, it took about 20 seconds.
The speed difference can be much greater for a model with many parameters and some equality constraints.
Enabling parallel processing also has the added benefit of showing the progress in real time.
try_k_more_times
and semlbci_out
For some models and some parameters, the search may be difficult. By
default, semlbci()
will try two more times, successively
changing some settings internally. If still failed in forming the LBCI,
users can try to set try_k_more_times
to a larger number
slightly larger than 2 (the default value) to see whether it can help
forming the LBCI. This can be done without forming other LBCIs again if
the output of semlbci()
is passed to the new call using
semlbci_out
.
For example, assume that some LBCIs could not be found in the first run:
We can call semlbci()
again, increasing
try_k_more_times
to 5, and set semlbci_out
to
lbci_some_failed
.
It will only form LBCIs for parameters failed in the first one. The
output, lbci_try_again
, will have the original LBCIs plus
the new ones, if the search succeeds.
For detailed documentation of other arguments, please refer to the
help page of semlbci()
. Advanced users who want to tweak
the optimization options can check the help pages of
ci_bound_wn_i()
and ci_i_one()
,
semlbci()
supports multiple-group models. For example,
this is a two-group confirmatory factor analysis model with equality
constraints:
data(HolzingerSwineford1939)
mod_cfa <-
'
visual =~ x1 + v(lambda2, lambda2)*x2 + v(lambda3, lambda3)*x3
textual =~ x4 + v(lambda5, lambda5)*x5 + v(lambda6, lambda6)*x6
speed =~ x7 + v(lambda8, lambda8)*x8 + v(lambda9, lambda9)*x9
'
fit_cfa <- cfa(model = mod_cfa,
data = HolzingerSwineford1939,
group = "school")
The factor correlations between group are not constrained to be equal.
parameterEstimates(fit_cfa)[c(22, 23, 58, 59), ]
#> lhs op rhs block group label est se z pvalue ci.lower
#> 22 visual ~~ textual 1 1 0.416 0.097 4.271 0.000 0.225
#> 23 visual ~~ speed 1 1 0.169 0.064 2.643 0.008 0.044
#> 58 visual ~~ textual 2 2 0.437 0.099 4.423 0.000 0.243
#> 59 visual ~~ speed 2 2 0.314 0.079 3.958 0.000 0.158
#> ci.upper
#> 22 0.606
#> 23 0.294
#> 58 0.631
#> 59 0.469
This is the LBCI for covariance between visual ability and textual ability:
fcov <- semlbci(fit_cfa,
pars = c("visual ~~ textual"))
print(fcov,
sem_out = fit_cfa,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#>
#> Group 1 [Pasteur]:
#>
#> Covariances:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.416 0.097 4.271 0.000 0.221 0.654
#>
#>
#> Group 2 [Grant-White]:
#>
#> Covariances:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.437 0.099 4.423 0.000 0.263 0.663
This is the LBCI for correlation between visual ability and textual ability:
fcor <- semlbci(fit_cfa,
pars = c("visual ~~ textual"),
standardized = TRUE)
print(fcor,
sem_out = fit_cfa,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Standardized Estimates Only
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#>
#> Group 1 [Pasteur]:
#>
#> Covariances:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.485 0.087 5.601 0.000 0.291 0.640
#>
#>
#> Group 2 [Grant-White]:
#>
#> Covariances:
#> Standardized Std.Err z-value P(>|z|) lb.lower lb.upper
#> visual ~~
#> textual 0.540 0.086 6.317 0.000 0.357 0.692
Note that the example above can take more than one minute to one if parallel processing is not enabled.
semlbci()
also supports the robust LBCI proposed by Falk
(2018). To form robust LBCI, the model must be fitted with robust test
statistics requested (e.g., estimator = "MLR"
). To request
robust LBCIs, add robust = "satorra.2000"
when calling
semlbci()
.
We use the simple mediation model as an example:
fit_robust <- sem(mod, simple_med,
fixed.x = FALSE,
estimator = "MLR")
fit_lbci_ab_robust <- semlbci(fit_robust,
pars = "ab := ",
robust = "satorra.2000")
print(fit_lbci_ab_robust,
sem_out = fit_robust,
output = "lavaan")
#> Likelihood-Based CI Notes:
#>
#> - lb.lower, lb.upper: The lower and upper likelihood-based confidence
#> bounds.
#>
#> Parameter Estimates:
#>
#> Standard errors Sandwich
#> Information bread Observed
#> Observed information based on Hessian
#>
#> Defined Parameters:
#> Estimate Std.Err z-value P(>|z|) lb.lower lb.upper
#> ab 0.897 0.305 2.941 0.003 0.353 1.571
semlbci()
support forming the LBCIs for most free
parameters. Not illustrated above but LBCIs can be formed for path
coefficients between latent variables and also user-defined parameters
based on latent-level parameters, such as an indirect effect from one
latent variable to another.
More examples can be found in the “examples” folders in the OSF page for this package.
The following is a summary of the limitations of
semlbci()
. Please refer to check_sem_out()
for
the full list of limitations. This function is called by
semlbci()
to check the sem_out
object, and
will raise warnings or errors as appropriate.
The function semlbci()
currently supports
lavaan::lavaan()
results estimated by maximum likelihood
(ML
), full information maximum likelihood for missing data
(fiml
), and their robust variants (e.g.,
MLM
).
This package currently supports single and multiple group models with continuous variables. It may work for a model with ordered variables but this is not officially tested.
The current and preferred method is the one proposed by Wu and Neale
(2012), illustrated by Pek and Wu (2015). The current implementation in
semlbci()
does not check whether a parameter is near its
boundary. The more advanced methods by Pritikin, Rappaport, and Neale
(2017) will be considered in future development.
A detailed presentation of the internal workflow of
semlbci()
can be found in the
vignette("technical_workflow", package = "semlbci")
. Users
interested in calling the lowest level function,
ci_bound_wn_i()
, can see some illustrative examples in
vignette("technical_searching_one_bound", package = "semlbci")
.
Cheung, S. F., & Pesigan, I. J. A. (2023). semlbci: An R package for forming likelihood-based confidence intervals for parameter estimates, correlations, indirect effects, and other derived parameters. Structural Equation Modeling: A Multidisciplinary Journal. 30(6), 985–999. https://doi.org/10.1080/10705511.2023.2183860
Falk, C. F. (2018). Are robust standard errors the best approach for interval estimation with nonnormal data in structural equation modeling? Structural Equation Modeling: A Multidisciplinary Journal, 25(2), 244-266. https://doi.org/10.1080/10705511.2017.1367254
Pek, J., & Wu, H. (2015). Profile likelihood-based confidence intervals and regions for structural equation models. Psychometrika, 80(4), 1123–1145. https://doi.org/10.1007/s11336-015-9461-1
Pritikin, J. N., Rappaport, L. M., & Neale, M. C. (2017). Likelihood-based confidence intervals for a parameter with an upper or lower bound. Structural Equation Modeling: A Multidisciplinary Journal, 24(3), 395-401. https://doi.org/10.1080/10705511.2016.1275969
Wu, H., & Neale, M. C. (2012). Adjusted confidence intervals for a bounded parameter. Behavior Genetics, 42(6), 886–898. https://doi.org/10.1007/s10519-012-9560-z