geex
The empirical sandwich variance estimator is known to underestimate \(V(\theta)\) in small samples (Fay and Graubard 2001). Particularly in the context of GEE, many authors have proposed corrections that modify components of \(\hat{\Sigma}\) and/or by assuming \(\hat{\theta}\) follows a \(t\) (or \(F\)), as opposed to Normal, distribution with some estimated degrees of freedom. Many of the proposed corrections somehow modify a combination of the \(A_i\), \(A_m\), \(B_i\), or \(B_m\) matrices.
geex
provides an API that allows users to specify
functions that utilize these matrices to form corrections. A finite
sample correction function at a minimum takes the argument
components
, which is an object of class
sandwich_components
. For example,
<- function(components){
correct_by_nothing <- grab_bread(components)
A <- grab_meat(components)
B compute_sigma(A = A, B = B)
}
is a correctly formed function that does no corrections. Additional arguments may also be specified, as shown in the example.
geex
The geex
package includes the bias correction and
degrees of freedom corrections proposed by Fay
and Graubard (2001) in the correct_by_fay_bias
and
correct_by_fay_df
functions respectively. The following
demonstrates the construction and use of the bias correction. Fay and Graubard (2001) proposed the modified
variance estimator \(\hat{\Sigma}^{bc}(b) =
A_m^{-1} B_m^{bc}(b) \{A_m^{-1}\}^{\intercal}/m\), where:
\[\begin{equation} \label{eq:bc} B^{bc}_m(b) = \sum_{i = 1}^m H_i(b) B_i H_i(b)^{\intercal}, \end{equation}\]
\[\begin{equation} \label{eq:H} H_i(b) = \{1 - \min(b, \{A_i A^{-1}\}_{jj}) \}^{-1/2}, \end{equation}\]
and \(W_{jj}\) is the \((j, j)\) element of a matrix \(W\). When \(\{A_i A^{-1}\}_{jj}\) is close to 1, the adjustment to \(\hat{\Sigma}^{bc}(b)\) may be extreme, and the constant \(b\) is chosen by the analyst to limit over adjustments.
The bias corrected estimator \(\hat{\Sigma}^{bc}(b)\) can be implemented
in geex
by the following function:
<- function(components, b){
bias_correction <- grab_bread(components)
A <- grab_bread_list(components)
A_i <- grab_meat_list(components)
B_i <- solve(A)
Ainv
<- lapply(A_i, function(m){
H_i diag( (1 - pmin(b, diag(m %*% Ainv) ) )^(-0.5) )
})
<- lapply(seq_along(B_i), function(i){
Bbc_i %*% B_i[[i]] %*% H_i[[i]]
H_i[[i]]
})<- apply(simplify2array(Bbc_i), 1:2, sum)
Bbc
compute_sigma(A = A, B = Bbc)
}
The compute_sigma
function simply computes \(A^{-1} B \{A^{-1}\}^{\intercal}\). Note
that geex
computes \(A_m\)
and \(B_m\) as the sums of \(A_i\) and \(B_i\) rather than the means, hence the
appropriate function in the apply
call is sum
and not mean
. To use this bias correction, the
m_estimate
function accepts a named list of corrections to
perform. Each element of the list is also a list with two elements:
correctFUN
, the correction function; and
correctFUN_control
, a list of arguments passed to the
correctFUN
besides A
, A_i
,
B
, and B_i
.
Here we compare the geex
implementation of GEE with an
exchangeable correlation matrix to Fay’s saws
package.
The estimating functions are:
\[\begin{equation} \label{gee} \sum_{i= 1}^m \psi(\mathbf{Y}_i, \mathbf{X}_i, \beta) = \sum_{i = 1}^m \mathbf{D}_i^{\intercal} \mathbf{V}_i^{-1} (\mathbf{Y}_i - \mathbf{\mu}(\beta)) = 0 \end{equation}\]
where \(\mathbf{D}_i = \partial
\mathbf{\mu}/\partial \mathbf{\beta}\). The covariance matrix is
modeled by \(\mathbf{V}_i = \phi
\mathbf{A}_i^{0.5} \mathbf{R}(\alpha) \mathbf{A}_i^{0.5}\). The
matrix \(\mathbf{R}(\alpha)\) is the
“working” correlation matrix, which in this example is an exchangeable
matrix with off diagonal elements \(\alpha\). The matrix \(\mathbf{A}_i\) is a diagonal matrix with
elements containing the variance functions of \(\mu\). The equations in \(\eqref{gee}\) can be translated into an
eeFUN
as:
<- function(data, formula, family){
gee_eefun <- model.matrix(object = formula, data = data)
X <- model.response(model.frame(formula = formula, data = data))
Y <- nrow(X)
n function(theta, alpha, psi){
<- family$linkinv(X %*% theta)
mu <- t(X) %*% diag(as.numeric(mu), nrow = n)
Dt <- diag(as.numeric(family$variance(mu)), nrow = n)
A <- matrix(alpha, nrow = n, ncol = n)
R diag(R) <- 1
<- psi * (sqrt(A) %*% R %*% sqrt(A))
V %*% solve(V) %*% (Y - mu)
Dt
} }
This eeFUN
treats the correlation parameter \(\alpha\) and scale parameter \(\phi\) as fixed, though some estimation
algorithms use an iterative procedure that alternates between estimating
\(\beta\) and these parameters. By
customizing the root finding function, such an algorithm could be
implemented using geex
[see
vignette("geex_root_solvers")
for more information].
We use this example to compare covariance estimates obtained from the
gee
function, so root finding computations are turned off.
The gee
\(\beta\)
estimates are used instead. Estimates for \(\alpha\) and \(\phi\) are also extracted from the
gee
results in m_estimate
. This example shows
that an eeFUN
can accept additional arguments to be passed
to either the outer (data) function or the inner (theta) function.
Unlike previous examples, the independent units are the types of wool,
which is set in m_estimate
by the units
argument.
<- gee::gee(breaks~tension, id=wool, data=warpbreaks, corstr="exchangeable")
g <- saws::geeUOmega(g) guo
library(geex)
<- m_estimate(
results estFUN = gee_eefun, data = warpbreaks,
units = 'wool', roots = coef(g), compute_roots = FALSE,
outer_args = list(formula = breaks ~ tension,
family = gaussian()),
inner_args = list(alpha = g$working.correlation[1,2],
psi = g$scale),
corrections = list(
bias_correction_.1 = correction(bias_correction, b = .1),
bias_correction_.3 = correction(bias_correction, b = .3)))
In the geex
output, the item corrections
contains a list of the results of computing each item in the
corrections_list
. Comparing the geex
results
to the results of the saws::geeUOmega
function, the maximum
difference in the results for any of corrected estimated covariance
matrices is 1.1e-09.