geex
This vignette implements examples from and is meant to read with
Stefanski and Boos (2002) (SB). Examples
4-9 use the geexex
data set. For information on how the
data are generated, run help('geexex')
.
library(geex)
Example 4 calculates an instumental variable estimator. The variables (\(Y_3, W_1, Z_1\)) are observed according to:
\[ Y_{3i} = \alpha + \beta X_{1i} + \sigma_{\epsilon}\epsilon_{1, i} \]
\[ W_{1i} =\beta X_{1i} + \sigma_{U}\epsilon_{2, i} \]
and
\[ Z_{1i} = \gamma + \delta X_{1i} + \sigma_{\tau}\epsilon_{3, i}. \]
\(Y_3\) is a linear function of a latent variable \(X_1\), whose coefficient \(\beta\) is of interest. \(W_1\) is a measurement of the unobserved \(X_1\), and \(Z_1\) is an instrumental variable for \(X_1\). The instrumental variable estimator is \(\hat{\beta}_{IV}\) is the ratio of least squares regression slopes of \(Y_3\) on \(Z_1\) and \(Y_3\) and \(W_1\). The \(\psi\) function below includes this estimator as \(\hat{\theta}_4 = \hat{\beta}_{IV}\). In the example data, 100 observations are generated where \(X_1 \sim \Gamma(\text{shape} = 5, \text{rate} = 1)\), \(\alpha = 2\), \(\beta = 3\), \(\gamma = 2\), \(\delta = 1.5\), \(\sigma_{\epsilon} = \sigma_{\tau} = 1\), \(\sigma_U = .25\), and \(\epsilon_{\cdot, i} \sim N(0,1)\).
\[ \psi(Y_{3i}, Z_{1i}, W_{1i}, \mathbf{\theta}) = \begin{pmatrix} \theta_1 - Z_{1i} \\ \theta_2 - W_{1i} \\ (Y_{3i} - \theta_3 W_{1i})(\theta_2 - W_{1i}) \\ (Y_{3i} - \theta_4 W_{1i})(\theta_1 - Z_{1i}) \\ \end{pmatrix} \]
<- function(data){
SB4_estFUN <- data$Z1; W1 <- data$W1; Y3 <- data$Y3
Z1 function(theta){
c(theta[1] - Z1,
2] - W1,
theta[- (theta[3] * W1)) * (theta[2] - W1),
(Y3 - (theta[4] * W1)) * (theta[1] - Z1)
(Y3
)
} }
<- m_estimate(
estimates estFUN = SB4_estFUN,
data = geexex,
root_control = setup_root_control(start = c(1, 1, 1, 1)))
The R packages AER
and
ivpack
can compute the IV estimator and sandwich variance estimators
respectively, which is done below for comparison.
<- AER::ivreg(Y3 ~ W1 | Z1, data = geexex)
ivfit <- ivpack::cluster.robust.se(ivfit, clusterid = 1:nrow(geexex)) iv_se
coef(ivfit)[2]
coef(estimates)[4]
2, 'Std. Error']
iv_se[sqrt(vcov(estimates)[4, 4])
This example shows the link between the influence curves and the Hodges-Lehman estimator.
\[ \psi(Y_{2i}, \theta) = \begin{pmatrix} IC_{\hat{\theta}_{HL}}(Y_2; \theta_0) - (\theta_1 - \theta_0) \\ \end{pmatrix} \]
The functions used in SB5_estFUN
are defined below:
<- function(y, theta0, distrFUN = pnorm){
F0 distrFUN(y - theta0, mean = 0)
}
<- function(y, densFUN){
f0 densFUN(y, mean = 0)
}
<- function(y, densFUN = dnorm){
integrand f0(y, densFUN = densFUN)^2
}
<- integrate(integrand, lower = -Inf, upper = Inf)$value IC_denom
<- function(data){
SB5_estFUN <- data[['Y2']]
Yi function(theta){
1/IC_denom) * (F0(Yi, theta[1]) - 0.5)
(
} }
<- m_estimate(
estimates estFUN = SB5_estFUN,
data = geexex,
root_control = setup_root_control(start = 2))
The hc.loc
of the ICSNP
package computes the Hodges-Lehman estimator and SB gave closed form
estimators for the variance.
<- ICSNP::hl.loc(geexex$Y2)
theta_cls <- 1/(12 * IC_denom^2) / nrow(geexex) Sigma_cls
## $geex
## $geex$parameters
## [1] 2.026376
##
## $geex$vcov
## [,1]
## [1,] 0.01040586
##
##
## $cls
## $cls$parameters
## [1] 2.024129
##
## $cls$vcov
## [1] 0.01047198
This example illustrates estimation with nonsmooth \(\psi\) function for computing the Huber (1964) estimator of the center of symmetric distributions.
\[ \psi_k(Y_{1i}, \theta) = \begin{cases} (Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| \leq k \\ k * sgn(Y_{1i} - \theta) & \text{when } |(Y_{1i} - \theta)| > k\\ \end{cases} \]
<- function(data, k = 1.5){
SB6_estFUN <- data$Y1
Y1 function(theta){
<- Y1 - theta[1]
x if(abs(x) <= k) x else sign(x) * k
} }
<- m_estimate(
estimates estFUN = SB6_estFUN,
data = geexex,
root_control = setup_root_control(start = 3))
The point estimate from geex
is compared to the estimate
obtained from the huber
function in the MASS
package. The variance estimate is compared to an estimator suggested by
SB: \(A_m = \sum_i [-\psi'(Y_i -
\hat{\theta})]/m\) and \(B_m = \sum_i
\psi_k^2(Y_i - \hat{\theta})/m\), where \(\psi_k'\) is the derivative of \(\psi\) where is exists.
<- MASS::huber(geexex$Y1, k = 1.5, tol = 1e-10)$mu
theta_cls
<- function(x, k = 1.5){
psi_k if(abs(x) <= k) x else sign(x) * k
}
<- mean(unlist(lapply(geexex$Y1, function(y){
A <- y - theta_cls
x -numDeriv::grad(psi_k, x = x)
})))
<- mean(unlist(lapply(geexex$Y1, function(y){
B <- y - theta_cls
x psi_k(x = x)^2
})))
## closed form covariance
<- matrix(1/A * B * 1/A / nrow(geexex)) Sigma_cls
<- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),
results cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] 4.82061
##
## $geex$vcov
## [,1]
## [1,] 0.08356179
##
##
## $cls
## $cls$parameters
## [1] 4.999386
##
## $cls$vcov
## [,1]
## [1,] 0.0928935
Approximation of \(\psi\) with geex is EXPERIMENTAL.
Example 7 illustrates calculation of sample quantiles using M-estimation and approximation of the \(\psi\) function.
\[ \psi(Y_{1i}, \theta) = \begin{pmatrix} 0.5 - I(Y_{1i} \leq \theta_1) \\ \end{pmatrix} \]
<- function(data){
SB7_estFUN <- data$Y1
Y1 function(theta){
0.5 - (Y1 <= theta[1])
} }
Note that \(\psi\) is not
differentiable; however, geex
includes the ability to
approximate the \(\psi\) function via
an approx_control
object. The FUN
in an
approx_control
must be a function that takes in the \(\psi\) function, modifies it, and returns a
function of theta
. For this example, I approximate \(\psi\) with a spline function. The
eval_theta
argument is used to modify the basis of the
spline.
<- function(psi, eval_theta){
spline_approx <- Vectorize(psi)(eval_theta)
y <- splinefun(x = eval_theta, y = y)
f function(theta) f(theta)
}
<- m_estimate(
estimates estFUN = SB7_estFUN,
data = geexex,
root_control = setup_root_control(start = 4.7),
approx_control = setup_approx_control(FUN = spline_approx,
eval_theta = seq(3, 6, by = .05)))
A comparison of the variance is not obvious, so no comparison is made.
## $geex
## $geex$parameters
## [1] 4.7
##
## $geex$vcov
## [,1]
## [1,] 0.1773569
##
##
## $cls
## $cls$parameters
## [1] 4.708489
##
## $cls$vcov
## [1] NA
Example 8 demonstrates robust regression for estimating \(\beta\) from 100 observations generated from \(Y_4 = 0.1 + 0.1 X_{1i} + 0.5 X_{2i} + \epsilon_i\), where \(\epsilon_i \sim N(0, 1)\), \(X_1\) is defined as above, and the first half of the observation have \(X_{2i} = 1\) and the others have \(X_{2i} = 0\).
\[ \psi_k(Y_{4i}, \theta) = \begin{pmatrix} \psi_k(Y_{4i} - \mathbf{x}_i^T \beta) \mathbf{x}_i \end{pmatrix} \]
<- function(x, k = 1.345){
psi_k if(abs(x) <= k) x else sign(x) * k
}
<- function(data){
SB8_estFUN <- data$Y4
Yi <- model.matrix(Y4 ~ X1 + X2, data = data)
xi function(theta){
<- Yi - xi %*% theta
r c(psi_k(r) %*% xi)
} }
<- m_estimate(
estimates estFUN = SB8_estFUN,
data = geexex,
root_control = setup_root_control(start = c(0, 0, 0)))
<- MASS::rlm(Y4 ~ X1 + X2, data = geexex, method = 'M')
m <- coef(m)
theta_cls <- vcov(m) Sigma_cls
<- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),
results cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] -0.04050369 0.14530196 0.30181589
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.05871932 -0.0101399730 -0.0133230841
## [2,] -0.01013997 0.0021854268 0.0003386202
## [3,] -0.01332308 0.0003386202 0.0447117633
##
##
## $cls
## $cls$parameters
## (Intercept) X1 X2
## -0.0377206 0.1441181 0.2988842
##
## $cls$vcov
## (Intercept) X1 X2
## (Intercept) 0.07309914 -0.0103060747 -0.0241724792
## X1 -0.01030607 0.0020579145 0.0005364106
## X2 -0.02417248 0.0005364106 0.0431120686
Example 9 illustrates estimation of a generalized linear model.
\[ \psi(Y_i, \theta) = \begin{pmatrix} D_i(\beta)\frac{Y_i - \mu_i(\beta)}{V_i(\beta) \tau} \end{pmatrix} \]
<- function(data){
SB9_estFUN <- data$Y5
Y <- model.matrix(Y5 ~ X1 + X2, data = data, drop = FALSE)
X function(theta){
<- X %*% theta
lp <- plogis(lp)
mu <- t(X) %*% dlogis(lp)
D <- mu * (1 - mu)
V %*% solve(V) %*% (Y - mu)
D
} }
<- m_estimate(
estimates estFUN = SB9_estFUN,
data = geexex,
root_control = setup_root_control(start = c(.1, .1, .5)))
Compare point estimates to glm
coefficients and
covariance matrix to sandwich
.
<- glm(Y5 ~ X1 + X2, data = geexex, family = binomial(link = 'logit'))
m9 <- coef(m9)
theta_cls <- sandwich::sandwich(m9) Sigma_cls
<- list(geex = list(parameters = coef(estimates), vcov = vcov(estimates)),
results cls = list(parameters = theta_cls, vcov = Sigma_cls))
results
## $geex
## $geex$parameters
## [1] -1.1256071 0.3410619 -0.1148368
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.35202094 -0.058906883 -0.101528787
## [2,] -0.05890688 0.012842435 0.004357355
## [3,] -0.10152879 0.004357355 0.185455144
##
##
## $cls
## $cls$parameters
## (Intercept) X1 X2
## -1.1256070 0.3410619 -0.1148368
##
## $cls$vcov
## (Intercept) X1 X2
## (Intercept) 0.35201039 -0.058903546 -0.101534539
## X1 -0.05890355 0.012841392 0.004358926
## X2 -0.10153454 0.004358926 0.185456314
Example 10 illustrates testing equality of success probablities.
\[ \psi(Y_i, n_i, \theta) = \begin{pmatrix} \frac{(Y_i - n_i \theta_2)^2}{n_i \theta_2( 1 - \theta_2 )} - \theta_1 \\ Y_i - n_i \theta_2 \end{pmatrix} \]
<- function(data){
SB10_estFUN <- data$ft_made
Y <- data$ft_attp
n function(theta){
<- theta[2]
p c(((Y - (n * p))^2)/(n * p * (1 - p)) - theta[1],
- n * p)
Y
} }
<- m_estimate(
estimates estFUN = SB10_estFUN,
data = shaq,
units = 'game',
root_control = setup_root_control(start = c(.5, .5)))
<- function(p) {
V11 <- nrow(shaq)
k <- sum(shaq$ft_attp)
sumn <- sum(1/shaq$ft_attp)
sumn_inv <- 1 - (6 * p) + (6 * p^2)
term2_n <- p * (1 - p)
term2_d <- term2_n/term2_d
term2 <- ((1 - (2 * p))^2) / ((sumn/k) * p * (1 - p))
term3 2 + (term2 * (1/k) * sumn_inv) - term3
}
<- sum(shaq$ft_made)/sum(shaq$ft_attp)
p_tilde <- V11(p_tilde)/23
V11_hat
# Compare variance estimates
V11_hat
## [1] 0.0783097
vcov(estimates)[1, 1]
## [1] 0.1929791
# Note the differences in the p-values
pnorm(35.51/23, mean = 1, sd = sqrt(V11_hat), lower.tail = FALSE)
## [1] 0.02596785
pnorm(coef(estimates)[1],
mean = 1,
sd = sqrt(vcov(estimates)[1, 1]),
lower.tail = FALSE)
## [1] 0.1078138
This example shows that the empircal sandwich variance estimator may be different from other sandwich variance estimators that make assumptions about the structure of the \(A\) and \(B\) matrices.