geex
In the basic set-up, M-estimation applies to estimators of the \(p \times 1\) parameter \(\theta=(\theta_1, \theta_2, \dots, \theta_p)^{\intercal}\) which can be obtained as solutions to an equation of the form
\[\begin{equation} \label{eq:psi} \sum_{i = 1}^m \psi(O_i, \theta) = 0, \end{equation}\]
where \(O_1, \ldots, O_m\) are \(m\) independent and identically distributed (iid) random variables, and the function \(\psi\) returns a vector of length \(p\) and does not depend on \(i\) or \(m\) (Stefanski and Boos 2002, hereafter SB). See SB for the case where the \(O_i\) are independent but not necessarily identically distributed. The roots of (1) are referred to as M-estimators and denoted by \(\hat \theta\). M-estimators can be solved for analytically in some cases or computed numerically in general. Under certain regularity conditions, the asymptotic properties of \(\hat{\theta}\) can be derived from Taylor series approximations, the law of large numbers, and the central limit theorem. In brief, let \(\theta_0\) be the true parameter value defined by \(\int \psi(o, \theta_0) dF(o) = 0\), where \(F\) is the distribution function of \(O\). Let \(\dot{\psi}(o, \theta) = {\partial \psi(O_i, \theta)/\partial \theta}^{\intercal}\), \(A(\theta_0) = E[-\dot{\psi}(O_1, \theta_0)]\), and \(B(\theta_0) = E[\psi(O_1, \theta_0)\psi(O_1, \theta_0)^{\intercal}]\). Then under suitable regularity assumptions, \(\hat{\theta}\) is consistent and asymptotically Normal, i.e.,
\[\begin{equation*} \label{eq:variance} \sqrt{m}(\hat{\theta} - \theta_0) \overset{d}{\to} N(0, V(\theta_0)) \text{ as } m \to \infty, \end{equation*}\]
where \(V(\theta_0) = A(\theta_0)^{-1} B(\theta_0) \{A(\theta_0)^{-1} \}^{\intercal}\). The sandwich form of \(V(\theta_0)\) suggests several possible large sample variance estimators. For some problems, the analytic form of \(V(\theta_0)\) can be derived and estimators of \(\theta_0\) and other unknowns simply plugged into \(V(\theta_0)\). Alternatively, \(V(\theta_0)\) can be consistently estimated by the empirical sandwich variance estimator, where the expectations in \(A(\theta)\) and \(B(\theta)\) are replaced with their empirical counterparts. Let \(A_i = - \dot{\psi}(O_i, \theta)|_{\theta = \hat{\theta}}\), \(A_m = m^{-1} \sum_{i = 1}^m A_i\), \(B_i = \psi(O_i, \hat{\theta}) \psi(O_i, \hat{\theta})^{\intercal}\), and \(B_m = m^{-1} \sum_{i = 1}^m B_i\). The empirical sandwich estimator of the variance of \(\hat{\theta}\) is:
\[\begin{equation} \label{eq:esve} \hat{\Sigma} = A_m^{-1} B_m \{A_m^{-1}\}^{\intercal}/m . \end{equation}\]
The geex
package provides an application programming
interface (API) for carrying out M-estimation. The analyst provides a
function, called estFUN
, corresponding to \(\psi(O_i, \theta)\) that maps data \(O_i\) to a function of \(\theta\). Numerical derivatives approximate
\(\dot{\psi}\) so that evaluating \(\hat{\Sigma}\) is entirely a computational
exercise. No analytic derivations are required from the analyst.
Consider estimating the population mean \(\theta = E[Y_i]\) using the sample mean \(\hat{\theta} = m^{-1} \sum_{i=1}^m Y_i\) of \(m\) iid random variables \(Y_1, \dots, Y_m\). The estimator \(\hat{\theta}\) can be expressed as the solution to the following estimating equation:
\[ \sum_{i = 1}^m (Y_i - \theta) = 0. \]
which is equivalent to solving \(\eqref{eq:psi}\) where \(O_i = Y_i\) and \(\psi(O_i, \theta) = Y_i - \theta\). An
estFUN
is a translation of \(\psi\) into a function whose first argument
is data
and returns a function whose first argument is
theta
. An estFUN
corresponding to the
estimating equation for the sample mean of \(Y\) is:
meanFUN <- function(data){ function(theta){ data$Y - theta } } .
If an estimator fits into the above framework, then the user need
only specify estFUN
. No other programming is required to
obtain point and variance estimates. The remaining sections provide
examples of translating \(\psi\) into
an estFUN
.
The three examples of M-estimation from SB are presented here for
demonstration. For these examples, the data are \(O_i = \{Y_{1i}, Y_{2i}\}\) where \(Y_1 \sim N(5, 16)\) and \(Y_2 \sim N(2, 1)\) for \(m = 100\) and are included in the
geexex
dataset.
The first example estimates the population mean (\(\theta_1\)) and variance (\(\theta_2\)) of \(Y_1\). The solution to the estimating equations below are the sample mean \(\hat{\theta}_1 = m^{-1} \sum_{i = 1}^m Y_{1i}\) and sample variance \(\hat{\theta}_2 = m^{-1} \sum_{i = 1}^m (Y_{1i} - \hat{\theta}_1)^2\).
\[ \psi(Y_{1i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \end{pmatrix} \]
<- function(data){
SB1_estfun <- data$Y1
Y1 function(theta){
c(Y1 - theta[1],
- theta[1])^2 - theta[2])
(Y1
} }
The primary geex
function is m_estimate
,
which requires two inputs: estFUN
(the \(\psi\) function), data
(the
data frame containing \(O_i\) for \(i = 1, \dots, m\)). The package defaults to
rootSolve::multiroot
or estimating roots of \(\eqref{eq:psi}\), though the solver
algorithm can be specified in the root_control
argument.
Starting values for rootSolve::multiroot
are passed via the
root_control
argument; e.g.,
library(geex)
<- m_estimate(
results estFUN = SB1_estfun,
data = geexex,
root_control = setup_root_control(start = c(1,1)))
<- nrow(geexex)
n <- diag(1, nrow = 2)
A
<- with(geexex, {
B <- mean(Y1)
Ybar <- mean( (Y1 - Ybar)^2 )
B11 <- mean( (Y1 - Ybar) * ((Y1 - Ybar)^2 - B11) )
B12 <- mean( ((Y1 - Ybar)^2 - B11)^2 )
B22 matrix(
c(B11, B12,
nrow = 2
B12, B22),
)
})
# closed form roots
<- c(mean(geexex$Y1),
theta_cls # since var() divides by n - 1, not n
var(geexex$Y1) * (n - 1)/ n )
# closed form sigma
<- (solve(A) %*% B %*% t(solve(A))) / n
Sigma_cls
<- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
The m_estimate
function returns an object of the
S4
class geex
, which contains an
estimates
slot and vcov
slot for \(\hat{\theta}\) and \(\hat{\Sigma}\), respectively. These slots
can be accessed by the functions coef
(or
roots
) and vcov
. The point estimates obtained
for \(\theta_1\) and \(\theta_2\) are analogous to the base R
functions mean
and var
(after multiplying by
\(m-1/m\) for the latter). SB gave a
closed form for \(A(\theta_0)\) (an
identity matrix) and \(B(\theta_0)\)
(not shown here) and suggest plugging in sample moments to compute \(B(\hat{\theta})\). The point and variance
estimates using both geex
and the analytic solutions are
shown below}. The maximum absolute difference between either the point
or variance estimates is 4e-11, thus demonstrating excellent agreement
between the numerical results obtained from geex
and the
closed form solutions for this set of estimating equations and data.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239
##
## $geex$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239
##
## $cls$vcov
## [,1] [,2]
## [1,] 0.10041239 0.03667969
## [2,] 0.03667969 2.49219638
This example calculates a ratio estimator and illustrates the delta method via M-estimation. The estimating equations target the means of \(Y_1\) and \(Y_2\), labelled \(\theta_1\) and \(\theta_2\), as well as the estimand \(\theta_3 = \theta_1/ \theta_2\).
\[ \psi(Y_{1i}, Y_{2i}, \theta) = \begin{pmatrix} Y_{1i} - \theta_1 \\ Y_{2i} - \theta_2 \\ \theta_1 - \theta_3\theta_2 \end{pmatrix} \]
The solution to \(\eqref{eq:psi}\) for this \(\psi\) function yields \(\hat{\theta}_3 = \bar{Y}_1 / \bar{Y}_2\), where \(\bar{Y}_j\) denotes the sample mean of \(Y_{j1}, \ldots,Y_{jm}\) for \(j=1,2\).
<- function(data){
SB2_estfun <- data$Y1; Y2 <- data$Y2
Y1 function(theta){
c(Y1 - theta[1],
- theta[2],
Y2 1] - (theta[3] * theta[2])
theta[
)
} }
<- m_estimate(
results estFUN = SB2_estfun,
data = geexex,
root_control = setup_root_control(start = c(1, 1, 1)))
# Comparison to an analytically derived sanwich estimator
<- with(geexex, {
A matrix(
c(1 , 0, 0,
0 , 1, 0,
-1, mean(Y1)/mean(Y2), mean(Y2)),
byrow = TRUE, nrow = 3)
})
<- with(geexex, {
B matrix(
c(var(Y1) , cov(Y1, Y2), 0,
cov(Y1, Y2), var(Y2) , 0,
0, 0, 0),
byrow = TRUE, nrow = 3)
})
## closed form roots
<- c(mean(geexex$Y1), mean(geexex$Y2))
theta_cls 3] <- theta_cls[1]/theta_cls[2]
theta_cls[## closed form covariance
<- (solve(A) %*% B %*% t(solve(A))) / nrow(geexex)
Sigma_cls <- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB gave closed form expressions for \(A(\theta_0)\) and \(B(\theta_0)\), into which we plug in
appropriate estimates for the matrix components and compare to the
results from geex
. The point estimates again show excellent
agreement (maximum absolute difference 4.4e-16), while the covariance
estimates differ by the third decimal (maximum absolute difference
1e-03).
comparison
## $geex
## $geex$estimates
## [1] 5.044563 2.012793 2.506250
##
## $geex$vcov
## [,1] [,2] [,3]
## [1,] 0.100412389 -0.008718712 0.06074328
## [2,] -0.008718712 0.010693092 -0.01764626
## [3,] 0.060743283 -0.017646263 0.05215103
##
##
## $cls
## $cls$estimates
## [1] 5.044563 2.012793 2.506250
##
## $cls$vcov
## [,1] [,2] [,3]
## [1,] 0.10142666 -0.00880678 0.06135685
## [2,] -0.00880678 0.01080110 -0.01782451
## [3,] 0.06135685 -0.01782451 0.05267781
This example extends Example 1 to again illustrate the delta method. The estimating equations target not only the mean (\(\theta_1\)) and variance (\(\theta_2\)) of \(Y_1\), but also the standard deviation (\(\theta_3\)) and the log of the variance (\(\theta_4\)) of \(Y_1\).
\[ \psi(Y_{1i}, \mathbf{\theta}) = \begin{pmatrix} Y_{1i} - \theta_1 \\ (Y_{1i} - \theta_1)^2 - \theta_2 \\ \sqrt{\theta_2} - \theta_3 \\ \log(\theta_2) - \theta_4 \end{pmatrix} \]
<- function(data){
SB3_estfun <- data$Y1
Y1 function(theta){
c(Y1 - theta[1],
- theta[1])^2 - theta[2],
(Y1 sqrt(theta[2]) - theta[3],
log(theta[2]) - theta[4])
} }
## closed form roots
<- numeric(4)
theta_cls 1] <- mean(geexex$Y1)
theta_cls[2] <- sum((geexex$Y1 - theta_cls[1])^2)/nrow(geexex)
theta_cls[3] <- sqrt(theta_cls[2])
theta_cls[4] <- log(theta_cls[2])
theta_cls[
## Compare to closed form ##
<- theta_cls[2]
theta2 <- moments::moment(geexex$Y1, order = 3, central = TRUE)
mu3 <- moments::moment(geexex$Y1, order = 4, central = TRUE)
mu4 ## closed form covariance
<- matrix(
Sigma_cls c(theta2, mu3, mu3/(2*sqrt(theta2)), mu3/theta2,
- theta2^2, (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/theta2,
mu3, mu4 /(2 * sqrt(theta2)), (mu4 - theta2^2)/(2*sqrt(theta2)), (mu4 - theta2^2)/(4*theta2), (mu4 - theta2^2)/(2*theta2^(3/2)),
mu3/theta2, (mu4 - theta2^2)/theta2, (mu4 - theta2^2)/(2*theta2^(3/2)), (mu4/theta2^2) - 1) ,
mu3nrow = 4, byrow = TRUE) / nrow(geexex)
## closed form covariance
# Sigma_cls <- (solve(A) %*% B %*% t(solve(A))) / n
<- list(geex = list(estimates = coef(results), vcov = vcov(results)),
comparison cls = list(estimates = theta_cls, vcov = Sigma_cls))
SB again provided a closed form for \(A(\theta_0)\) and \(B(\theta_0)\), which we compare to the
geex
results. The maximum absolute difference between
geex
and the closed form estimates for both the parameters
and the covariance is 3.8e-11.
## $geex
## $geex$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $geex$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678
##
##
## $cls
## $cls$estimates
## [1] 5.044563 10.041239 3.168791 2.306700
##
## $cls$vcov
## [,1] [,2] [,3] [,4]
## [1,] 0.100412389 0.03667969 0.005787646 0.003652905
## [2,] 0.036679687 2.49219638 0.393240840 0.248196105
## [3,] 0.005787646 0.39324084 0.062049026 0.039162582
## [4,] 0.003652905 0.24819611 0.039162582 0.024717678