The spfilteR
package provides tools to implement the
eigenvector-based spatial filtering (ESF) approach put forth by Griffith (2003) in linear and generalized linear
regression models. It allows users to obtain eigenvectors from a
transformed connectivity matrix and supports the supervised and
unsupervised creation of a synthetic spatial filter. For the
unsupervised approach, eigenvectors can be selected based on either i)
the maximization of model fit, ii) the minimization of residual
autocorrelation, iii) the statistical significance of residual
autocorrelation, or iv) the significance level of candidate
eigenvectors. Alternatively, all eigenvectors in the search set may be
included so that no selection takes place. While the functions provide a
high flexibility, only a minimum of code is required to implement the
unsupervised ESF approach.
This vignette solely focuses on the workflow to illustrate the
functionality of the spfilteR
package. For a discussion on
the ESF methodology, interested readers may be referred to other
sources, e.g., Griffith, Chun, and Li
(2019) or Tiefelsdorf and Griffith
(2007).
A stable release version of the spfilteR
package is
available on CRAN and the latest development version can be downloaded
from GitHub.
# CRAN release
install.packages("spfilteR")
# GitHub
library(devtools)
::install_github("sjuhl/spfilteR") devtools
Together with the main functions, the package contains an artificial
dataset (fakedataset
) and a connectivity matrix
(W
) connecting the 100 cross-sections on a regular grid by
using the rook criterion of adjacency. For illustrative purposes, the
following examples utilize the fake dataset.
# load the package
library(spfilteR)
# load the dataset
data("fakedata")
# take a look at the connectivity matrix and the variables
dim(W)
#> [1] 100 100
head(fakedataset)
#> ID indicator count x1 x2 x3 negative negcount
#> 1 1 0 0 11.29514 5.527426 6.6406409 -2.338103 9
#> 2 2 0 4 13.92705 7.047870 4.0049350 -2.048737 3
#> 3 3 1 1 17.31431 10.525834 -0.7022595 -1.868379 5
#> 4 4 1 4 15.21778 7.817901 1.5948141 -2.737914 2
#> 5 5 1 3 15.56298 7.903947 -1.2708973 -1.117427 2
#> 6 6 1 2 17.43941 11.254842 -1.4158475 1.577442 2
Besides an ID
variable, the dataset contains a binary
indicator
variable, two count variables, and four
continuous variables.
In a first step, the function MI.vec()
checks for the
presence of spatial autocorrelation in the continuous variables by means
of Moran’s I (see also Cliff and Ord
1981, 1972). The function further allows users to define the
alternative hypothesis in order to obtain p-values.
# select continuous variables
<- cbind(fakedataset$x1, fakedataset$x2, fakedataset$x3, fakedataset$negative)
cont colnames(cont) <- c("x1", "x2", "x3", "negative")
# Moran test of spatial autocorrelation
<- MI.vec(x = cont, W = W, alternative = 'greater'))
(I #> I EI VarI zI pI
#> x1 0.311178214 -0.01010101 0.005344193 4.3948248 5.543107e-06 ***
#> x2 0.168757531 -0.01010101 0.005344193 2.4466317 7.209904e-03 **
#> x3 0.009112739 -0.01010101 0.005344193 0.2628276 3.963417e-01
#> negative -0.244064424 -0.01010101 0.005344193 -3.2004192 9.993139e-01
The output suggests that the variables x1
and
x2
are positively autocorrelated at conventional levels of
statistical significance. Moreover, the standardized value of Moran’s
I (zI
) indicates that the variable
negative
is negatively autocorrelated. We can use the
function MI.vec()
and specify
alternative = 'lower'
to assess the significance of the
negative autocorrelation:
MI.vec(x = fakedataset$negative, W = W, alternative = 'lower')
#> I EI VarI zI pI
#> 1 -0.2440644 -0.01010101 0.005344193 -3.200419 0.0006861391 ***
Since the Moran coefficient is a global measure of spatial
autocorrelation, spatial heterogeneity constitutes a problem for this
statistic. More specifically, the simultaneous presence of positive and
negative spatial autocorrelation at different scales cannot be revealed
by the classical Moran’s I. To circumvent this problem, the
function MI.decomp()
decomposes the Moran coefficient into
a positively and a negatively autocorrelated part and performs a
permutation procedure to assess the significance (Dray 2011):
# decompose Moran's I
<- MI.decomp(x = cont, W = W, nsim=100))
(I.dec #> I+ VarI+ pI+ I- VarI- pI-
#> x1 0.3894789 0.001434660 0.00990099 ** -0.0783007 0.002050915 1.00000000
#> x2 0.3048369 0.001739862 0.00990099 ** -0.1360794 0.001902856 0.99009901
#> x3 0.2247566 0.001527913 0.27722772 -0.2156438 0.002159211 0.39603960
#> negative 0.1131614 0.001539866 1.00000000 -0.3572259 0.001516440 0.00990099
#> pItwo.sided
#> x1 0.01980198 *
#> x2 0.01980198 *
#> x3 0.55445545
#> negative ** 0.01980198 *
Note that the global Moran’s I is the sum of I+
and I-
:
# I = 'I+' + 'I-'
cbind(I[, "I"], I.dec[, "I+"] + I.dec[, "I-"])
#> [,1] [,2]
#> x1 0.311178214 0.311178214
#> x2 0.168757531 0.168757531
#> x3 0.009112739 0.009112739
#> negative -0.244064424 -0.244064424
Assume that we wish to regress x1
on x2
and
test for residual autocorrelation using the function
MI.resid()
.
# define variables
<- fakedataset$x1
y <- cbind(1, fakedataset$x2)
X
# OLS regression
<- resid(lm(y ~ X))
ols.resid
# Moran test of residual spatial autocorrelation
MI.resid(resid = ols.resid, W = W, alternative = 'greater')
#> I EI VarI zI pI
#> 1 0.350568 -0.01010101 0.005344193 4.933643 4.035495e-07 ***
The results show that the residuals are significantly autocorrelated
which violates the assumption of uncorrelated errors. In order to
resolve this problem of spatial autocorrelation in regression residuals,
the function lmFilter()
estimates a spatially filtered
linear regression model using an unsupervised stepwise regression to
identify relevant eigenvectors derived from the transformed connectivity
matrix. Below, the unsupervised eigenvector search algorithm selects
eigenvectors based on the reduction in residual autocorrelation. The
output is a class spfilter
object.
# ESF model
<- lmFilter(y = y, x = X, W = W, objfn = 'MI', positive = TRUE,
(lm.esf ideal.setsize = TRUE, tol = .2))
#> 8 out of 22 candidate eigenvectors selected
summary(lm.esf, EV = TRUE)
#>
#> - Spatial Filtering with Eigenvectors (Linear Model) -
#>
#> Coefficients (OLS):
#> Estimate SE p-value
#> (Intercept) 9.2169316 0.66572188 5.088489e-24 ***
#> beta_1 0.9947113 0.07995667 2.932913e-21 ***
#>
#> Adjusted R-squared:
#> Initial Filtered
#> 0.4673945 0.7257458
#>
#> Filtered for positive spatial autocorrelation
#> 8 out of 22 candidate eigenvectors selected
#> Objective Function: "MI"
#>
#> Summary of selected eigenvectors:
#> Estimate SE p-value partialR2 VIF MI
#> ev_13 -9.580545 1.447552 2.550912e-09 0.23208263 1.005781 0.6302019 ***
#> ev_4 -3.239103 1.481047 3.133258e-02 0.02873360 1.049729 0.9257835 *
#> ev_10 -5.529712 1.453592 2.589498e-04 0.07901543 1.013360 0.7303271 ***
#> ev_2 4.885225 1.444218 1.064466e-03 0.06058390 1.001660 1.0004147 **
#> ev_9 3.458918 1.469630 2.076982e-02 0.03041775 1.034209 0.7638378 *
#> ev_20 4.188058 1.445305 4.720290e-03 0.04411345 1.002998 0.4539879 **
#> ev_5 -2.863850 1.452437 5.170840e-02 0.02055372 1.011899 0.8968632 .
#> ev_19 3.805433 1.443517 9.873617e-03 0.03671349 1.000798 0.4615722 **
#>
#> Moran's I (Residuals):
#> Observed Expected Variance z p-value
#> Initial 0.35056797 -0.01192610 0.01207299 3.2990854 0.0004850019 ***
#> Filtered -0.03446177 -0.07716672 0.04880722 0.1933019 0.4233612531
While the print
method shows that 8 eigenvectors were
selected from the candidate set consisting of 22 eigenvectors, the
summary
method provides additional information. Besides the
ordinary least squares (OLS) parameter estimates, the output shows that
the ESF model filters for positive spatial autocorrelation using the
minimization of residual autocorrelation as objective function during
the eigenvector search. A comparison between the filtered and the
nonspatial OLS model with respect to model fit and residual
autocorrelation is also provided. Since the option EV
is
set to TRUE
, the summary
method also displays
information on the selected eigenvectors. As the results show, the ESF
model successfully removes the spatial pattern from model residuals.
The plot
method allows for an easy visualization of the
results. The graph displays the Moran coefficient associated with each
of the eigenvectors. The shaded area signifies the candidate set and
selected eigenvectors are illustrated by black dots.
plot(lm.esf)
Moreover, lmFilter()
also allows users to select
eigenvectors based on alternative selection criteria:
### Alternative selection criteria:
# maximization of model fit
lmFilter(y = y, x = X, W = W, objfn = 'R2', positive = TRUE, ideal.setsize = TRUE)
#> 11 out of 22 candidate eigenvectors selected
# significance of residual autocorrelation
lmFilter(y = y, x = X, W = W, objfn = 'pMI', sig = .1, bonferroni = FALSE,
positive = TRUE, ideal.setsize = TRUE)
#> 3 out of 22 candidate eigenvectors selected
# significance of eigenvectors
lmFilter(y = y, x = X, W = W, objfn = 'p', sig = .1, bonferroni = TRUE,
positive = TRUE, ideal.setsize = TRUE)
#> 3 out of 22 candidate eigenvectors selected
# all eigenvectors in the candidate set
lmFilter(y = y, x = X, W = W, objfn = 'all', positive = TRUE, ideal.setsize = TRUE)
#> 22 out of 22 candidate eigenvectors selected
If users wish to select eigenvectors based on individual selection
criteria, they can obtain the eigenvectors using the function
getEVs()
and perform a supervised selection procedure using
the basic stats::lm()
command.
The ESF approach outlined above easily extends to generalized linear
models (GLMs) as well (see also Griffith, Chun,
and Li 2019). Therefore, the spfilteR
package
contains the function glmFilter()
which uses maximum
likelihood estimation (MLE) to fit a spatially filtered GLM and performs
an unsupervised eigenvector search based on alternative objective
functions.
Except for minor differences, glmFilter()
works similar
to the lmFilter()
function discussed above. The option
model
defines the model that needs to be estimated.
Currently, unsupervised spatial filtering is possible for logit, probit,
and poisson models. Moreover, the option optim.method
specifies the optimization method used to maximize the log-likelihood
function. Finally, resid.type
allows users to define the
type of residuals used to calculate Moran’s I and
boot.MI
is an integer specifying the number of bootstrap
permutations to obtain the variance of Moran’s I.
# define outcome variables
<- fakedataset$indicator
y.bin <- fakedataset$count
y.count
# ESF logit model
<- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'logit',
(logit.esf optim.method = 'BFGS', sig = .1, bonferroni = FALSE,
positive = TRUE, ideal.setsize = FALSE, alpha = .25,
resid.type = 'deviance', boot.MI = 100))
#> 5 out of 31 candidate eigenvectors selected
# ESF probit model
<- glmFilter(y = y.bin, x = NULL, W = W, objfn = 'p', model = 'probit',
(probit.esf optim.method = 'BFGS', sig = .1, bonferroni = FALSE,
positive = TRUE, ideal.setsize = FALSE, alpha = .25,
resid.type = 'deviance', boot.MI = 100))
#> 5 out of 31 candidate eigenvectors selected
# ESF poisson model
<- glmFilter(y = y.count, x = NULL, W = W, objfn = 'BIC', model = 'poisson',
(poisson.esf optim.method = 'BFGS', positive = TRUE, ideal.setsize = FALSE,
alpha = .25, resid.type = 'deviance', boot.MI = 100))
#> 7 out of 31 candidate eigenvectors selected
Again, if users wish to define individual selection criteria or fit a
GLM currently not implemented in glmFilter()
, they can
obtain the eigenvectors using the getEVs()
command and
perform supervised eigenvector selection using the standard stats::glm()
function.