Spatio-temporal data are common in practice. Such data often have
complex structures that are difficult to describe and estimate. To
overcome this difficulty, the R
package
SpTe2M has been developed to estimate the underlying
spatio-temporal mean and covariance structures. This package also
includes tools of online process monitoring to detect change-points in a
spatio-temporal process over time. More specifically, it implements the
nonparametric spatio-temporal data modeling methods described in Yang
and Qiu (2018, 2019, and 2022), as well as the online spatio-temporal
process monitoring methods discussed in Qiu and Yang (2021 and 2023) and
Yang and Qiu (2020). In this vignette, we will demonstrate the main
functions in SpTe2M by using the Florida influenza-like
illness (ILI) data, which is a built-in dataset of the package.
The Florida ILI dataset, named as ili_dat
in
SpTe2M, contains daily ILI incidence rates at 67
Florida counties during years 2012-2014. The observed IIL incidence
rates was collected by the Electronic Surveillance System for the Early
Notification of Community-based Epidemics of the Florida Department of
Health. In addition to observations of the variable Rate
(i.e., ILI incidence rate), the dataset also includes observations of
the following 7 variables: County
, Date
,
Lat
(i.e., latitude of the geometric center of a county),
Long
(i.e., longitude of the geometric center of a county),
Time
, Temp
(i.e., air temperature), and
RH
(i.e., relative humidity). First of all, let us install
and load the package SpTe2M and take a quick look at
the ILI data.
First, we run the following R
code to install the
package from CRAN and load the package in R
:
Then, we read the built-in dataset ili_dat
and use the
R
function head()
to display its first 6
rows.
data(ili_dat)
head(ili_dat)
#> County Date Lat Long Time Rate Temp RH
#> 1 alachua 1/1/2012 29.65320 -82.32440 0.00273224 8.020372e-06 60.5 78.6
#> 2 baker 1/1/2012 30.27236 -82.21351 0.00273224 0.000000e+00 62.0 78.6
#> 3 bay 1/1/2012 30.16190 -85.65297 0.00273224 3.532404e-05 61.5 78.6
#> 4 bradford 1/1/2012 29.96893 -82.12255 0.00273224 0.000000e+00 61.0 78.6
#> 5 brevard 1/1/2012 28.70765 -80.89049 0.00273224 0.000000e+00 68.0 78.6
#> 6 broward 1/1/2012 26.05192 -80.14526 0.00273224 2.527846e-05 71.0 78.6
Next, we make some geographic maps to investigate the spatio-temporal
patterns of the ILI incidence. To do this, we first combine
ili_dat
with the map shape files included in the packages
maps and mapproj and then create maps
by using the function ggplot()
in the package
ggplot2.
library(maps)
library(ggplot2)
library(mapproj)
# turn shape files in the maps packages into a data frame
FL <- map_data('county','florida')
names(FL)[6] <- 'County'
# Only plot maps on Jun 15 and Dec 15
subdat <- subset(ili_dat,Date%in%c('6/15/2012','6/15/2013','6/15/2014',
'12/15/2012','12/15/2013','12/15/2014'))
subdat$Date <- factor(subdat$Date,levels=c('6/15/2012','6/15/2013','6/15/2014',
'12/15/2012','12/15/2013','12/15/2014'))
# merge ILI data with map data
mydat <- merge(FL,subdat)
# make maps using ggplot
maps <- ggplot(data=mydat,aes(long,lat,group=group,fill=Rate))+geom_polygon()+
facet_wrap(~Date,ncol=3)+geom_path(colour='grey10',lwd=0.5)+
scale_fill_gradient2('',low='cyan',mid='white',high='navy',
guide='colorbar',limits=c(0,0.0003),na.value='orange',
breaks=c(0,0.0001,0.0002,0.0003),
labels=c('0','1e-4','2e-4','3e-4'))+
guides(fill=guide_colorbar(barwidth=25,barheight=1,direction='horizontal'),
cex=1)+theme_bw(base_size=15)+xlab('longitude')+ylab('latitude')+
theme(legend.position='bottom',
axis.ticks=element_blank(),
line=element_blank(),
axis.text=element_blank(),
panel.border=element_rect(color='black',linewidth=1.2),
axis.line=element_line(colour='black'),
legend.margin=margin(-10,0,0,0),
legend.box.margin=margin(0,0,0,0))
# display the maps
maps
From the maps, it can be seen that ILI incidence rates have obvious seasonal patterns with more ILI cases in the winters and fewer ILI cases in the summers. Moreover, there seems to be an unusual pattern of ILI incidence rates in the winter of 2014 because the incidence rates on 12/15/2014 are much higher than those on 12/15/2012 and 12/15/2013. Next, we appply the package SpTe2M to this dataset to estimate the spatio-temporal mean and covariance structures and monitor the ILI incidence sequentially over time.
In this part, we estimate the spatio-temporal mean and covariance
structures in year 2013 by using the functions
spte_meanest()
and spte_covest()
in
SpTe2M. To this end, we first use the following code to
extract the observed ILI data in year 2013.
n <- 365; m <- 67; N1 <- (n+1)*m; N2 <- n*m
# extract the observed ILI data in year 2013
ili13 <- ili_dat[(N1+1):(N1+N2),]
y13 <- ili13$Rate; st13 <- ili13[,c('Lat','Long','Time')]
After obtaining the ILI data in year 2013, the spatio-temporal mean
structure can be estimated by applying the function
spte_meanest()
to the extracted data.
Note that we don’t specify the arguments ht
and
hs
when using the function spte_meanest()
. In
such a case, the two bandwidths ht
and hs
would be selected automatically by the modified cross-validation
procedure (cf., Yang and Qiu 2018).
To visually check whether the estimated means describe the observed ILI data well, we use the code below to plot the estimated means for the 4 Florida counties Broward, Lake, Pinellas, and Seminole, along with the observed ILI incidence rates at the 4 counties.
mu <- mu.est$muhat; mu <- t(matrix(mu,m,n))
obs <- t(matrix(y13,m,n))
ids <- c(6,34,52,57) # IDs for Broward, Lake, Pinellas, Seminole
par(mfrow=c(2,2),mgp=c(2.4,1,0))
par(mar=c(3.5,3.5,1.5,0.5))
plot(1:365,mu[,ids[1]],type='l',lty=1,lwd=1.5,xaxt='n',
ylim=c(0,8e-5),main='Broward',xlab='Time',ylab='Incidence rate',
cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[1]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3,1.5,1))
plot(1:365,mu[,ids[2]],type='l',lty=1,lwd=1.5,xaxt='n',
ylim=c(0,8e-5),main='Lake',xlab='Time',ylab='',
cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[2]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3.5,1.5,0.5))
plot(1:365,mu[,ids[3]],type='l',lty=1,lwd=1.5,xaxt='n',
ylim=c(0,8e-5),main='Pinellas',xlab='Time',ylab='Incidence rate',
cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[3]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3,1.5,1))
plot(1:365,mu[,ids[4]],type='l',lty=1,lwd=1.5,xaxt='n',
ylim=c(0,8e-5),main='Seminole',xlab='Time',ylab=' ',
cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[4]],cex=1)
axis(1.2,at=c(1+c(1,62,123,184,245,306, 367)),
label=c('Jan','Mar','May','July','Sep','Nov','Jan'))