SpTe2M: Nonparametric Modeling and Monitoring of Spatio-Temporal Data

Kai Yang, Peihua Qiu

September 29, 2023

Introduction

Spatio-temporal data are common in practice. Such data often have complex structures that are difficult to describe and estimate. To overcome this difficulty, the R package SpTe2M has been developed to estimate the underlying spatio-temporal mean and covariance structures. This package also includes tools of online process monitoring to detect change-points in a spatio-temporal process over time. More specifically, it implements the nonparametric spatio-temporal data modeling methods described in Yang and Qiu (2018, 2019, and 2022), as well as the online spatio-temporal process monitoring methods discussed in Qiu and Yang (2021 and 2023) and Yang and Qiu (2020). In this vignette, we will demonstrate the main functions in SpTe2M by using the Florida influenza-like illness (ILI) data, which is a built-in dataset of the package.

Florida ILI Data

The Florida ILI dataset, named as ili_dat in SpTe2M, contains daily ILI incidence rates at 67 Florida counties during years 2012-2014. The observed IIL incidence rates was collected by the Electronic Surveillance System for the Early Notification of Community-based Epidemics of the Florida Department of Health. In addition to observations of the variable Rate (i.e., ILI incidence rate), the dataset also includes observations of the following 7 variables: County, Date, Lat (i.e., latitude of the geometric center of a county), Long (i.e., longitude of the geometric center of a county), Time, Temp (i.e., air temperature), and RH (i.e., relative humidity). First of all, let us install and load the package SpTe2M and take a quick look at the ILI data.

Load ILI data

First, we run the following R code to install the package from CRAN and load the package in R:

install.packages("SpTe2M")
library(SpTe2M)

Then, we read the built-in dataset ili_dat and use the R function head() to display its first 6 rows.

data(ili_dat)
head(ili_dat)
#>     County     Date      Lat      Long       Time         Rate Temp   RH
#> 1  alachua 1/1/2012 29.65320 -82.32440 0.00273224 8.020372e-06 60.5 78.6
#> 2    baker 1/1/2012 30.27236 -82.21351 0.00273224 0.000000e+00 62.0 78.6
#> 3      bay 1/1/2012 30.16190 -85.65297 0.00273224 3.532404e-05 61.5 78.6
#> 4 bradford 1/1/2012 29.96893 -82.12255 0.00273224 0.000000e+00 61.0 78.6
#> 5  brevard 1/1/2012 28.70765 -80.89049 0.00273224 0.000000e+00 68.0 78.6
#> 6  broward 1/1/2012 26.05192 -80.14526 0.00273224 2.527846e-05 71.0 78.6

Create ILI maps

Next, we make some geographic maps to investigate the spatio-temporal patterns of the ILI incidence. To do this, we first combine ili_dat with the map shape files included in the packages maps and mapproj and then create maps by using the function ggplot() in the package ggplot2.

library(maps)
library(ggplot2)
library(mapproj)
# turn shape files in the maps packages into a data frame
FL <- map_data('county','florida')
names(FL)[6] <- 'County'
# Only plot maps on Jun 15 and Dec 15
subdat <- subset(ili_dat,Date%in%c('6/15/2012','6/15/2013','6/15/2014',
                          '12/15/2012','12/15/2013','12/15/2014'))
subdat$Date <- factor(subdat$Date,levels=c('6/15/2012','6/15/2013','6/15/2014',
                                       '12/15/2012','12/15/2013','12/15/2014'))
# merge ILI data with map data
mydat <- merge(FL,subdat)
# make maps using ggplot
maps <- ggplot(data=mydat,aes(long,lat,group=group,fill=Rate))+geom_polygon()+
  facet_wrap(~Date,ncol=3)+geom_path(colour='grey10',lwd=0.5)+
  scale_fill_gradient2('',low='cyan',mid='white',high='navy',
                       guide='colorbar',limits=c(0,0.0003),na.value='orange', 
                       breaks=c(0,0.0001,0.0002,0.0003), 
                       labels=c('0','1e-4','2e-4','3e-4'))+
  guides(fill=guide_colorbar(barwidth=25,barheight=1,direction='horizontal'),
         cex=1)+theme_bw(base_size=15)+xlab('longitude')+ylab('latitude')+
  theme(legend.position='bottom',
        axis.ticks=element_blank(),
        line=element_blank(),
        axis.text=element_blank(),
        panel.border=element_rect(color='black',linewidth=1.2),
        axis.line=element_line(colour='black'),
        legend.margin=margin(-10,0,0,0),
        legend.box.margin=margin(0,0,0,0))
# display the maps
maps

From the maps, it can be seen that ILI incidence rates have obvious seasonal patterns with more ILI cases in the winters and fewer ILI cases in the summers. Moreover, there seems to be an unusual pattern of ILI incidence rates in the winter of 2014 because the incidence rates on 12/15/2014 are much higher than those on 12/15/2012 and 12/15/2013. Next, we appply the package SpTe2M to this dataset to estimate the spatio-temporal mean and covariance structures and monitor the ILI incidence sequentially over time.

ILI Data Modeling

In this part, we estimate the spatio-temporal mean and covariance structures in year 2013 by using the functions spte_meanest() and spte_covest() in SpTe2M. To this end, we first use the following code to extract the observed ILI data in year 2013.

n <- 365; m <- 67; N1 <- (n+1)*m; N2 <- n*m
# extract the observed ILI data in year 2013
ili13 <- ili_dat[(N1+1):(N1+N2),]
y13 <- ili13$Rate; st13 <- ili13[,c('Lat','Long','Time')]

Estimate spatio-temporal mean

After obtaining the ILI data in year 2013, the spatio-temporal mean structure can be estimated by applying the function spte_meanest() to the extracted data.

# estimate the mean
mu.est <- spte_meanest(y=y13,st=st13)

Note that we don’t specify the arguments ht and hs when using the function spte_meanest(). In such a case, the two bandwidths ht and hs would be selected automatically by the modified cross-validation procedure (cf., Yang and Qiu 2018).

To visually check whether the estimated means describe the observed ILI data well, we use the code below to plot the estimated means for the 4 Florida counties Broward, Lake, Pinellas, and Seminole, along with the observed ILI incidence rates at the 4 counties.

mu <- mu.est$muhat; mu <- t(matrix(mu,m,n))
obs <- t(matrix(y13,m,n))
ids <- c(6,34,52,57) # IDs for Broward, Lake, Pinellas, Seminole
par(mfrow=c(2,2),mgp=c(2.4,1,0))
par(mar=c(3.5,3.5,1.5,0.5))
plot(1:365,mu[,ids[1]],type='l',lty=1,lwd=1.5,xaxt='n',
     ylim=c(0,8e-5),main='Broward',xlab='Time',ylab='Incidence rate',
     cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[1]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
     label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3,1.5,1))
plot(1:365,mu[,ids[2]],type='l',lty=1,lwd=1.5,xaxt='n',
     ylim=c(0,8e-5),main='Lake',xlab='Time',ylab='',
     cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[2]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
     label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3.5,1.5,0.5))
plot(1:365,mu[,ids[3]],type='l',lty=1,lwd=1.5,xaxt='n',
     ylim=c(0,8e-5),main='Pinellas',xlab='Time',ylab='Incidence rate',
     cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[3]],cex=1)
axis(1,cex.axis=1.2,at=c(1+c(1,62,123,184,245,306, 367)),
     label=c('Jan','Mar','May','July','Sep','Nov','Jan'))
par(mar=c(3.5,3,1.5,1))
plot(1:365,mu[,ids[4]],type='l',lty=1,lwd=1.5,xaxt='n',
     ylim=c(0,8e-5),main='Seminole',xlab='Time',ylab=' ',
     cex=1.2,cex.lab=1.3,cex.axis=1.2,cex.main=1.3)
points(1:365,obs[,ids[4]],cex=1)
axis(1.2,at=c(1+c(1,62,123,184,245,306, 367)),
     label=c('Jan','Mar','May','July','Sep','Nov','Jan'))