The synlik
package provides Synthetic Likelihood methods for intractable likelihoods. The package is meant to be
as general purpose as possible: as long as you are able to simulate data from your model you should be able to fit it.
A synlik
object is mainly composed of the simulator
, which is the function that simulates data from the model of interest.
In addition, it is possible to specify a function summaries
, which transforms the data generated by simulator
into
summary statistics. The simulator
can generate any kind of output, as long as summaries
is able to transform it into a matrix
where each row is a simulated vector of statistics. If summaries
is not specified, then simulator
has to output such a matrix.
Here we set-up a synlik
object representing the Ricker map. The observations are given by \(Y_t \sim Pois(\phi N_t)\), where the hidden state has the following dynamics: \(N_t = r N_{t-1} exp( -N_{t-1} + e_t )\). This is how we create the object:
library(synlik)
## Loading required package: Rcpp
ricker_sl <- synlik(simulator = rickerSimul,
summaries = rickerStats,
param = c( logR = 3.8, logSigma = log(0.3), logPhi = log(10) ),
extraArgs = list("nObs" = 50, "nBurn" = 50)
)
Here:
rickerSimul
and rickerStats
are functions provided by synlik
.param
is a named vector that contains the log-parameters that will be used by rickerSimul(param, nsim, extraArgs, ...)
. extraArgs
contains additional arguments required by rickerSimul
, see ?rickerSimul
for details.Now we are ready to simulate data from the object:
ricker_sl@data <- simulate(ricker_sl, nsim = 1, seed = 54)
Here ricker_sl@data
is just a vector, but synlik allows the simulator to simulate any kind of object, so it is often
necessary encapsulate an adequate plotting function into the object:
ricker_sl@plotFun <- function(input, ...) plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
plot(ricker_sl)
If we want to simulate several datasets we simply do:
tmp <- simulate(ricker_sl, nsim = 10)
dim(tmp)
## [1] 10 50
So far we have just simulated data, not summary statistics. In this particular example rickerStats
needs to be passed the reference data in ricker_sl@data
in order to be able to calculate the statistics. We can do that by using the slot extraArgs
:
ricker_sl@extraArgs$obsData <- ricker_sl@data
Now we are ready to simulate summary statistics:
tmp <- simulate(ricker_sl, nsim = 2, stats = TRUE)
tmp
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.8016316 0.0005366660 2.427694e-05 0.03743647 -0.2248205 36.90 17
## [2,] 0.9022046 0.0003317784 1.959014e-05 0.05564859 -0.2073172 37.38 21
## [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 3180.570 -903.5390 -370.0129 -526.6177 46.91435 -203.1856
## [2,] 3555.596 -863.9911 -930.2048 547.7001 -842.12995 225.4435
and to check their approximate normality:
checkNorm(ricker_sl)
If we want to estimate the value of the synthetic likelihood at a certain location in the parameter space, we can do it by using the function slik
as follows:
slik(ricker_sl,
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1e3)
## [1] -20.69029
We can also look at slices of this function wrt each parameter:
slice(object = ricker_sl,
ranges = list("logR" = seq(3.5, 3.9, by = 0.01),
"logPhi" = seq(2, 2.6, by = 0.01),
"logSigma" = seq(-2, -0.5, by = 0.02)),
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1000)
Finally we can have 2D slices with respect to pairs of parameters:
slice(object = ricker_sl,
ranges = list("logR" = seq(3.5, 3.9, by = 0.02),
"logPhi" = seq(2, 2.6, by = 0.02)),
pairs = TRUE,
param = c(logR = 3.8, logSigma = log(0.3), logPhi = log(10)),
nsim = 1000,
multicore = TRUE,
ncores = 2)
Notice that here we have used the multicore
option, which distributes the computation among different cores or cluster nodes.
Also slik
provides this option, but for such a simple model the time needed to set up the cluster is longer than the simulation time.
The unknown model parameters can be estimated by MCMC, using the smcmc
function.
Here is an example (you might want to increase niter
):
ricker_sl <- smcmc(ricker_sl,
initPar = c(3.2, -1, 2.6),
niter = 10,
burn = 3,
priorFun = function(input, ...) sum(input),
propCov = diag(c(0.1, 0.1, 0.1))^2,
nsim = 500)
Notice that priorFun
returns the log-density of the prior. If we have not reached convergence we can do some more MCMC iterations by using the continue
generic:
ricker_sl <- continue(ricker_sl, niter = 10)
Finally we can plot the MCMC output (here we plot a pre-computed object):
data(ricker_smcmc)
addline1 <- function(parNam, ...) abline(h = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3)
addline2 <- function(parNam, ...) abline(v = ricker_smcmc@param[parNam], lwd = 2, lty = 2, col = 3)
plot(ricker_smcmc, addPlot1 = "addline1", addPlot2 = "addline2")
## [1] "Plotting the MCMC chains"
## Press <Enter> to see the next plot...
## [1] "The posterior densities"
## Press <Enter> to see the next plot...
## [1] "Plotting the log-likelihood chain"
## Press <Enter> to see the next plot...
## [1] "Plotting correlation structure of the posterior sample"
## Press <Enter> to see the next plot...
were we have added the green dotted lines, indicating the position of the true parameters.
As a more challenging example, let us consider the Blowfly model proposed by Wood (2010):
\[
N_{t} = R_t + S_t,
\]
where
\[
R_t \sim \text{Pois}(PN_{t-\tau}e^{-\frac{N_{t-\tau}}{N_0}}e_t),
\]
represents delayed recruitment process, while:
\[
S_t \sim \text{binom}(e^{-\delta\epsilon_t}, N_{t-1}),
\]
denotes the adult survival process. Finally \(e_t\) and \(\epsilon_t\) are independent gamma distributed random variables, with unit means and variances equal to \(\sigma_p^2\) and \(\sigma_d^2\) respectively. We start by creating a synlik
object:
blow_sl <- synlik(simulator = blowSimul,
summaries = blowStats,
param = log( c( "delta" = 0.16, "P" = 6.5, "N0" = 400,
"var.p" = 0.1, "tau" = 14, "var.d" = 0.1) ),
extraArgs = list("nObs" = 200, "nBurn" = 200, "steps" = 1),
plotFun = function(input, ...){
plot(drop(input), type = 'l', ylab = "Pop", xlab = "Time", ...)
}
)
for more details see ?blow_sl
. As before we simulate data and we store it back into the object:
blow_sl@data <- simulate(blow_sl, seed = 84)
blow_sl@extraArgs$obsData <- blow_sl@data
As for the Ricker example, blowStats
needs the observed data to calculate the statistics, hence we store it
into extraArgs$obsData
. Notice that this is just a peculiarity of the chosen summaries
function.
Then, we can estimate the parameters by adaptive MCMC (you might want to increase niter
):
blow_sl <- smcmc(blow_sl,
initPar = log( c( "delta" = 0.1, "P" = 8, "N0" = 600,
"sig.p" = 0.2, "tau" = 17, "sig.d" = 0.3) ),
niter = 2,
burn = 0,
propCov = diag(rep(0.001, 6)),
nsim = 500,
prior = function(input, ...){
sum(input) +
dunif(input[4], log(0.01), log(1), log = TRUE) +
dunif(input[6], log(0.01), log(1), log = TRUE)
},
targetRate = 0.15,
multicore = FALSE
)
We plot the results on the original scale (here we plot a pre-computed object):
data(blow_smcmc)
tmpTrans <- rep("exp", 6)
names(tmpTrans) <- names(blow_smcmc@param)
plot(blow_smcmc, trans = tmpTrans)
## [1] "Plotting the MCMC chains"
## Press <Enter> to see the next plot...
## [1] "The posterior densities"
## Press <Enter> to see the next plot...
## [1] "Plotting the log-likelihood chain"
## Press <Enter> to see the next plot...
## [1] "Plotting correlation structure of the posterior sample"
## Press <Enter> to see the next plot...
The package includes some of Nicholson (1954) experimental datasets, which can be loaded into the object:
data(bf1)
blow_sl@data <- bf1$pop
blow_sl@extraArgs$obsData <- blow_sl@data
Then it is possible to fit the model using the experimental data by MCMC.
Let us consider a model that does not produce time series data: the alpha-stable distribution, as
described in Nolan (2012). For a quick reference do library("stableDist")
and ?rstable
.
As a first step we need to wrap the function rstable
, which simulates random variables from this distribution,
into a function that fits the synlik
framework:
stableSimul <- function(param, nsim, extraArgs, ...)
{
if( !is.loaded("stabledist") ) library(stabledist)
# Some sanity check
if( !( c("nObs") %in% names(extraArgs) ) ) stop("extraArgs should contain nObs")
nObs <- extraArgs$nObs
stopifnot( length(param) == 4 )
param[c(1, 3)] <- exp(param[c(1, 3)])
if(abs(param[1] - 1) < 0.01) stop("alpha == 1 is not allowed")
# Actual simulation
output <- rstable(nObs * nsim, alpha = param[1], beta = param[2], gamma = param[3], delta = param[4])
return( matrix(output, nsim, nObs) )
}
We need also to set up a function that calculates the summary statistics, which in our case are 10 quantiles:
stableStats <- function(x, extraArgs, ...){
if (!is.matrix(x)) x <- matrix(x, 1, length(x))
X0 <- t( apply(x, 1, quantile, probs = seq(0.1, 0.9, length.out = 10)) )
unname(X0)
}
We then create a synlik
object:
stable_sl <- synlik( simulator = stableSimul,
summaries = stableStats,
param = c(alpha = log(1.5), beta = 0.1, gamma = log(1), delta = 2),
extraArgs = list("nObs" = 1000),
plotFun = function(input, ...) hist(input, xlab = "x", main = "The data", ...)
)
stable_sl@data <- simulate(stable_sl, seed = 67)
plot(stable_sl)
As we see from the histogram, the distribution is quite fat-tailed.
As before, can then estimate the parameters by MCMC (you might want to increase niter
):
stable_sl <- smcmc(stable_sl,
initPar = c(alpha = log(1.7), beta = -0.1, gamma = log(1.3), delta = 1.5),
niter = 2,
burn = 0,
priorFun = function(input, ...) {
dunif(input[1], log(1), log(2), log = TRUE) +
dunif(input[2], -1, 1, log = TRUE)
},
propCov = diag(c(0.1, 0.1, 0.1, 0.1))^2,
targetRate = 0.25,
nsim = 200)
# plot(stable_sl, trans = c("alpha" = "exp", "gamma" = "exp"))
By slicing the likelihood we check whether the model is well identified:
slice(object = stable_sl,
ranges = list("alpha" = log(seq(1.2, 1.9, by = 0.05)),
"beta" = seq(-0.5, 0.5, by = 0.05),
"gamma" = log(seq(0.5, 1.9, by = 0.05)),
"alpha" = seq(1.1, 2.2, by = 0.05)),
param = stable_sl@param,
trans = list("alpha" = "exp", "gamma" = "exp"),
nsim = 1000,
multicore = TRUE,
ncores = 2)
We probably could do better by putting some more effort in choosing the summary statistics.
Simon N Wood. Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466(7310):1102–1104, 2010.
Alexander J Nicholson. An outline of the dynamics of animal populations. Australian Journal of Zoology, 2(1):9–65, 1954.
Diethelm Wuertz, Martin Maechler and Rmetrics core team members. (2013). stabledist: Stable Distribution Functions. R package version 0.6–6.