nimbleNoBounds
This vignette assumes the reader is already familiar with using NIMBLE to perform Markov chain Monte Carlo (MCMC).
The principal motivation of this package is that bounds on prior distributions can impair MCMC performance when posterior distributions are located close to those bounds. In particular, when using Metropolis-Hastings, proposals made beyond the known bounds of a distribution will inflate rejection rates. Moreover, in adaptive Metropolis-Hastings, such increased rejection rates will lead to shrinking the Metropolis-Hastings kernel, thereby increasing the auto-correlation in the accepted samples and reducing the effective sample size. The result can therefore be reduced MCMC efficiency.
This package offers a simple solution to avoid this scenario. Common bounded probability distributions (beta, uniform, exponential, gamma, chi-squared, half-flat, inverse-gamma, log-normal, Weibull) are transformed, via log or logit transformations, to the unbounded real line. The change of variables technique is used to obtain the transformed distributions. The resulting distributions are provided here for use in NIMBLE models.
In this vignette we explore the potential efficiency gains with a simple toy example.
nimbleNoBounds
The following transformed distributions are available.
Function | Transform | Distribution | Corresponding R function |
---|---|---|---|
dLogChisq |
Log | Chi-squared | dchisq |
dLogExp |
Log | Exponential | dexp |
dLogGamma |
Log | Gamma | dgamma |
dLogHalfflat |
Log | Half-flat | dhalfflat |
dLogInvgamma |
Log | Inverse-gamma | dinvgamma |
dLogLnorm |
Log | Log-normal | dlnorm |
dLogWeib |
Log | Weibull | dweib |
dLogitBeta |
Logit | Beta | dbeta |
dLogitUnif |
Logit | Uniform | dunif |
Note: the naming convention is d[Transform][Distribution]
. Only d
and r
functions have been provided. I do not plan to add p
or q
functions, but am open to collaboration if somebody else is motivated to add them.
The following toy problem compares sampling of bounded and unbounded distributions.
First, we create NIMBLE models with both classical bounded distributions, and with unbounded transformed equivalents.
library(nimbleNoBounds) # Also loads nimble
library(coda) # For MCMC diagnostics
## Model with bounded priors
boundedDistributionsModel <- nimbleCode({
b ~ dbeta(b1, b2)
c ~ dchisq(c1)
e ~ dexp(e1)
g ~ dgamma(g1, g2)
ig ~ dinvgamma(i1, i2)
l ~ dlnorm(l1, l2)
u ~ dunif(u1, u2)
w ~ dweib(w1, w2)
})
## Model with unbounded priors
unboundedDistributionsModel <- nimbleCode({
## Priors transformed to real line
tb ~ dLogitBeta(b1, b2)
tc ~ dLogChisq(c1)
te ~ dLogExp(e1)
tg ~ dLogGamma(g1, g2)
tig ~ dLogInvgamma(i1, i2)
tl ~ dLogLnorm(l1, l2)
tu ~ dLogitUnif(u1, u2)
tw ~ dLogWeib(w1, w2)
## Back-transformations
b <- ilogit(tb)
c <- exp(tc)
e <- exp(te)
g <- exp(tg)
ig <- exp(tig) ## Currently doesn't compile if a node is called "i"
l <- exp(tl)
u <- ilogit(tu)*(u2-u1)+u1 ## Scaled and shifted output from ilogit transformation
w <- exp(tw)
})
## List of (fixed) parameters
const = list(
b1=1 , b2=11, ## Beta
c1=2 , ## Chi-squared
i1=2.5 , i2=0.01, ## Inverse-gamma
u1=-6 , u2=66, ## Uniform
e1=0.1 , ## Exponential
g1=0.1 , g2=10, ## Gamma
l1=-3 , l2=0.1, ## Log-normal
w1=2 , w2=2 ## Weibull
)
## Build and compile models
rBounded <- nimbleModel(boundedDistributionsModel, constants=const)
rUnbounded <- nimbleModel(unboundedDistributionsModel, constants=const)
cBounded <- compileNimble(rBounded)
cUnbounded <- compileNimble(rUnbounded)
## Lists of nodes
stochNodesB = rBounded$getNodeNames(stochOnly=TRUE)
stochNodesU = rUnbounded$getNodeNames(stochOnly=TRUE)
monitorNodes = stochNodesB
distributions = substring(rBounded$getDistribution(monitorNodes), 2, 99)
Now we perform MCMC using NIMBLE’S univariate Metropolis-Hastings sampler. We use B
to indicate bounded and U
to indicate unbounded. The aim is to calculate the efficiency gain when sampling each distribution on the real line. This is done within a loop so that between-run variation can be accounted for. Efficiency is calculated as the ratio of effective sample size divided by time spent executing a given sampler. The efficiency gain is then calculated as the ratio (unbounded over bounded) of the efficiencies.
## Configure an MCMC for each model
configureMcmcB = configureMCMC(cBounded, monitors=monitorNodes)
configureMcmcU = configureMCMC(cUnbounded, monitors=monitorNodes)
## Remove posterior predictive samplers & replace with univariate Metropolis-Hastings
configureMcmcB$removeSamplers()
configureMcmcU$removeSamplers()
for (nd in stochNodesB) configureMcmcB$addSampler(nd)
for (nd in stochNodesU) configureMcmcU$addSampler(nd)
print(configureMcmcB)
print(configureMcmcU)
## Build & compile the MCMCs
rMcmcB <- buildMCMC(configureMcmcB)
rMcmcU <- buildMCMC(configureMcmcU)
cMcmcB <- compileNimble(rMcmcB)
cMcmcU <- compileNimble(rMcmcU)
## The following loop takes approximately 10 minutes.
## You can optionally skip running the loop and load the data object efficiencyGain from the package.
## Or simply just reduce nSamples and nReps.
runLoop = FALSE
if (runLoop) {
nSamples = 1E5
nReps = 111
efficiencyGain = matrix(0, ncol=length(monitorNodes), nrow=nReps)
for (ii in 1:nReps) {
print(ii)
## Run MCMC
cMcmcB$run(niter=nSamples, time = TRUE)
cMcmcU$run(niter=nSamples, time = TRUE)
## Extract samples
samplesB = as.matrix(cMcmcB$mvSamples)
samplesU = as.matrix(cMcmcU$mvSamples)
## Calculate effective sample size
(effB = effectiveSize(samplesB))
(effU = effectiveSize(samplesU))
## Time spent in each sampler
timeB = cMcmcB$getTimes()
timeU = cMcmcU$getTimes()
## Calculate sampling efficiency
(efficiencyB = effB / timeB)
(efficiencyU = effU / timeU)
## Calculate efficiencyGain
(efficiencyGain[ii,] = efficiencyU / efficiencyB)
}
colnames(efficiencyGain) = distributions
## write.table(efficiencyGain, file=here::here("nimbleNoBounds/data/efficiencyGain.csv"), sep=";", row.names = FALSE)
}
Finally, some box-plots of the efficiency gain per distribution, annotated with the with mean and 95% CIs.
data(efficiencyGain, package="nimbleNoBounds")
boxplot(log10(efficiencyGain), ylab="log10 Efficiency Gain", ylim=c(-1, 4))
abline(h=0)
for (ii in 1:length(distributions)) {
text(ii, -0.7, paste("x", signif(mean(efficiencyGain[,distributions[ii]]), 3)))
text(ii, -0.9, paste0("(", paste(signif(quantile(efficiencyGain[,distributions[ii]], c(0.025, 0.975)), 3), collapse=" - "), ")"), cex=0.7)
}
These results show that: