Hierarchical Bayesian models

Parametric Bayesian framework

Currently, serosv only has models under parametric Bayesian framework

Proposed approach

Prevalence has a parametric form $\pi(a_i, \alpha)$ where $\alpha$ is a parameter vector

One can constraint the parameter space of the prior distribution $P(\alpha)$ in order to achieve the desired monotonicity of the posterior distribution $P(\pi_1, \pi_2, ..., \pi_m|y,n)$

Where:

$n = (n_1, n_2, ..., n_m)$ and $n_i$ is the sample size at age $a_i$

$y = (y_1, y_2, ..., y_m)$ and $y_i$ is the number of infected individual from the $n_i$ sampled subjects

Farrington

Refer to Chapter 10.3.1

Proposed model

The model for prevalence is as followed

$\pi (a) = 1 - exp\{ \frac{\alpha_1}{\alpha_2}ae^{-\alpha_2 a} + \frac{1}{\alpha_2}(\frac{\alpha_1}{\alpha_2} - \alpha_3)(e^{-\alpha_2 a} - 1) -\alpha_3 a \}$

For likelihood model, independent binomial distribution are assumed for the number of infected individuals at age $a_i$

$y_i \sim Bin(n_i, \pi_i), \text{ for } i = 1,2,3,...m$

The constraint on the parameter space can be incorporated by assuming truncated normal distribution for the components of $\alpha$ , $\alpha = (\alpha_1, \alpha_2, \alpha_3)$ in $\pi_i = \pi(a_i,\alpha)$

$\alpha_j \sim \text{truncated } \mathcal{N}(\mu_j, \tau_j), \text{ } j = 1,2,3$

The joint posterior distribution for $\alpha$ can be derived by combining the likelihood and prior as followed

$P(\alpha|y) \propto \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha)) \prod^3_{i=1}-\frac{1}{\tau_j}\text{exp}(\frac{1}{2\tau^2_j} (\alpha_j - \mu_j)^2)$

Where the flat hyperprior distribution is defined as followed:
- $\mu_j \sim \mathcal{N}(0, 10000)$
- $\tau^{-2}_j \sim \Gamma(100,100)$

The full conditional distribution of $\alpha_i$ is thus $P(\alpha_i|\alpha_j,\alpha_k, k, j \neq i) \propto -\frac{1}{\tau_i}\text{exp}(\frac{1}{2\tau^2_i} (\alpha_i - \mu_i)^2) \prod^m_{i=1} \text{Bin}(y_i|n_i, \pi(a_i, \alpha))$

Fitting data

To fit Farrington model, use hierarchical_bayesian_model() and define type = "far2" or type = "far3" where

type = "far2" refers to Farrington model with 2 parameters ( $\alpha_3 = 0$ )
type = "far3" refers to Farrington model with 3 parameters ( $\alpha_3 > 0$ )

df <- mumps_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="far3")
#> 
#> SAMPLING FOR MODEL 'fra_3' NOW (CHAIN 1).
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: Rejecting initial value:
#> Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
#> Chain 1:   Stan can't start sampling from this initial value.
#> Chain 1: 
#> Chain 1: Gradient evaluation took 6.6e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 1.576 seconds (Warm-up)
#> Chain 1:                0.485 seconds (Sampling)
#> Chain 1:                2.061 seconds (Total)
#> Chain 1:
#> Warning: There were 1729 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: The largest R-hat is 1.28, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess

model$info
#>                       mean      se_mean           sd          2.5%
#> alpha1        1.359200e-01 1.718392e-03 6.634252e-03  1.267890e-01
#> alpha2        1.950533e-01 1.140411e-03 8.105212e-03  1.809270e-01
#> alpha3        7.567629e-03 3.736334e-04 5.954382e-03  6.839367e-04
#> tau_alpha1    3.526243e-01 1.867970e-01 9.495085e-01  4.317908e-06
#> tau_alpha2    1.668595e+00 1.274720e+00 3.000895e+00  9.079584e-06
#> tau_alpha3    8.960579e-02 4.787153e-02 2.607501e-01  3.228223e-06
#> mu_alpha1    -5.020807e+00 3.040566e+00 3.468501e+01 -9.811121e+01
#> mu_alpha2     1.092032e-01 1.214227e+00 2.812453e+01 -6.665632e+01
#> mu_alpha3    -9.867857e+00 8.021242e+00 4.499586e+01 -9.719373e+01
#> sigma_alpha1  7.096216e+01 1.549036e+01 3.165892e+02  5.643009e-01
#> sigma_alpha2  5.147320e+01 1.300805e+01 2.839421e+02  3.507393e-01
#> sigma_alpha3  4.555268e+02 3.579400e+02 9.370173e+03  9.808250e-01
#> lp__         -2.535145e+03 6.460925e-01 3.605881e+00 -2.543527e+03
#>                        25%           50%           75%         97.5%      n_eff
#> alpha1        1.305291e-01  1.337552e-01  1.408133e-01  1.504679e-01  14.905261
#> alpha2        1.912298e-01  1.929279e-01  1.984758e-01  2.153343e-01  50.513286
#> alpha3        3.710596e-03  6.979474e-03  8.457078e-03  2.479728e-02 253.969761
#> tau_alpha1    1.000828e-03  2.804623e-02  7.209248e-02  3.140354e+00  25.837941
#> tau_alpha2    3.931032e-03  7.702037e-02  1.193477e+00  8.128890e+00   5.542067
#> tau_alpha3    2.567107e-04  5.291966e-04  1.052404e-02  1.039482e+00  29.668421
#> mu_alpha1    -5.232470e+00 -2.381782e+00  7.419195e-01  6.725207e+01 130.129197
#> mu_alpha2    -1.684258e+00  3.198146e-01  2.342777e+00  6.354279e+01 536.500895
#> mu_alpha3    -4.227952e+01 -3.980619e+00  5.248327e+00  9.663215e+01  31.467480
#> sigma_alpha1  3.724389e+00  5.971215e+00  3.160979e+01  4.812418e+02 417.704858
#> sigma_alpha2  9.153621e-01  3.603273e+00  1.594949e+01  3.318847e+02 476.469754
#> sigma_alpha3  9.747849e+00  4.347019e+01  6.241343e+01  5.565829e+02 685.290783
#> lp__         -2.537527e+03 -2.533755e+03 -2.532478e+03 -2.529927e+03  31.148233
#>                   Rhat
#> alpha1       1.1107023
#> alpha2       1.0246646
#> alpha3       0.9997907
#> tau_alpha1   1.0059599
#> tau_alpha2   1.3294355
#> tau_alpha3   1.0125176
#> mu_alpha1    1.0224104
#> mu_alpha2    1.0010558
#> mu_alpha3    1.0505264
#> sigma_alpha1 1.0025618
#> sigma_alpha2 1.0075432
#> sigma_alpha3 1.0011065
#> lp__         1.1035163
plot(model)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.

Log-logistic

Proposed approach

The model for seroprevalence is as followed

$\pi(a) = \frac{\beta a^\alpha}{1 + \beta a^\alpha}, \text{ } \alpha, \beta > 0$

The likelihood is specified to be the same as Farrington model ( $y_i \sim Bin(n_i, \pi_i)$ ) with

$\text{logit}(\pi(a)) = \alpha_2 + \alpha_1\log(a)$

Where $\alpha_2 = \text{log}(\beta)$

The prior model of $\alpha_1$ is specified as $\alpha_1 \sim \text{truncated } \mathcal{N}(\mu_1, \tau_1)$ with flat hyperprior as in Farrington model

$\beta$ is constrained to be positive by specifying $\alpha_2 \sim \mathcal{N}(\mu_2, \tau_2)$

The full conditional distribution of $\alpha_1$ is thus

$P(\alpha_1|\alpha_2) \propto -\frac{1}{\tau_1} \text{exp} (\frac{1}{2 \tau_1^2} (\alpha_1 - \mu_1)^2) \prod_{i=1}^m \text{Bin}(y_i|n_i,\pi(a_i, \alpha_1, \alpha_2) )$

And $\alpha_2$ can be derived in the same way

Fitting data

To fit Log-logistic model, use hierarchical_bayesian_model() and define type = "log_logistic"

df <- rubella_uk_1986_1987
model <- hierarchical_bayesian_model(df, type="log_logistic")
#> 
#> SAMPLING FOR MODEL 'log_logistic' NOW (CHAIN 1).
#> Chain 1: 
#> Chain 1: Gradient evaluation took 1e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1: 
#> Chain 1: 
#> Chain 1: Iteration:    1 / 5000 [  0%]  (Warmup)
#> Chain 1: Iteration:  500 / 5000 [ 10%]  (Warmup)
#> Chain 1: Iteration: 1000 / 5000 [ 20%]  (Warmup)
#> Chain 1: Iteration: 1500 / 5000 [ 30%]  (Warmup)
#> Chain 1: Iteration: 1501 / 5000 [ 30%]  (Sampling)
#> Chain 1: Iteration: 2000 / 5000 [ 40%]  (Sampling)
#> Chain 1: Iteration: 2500 / 5000 [ 50%]  (Sampling)
#> Chain 1: Iteration: 3000 / 5000 [ 60%]  (Sampling)
#> Chain 1: Iteration: 3500 / 5000 [ 70%]  (Sampling)
#> Chain 1: Iteration: 4000 / 5000 [ 80%]  (Sampling)
#> Chain 1: Iteration: 4500 / 5000 [ 90%]  (Sampling)
#> Chain 1: Iteration: 5000 / 5000 [100%]  (Sampling)
#> Chain 1: 
#> Chain 1:  Elapsed Time: 0.345 seconds (Warm-up)
#> Chain 1:                0.262 seconds (Sampling)
#> Chain 1:                0.607 seconds (Total)
#> Chain 1:
#> Warning: There were 587 divergent transitions after warmup. See
#> https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> to find out why this is a problem and how to eliminate them.
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess

model$type
#> [1] "log_logistic"
plot(model)
#> Warning: No shared levels found between `names(values)` of the manual scale and the
#> data's fill values.