In this vignette, we will explain how to compute a Bayes factor for mixtures of equality and inequality-constrained hypotheses for multinomial models.
As example for a mixture of equality and inequality-constrained
hypotheses in multinomial models, we will use the dataset,
peas
, which is included in the package
multibridge
. The dataset provides the categorization of
crossbreeds between a plant variety that produced round yellow peas with
a plant variety that produced wrinkled green peas. This dataset contains
the phenotypes of peas from 556 plants that were categorized either as
(1) round and yellow, (2) wrinkled and yellow, (3) round and green, or
(4) wrinkled and green. Furthermore, in the context of the evaluation of
mixture of equality and inequality-constrained hypotheses the dataset
was discussed in Sarafoglou et al.
(2021).
library(multibridge)
data(peas)
peas
## peas counts
## 1 roundYellow 315
## 2 wrinkledYellow 101
## 3 roundGreen 108
## 4 wrinkledGreen 32
The model that we will use assumes that the vector of observations \(x_1, \cdots, x_K\) in the \(K\) categories follow a multinomial distribution. The parameter vector of the multinomial model, \(\theta_1, \cdots, \theta_K\), contains the probabilities of observing a value in a particular category; here, it reflects the probabilities that the peas show one of the four phenotypes. The parameter vector \(\theta_1, \cdots, \theta_K\) is drawn from a Dirichlet distribution with concentration parameters \(\alpha_1, \cdots, \alpha_K\). The model can be described as follows:
\[\begin{align} x_1, \cdots, x_K &\sim \text{Multinomial}(\sum_{k = 1}^K x_k, \theta_1, \cdots, \theta_K) \\ \theta_1, \cdots, \theta_K &\sim \text{Dirichlet}(\alpha_1, \cdots, \alpha_K) \\ \end{align}\]
Based on the Mendelian laws of inheritance we test the informed hypothesis \(\mathcal{H}_r\) that the number of peas that will be categorized as “round and yellow” will be highest, since both traits are dominant in the parent plants and should thus appear in the offspring. Furthermore, the Mendelian laws of inheritance predict that the phenotypes “wrinkled and yellow” and “round and green” occur second most often and the probability to fall into one of the two categories is equal, due to the fact that in each case one of the traits is dominant. Consequently, “wrinkled and green” peas should appear least often. This informed hypothesis will be tested against the encompassing hypothesis \(\mathcal{H}_e\) without constraints:
\[\begin{align*} \mathcal{H}_m &: \theta_{1} > \theta_{2} = \theta_{3} > \theta_{4} \\ \mathcal{H}_e &: \theta_1, \theta_2, \theta_{3}, \theta_{4}. \end{align*}\]
To compute the Bayes factor in favor of the restricted hypothesis, \(\text{BF}_{re}\), we need to specify (1) a vector containing the number of observations, (2) the restricted hypothesis, (3) a vector with concentration parameters, (4) the labels of the categories of interest (i.e., the manifestation of the peas).
x <- peas$counts
# Test the following restricted Hypothesis:
# Hr: roundYellow > wrinkledYellow = roundGreen > wrinkledGreen
Hr <- c('roundYellow > wrinkledYellow = roundGreen > wrinkledGreen')
# Prior specification
# We assign a uniform Dirichlet distribution, that is, we set all concentration parameters to 1
a <- c(1, 1, 1, 1)
categories <- peas$peas
With this information, we can now conduct the analysis with the
function mult_bf_informed()
. Since we are interested in
quantifying evidence in favor of the informed hypothesis, we set the
Bayes factor type to BFre
. For reproducibility, we are also
setting a seed with the argument seed
:
results <- multibridge::mult_bf_informed(x=x,Hr=Hr, a=a, factor_levels=categories,
bf_type = 'BFre', seed = 2020)
We can get a quick overview of the results by using the implemented
summary()
method:
m1 <- summary(results)
m1
## Bayes factor analysis
##
## Hypothesis H_e:
## All parameters are free to vary.
##
## Hypothesis H_r:
## roundYellow > wrinkledYellow = roundGreen > wrinkledGreen
##
## Bayes factor estimate BFre:
##
## 64.379
##
## Based on 1 independent equality-constrained hypothesis
## and 1 independent inequality-constrained hypothesis.
##
## Relative Mean-Square Error:
##
## 0.000932
##
## Posterior Median and Credible Intervals Of Marginal Beta Distributions:
## alpha beta 2.5% 50% 97.5%
## 1 roundYellow 1 + 315 3 + 241 0.523 0.5640 0.6050
## 2 wrinkledYellow 1 + 101 3 + 455 0.151 0.1820 0.2150
## 3 roundGreen 1 + 108 3 + 448 0.163 0.1940 0.2280
## 4 wrinkledGreen 1 + 32 3 + 524 0.041 0.0584 0.0799
The summary of the results shows the Bayes factor estimate, the
evaluated informed hypothesis and the posterior parameter estimates of
the marginal beta distributions (based on the encompassing model). The
data show evidence in factor of our informed hypothesis: The data is
64.4 more likely to have occurred under the informed hypothesis than
under the encompassing hypothesis. We can also further decompose the
Bayes factor into an equality constrained Bayes factor (i.e., the Bayes
factor that evaluates the equality constraints against the encompassing
hypothesis) and an inequality constrained Bayes factor (i.e., the Bayes
factor that evaluates the inequality constraints against the
encompassing hypothesis given that the equality constraints hold). We
can access this information with the S3
method
bayes_factor
bayes_list <- bayes_factor(results)
bayes_list$bf_table
## bf_type bf_total bf_equalities bf_inequalities
## 1 LogBFer -4.16479226 -2.33226549 -1.8325268
## 2 BFer 0.01553294 0.09707557 0.1600087
## 3 BFre 64.37930720 10.30125246 6.2496582
# Bayes factors in favor for informed hypothesis
bfre <- bayes_list$bf_table[bayes_list$bf_table$bf_type=='BFre', ]
Based on this summary table of the Bayes factors, we can infer the following:
Details on the decomposition of the Bayes factor can be found in Sarafoglou et al. (2021).