The mlpwr
package is a powerful tool for comprehensive
power analysis and design optimization in research. It addresses
challenges in optimizing study designs for power across multiple
dimensions while considering cost constraints. By combining Monte Carlo
simulations, surrogate modeling techniques, and cost functions,
mlpwr
enables researchers to model the relationship between
design parameters and statistical power, allowing for efficient
exploration of the parameter space.
Using Monte Carlo simulation, mlpwr
estimates
statistical power across different design configurations by generating
simulated datasets and performing hypothesis tests on these. A surrogate
model, such as linear regression, logistic regression, support vector
regression (SVR), or Gaussian process regression, is then fitted to
approximate the power function. This facilitates the identification of
optimal design parameter values.
The mlpwr
package offers two primary types of outputs
based on specified goals and constraints. Researchers can obtain study
design parameters that yield the desired power level at the lowest
possible cost, taking budget limitations and resource availability into
account. Alternatively, researchers can identify design parameters that
maximize power within a given cost threshold, enabling informed resource
allocation.
In conclusion, the mlpwr
package provides a
comprehensive and flexible tool for power analysis and design
optimization. It guides users through the process of optimizing study
designs, enhancing statistical power, and making informed decisions
within their research context.
For more details, refer to Zimmer & Debelak (2023).
In this Vignette we will apply the mlpwr package to an ANOVA setting.
ANOVA, or Analysis of Variance, is a statistical test used to compare the means of three or more groups. It is commonly employed when examining the differences between multiple treatments, interventions, or populations.
Before conducting an ANOVA test, certain assumptions should be met:
ANOVA is robust to violations of the second and third assumption, especially with large sample sizes.
The formula for calculating the test statistic (F-value) in ANOVA is as follows: ANOVA formula:
\[ F = \frac{{\text{MS}_{\text{between}}}}{{\text{MS}_{\text{within}}}} \]
where:
\(\text{MS}_{\text{between}}\) represents the mean sum of squares between groups, calculated as:
\[ \text{MS}_{\text{between}} = \frac{{SS_{\text{between}}}}{{df_{\text{between}}}} \]
\(\text{MS}_{\text{within}}\) represents the mean sum of squares within groups, calculated as:
\[ \text{MS}_{\text{within}} = \frac{{SS_{\text{within}}}}{{df_{\text{within}}}} \]
\(SS_{\text{between}}\) represents the sum of squares between groups, calculated as the sum of squared deviations of the group means from the overall mean, weighted by the number of observations in each group.
\(SS_{\text{within}}\) represents the sum of squares within groups, calculated as the sum of squared deviations of the individual observations from their respective group means.
\(df_{\text{between}}\) represents the degrees of freedom between groups, which is equal to the number of groups minus one.
\(df_{\text{within}}\) represents the degrees of freedom within groups, which is equal to the total number of observations minus the number of groups.
The F-value is then compared against the critical value from the F-distribution with degrees of freedom \(df_{\text{between}}\) and \(df_{\text{within}}\).
In R, you can perform ANOVA using the aov()
function,
specifying the outcome variable and the grouping variable as
arguments.
For further information about ANOVA consult Armstrong et al. (2000).
Let’s consider an example to illustrate the use of ANOVA. Suppose someone has developed a psychological test to assess cognitive abilities. After deployment of the test, practitioners report varying performance in the test by individuals from different cultural backgrounds. The test is designed to measure problem-solving skills, which should not be affected by cultural background, and would be unfair if it was. We now want to investigate if the test score really differs between people from different backgrounds and conduct an ANOVA for that purpose, comparing the test score for people from different countries.
A corresponding dataset would look like this:
## score group
## 1 4.466840 A
## 2 3.722784 A
## 3 6.642591 B
## 4 5.115805 B
## 5 5.993559 C
## 6 4.930376 C
Where the group variable encodes the country of origin. We could conduct an ANOVA test like so:
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 3.485 1.7424 2.604 0.221
## Residuals 3 2.008 0.6692
We have conducted a one-way ANOVA and the group factor was significant at an alpha level of 0.05. This indicates a statistically significant different in group means, or put differently a statistically significant difference between our three countries. To see which groups differ exactly, we would need to perform post-hoc tests. A post-hoc test can be performed using Tukey’s Honestly Significant Difference (HSD). For more details consult this tuorial.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = score ~ group, data = dat)
##
## $group
## diff lwr upr p adj
## B-A 1.7843860 -1.633974 5.202746 0.2206410
## C-A 1.3671557 -2.051204 4.785515 0.3478421
## C-B -0.4172303 -3.835590 3.001129 0.8720866
We see that at our specified alpha = 0.05
only the
difference between group A and B is statistically significant as the
p-value is below 0.05.
To ensure that we find statistical significant evidence for this
unfairness effect, if it exists, we perform an a-priori power analysis
before our ANOVA test. We want to find out how many countries and
participants per country we would need to find a statistically
significant effect. We use the mlpwr
package for this
purpose and tidyr
is used for the simulation. If it is your
first time using these packages, you have to install them like so:
Now the packages are permanently installed on your computer. To use them in R you need to (re-)load them every time you start a session.
mlpwr
relies on simulations for the power analysis. Thus
we need to write a function that simulates the data and conducts an
ANOVA based on our assumptions. The input to this function need to be
the parameters we want to investigate/optimize for in our power
analysis. For our scenario a function like this would do:
simfun_anova <- function(n, n.groups) {
groupmeans <- rnorm(n.groups, sd = 0.2)
dat <- sapply(groupmeans, function(x) rnorm(n, mean = x, sd = 1))
dat <- gather(as.data.frame(dat))
res <- aov(value ~ key, data = dat)
summary(res)[[1]][1, 5] < 0.01
}
This function takes two inputs: n
and
n.groups
, the number of participants per group (country)
and the number of groups (countries). These are the design parameters we
want to optimize for in the power analysis. We want to find out how many
participants we need per country and how many different countries to
achieve a certain power level.
The function first generates data for each group, by generating a
groupmean
for every group. We assume the data to be
normally distributed, thus rnorm
is used on the groupmeans
to generate individual observations. In line with the ANOVA assumptions
we expect all groups to have the same variance \[\sigma = 1\]. We set
alpha = 0.01
to be very strict and accept the alternative
hypothesis if the p-value is below that. Our function outputs the result
of the hypothesis test in a TRUE/FALSE format. This is important, as the
find.design
function of mlpwr we will use for the power
analysis expects this kind of output from the simulation.
A similar function for an ANOVA is implemented in the example simulation function of mlpwr and can be accessed like so:
mlpwr
allows for the option to weigh the design
parameters with a cost function. When optimizing multiple parameters,
submitting a cost function is necessary. This allows for easy study
design optimization under cost considerations. For our example let us
assume that recruiting additional participants for a country is not as
expensive as recruiting people from a whole new country. We can simulate
this in a cost function like so:
This assumes that the cost for an additional participant in a group is 1, while an additional group costs 10 times that of an additional participant.
The previous section showed how we can perform a data simulation with
a subsequent hypothesis test and construct a cost function for automatic
weighing between the design parameters. Now we perform a power
simulation with mlpwr
.
A power simulation can be done using the
find.design
function. For our purpose we submit 5 parameters
to it:
Note: additional specifications are possible (see documentation) but not necessary. For most parameters like the choice of surrogate function the default values should already be good choices.
The find.design
function needs to reiterate a data
simulation multiple times. For this purpose it expects a data generating
function (DGF) as its main input. The DGF takes the design parameters
(here n, n.groups
) as input and must output a logical value
of whether the hypothesis test was TRUE or FALSE.
With the specified parameters we can perform the power analysis. We set a random seed for reproducibility reasons.
set.seed(112)
res <- find.design(simfun = simfun_anova, boundaries = list(n = c(5,200), n.groups = c(5,30)),
power = .8, costfun = costfun_anova, evaluations = 2000)
Now we can summarize the results using the summary
command.
##
## Call:
## find.design(simfun = simfun_anova, boundaries = list(n = c(5,
## 200), n.groups = c(5, 30)), power = 0.8, evaluations = 2000,
## costfun = costfun_anova)
##
## Design: n = 109, n.groups = 7
##
## Power: 0.79726, SE: 0.01847, Cost: 179
## Evaluations: 2000, Time: 12.33, Updates: 32
## Surrogate: Gaussian process regression
As we can see the calculated sample size for the desired power of 0.8
is nA =
, nB =
. The estimated power for this
sample size is 0.79726 with a standard error of 0.01847.The summary
additionally reports the number of simulation function evaluations, the
time until termination in seconds, and the number of surrogate model
updates. See Zimmer &
Debelak (2023) for more details. We can also plot our power
simulation and look at the calculated function using the
plot
function.
Confidence Intervals (gray) are printed in addition to the estimated
power curve (black), so we can get a feel for how the design parameter
(here n, n.groups
) influence the power level and also where
the prediction is more or less uncertain. The black dots show us the
simulated data. This concludes our power analysis.