Welcome to PowRPriori! A priori power analysis is a
critical step in designing robust and efficient studies. This package is
designed to make power analysis for complex linear models (LMs, LMMs,
GLMs, and GLMMs) intuitive, user-friendly and accessible. It ensures
robust estimations by using Monte-Carlo-style data simulation as a
foundation for the power analysis. One of the main goals of the package
is to provide a framework for a priori power analyses which does not
necessitate a deep understanding of the underlying statistics or ins and
outs of the model specifications. Ideally, we can focus on your research
design and expected effect sizes with the package doing the rest for us
without overly complicated function parameters.
This vignette will guide us through the entire workflow of a power analysis using a realistic example. We will cover: 1. Defining a study design. 2. Specifying expected effects based on theory. 3. Running the power simulation. 4. Interpreting and visualizing the results.
Finally, we will show brief examples of how to adapt the workflow for other common study designs.
Let’s imagine a common research scenario. We want to test a new cognitive training program.
Research Question: Does our new training program
(group: Treatment vs. Control) improve memory scores from a
pre-test to a post-test (time) more than in a control
group?
This is a classic 2x2 mixed design, with group as a
between-subject factor and time as a within-subject factor.
Since time is a within-subject factor, which can thus be
seen as “nested” within each individual measured, we can elegantly
analyze such a research design using a linear mixed effects model. The
important questions for our power analysis are, what does our study
design look like, what are the effect sizes we can realistically expect
(based on the existing literature) and what are reasonable estimates for
the random effects structure?
First, we translate our study design into a
PowRPriori_design object. We specify the name of our
subject ID (subject), our between-subject factors, and our
within-subject factors. Note that the id parameter does not
necessarily have to represent a person but can be any analysis unit on
the lowest level of your mixed effects design (e.g. a plot in a
garden).
Here we need to translate our research question into a statistical
model which can be tested. This package uses lme4-style formula to do
that. Our research question implies an interaction effect, since we want
to know if the effect of time is different between the
Control and the Treatment groups. We will also include a random
intercept for subject to account for the fact that
measurements from the same person are correlated (representing the
“nesting” of the data).
Our lme4-style model formula is:
score ~ group * time + (1 | subject)
On the left hand side of the tilde is our outcome variable and on the
right hand side our combination of fixed (group * time) and
random ((1 | subject)) effects.
This is the most critical part of our power analysis. We need to
define the effect sizes which we expect in our study. Ideally, there is
existing literature or an underlying theory that guides us here. We
could, however, also simulate however many effect sizes we like using
the package. In fact, varying the effect sizes can generally be a good
idea to get more robust estimates. If we wanted to, we could directly
define the fixed effects. However, to correctly simulate the data and,
more importantly, fit the model, the names of the coefficients need to
be specified such as the model fitting engine (lme4 in the
case of this package) expects them. To make this process easier,
PowRPriori gives we two options.
First, if we already know the values of the coefficients, we can use
the get_fixed_effects_structure helper function to let
PowRPriori automatically print a ready-to-use code snippet
to the console containing all correctly named coefficients and which is
structured so that PowRPriori understands it. The values
are left as placeholders (...), so all that is left to do
is fill them. Second, if we do not know the values of the coefficients,
PowRPRiori also allows us to think in terms of expected
outcomes and calculate the necessary coefficients for us. Let’s go
through both scenarios.
We will start with the case where we do not know the values of the coefficients, but can estimate reasonable values for the mean outcome scores in each condition.
Let’s assume the memory score is on a 100-point scale. We hypothesize the following mean scores for each condition:
-The Control group starts at 50 and improves slightly to 52 (a small practice effect).
-The Treatment group also starts at 50 but improves significantly to 60.
We create a data frame with these expected means:
expected_means <- tidyr::expand_grid(
group = c("Control", "Treatment"),
time = c("pre", "post")
)
# Assign the means based on our hypothesis
expected_means$mean_score <- c(50, 52, 50, 60)
knitr::kable(expected_means, caption = "Expected Mean Scores for Each Condition")| group | time | mean_score |
|---|---|---|
| Control | pre | 50 |
| Control | post | 52 |
| Treatment | pre | 50 |
| Treatment | post | 60 |
In more complex examples, the sequence of the means might not be as
straightforward as in this case. Then we can simply look at the
resulting dataframe created by expand_grid to see the
generated sequence of conditions.
Now that we have our dataframe containing the expected mean outcomes,
we use the helper function
fixed_effects_from_average_outcome() to translate these
intuitive means into the regression coefficients our model needs.
my_fixed_effects <- fixed_effects_from_average_outcome(
formula = score ~ group * time,
outcome = expected_means
)
#Note the naming of the coefficients is exactly as `lme4` expects them to be. Do not change these names!
print(my_fixed_effects)
#> $`(Intercept)`
#> [1] 50
#>
#> $groupTreatment
#> [1] 0
#>
#> $timepost
#> [1] 2
#>
#> $`groupTreatment:timepost`
#> [1] 8Let us now look at a scenario where we already know the values for
the coefficient. We will simply assume that we already knew the values
produced by the fixed_effects_from_average_outcome function
before. We will also use my_design here, which we created
above. In combination with supplying the formula, the process of
obtaining the correct structure for the fixed effects becomes rather
simple.
get_fixed_effects_structure(formula = score ~ group * time + (1 | subject), design = my_design)
#> Copy the following code and fill the placeholders (...) with your desired coefficients:
#>
#> fixed_effects <- list(
#> `(Intercept)` = ...,
#> groupTreatment = ...,
#> timepost = ...,
#> `groupTreatment:timepost` = ...
#> )Now we can fill in the values:
We also need to define the values of the random effects structure
(i.e. the variance components). Again, ideally, we have some sort of
guide for the values of these so that we can assign realistic values.
The random effects can influence the model fitting process considerably
when using mixed effects model, especially in more complex scenarios.
This means that we’ll need to strike a balance between realistic values
and tweaking it so that the model can be fit. Fortunately,
PowRPriori lets us know if the model fitting process is
problematic during simulation.
For now, let’s assume the standard deviation of our subjects’ baseline scores (the random intercept) is 8 points, and the residual standard deviation (random noise) unexplained by our model is 12 points.
We can use get_random_effects_structure() to get a
template very similar to what get_fixed_effects_structure()
provides.
# This helps us get the correct names
get_random_effects_structure(score ~ group * time + (1|subject), my_design)
#> Copy the following structure and fill the placeholders (...) with your random effects values:
#>
#> random_effects <- list(
#> subject = list(
#> `(Intercept)` = ...
#> ),
#> sd_resid = ...
#> )Now we can again fill in the values:
We now have all the ingredients to run our power simulation. We want
to find the sample size needed to achieve 80% power at an alpha level of
5% for our key hypothesis: the groupTreatment:timepost
interaction. We’ll start with N=30 and increase in steps of 10. Note
that neither the 80% power nor the 5% alpha level are set in stone.
Depending on the context, we might also want to adjust these.
# NOTE: This function can take a few minutes to run.
# For a real analysis, n_sims should be 1000 or higher (default value is 2000 simulations).
# We use a low n_sims here for a quick example.
power_results <- power_sim(
formula = score ~ group * time + (1 | subject),
design = my_design,
fixed_effects = my_fixed_effects,
random_effects = my_random_effects,
test_parameter = "groupTreatment:timepost",
n_start = 30,
n_increment = 10,
n_sims = 200, # Use >= 1000 for real analysis
power_crit = 0.80,
alpha = 0.05,
parallel_plan = "sequential"
)After the simulation is complete, we can inspect the results object.
The summary() function provides a detailed overview,
including the power table and a check of the parameter recovery (which
confirms our simulation ran as expected).
summary(power_results)
#> --- Power Simulation Results ---
#>
#> Family: gaussianPower Table:
#> Formula: score ~ group * time + (1 | subject)
#>
#> subject power_groupTreatment.timepost ci_lower_groupTreatment.timepost
#> 1 30 0.230 0.172
#> 2 40 0.315 0.251
#> 3 50 0.425 0.356
#> 4 60 0.460 0.391
#> 5 70 0.500 0.431
#> 6 80 0.570 0.501
#> 7 90 0.595 0.527
#> 8 100 0.605 0.537
#> 9 110 0.725 0.663
#> 10 120 0.735 0.674
#> 11 130 0.725 0.663
#> 12 140 0.815 0.761
#> ci_upper_groupTreatment.timepost n_singular n_convergence n_identifiable
#> 1 0.288 8 0 0
#> 2 0.379 5 0 0
#> 3 0.494 2 0 0
#> 4 0.529 2 0 0
#> 5 0.569 0 0 0
#> 6 0.639 0 0 0
#> 7 0.663 0 0 0
#> 8 0.673 0 0 0
#> 9 0.787 0 0 0
#> 10 0.796 0 0 0
#> 11 0.787 0 0 0
#> 12 0.869 0 0 0
#> n_other_warnings
#> 1 0
#> 2 0
#> 3 0
#> 4 0
#> 5 0
#> 6 0
#> 7 0
#> 8 0
#> 9 0
#> 10 0
#> 11 0
#> 12 0
#>
#> --- Parameter Recovery (Fixed Effects) ---
#> True_Value Avg_Estimated Bias
#> Intercept 50 50.085 0.085
#> groupTreatment 0 -0.420 -0.420
#> timepost 2 1.777 -0.223
#> groupTreatment:timepost 8 8.436 0.436
#>
#> --- Parameter Recovery (Random Effects) ---
#>
#> Standard Deviations & Correlations:
#>
#> Group: subject
#> Parameter True_Value Avg_Estimated Bias
#> 1 Intercept 8 7.915 -0.085
#>
#> Group: sd_resid
#> Parameter True_Value Avg_Estimated Bias
#> 1 sd_resid 12 12.049 0.049
#>
#>
#> Intra-Class Correlations (ICC):
#>
#> Level Implied_True_ICC Avg_Estimated_ICC Bias
#> 1 subject 0.308 0.301 -0.006
#>
#>
#> --- Note ---
#> Small bias values indicate that the simulation correctly
#> estimated the a-priori specified effect sizes.From this table, we can see that we need approximately N=140 subjects to reach our desired 80% power at an alpha level of 5%. Also, we can see that our model had some singular fits, especially when the sample size was smaller. Although sometimes singular fits can happen, if they happen often, this can mean that our model specifications might be problematic. Right now, we do not need to delve further into diagnostics, but investigating our random effects parameters would probably be useful here.
A plot of the power curve is often an intuitive way to present the overall trajectory of the power across the increasing sample sizes.
This plot visually confirms our finding from the table.
Another useful aspect to investigate is whether the data our
simulation produced looks plausible or conforms to our initial
intention. To investigate this, we can call the
plot_sim_model function with type = "data" to
plot sample data from our PowRPriori object.
The PowRPriori workflow is flexible. Here are brief
examples for other common designs.
We could also design our study so that the intervention is applied to
whole classes, not individual pupils. We can use the hierarchical
structure in define_design to specify this. In the
following scenario, the intervention is applied at class level, i.e. if
a class is in the intervention group, every pupil in the respective
class is also in the intervention group.
cluster_design <- define_design(
id = "pupil",
nesting_vars = list(class = 1:20), # 20 classes
between = list(
class = list(intervention = c("yes", "no")) # Intervention at class level
),
within = list(
time = c("pre", "post")
)
)
# The rest of the workflow (specifying effects, running power_sim)
# would follow the same logic as the main example.If our outcome is binary (e.g., pass/fail), we can specify
family = "binomial". We then specify our expected effects
as probabilities (between 0 and 1).
# We expect the Control group to have a 50% pass rate at both times.
# The Treatment group starts at 50% but improves to a 75% pass rate.
glmm_probs <- expand_grid(
group = c("Control", "Treatment"),
time = c("pre", "post")
)
glmm_probs$pass_prob <- c(0.50, 0.50, 0.50, 0.75)
# The fixed effects are calculated from these probabilities
glmm_fixed_effects <- fixed_effects_from_average_outcome(
formula = passed ~ group * time,
outcome = glmm_probs,
family = "binomial"
)
# Note: For binomial models, sd_resid is not specified in random_effects. You could also use generate_random_effects_structure again as before.
glmm_random_effects <- list(
subject = list(Intercept = 2.0)
)
# The power_sim() call would then include `family = "binomial"`.This vignette demonstrated the complete workflow for conducting a
power analysis with PowRPriori. By focusing on a clear
definition of the design and an intuitive specification of the expected
effects, the package aims to make power simulation a straightforward and
robust process.