Basic tools for time-to-event trial simulation and testing

Keaven Anderson

Overview

This vignette demonstrates the lower-level routines in the simtrial package specifically related to trial generation and statistical testing.

The routines are as follows:

Application of the above is demonstrated using higher-level routines sim_pw_surv() and sim_fixed_n() to generate simulations and weighted logrank analysis for a stratified design.

The intent has been to write these routines in the spirit of the tidyverse approach (alternately referred to as data wrangling, tidy data, R for Data Science, or split-apply-combine). The other objectives are to have an easily documentable and validated package that is easy to use and efficient as a broadly-useful tool for simulation of time-to-event clinical trials.

The package could be extended in many ways in the future, including:

library(simtrial)
library(gt)
library(dplyr)

Randomization

Fixed block randomization with an arbitrary block contents is performed as demonstrated below. In this case we have a block size of 5 with one string repeated twice in each block and three other strings appearing once each.

randomize_by_fixed_block(n = 10, block = c("A", "Dog", "Cat", "Cat"))
#>  [1] "Cat" "Cat" "A"   "Dog" "Cat" "Cat" "Dog" "A"   "Dog" "Cat"

More normally, with a default of blocks of size four:

randomize_by_fixed_block(n = 20)
#>  [1] 0 1 1 0 0 1 1 0 0 0 1 1 1 0 1 0 0 1 0 1

Enrollment

Piecewise constant enrollment can be randomly generated as follows. Note that duration is specifies interval durations for constant rates; the final rate is extended as long as needed to generate the specified number of observations.

rpwexp_enroll(
  n = 20,
  enroll_rate = data.frame(
    duration = c(1, 2),
    rate = c(2, 5)
  )
)
#>  [1] 0.2447973 0.4380971 0.8612095 1.1413581 1.1989323 1.5872067 1.6325434
#>  [8] 1.6647893 1.8136604 1.9181315 2.0372685 2.3017409 2.3240102 2.8257518
#> [15] 3.1450429 3.2953514 3.3423569 3.5583331 3.7639825 4.0224348

Time-to-event and time-to-dropout

Time-to-event and time-to-dropout random number generation for observations is generated with piecewise exponential failure times. For a large number of observations, a log-plot of the time-to-failure

x <- rpwexp(
  10000,
  fail_rate = data.frame(
    rate = c(1, 3, 10),
    duration = c(.5, .5, 1)
  )
)
plot(
  sort(x),
  (10000:1) / 10001,
  log = "y",
  main = "PW Exponential simulated survival curve",
  xlab = "Time", ylab = "P{Survival}"
)

Generating a trial

Ideally, this might be done with a routine where generation of randomization, and time-to-event data could be done in a modular fashion plugged into a general trial generation routine. For now, stratified randomization, piecewise constant enrollment, fixed block randomization and piecewise exponential failure rates support a flexible set of trial generation options for time-to-event endpoint trials. At present, follow this format very carefully as little checking of input has been developed to-date. The methods used here have all be demonstrated above, but here they are combined in a single routine to generate a trial. Note that in the generated output dataset, cte is the calendar time of an event or dropout, whichever comes first, and fail is an indicator that cte represents an event time.

First we set up input variables to make the later call to sim_pw_surv() more straightforward to read.

stratum <- data.frame(stratum = c("Negative", "Positive"), p = c(.5, .5))

block <- c(rep("control", 2), rep("experimental", 2))

enroll_rate <- data.frame(rate = c(3, 6, 9), duration = c(3, 2, 1))

fail_rate <- data.frame(
  stratum = c(rep("Negative", 4), rep("Positive", 4)),
  period = rep(1:2, 4),
  treatment = rep(c(rep("control", 2), rep("experimental", 2)), 2),
  duration = rep(c(3, 1), 4),
  rate = log(2) / c(4, 9, 4.5, 10, 4, 9, 8, 18)
)
dropout_rate <- data.frame(
  stratum = c(rep("Negative", 4), rep("Positive", 4)),
  period = rep(1:2, 4),
  treatment = rep(c(rep("control", 2), rep("experimental", 2)), 2),
  duration = rep(c(3, 1), 4),
  rate = rep(c(.001, .001), 4)
)
x <- sim_pw_surv(
  n = 400,
  stratum = stratum,
  block = block,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  dropout_rate = dropout_rate
)

head(x) |>
  gt() |>
  fmt_number(columns = c("enroll_time", "fail_time", "dropout_time", "cte"), decimals = 2)
stratum enroll_time treatment fail_time dropout_time cte fail
Negative 0.21 experimental 8.65 1,558.49 8.86 1
Positive 0.33 experimental 30.96 630.77 31.29 1
Positive 1.40 control 0.18 995.62 1.58 1
Negative 1.51 control 0.70 985.93 2.22 1
Negative 2.23 control 8.92 841.02 11.15 1
Negative 2.47 experimental 0.21 18.72 2.68 1

Cutting data for analysis

There two ways to cut off data in the generated dataset x from above. The first uses a calendar cutoff date. The output only includes the time from randomization to event or dropout (tte), and indicator that this represents and event (event), the stratum in which the observation was generated (stratum) and the treatment group assigned (treatment). Observations enrolled after the input cut_date are deleted and events and censoring from x that are after the cut_date are censored at the specified cut_date.

y <- cut_data_by_date(x, cut_date = 5)

head(y) |>
  gt() |>
  fmt_number(columns = "tte", decimals = 2)
tte event stratum treatment
4.79 0 Negative experimental
4.67 0 Positive experimental
0.18 1 Positive control
0.70 1 Negative control
2.77 0 Negative control
0.21 1 Negative experimental

For instance, if we wish to cut the entire dataset when 50 events are observed in the Positive stratum we can use the get_cut_date_by_event function as follows:

cut50Positive <- get_cut_date_by_event(filter(x, stratum == "Positive"), 50)
y50Positive <- cut_data_by_date(x, cut50Positive)

with(y50Positive, table(stratum, event))
#>           event
#> stratum     0  1
#>   Negative 47 66
#>   Positive 57 50

Perhaps the most common way to cut data is with an event count for the overall population, which is done using the cut_data_by_event function. Note that if there are tied events at the date the cte the count is reached, all are included. Also, if the count is never reached, all event times are included in the cut - with no indication of an error.

y150 <- cut_data_by_event(x, 150)
table(y150$event, y150$treatment)
#>    
#>     control experimental
#>   0      53           59
#>   1      78           72

Generating a counting process dataset

Once we have cut data for analysis, we can create a dataset that is very simple to use for weighted logrank tests. A slightly more complex version could be developed in the future to enable Kaplan-Meier-based tests. We take the dataset y150 from above and process it into this format. The counting process format is further discussed in the next section where we compute a weighted logrank test.

ten150 <- counting_process(y150, arm = "experimental")

head(ten150) |>
  gt() |>
  fmt_number(columns = c("tte", "o_minus_e", "var_o_minus_e"), decimals = 2)
stratum events n_event_tol tte n_risk_tol n_risk_trt s o_minus_e var_o_minus_e
Negative 1 1 0.06 130 65 1.0000000 0.50 0.25
Negative 1 0 0.17 128 64 0.9923077 −0.50 0.25
Negative 1 0 0.18 127 64 0.9845553 −0.50 0.25
Negative 1 1 0.21 126 64 0.9768029 0.49 0.25
Negative 1 1 0.21 125 63 0.9690505 0.50 0.25
Negative 1 0 0.38 123 61 0.9612981 −0.50 0.25

Logrank and weighted logrank testing

Now stratified logrank and stratified weighted logrank tests are easily generated based on the counting process format. Each record in the counting process dataset represents a tte at which one or more events occurs; the results are stratum-specific. Included in the observation is the number of such events overall (events) and in the experimental treatment group (txevents), the number at risk overall (atrisk) and in the experimental treatment group (txatrisk) just before tte, the combined treatment group Kaplan-Meier survival estimate (left-continuous) at tte, the observed events in experimental group minus the expected at tte based on an assumption that all at risk observations are equally likely to have an event at any time, and the variance for this quantity (Var).

To generate a stratified logrank test and a corresponding one-sided p-value, we simply do the following:

z <- with(ten150, sum(o_minus_e) / sqrt(sum(var_o_minus_e)))
c(z, pnorm(z))
#> [1] -0.7193799  0.2359535

A Fleming-Harrington \(\rho=1\), \(\gamma=2\) is nearly as simple. We again compute a z-statistic and its corresponding one-sided p-value.

xx <- mutate(ten150, w = s * (1 - s)^2)
z <- with(xx, sum(o_minus_e * w) / sum(sqrt(var_o_minus_e * w^2)))
c(z, pnorm(z))
#> [1] -0.01961574  0.49217496

For Fleming-Harrington tests, a routine has been built to do these tests for you:

fh00 <- y150 |> wlr(weight = fh(rho = 0, gamma = 0))
fh01 <- y150 |> wlr(weight = fh(rho = 0, gamma = 1))
fh10 <- y150 |> wlr(weight = fh(rho = 1, gamma = 0))
fh11 <- y150 |> wlr(weight = fh(rho = 1, gamma = 1))

temp_tbl <- fh00 |>
  unlist() |>
  as.data.frame() |>
  cbind(fh01 |> unlist() |> as.data.frame()) |>
  cbind(fh10 |> unlist() |> as.data.frame()) |>
  cbind(fh11 |> unlist() |> as.data.frame())

colnames(temp_tbl) <- c("Test 1", "Test 2", "Test 3", "Test 4")
temp_tbl
#>                        Test 1             Test 2             Test 3
#> method                    WLR                WLR                WLR
#> parameter  FH(rho=0, gamma=0) FH(rho=0, gamma=1) FH(rho=1, gamma=0)
#> estimation   -4.3680382958728 -0.487102597620562  -3.88093569825224
#> se           6.07194965769659   2.24485101526141    4.3605070085087
#> z          -0.719379860196309 -0.216986603702892 -0.890019369463077
#>                        Test 4
#> method                    WLR
#> parameter  FH(rho=1, gamma=1)
#> estimation -0.544175279271521
#> se           1.14814409403658
#> z          -0.473960787759958

If we wanted to take the minimum of these for a MaxCombo test, we would first use fh_weight() to compute a correlation matrix for the above z-statistics as follows. Note that the ordering of rho_gamma and g in the argument list is opposite of the above. The correlation matrix for the z-values is now in V1-V4. We can compute a p-value for the MaxCombo as follows using mvtnorm::pmvnorm(). Note the arguments for GenzBretz() which are more stringent than the defaults; we have also used these more stringent parameters in the example in the help file.

y150 |>
  maxcombo(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1))
#> $method
#> [1] "MaxCombo"
#> 
#> $parameter
#> [1] "FH(0, 0) + FH(0, 1) + FH(1, 0) + FH(1, 1)"
#> 
#> $z
#> [1] -0.7193799 -0.2169866 -0.8900194 -0.4739608
#> 
#> $p_value
#> [1] 0.2870442

Simplification for 2-arm trials

The sim_fixed_n() routine combines much of the above to go straight to generating tests for individual trials so that cutting data and analyzing do not need to be done separately. Here the argument structure is meant to be simpler than for sim_pw_surv().

stratum <- data.frame(stratum = "All", p = 1)
enroll_rate <- data.frame(
  duration = c(2, 2, 10),
  rate = c(3, 6, 9)
)
fail_rate <- data.frame(
  stratum = "All",
  duration = c(3, 100),
  fail_rate = log(2) / c(9, 18),
  hr = c(0.9, 0.6),
  dropout_rate = rep(0.001, 2)
)
block <- rep(c("experimental", "control"), 2)
rho_gamma <- data.frame(rho = 0, gamma = 0)

Now we simulate a trial 2 times and cut data for analysis based on timing_type = 1:5 which translates to:

  1. the planned study duration,
  2. targeted event count is achieved,
  3. planned minimum follow-up after enrollment is complete,
  4. the maximum of 1 and 2,
  5. the maximum of 2 and 3.
sim_fixed_n(
  n_sim = 2, # Number of simulations
  sample_size = 500, # Trial sample size
  target_event = 350, # Targeted events at analysis
  stratum = stratum, # Study stratum
  enroll_rate = enroll_rate, # Enrollment rates
  fail_rate = fail_rate, # Failure rates
  total_duration = 30, # Planned trial duration
  block = block, # Block for treatment
  timing_type = 1:5, # Use all possible data cutoff methods
  rho_gamma = rho_gamma # FH test(s) to use; in this case, logrank
) |>
  gt() |>
  fmt_number(columns = c("ln_hr", "z", "duration"))
#> Backend uses sequential processing.
method parameter estimation se z event ln_hr cut duration sim
WLR FH(rho=0, gamma=0) -9.421199 5.167395 −1.82 107 −0.36 Planned duration 30.00 1
WLR FH(rho=0, gamma=0) -41.415307 9.226372 −4.49 350 −0.48 Targeted events 67.19 1
WLR FH(rho=0, gamma=0) -46.927432 9.352622 −5.02 361 −0.53 Minimum follow-up 72.17 1
WLR FH(rho=0, gamma=0) -41.415307 9.226372 −4.49 350 −0.48 Max(planned duration, event cut) 67.19 1
WLR FH(rho=0, gamma=0) -46.927432 9.352622 −5.02 361 −0.53 Max(min follow-up, event cut) 72.17 1
WLR FH(rho=0, gamma=0) -3.272526 5.212866 −0.63 109 −0.12 Planned duration 30.00 2
WLR FH(rho=0, gamma=0) -41.231880 9.235871 −4.46 350 −0.48 Targeted events 67.29 2
WLR FH(rho=0, gamma=0) -41.597357 9.388460 −4.43 362 −0.47 Minimum follow-up 69.47 2
WLR FH(rho=0, gamma=0) -41.231880 9.235871 −4.46 350 −0.48 Max(planned duration, event cut) 67.29 2
WLR FH(rho=0, gamma=0) -41.597357 9.388460 −4.43 362 −0.47 Max(min follow-up, event cut) 69.47 2

If you look carefully, you should be asking why the cutoff with the planned number of events is so different than the other data cutoff methods. To explain, we note that generally you will want sample_size above to match the enrollment specified in enroll_rate:

enroll_rate |> summarize(
  "Targeted enrollment based on input enrollment rates" = sum(duration * rate)
)
#>   Targeted enrollment based on input enrollment rates
#> 1                                                 108

The targeted enrollment takes, on average, 30 months longer than the sum of the enrollment durations in enroll_rate (14 months) at the input enrollment rates. To achieve the input sample_size of 500, the final enrollment rate is assumed to be steady state and extends in each simulation until the targeted enrollment is achieved. The planned duration of the trial is taken as 30 months as specified in total_duration. The targeted minimum follow-up is

total_duration <- 30 # From above
total_duration - sum(enroll_rate$duration)
#> [1] 16

It is thus, implicit that the last subject was enrolled 16 months prior to the duration given for the cutoff with “Minimum follow-up” cutoff in the simulations above. The planned duration cutoff is given in the total_duration argument which results in a much earlier cutoff.