UnplanSimon

Introduction
Overview of Simon’s Two-Stage Design
- Design Parameters and Constraints
- Example 1 (Designing a Study Using Simon’s Two-Stage Design)
Adaptive Threshold Simon Design (ATS Simon)
- Example 2 (ATS Simon design for Addressing Under-or Over-Enrollment in Simon’s Two-Stage Design based on Alpha Spending Function)
Adaptive Threshold and Sample Size Simon Design (ATSS Simon)
- Example 3 (Adaptive Threshold and Sample Size Simon Design (ATSS Simon) for Under- and/or Over-enrollment)
- Summary
Post-Trial Inference for ATS and ATSS Simon Designs with Under- and Over-Enrollment
References

Introduction

The UnplanSimon package serves three purposes, including:

Providing an Adaptive Threshold Simon Design (ATS Simon) method for Simon’s two-stage design in oncology trials when the realized sample sizes in the \(1^{st}\) and/or \(2^{nd}\) stage(s) are different from the planned sample sizes in the \(1^{st}\) and/or \(2^{nd}\) stage(s). The proposed ATS Simon design tries to follow sample sizes of the original design, to that end, this design updates the original thresholds of \((r_1, r)\) in the \(1^{st}\) and/or the \(2^{nd}\) stage(s) to satisfy the type I error rate as the original planned design (note: power will decrease if the realized sample size is smaller than the original one).
Providing an Adaptive Threshold and Sample Size Simon Design (ATSS Simon) method for Simon’s two-stage design in oncology trials when the realized sample sizes in the \(1^{st}\) and/or \(2^{nd}\) stage(s) are different from the planned sample sizes in the \(1^{st}\) and/or \(2^{nd}\) stage(s). The proposed ATSS Simon method updates not only the original threshold of \((r_{1}^{*}, r^*)\) but the original sample sizes of \((n_{1}^{*}, n^*)\) to satisfy the type I error rate and power requirements as the original planned design (note: unlike the ATS Simon design, the sample size here will also be updated to satisfy the original power). In addition, the ATSS Simon design also satisfies the other criteria as in the originally planned design, such as minimizing the average sample size under the null hypothesis \(H_0\).
Providing comprehensive post-trial inference tools at the end of the trial for Simon’s two-stage design when under- or over-enrollment occurs. This includes the computation of point estimates, confidence intervals, and p-values for the proposed ATS and ATSS Simon design methods.

This vignette introduces functionalities in {UnplanSimon} tailored for
designing single-arm clinical trials using the proposed method. Examples featuring Simon’s two-stage designs from {clinfun} demonstrate the application of these functions within our package.

To begin, install and load {UnplanSimon} and {clinfun}:

library(clinfun)
library(UnplanSimon)

Overview of Simon’s Two-Stage Design

Simon’s two-stage design is a statistical method used in clinical trials, particularly in Phase II studies, to evaluate the efficacy of a new treatment while minimizing the number of patients exposed to potentially ineffective treatments. It consists of two stages:

Stage 1:
-A predetermined number of patients (\(n_1\)) are enrolled and treated.
-If the number of responses (positive outcomes) in these patients meets or exceeds a certain threshold (\(r_1\)), the trial continues to the second stage.
-If the number of responses is equal to or below this threshold, the trial is stopped early for futility, concluding that the treatment is ineffective.

Stage 2:
-An additional number of patients (\(n_2\)) are enrolled and treated.
-The total number of responses from both stages is then evaluated against a final threshold (\(r\)).
-If the total number of responses exceeds the final threshold, the treatment is considered promising for further study.
-If not, the treatment is considered ineffective.

Simon’s two-stage design aims to reduce the number of patients receiving ineffective treatment while ensuring that promising treatments are identified efficiently. It balances the need for early stopping in case of futility with the need for sufficient data to make a reliable decision about the treatment’s efficacy (Simon, 1989).

From the above description, when designing a trial using Simon’s two-stage design, there are four key design parameters that need to be specified:

\(r_1\): Threshold in stage 1
\(r\): Threshold in stage 2
\(n_1\): Number of patients in stage 1
\(n\): Number of patients in stages 1 and 2, e.g., (\(n_1 + n_2\))

Design Parameters and Constraints

In order to determine the design parameters for Simon’s two-stage design, the followings are required:

\(p_0\): Unacceptable efficacy rate
\(p_1\): Desirable efficacy rate
\(\alpha\): Type I error rate
\(\beta\): Type II error rate

That is,

\(H_0\): The true treatment response rate is less than or equal to some unacceptable level (\(p \leq p_0\))
\(H_1\): The true treatment response rate is greater than or equal to some desirable level (\(p \geq p_1\))

Example 1 (Designing a Study Using Simon’s Two-Stage Design)

One trial employed Simon’s two-stage design, considering an overall response rate of 25% as unacceptable and 45% as desirable. The trial set a type I error rate at 10% and aimed for a target power of 90%, corresponding to a 10% type II error rate.

From the above, we have:

\(p_0\): 25% overall response rate
\(p_1\): 45% overall response rate
\(\alpha\): 10% type I error rate
\(\beta\): 10% type II error rate

Thus, the hypotheses used for testing are:

\(H_0\): The true overall response rate is less than or equal to 25% (\(p \leq 0.25\))
\(H_1\): The true overall response rate is greater than or equal to 45% (\(p \geq 0.45%\))

Now, using ph2simon() function from clinfun R package, specify these parameters and print the resulting object.

library(clinfun)
# Specify the parameters and constraints
trial = ph2simon(0.25, 0.45, 0.1, 0.1)
# Print
trial
#> 
#>  Simon 2-stage Phase II design 
#> 
#> Unacceptable response rate:  0.25 
#> Desirable response rate:  0.45 
#> Error rates: alpha =  0.1 ; beta =  0.1 
#> 
#>            r1 n1  r  n EN(p0) PET(p0)   qLo   qHi
#> Minimax     5 23 13 39  31.50  0.4685 0.752 1.000
#> Admissible  3 15 13 40  28.47  0.4613 0.026 0.752
#> Optimal     3 14 14 44  28.36  0.5213 0.000 0.026

This output provides the computed design parameters (\(r_1\), \(r\), \(n_1\), \(n\)) as well as \(EN(p_0)\) and \(PET(p_0)\) for three designs, optimal, minimax, and admissible designs.

As expected, the optimal design has the smallest expected sample size under the Null (\(EN(p_0)\) = 28.36), whereas the minimax design has the smallest maximum sample size (\(n\) = 39).

Adaptive Threshold Simon Design (ATS Simon)

In single arm phase II studies, under-enrollment or over-enrollment can be common issues when using Simon’s two-stage design. For rare diseases, it can be particularly challenging to recruit additional patients quickly in cases of under-enrollment. Therefore, we propose an Adaptive Threshold Simon design to assist clinical investigators in making decisions based on the actual sample size instead of the planned one while still adhering to the original design framework when under- or over-enrollment occurs.

A thorough literature review identified numerous clinical trials utilizing Simon’s two-stage designs, where the proportion of patients classified as inevaluable for response surpassed thresholds of 20%, 30%, and even 40%. The factors contributing to inevaluability included patients failing to complete the requisite number of cycles for response evaluation, being deemed ineligible upon central review, early withdrawal due to noncompliance, among other considerations (Ji, 2022), leading to under-enrollment during at Go/NoGo decision-making
timepoint in studies. Over-enrollment in multi-site trials often occurs due to variations in patient recruitment rates across different locations.

It is important to note that deviations our methods address, such as under-enrollment and over-enrollment, are incidental or non-informative, indicating they arise from unforeseen circumstances or factors rather than deliberate bias or systematic error.

We identify an updated \(1^{st}\) stage threshold \(r_1^{*}\) based on the actual \(1^{st}\) stage sample size \(n_{1}^*\) such that the updated design’s probability of early termination (PET) under the null hypothesis is the closest to the original PET under the null hypothesis. \[ P\left(Y_{1} \leq r_1^* \mid n_{1}^{*}, p_{0}\right) \approx P\left(Y_{1} \leq r_1 \mid n_{1}, p_{0}\right)\] The primary objective of Simon’s two-stage design, as well as our proposed ATS or ATSS Simon design, in single arm phase II studies, is to identify and eliminate ineffective drugs as early as possible, Therefore, in cases of deviations in sample size, the updated type I error rate (\(\alpha\)) of a new method (including the operating characteristic, like probability of early termination) should still be close to the original one.

In the field of group sequential design, the alpha-spending function is a natural way to reallocate the type I error rate (\(\alpha\)). We have adopted this method in our proposed ATS Simon’s design.

Based on the identified \(r_1^{*}\), we then define the alpha-spending function based on the actual total sample size (\(n^{*}\)). Specifically, we use the Lan-DeMets spending function (Lan-DeMets et al., 1983).

\[ \alpha(n^{*}) = \begin{cases} {2-2 \Phi\left(\frac{z_{1-\alpha/2}}{(n^*/n)^{1/2}}\right)} & \text{if } n^{*} \leq n \\ \alpha & \text{if } n^{*} > n \end{cases} \]
where \(\alpha\) here is the type I error rate in the original Simon two-stage design.

We can see, if the actual sample size \(n^{*} > n\), the updated alpha \(\alpha(n^{*})\) will be at most \(\alpha\) (the original one) and if the actual total sample size \(n^{*} < n\), the \(\alpha(n^{*})\) may be smaller than \(\alpha\) due to smaller sample size. Therefore, ATS Simon’s design can control the type I error rate at or below the original level. Based on the above information, we identify a smallest integer \(r^{*}\) as the threshold at \(2^{nd}\) stage such that \[ P\left(Y_{1} > r_1^{*}, Y > r^{*} \mid n_{1}^{*}, n^{*}, p_{0}\right) \approx \alpha(n^{*})\]

Here, \(n_1^{*}\) denotes as the actual sample size at the \(1^{st}\) stage; \(n^{*}\) denotes as the actual sample size of the two stages.

Rationale of identifying a smallest integer \(r^{*}\) is that probability \(P\left(Y_{1} > r_1^{*}, Y > r^{*} \mid n_{1}^{*}, n^{*}, p\right)\) is a decreasing function of the threshold \(r^{*}\) in stage 2. So, if we find such a \(r^{*}\) under \(p_0\) and based on the fact of \(\alpha(n^{*}) \leq \alpha\), the type I error rate is well controlled. Since the trends of alpha and power are the same (i.e., if alpha increases, power also increases), if alpha is close to the original value, the power will also be close to the original value. This means we can simultaneously maintain the original power or minimize power loss. That is, if we select a larger \(r^{*}\) instead of the smallest one, we cannot minimize the power loss.

Example 2 (ATS Simon design for Addressing Under-or Over-Enrollment in Simon’s Two-Stage Design based on Alpha Spending Function)

Example 2.1 (Under-enrollment in \(1^{st}\) stage)

Firstly, let’s introduce the usage of ATS_Design(). Under-enrollment in the first stage is more critical based on our understanding, so we will focus on this scenario for illustration purposes. In the optimal Simon’s two-stage design described in Example 1, the planned sample size for the \(1^{st}\) and \(2^{nd}\) stages are 14 and 30, respectively.

Suppose we currently have outcome data for only 11 patients in the \(1^{st}\) stage and assume the \(2^{nd}\) stage sample size for evaluable patients remains as planned, We can use ATS_Design() to provide updated thresholds for the interim analysis, without needing to wait for the number of evaluable patients to reach 14. This means the input n1_star is 11 and n_star is 41 (=11+30 instead of the original 14+30 as the total sample size), as the original optimal design specifies a sample size of 30 for the second stage.

ATS_Design(n1=14,n=44,n1_star=11,n_star=41,r1=3,r=14,p0=0.25,p1=0.45,alpha=0.1)

The updated design parameters, \((r_1^{*},r^{*})\), by ATS Simon design method are (2, 14). We also output \((n_1^{*},n^{*})\) and they are just actual sample sizes of stage 1 and the total sample size. The type I error rate of this updated design is 0.06, which is below the original type I error constraint of 0.1. As we know, if alpha decreases, power will also decrease. Therefore, the current power of 85.4% is lower than the original design’s 90% as expected, but still close to the original power. Additionally, the probability of early termination is 0.455, which is close to the original design’s 0.521.

Now, suppose the number of patients responding to the new treatment in the first stage is > 2. In that case, we will make a “Go” decision and recruit an additional 30 patients for the second stage of the study.

Example 2.2 (Under-enrollment in both \(1^{st}\) and \(2^{nd}\) stages)

We now consider an under-enrollment scenario in the \(2^{nd}\) stage, in addition to the \(1^{st}\) stage’s under-enrollment.

For example, if the actual sample size in the \(2^{nd}\) stage is 28 (the planned one is 30), indicating under-enrollment. Now the total sample size is 39, e.g., the input n_satr = 39.

Using the updated design parameters with sample sizes at stages 1 and 2 of \((n_1^{*},n^{*})\) = (11, 39), and the other original design parameters in the following code, we can determine the updated design parameters and operating characteristics.

ATS_Design(n1=14,n=44,n1_star=11,n_star=39,r1=3,r=14,p0=0.25,p1=0.45,alpha=0.1)

Our updated design parameters \((r_1^{*},r^{*})\) are now (2, 13). We also output \((n_1^{*},n^{*})\) and they are just actual sample sizes of stage 1 and the total sample size. The updated type I error rate is 0.077, which is below the constraint of 0.1.

Example 2.3 (Under-enrollment in \(1^{st}\) and Over-enrollment in \(2^{nd}\) stage)

As another example, suppose the actual sample size in the \(2^{nd}\) stage is now 31, indicating over-enrollment. Therefore, the input n_satr would be 42.

ATS_Design(n1=14,n=44,n1_star=11,n_star=42,r1=3,r=14,p0=0.25,p1=0.45,alpha=0.1)

The updated design parameters \((r_1^{*},r^{*})\) are now (2, 14). We also output \((n_1^{*},n^{*})\) and they are just actual sample sizes of stage 1 and the total sample size.The type I error for our ATS Simon design is 0.071, which is below the constraint of 0.1.

Adaptive Threshold and Sample Size Simon Design (ATSS Simon)

Rather than offering the above ATS design by merely adjusting thresholds based on the realized sample size, we present another option that follows the original Simon’s two-stage design algorithm: the Adaptive Threshold and Sample Size Simon (ATSS Simon) method. This approach extends Simon’s two-stage design by simultaneously adjusting both thresholds and sample size.

The overall strategy of this method is detailed as below:

\(\bullet\) Scenario 1: When under-enrollment or over-enrollment occurs at the \(1^{st}\) stage, we identify the design parameters \((r_{1}^{*}, r^{*}, n^{*})\) based on the actual sample size \(n_{1}^*\) at the \(1^{st}\) to satisfy the significance level \(\alpha\) and power \(1-\beta\): \[P\left(Y_{1} > r_1^*, Y > r^* \mid n_{1}^{*}, n^{*}, p_{0}\right) \leq \alpha\] \[P\left(Y_{1} > r_1^*, Y > r^* \mid n_{1}^{*}, n^{*}, p_{1}\right) \geq 1-\beta\] In addition, the design parameters \((r_{1}^{*}, r^{*}, n^{*})\) satisfies the same criteria as in the Optimal Simon’s Two Stage: minimizing the average sample size under the null hypothesis.

\(\bullet\) Scenario 2 (\(n^{*}\) changes again): When a Go decision has been made, the realized sample size \(n^{**}\) may be again different from \(n^{*}\). Further adjustment of the threshold at the \(2^{nd}\) stage is needed. So, we update again this threshold \(r^{*}\) such that we identify a smallest integer \(r^{**}\) given the design parameters \((r_{1}^{*}, n_{1}^{*}, n^{**})\) satisfying \[P\left(Y_{1} > r_1^*, Y > r^{**} \mid n_{1}^{*}, n^{**}, p_{0}\right) \leq \alpha \] Here, \(n_1^{*}\) denotes as the actual sample size at the \(1^{st}\) stage; \(n^{*}\) denotes as the actual total sample size of the two stages after the interim analysis based on \(n_1^{*}\)； \(n^{**}\) denotes as the actual total sample size of the two stages if \(n^{*}\) is again updated.

Example 3 (Adaptive Threshold and Sample Size Simon Design (ATSS Simon) for Under- and/or Over-enrollment)

In the optimal Simon’s two-stage design described in Example 1, the planned sample size for the \(1^{st}\) and \(2^{nd}\) stages are 14 and 30, respectively.

Example 3.1 (Under-enrollment at \(1^{st}\) stage)

Suppose we currently have outcome data for only 11 patients in the \(1^{st}\) stage and assume the \(2^{nd}\) stage sample size remains as planned. We can use ATSS_Design_Stage1() to implement the introduced ATSS Simon algorithm. Here, the parameter n1_satr is 11.

Note: Different from the previous ATS examples, we now should be noted that it is unnecessary to input n_star as a parameter, since the current ATSS method will compute an updated total sample size based on the sample size deviation of the stage 1 from the original design.

ATSS_Design_Stage1(p0=0.25,p1=0.45,n1_star=11,alpha=0.1,beta=0.1)

The updated design parameters \((r_1^{*},r^{*},n_1^{*},n^{*})\) are (2, 15, 11, 47). The type I error for our redesign is 0.09, which is controlled well and < 0.1. The updated power for our redesigned study is 0.901, surpassing the constraint of 0.9. This increase is due to the ATSS Simon design method, which now provides a larger total sample size of 47, compared to 44 in the original design.

Example 3.2 (Under-enrollment at \(1^{st}\) and \(2^{nd}\) stages)

If more than 2 patients respond to the new treatment in the first stage of our redesigned study, we will proceed to recruit 36 additional patients (47 total minus the 11 from the \(1^{st}\) stage). However, if now the actual sample size in the second stage is 34, indicating under-enrollment, we can use the function ATSS_Design_Stage2() to update the design parameters based on the realized total sample size. This adjustment means the parameter n_double_star is now 45.

Note: In the following code, n1_star and n_double_star are fixed since the trial is complete. Essentially, the ATSS algorithm only updates the threshold for stage 2, r_star. In this example, however, r_star remains unchanged compared to Example 3.1.

ATSS_Design_Stage2(p0=0.25,p1=0.45,r1_star=2,n1_star=11,n_double_star=45,alpha=0.1)

Our updated design parameters \((r_1^{*},r^{*},n_1^{*},n^{*})\) are (2, 15, 11, 45). And the updated type I error rate is 0.066 smaller than the original one of 0.1 and the updated power is 0.878, both are due to the realized smaller total sample size, though in this example, the updated r_star is still 15.

Example 3.3 (Under-enrollment at \(1^{st}\) and Over-enrollment at \(2^{nd}\) stage)

Suppose more than 2 patients respond to the new treatment at the end of the \(1^{st}\) stage of our redesigned study. In that case, we will make a “Go” decision to recruit 36 more patients into our study. However, if the actual sample size in the \(2^{nd}\) stage is 37, indicating over-enrollment, we can use the function ATSS_Design_Stage2() to update the design parameters based on the actual total sample size. In the following code, this means the parameter n_double_star is now 48.

ATSS_Design_Stage2(p0=0.25,p1=0.45,r1_star=2,n1_star=11,n_double_star=48,alpha=0.1)

Our updated design parameters \((r_1^{*},r^{*},n_1^{*},n^{*})\) are (2, 16, 11, 48). And updated type I error rate is 0.061 and updated power is 0.884.

Note: Compared to Example 3.2, although the current total sample size is larger (48 vs. 45), we expect to see an increase in power and type I error rate larger than those in Example 3.2. However, while the power is as expected, the type I error rate is not. This is because the optimally searched threshold at stage 2, r_star, is 16, compared to 15 in Example 3.2. Our algorithm dictates controlling the type I error rate and could only identify 16 as the threshold.

## Our algorithm searching process
result <- data.frame(
  r_star = round(c(2.0000000, 3.0000000, 4.0000000, 5.0000000, 6.0000000, 
                   7.0000000, 8.0000000, 9.0000000, 10.0000000, 11.0000000, 
                   12.0000000, 13.0000000, 14.0000000, 15.0000000, 16.0000000)),
  alpha = round(c(0.5447991, 0.5447929, 0.5447130, 0.5442052, 0.5421068, 
                  0.5357601, 0.5207790, 0.4920445, 0.4460005, 0.3831060, 
                  0.3087428, 0.2317242, 0.1611787, 0.1035884, 0.0614173), 3),
  power = round(c(0.9347765, 0.9347765, 0.9347765, 0.9347765, 0.9347763, 
                  0.9347752, 0.9347684, 0.9347366, 0.9346115, 0.9341919, 
                  0.9329744, 0.9298792, 0.9229204, 0.9089765, 0.8839142), 3)
)
print(result)
#>    r_star alpha power
#> 1       2 0.545 0.935
#> 2       3 0.545 0.935
#> 3       4 0.545 0.935
#> 4       5 0.544 0.935
#> 5       6 0.542 0.935
#> 6       7 0.536 0.935
#> 7       8 0.521 0.935
#> 8       9 0.492 0.935
#> 9      10 0.446 0.935
#> 10     11 0.383 0.934
#> 11     12 0.309 0.933
#> 12     13 0.232 0.930
#> 13     14 0.161 0.923
#> 14     15 0.104 0.909
#> 15     16 0.061 0.884

Summary

Both the proposed ATS and ATSS Simon design methods can address under- and over-enrollment. The ATS algorithm updates the thresholds based on the actual sample sizes, maintaining control over the type I error rate \(\alpha\) but potentially resulting in lower power than the original design during under-enrollment. In contrast, the ATSS algorithm updates both the thresholds and sample sizes, thereby controlling the \(\alpha\) while maintaining power similar to the original design, though it may require a larger sample size.

Post-Trial Inference for ATS and ATSS Simon Designs with Under- and Over-Enrollment

Key design details of two-stage single-arm trials are frequently left unreported, and their statistical inference is often not conducted in a way that mitigates the bias introduced by interim analyses. This issue is particularly concerning given the increasing reliance on non-randomized trials, which now represent a significant portion of the evidence on treatment effectiveness in rare, biomarker-defined patient subgroups (Grayling, 2021). Motivated by this problem, we inherit some widely used post-trial inference methods for ATS and ATSS Simon Designs with Under- and Over-Enrollment.

Point Estimate

When a multistage trial is ended, we also want to estimate the true response probability \({\pi}_{p}\) of the new therapy. The most commonly used estimator is the sample response rate, i.e. the maximum likelihood estimator (MLE): \[ \hat{\pi}_{p} = \frac{S}{N} \] However, in multi-stage designs like Simon’s Two-Stage design, we observe only extreme cases by crossing the threshold in the first stage, and hence the MLE is biased. In other words, the MLE is biased due to the sequential nature of the trial. This is known as the optional sampling effect. Here, we adopt Jung’s method for the estimation of the binomial probability in multistage clinical trials (Jung, 2004). Based on the Rao-Blackwell theorem, they derived the uniformly minimum variance unbiased estimator (UMVUE) as the conditional expectation of an unbiased estimator, which in this case is simply the maximum likelihood estimator based only on the first stage data, given the sufficient statistic. Let \(M\) denote the stopping stage (\(2^{nd}\) stage in our context) and let \(S=S_M\) denote the total number of responders accumulated up to the stopping stage. For observation \((m,s)\), the UMVUE of the response rate \(p\) is given by: \[ \hat{\pi}_{p} = \begin{cases} \frac{S}{n_1} & \text{if } m = 1 \\ \frac{\sum_{x_{1}=\left(r_{1}+1\right) \vee\left(S-n_{2}\right)}^{S \wedge n_{1}} {n_1 \choose x_1} {n_2-1 \choose S-x_1-1}}{ \sum_{x_{1}=\left(r_{1}+1\right) \vee\left(S-n_{2}\right)}^{S \wedge n_{1}} {n_1 \choose x_1} {n_2 \choose S-x_1}} & \text{if } m = 2 \end{cases} \]
At stage \(m\), we may accrue slightly more (or possibly less) patients than planned sample size as we introduced previously, especially in multicenter trials. UMVUE also provides an unbiased estimator for the conditional expectation by using all realized sample size.

Confidence intervals:

A conventional approach for constructing confidence intervals is to use the Clopper-Pearson exact confidence interval, disregarding the group sequential nature of the trial (Clopper et al., 1934). \[P(Y \geq y \mid p_{L}) = \sum_{k=y}^{n}{n \choose k}p_{L}^{k}(1-p_{L})^{n-k}=\frac{\alpha}{2}\] \[P(Y \leq y \mid p_{U}) = \sum_{k=y}^{n}{n \choose k}p_{U}^{k}(1-p_{U})^{n-k}=\frac{\alpha}{2}\] We now focus on constructing confidence intervals by considering deviations in sample sizes from the planned ones in the \(1^{st}\) and/or \(2^{nd}\) stages. Jung (2004) proposed a method especially when the treatment successfully goes to the second stage. Let \(M\) denote the stage at which a trial is terminated, and \(S\) denote the number of responders at stage \(M\). This method constructs confidence intervals based on the stochastic ordering of the distribution of \((M,S)\) with respect to the response rate \(p\).

\[P(\hat{p_{u}}(M,S) \geq \hat{p_{u}}(m,s) \mid p_{L}) = \frac{\alpha}{2}\] \[P(\hat{p_{u}}(M,S) \leq \hat{p_{u}}(m,s) \mid p_{U}) = \frac{\alpha}{2}\] However, the Clopper-Pearson confidence interval is known to be conservative, with the actual confidence level being bounded below by \((1-\alpha)\). Jung’s method also inherits this conservatism (Porcher et al., 2012).

To correct for this conservative nature, Porcher extended the Jung’s confidence interval with a mid-\(p\) approach (Porcher et al., 2012). This method is our recommended approach, even though all the above methods have also been integrated into the {UnplanSimon} R package.
\[P(\hat{p_{u}}(M,S) > \hat{p_{u}}(m,s) \mid p_{L}) + \frac{1}{2}P(P(\hat{p_{u}}(M,S) = \hat{p_{u}}(m,s) \mid p_{L})) = \frac{\alpha}{2}\] \[P(\hat{p_{u}}(M,S) < \hat{p_{u}}(m,s) \mid p_{U}) + \frac{1}{2}P(P(\hat{p_{u}}(M,S) = \hat{p_{u}}(m,s) \mid p_{U})) = \frac{\alpha}{2}\]

\(p\)-Value:

Kunzmann et al. (2024) raised a question whether a frequentist inferential framework for two-stage designs has a consistent test decision among \(p\)-values, point estimates, and confidence intervals. In practice, however, the consistency in those test decisions is often presumed by the non-statistical readership of published trial results and ambiguous situation can be avoided by using a consistent framework.

So here, we used the \(p\)-value defined as the probability of obtaining more extreme estimates toward \(H_1\) than the observed one when \(H_0\) is true based on the UMVUE ordering (Jung, 2006). Hence, for testing \(H_0:p=p_0\) against \(H_0:p=p_0 (p_0<p_1)\), the \(p\)-value for an estimate \(\hat{p}(m,s)\) will be given as \[ p_s = \begin{cases} 1-\sum_{(i,j):\hat{p_u}(i,j) < \hat{p_u}(m,s)}f_{p_0}(i,j) & \text{if } m = 1 \\ \sum_{(i,j):\hat{p_u}(i,j)\geq \hat{p_u}(m,s)}f_{p_0}(i,j) & \text{if } m = 2 \end{cases} \] It can be rewritten as \[ p_s = \begin{cases} P(Y_1 \geq s \mid p_0) & \text{if } m = 1 \\ \sum_{y_1=r_1+1}^{n_1} P(Y_1 =y_1 \mid p_0) P(Y_2 \leq s-y_1 \mid p_0) & \text{if } m = 2 \end{cases} \]

Example 4 (Post-Trial Inference usage for ATS and ATSS Simon Designs)

Example 4.1 (Post-Trial Inference for ATS Simon Design with Under-Enrollment)

Suppose we use the ATS Simon’s design method to address the problem of under-enrollment. The original Simon’s Two-Stage design is \((r_1,r,n_1,n) = (3,14,14,44)\). Considering the under-enrollment problem only at the \(1^{st}\) stage like Example 2.1, the new design is \((r_1^{*},r^*,n_1^{*},n^{*}) = (2,14,11,41)\). Especially, it should be noted that the input alpha here is the updated type I error constraint \(\alpha(n^{*})\) if we used the ATS Simon’s design. In Example 2.1, the updated type I error constraint \(\alpha(n^{*})\) is 0.088. So it means the input ’alpha here is 0.088. If the new treatment successfully enters the second stage and 20 patients respond to it, this means the parameter m is 2 and s is 20 under this context.

We now compute the point estimate (UMVUE), confidence interval and \(p\)-value based on the above setting. Note here, the option "CP" means Clopper-Pearson exact confidence interval，option "Jung" means Jung’s confidence interval and option "MIDp" means mid-\(p\) approach confidence interval.

SimonAnalysis(m=2, s=20, n1=11, n2=30, r1=2, r=14, alpha=0.088, quantile=c(0.025,0.975), CI_option = "CP", p0=0.25)

SimonAnalysis(m=2, s=20, n1=11, n2=30, r1=2, r=14, alpha=0.088, quantile=c(0.025,0.975), CI_option = "Jung", p0=0.25)

SimonAnalysis(m=2, s=20, n1=11, n2=30, r1=2, r=14, alpha=0.088, quantile=c(0.025,0.975), CI_option = "MIDp", p0=0.25)

Note: the computed UMVUEs and p values are the same since the above introduced same UMVUE method and approach for computing p value applied to all the three ways of computing the CIs. And the input alpha here is the updated type I error constraint \(\alpha(n^{*})\) if we used the ATS Simon’s design.

We can see the point estimate (UMVUE) for the response rate is 0.494; Clopper-Pearson exact confidence interval is (0.347, 0.630), Jung’s confidence interval is (0.329, 0.629), mid-\(p\) approach confidence interval is (0.339, 0.641). Also, we see that \(p\)-value is 0.001 smaller than the updated type I error constraint \(\alpha(n^{*})\) of 0.088. So we can reject the null hypothesis \(H_0\): The true treatment response rate is less than or equal to some unacceptable level (\(p \leq p_0=0.25\)).

Example 4.2 (Post-Trial Inference for ATSS Simon Design with Under-Enrollment)

Suppose we use the ATSS Simon’s design method to address the problem of under-enrollment. The original Simon’s Two-Stage design is \((r_1,r,n_1,n) = (3,14,14,44)\). Considering the under-enrollment problem only at the \(1^{st}\) stage like Example 3.1, the new design is \((r_1^{*},r^*,n_1^{*},n^{*}) = (2,15,11,47)\). If the new treatment successfully enters the second stage and 22 patients respond to it, this means the parameter m is 2 and s is 22 under this context.

SimonAnalysis(m=2, s=22, n1=11, n2=36, r1=2, r=15, alpha=0.1, quantile=c(0.025,0.975), CI_option = "CP", p0=0.25)

SimonAnalysis(m=2, s=22, n1=11, n2=36, r1=2, r=15, alpha=0.1, quantile=c(0.025,0.975), CI_option = "Jung", p0=0.25)

SimonAnalysis(m=2, s=22, n1=11, n2=36, r1=2, r=15, alpha=0.1, quantile=c(0.025,0.975), CI_option = "MIDp", p0=0.25)

We can see the point estimate (UMVUE) for the response rate is 0.478; Clopper-Pearson exact confidence interval is (0.342, 0.597), Jung’s confidence interval is (0.322, 0.604), mid-\(p\) approach confidence interval is (0.330, 0.615). Also, we see that \(p\)-value is 0.001 smaller than the type I error constraint \(\alpha\) 0.01. So we can reject the null hypothesis \(H_0\): The true treatment response rate is less than or equal to some unacceptable level (\(p \leq p_0=0.25\)).

References

Simon, R. (1989). Optimal two-stage designs for phase II clinical trials. Controlled clinical trials, 10(1), 1-10. doi:10.1016/0197-2456(89)90015-9

Ji, L., Whangbo, J., Levine, J. E., & Alonzo, T. A. (2022). Inefficiency of two-stage designs in phase II oncology clinical trials with high proportion of inevaluable patients. Contemporary clinical trials, 120, 106849. doi:10.1016/j.cct.2022.106849

Gordon Lan, K. K., & DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika, 70(3), 659-663. doi:10.1093/biomet/70.3.659

Grayling, M. J., & Mander, A. P. (2021). Two-stage single-arm trials are rarely analyzed effectively or reported adequately. JCO Precision Oncology, 5, 1813-1820. https://doi.org/10.1200/PO.21.00276

Jung, S. H., & Kim, K. M. (2004). On the estimation of the binomial probability in multistage clinical trials. Statistics in medicine, 23(6), 881-896. https://doi.org/10.1002/sim.1653

Clopper, C. J., & Pearson, E. S. (1934). The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26(4), 404-413. https://doi.org/10.2307/2331986

Porcher, R., & Desseaux, K. (2012). What inference for two-stage phase II trials?. BMC medical research methodology, 12, 1-13. https://doi.org/10.1186/1471-2288-12-117

Kunzmann, K. (2024). Optimal Adaptive Designs for Early Phase II Trials in Clinical Oncology (Doctoral dissertation). https://archiv.ub.uni-heidelberg.de/volltextserver/34225/

Jung, S. H., Owzar, K., George, S. L., & Lee, T. (2006). P-value calculation for multistage phase II cancer clinical trials. Journal of Biopharmaceutical Statistics, 16(6), 765-775. https://doi.org/10.1080/10543400600825645