- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2673021

# Between-Arm Comparisons in Randomized Phase II Trials

^{1}E-mail: ude.ekud@gnuj.ohnis

## SUMMARY

In a phase II trial, we may randomize patients to multiple arms of experimental therapies and evaluate their efficacy to determine if any of them is worth of a large scale phase III trial. Usually the primary objective of such study is to identify experimental therapies that are efficacious compared to a historical control. Each arm is independently evaluated using a standard design for single-arm phase II trial, e.g. Simon’s optimal or minimax design. When more than one arms are accepted through such a randomized trial, we may want to select the winner(s) among them. There are methods for between-arm comparisons in the literature, but most of them have drawbacks. They have a large false selection probability (type I error) when the competing arms have a small difference in efficacy, or the statistical tests used in the selection procedure do not properly reflect the small sample sizes and multi-stage design of the trials. In this paper, we propose between-arm comparison methods for selection in randomized phase II trials addressing these issues.

**Keywords:**Pick-the-winner, Type I error, Type II error, Two-stage design, Uniformly minimum variance unbiased estimator

## 1 Introduction

Phase II clinical trials are designed to screen out experimental therapies with low efficacy before they proceed to a large scale phase III trial. Often, we have multiple experimental therapies for efficacy screening with respect to the same patient population. Usually, the resources for clinical trials are limited, so that we may want to choose only small number of therapies, ideally one therapy, to be compared with a standard therapy through a phase III trial. In this setting, we may take one of two approaches: (i) Conduct multiple separate phase II trials, one for each experimental therapy, and evaluate them independently using a standard phase II trial design method for a single-arm phase II trial; (ii) Conduct a single phase II trial with multiple arms, randomize patients into the arms, and choose the best arm(s) using a selection method. The former approach requires more research resources due to the multiplicity of the studies. Also, the individual phase II trials may potentially have different patient characteristics, and the comparison among different therapies can be biased.

To avoid these issues, the second approach is attractive. However, the statistical approaches for analyzing randomized phase II trials are limited. Simon, Wittes and Ellenberg (1985) consider randomizing *n* patients to each of *K* treatment arms through a single stage and picking the winner, the arm with the largest estimated response rate, among them. this approach is based on the statistical methods of ranking and selection, the basic concepts of which were introduced over 50 years ago by Beckhofer (1954), with a substantial literature since that time. They show that, depending on the design setting, *n* = 16 to 70 patients are required for a 0.9 correct selection probability when there exists a difference of 0.15 in response rate among the *K* arms. Liu, LeBlanc and Desai (1999) point out that this approach has a high selection probability even when the treatment arms have the same response rates. Sargent and Goldberg (2001) consider a similar approach by allowing selection based on other factors when the difference in observed response rates is small.

Thall, Simon and Ellenberg (1989) consider studies with one control and *K* experimental arms. In the first stage, *n*_{1} patients are randomized to each of *K* experimental arms, and the winner is chosen for the second stage if its observed efficacy is larger than that for the historical control by 10%. The trial is stopped early if the winner does not satisfy this condition. In the second stage, *n*_{2} patients are randomized to each of the control arm and the selected experimental arm from stage 1, and one-sided testing is conducted to see if the experimental arm is better than the control. They require *n*_{1} = 30 to 80 patients and *n*_{2} = 90 to 140 patients under different design setting.

Palmer (1991) proposes a two-stage design for selection of the best of three treatments. In stage 1, cohorts of three patients are randomized to Arms A, B and C, and a decision is made to continue to accrue the next cohort or to stop and choose the better two arms. In stage 2, cohorts of two patients are randomized to the two arms chosen at stage 1, and a decision is made to continue to accrue the next cohort or to stop and choose the winner. Given the maximum number of patients available for the study, the stopping time for each stage is chosen to minimize the number of future failures using a Baysian approach. This method requires rapid determination of responses to be able to apply the sequential tests.

Steinberg and Venzon (2002) propose two-stage designs for a phase II trial with two experimental arms. In stage 1, *n*_{1} patients are randomized to each arm. The trial is stopped after stage 1 if the difference in number of responders between the two arms are larger than *d*, which is chosen so that, when the two arms have a difference of 0.15 in response rate, the probability of selecting the inferior arm is controlled at a specified level. Otherwise, the trial proceeds to stage 2 to randomize an additional *n*_{2} patients to each arm. After stage 2, the winner is chosen based on the cumulative responses through the two stages. Given *n* = *n*_{1} + *n*_{2}, one can choose *n*_{1} = *n*_{2} = *n*/2 or to minimize the expected sample size for the specified response rates with 0.15 of difference. This approach does not control the overall error probabilities through the two stages.

Most of these existing methods do not accurately control the type I error and the power for the whole selection procedure. Furthermore, they do not allow unequal designs among different arms. We propose exact and efficient between-arm comparison methods for analyzing randomized phase II trials designed for independent evaluation of each arm. The proposed methods can be used for comparing the response data from multiple single-arm trials on competing therapies with similar patient populations as well. We use the uniformly minimum variance unbiased estimator (UMVUE) since, as shown by Jung and Kim (2004), for 2-stage phase II trial designs, the maximum likelihood estimator (MLE) can be seriously biased, and the efficiency of UMVUE is comparable to that of MLE. In Section 2, we briefly review the UMVUE for multi-stage designs. We derive between-arm comparison methods under various conditions in Section 3. Some numerical studies are conducted in Section 4.

## 2 UMVUE - Review

We consider two-stage design for a single-arm phase II trial in this section. For an experimental cancer therapy, let *p*_{0} denote the maximum unacceptable response rate and *p*_{1} denote the minimum acceptable response rate (*p*_{0} < *p*_{1}). Also, let *p* denote the true response rate of the therapy. A typical two-stage phase II trial is conducted as follows. During stage 1, *n*_{1} patients are enrolled and treated. If the number of responders is less than or equal to *a*_{1}, the trial is terminated for lack of efficacy and it is concluded that the treatment does not warrant further investigation (i.e., accept *H*_{0}: *p* = *p*_{0}). Otherwise, the study is continued to stage 2 during which an additional *n*_{2} patients are enrolled and treated. If the cumulative number of responders after stage 2 does not exceed *a*, it is concluded that the treatment lacks sufficient efficacy (i.e., accept *H*_{0}: *p* = *p*_{0}). Otherwise, it is concluded that the treatment has a sufficient efficacy, and the treatment will be considered for further investigation in subsequent trials (i.e., accept *H*_{1}: *p* = *p*_{1}).

Refer to Simon (1989), Jung, Carey and Kim (2001), Jung et al. (2004) for the search of optimal two-stage designs. One may employ an upper boundary to stop the trial early when a significantly high efficacy is observed from stage 1 (Chang et al, 1987; Spiegelhalter, Freedman, Blackburn, 1986). However, there being no compelling ethical argument and thus rarely used, we consider early stopping only for lack of efficacy in this paper.

A two-stage design is defined by the number of patients to be accrued during stages 1 and 2, *n*_{1} and *n*_{2}, and the boundary values *a*_{1} and *a* (*a*_{1} < *a*). So, we specify a two-stage design by (*a*_{1}/*n*_{1}*, a*/*n*), where *n* = *n*_{1} + *n*_{2}, called the maximum sample size. Let *M* denote the stopping stage and *S* = *S _{M}* denote the total number of responders accumulated up to the stopping stage.

The UMVUE = (*m, s*), of *p* is given as

where *u* ∧ *v* = min(*u, v*) and *u* *v* = max(*u, v*). Refer to Jung and Kim (2004) for details. The distribution of the UMVUE is derived using the probability mass function of (*M, S*) in a two-stage design with lower stopping boundaries only is given as

## 3 Comparison and Selection for Randomized Phase II Designs

We consider two-arm randomized phase II studies, each arm with a two-stage design for independent evaluation as the primary objective. Following example study is used throughout this section.

### Example 1

Suppose that we randomize non-Hodgkin lymphoma patients who relapsed from a rituximab-containing combination regimen to rituximab alone (Arm R, *n* = 90) or ritux-imab+lenalidomide (Arm R+L, *n* = 45) with 2-to-1 probability. The two arms have the following two-stage designs:

Arm R: (*a*_{1}/*n*_{1}*, a*/*n*) = (10/57, 19/90) for 4% type I error at *p*_{0} = 0.15 and 95% power at *p*_{1} = 0.30.

Arm R+L: (*a*_{1}/*n*_{1}*, a*/*n*) = (4/21, 10/45) for 5% type I error at *p*_{0} = 0.15 and 89% power at *p*_{1} = 0.35.

Arm R is a potential control arm for a future phase III trial in case Arm R+L is accepted in this trial, but it is included in this phase II trial because there is not enough historical data on the regimen. Twice as many patients will be accrued to Arm R than to Arm R+L to allow more precise estimation of the clinical parameters to be used in designing a future phase III trial. Arm R+L may not be investigated further if it does not seem to be more efficacious than Arm R. We want to compare the two arms accounting for the two-stage design for each arm.

In general, we call the two arms *x* and *y*, respectively. For an outcome (*m _{k}, s_{k}*), let

*=*

_{k}*(*

_{k}*m*) denote the UMVUE for the true response probabilities

_{k}, s_{k}*p*in arm

_{k}*k*(=

*x, y*).

#### 3.1 When Both Arms Have Identical Two-Stage Designs

In this subsection, we assume that the two arms have the same two-stage design (*a*_{1}/*n*_{1}*, a*/*n*) for independent evaluation.

##### 3.1.1 When One Arm Is a Control

One may want to conduct a randomized phase II trial to evaluate an experimental therapy comparing with a prospective control. This may happen when the control therapy has been used as a standard without formal evaluation through a prospective study, or needs more testing in an extended patient population. In this case, the prospective control arm may also be evaluated using a standard two-stage design for phase II trials.

When such a trial is completed, we may want to test if the experimental arm (arm *y*) is better than the control (arm *x*) or not. The hypotheses associated with this type of comparison are

This is a one-sided test. So in this case, we usually would not want to accept the experimental arm *y* if it is not accepted in the independent evaluation. Thus, we want to accept the experimental arm (or, reject *H*_{0}) if it is accepted in the independent evaluation, i.e. *m _{y}* = 2 and

*s*>

_{y}*a*, and

*−*

_{y}*≥*

_{x}*c*for a chosen critical value

*c*. Let = {(

*m, s*):

*m*= 1, 0 ≤

*s*≤

*a*

_{1}} {(

*m*,

*s*):

*m*= 2,

*a*

_{1}+ 1 ≤

*s*≤

*n*} denote the sample space of each arm defined by the design (

*a*

_{1}/

*n*

_{1}

*, a*/

*n*). Then, given a true response probability

*p*=

_{x}*p*=

_{y}*p*under

*H*

_{0}, the probability of rejecting

*H*

_{0}is

where *I*(·) is the indicator function and *f*(*m, s*|*p*) denotes the probability mass function of (*M, S*) under the common two-stage designs. More generally, the probability of an event *A* in
^{2} is calculated as

In contrast to common asymptotic tests, such as the two-sample t-test, the operating characteristics of our exact test depends on the null response probability *p*, an unknown nuisance parameter. In order to remove the nuisance parameter, we control the type I error by maximizing the probability in (1) over the whole parameter space *p* [0, 1], or over a subset of interest [0, 1]. See Berger and Boos (1994) for the rationale for such an approach. Given *α*, we want to choose a critical value *c* = *c _{α}* so that the probability of accepting arm

*y*is no larger than

*α*under

*H*

_{0}, i.e.

We will refer to probability (2) as the type I error. Let *p*_{0} denote the response rate of a historical control. Then, we may choose a small interval such as = [*p*_{0} − 0.2, *p*_{0} + 0.2]. In our experience, the maximum type I error usually occurs within this range. Of course, if we want type I error control under any possible situation, we have to choose = [0, 1]. We use the latter in this paper.

Let *H*(*c*) = max_{p}_{} *h*(*c*|*p*). Obviously, *h*(*c*|*p*) is monotone in *c*. Given *c*, however, *h*(*c*|*p*) can have local maxima over *p* . For example, when both arms have the same design as that of Arm R+L in Example 1, (*a*_{1}/*n*_{1}*, a*/*n*) = (4/21, 10/45), Figure 1 displays *h*(*c* = 0.1|*p*) over *p* [0, 1]. Note that there are two maxima. So, given *α*, calculation of the critical value *c _{α}* requires a 2-stage numerical search procedure. For a given critical value

*c*,

*H*(

*c*) is calculated by the grid search for the maximum of

*h*(

*c*|

*p*) in the range of

*p*[0, 1]. For any

*p*[0, 1],

*h*(

*c*|

*p*) is monotone in

*c*, so that

*H*(

*c*) is also monotone in

*c*. Hence the critical value

*c*=

*c*satisfying

_{α}*H*(

*c*) =

_{α}*α*can be obtained by the bisection method.

Given *p _{x}* and

*p*=

_{y}*p*+ Δ(Δ > 0), the probability of correct comparison, called the power, is calculated as

_{x}Suppose that arm *y* is accepted in the independent evaluation, and *ĉ* = * _{y}* −

*denotes the observed difference from the data. Then, one may want to see how significant the evidence is against*

_{x}*H*

_{0}. To this end, we may calculate a p-value by

##### Example 2

Suppose that arm *x* is a control and arm *y* is an experimental therapy, both with the same two-stage design (*a*_{1}/*n*_{1}*, a*/*n*) = (4/21, 10/45) as in Arm R+L of Example 1. With *α* = 0.1, we have *c _{α}* = 0.1520 and the Type I error is maximized at

*p*=

_{x}*p*= 0.2692. With Δ = 0.2, the power is 0.669 for (

_{y}*p*) = (0.15, 0.35), 0.649 for (

_{x}, p_{y}*p*) = (0.2, 0.4), and 0.639 for (

_{x}, p_{y}*p*) = (0.25, 0.45). With Δ = 0.25, the power is 0.809 for (

_{x}, p_{y}*p*) = (0.15, 0.4), 0.796 for (

_{x}, p_{y}*p*) = (0.2, 0.45), and 0.800 for (

_{x}, p_{y}*p*) = (0.25, 0.5). When we have (

_{x}, p_{y}*m*) = (2, 12) (

_{x}, s_{x}*= 0.295), we have p-value = 0.3064 if (*

_{x}*m*) = (2, 15) (

_{y}, s_{y}*= 0.342); p-value = 0.1123 if (*

_{y}*m*) = (2, 20) (

_{y}, s_{y}*= 0.445); and p-value = 0.0145 if (*

_{y}*m*) = (2, 25) (

_{y}, s_{y}*= 0.556).*

_{y}Note that the above comparison rule controls the type I error of selecting the experimental arm when both arms have an equal response rate. This rule may be considered too strict. A phase II trial is designed not to show the superiority of an experimental therapy compared to the control, but to screen out ineffective therapies. If an experimental therapy is shown to have no worse efficacy than the control, its superiority may be investigated in a phase III trial using a more definitive endpoint, such as overall survival. In this sense, one may want to loosen the control of type I error somewhat in the phase II design. Let *δ*(> 0) denote the maximum clinically insignificant difference in response rate, e.g. *δ* = 0.05. Suppose that we do not care about falsely accepting arm *y* as far as *p _{y}* is within

*δ*of

*p*, i.e.

_{x}*p*>

_{y}*p*−

_{x}*δ*. In this case, the hypotheses may be modified to

We choose a critical value *c* = *c _{α}* satisfying

Given *p _{x}* and

*p*=

_{y}*p*+ Δ, the power is calculated as

_{x}For an observed difference *ĉ* = * _{y}* −

*, the p-value is calculated as*

_{x}We will allow maximum clinically insignificant difference *δ* in the remainder of this paper if not stated otherwise.

##### Example 3

Consider Example 2 with *δ* = 0.05. With *δ* = 0.05 and *α* = 0.1, we have *c _{α}* = 0.0925 and the Type I error is maximized at (

*p*) = (0.3138, 0.2638). With Δ = 0.2, the power is 0.799 for (

_{x}, p_{y}*p*) = (0.15, 0.35), 0.820 for (

_{x}, p_{y}*p*) = (0.2, 0.4), and 0.827 for (

_{x}, p_{y}*p*) = (0.25, 0.45). When we have (

_{x}, p_{y}*m*) = (2, 12), we have p-value = 0.1640 if (

_{x}, s_{x}*m*) = (2, 15); p-value = 0.0529 if (

_{y}, s_{y}*m*) = (2, 20); and p-value = 0.0051 if (

_{y}, s_{y}*m*) = (2, 25).

_{y}, s_{y}##### 3.1.2 When Both Arms Are Experimental

Suppose now that there are two experimental therapies, *x* and *y*, under investigation. The primary objective is to evaluate each therapy compared to a historical control. As a secondary analysis, we want to compare the two experimental arms and choose one that will be investigated further in a phase III trial. Given the maximum clinically negligible difference *δ*, the hypotheses may be expressed as

In this case, the associated testing is two-sided. As in the one-sided case, we do not want to select an experimental arm if it is not accepted in the independent evaluation. That is, we want to select an experimental arm if it is accepted in the independent evaluation and the UMVUE is significantly larger than that of the other arm.

For a chosen critical value *c*, we select arm *x* if

is true, and arm *y* if

is true. Since the two designs are identical for each arm, (*a*_{1}/*n*_{1}*, a*/*n*), the error probabilities *P*(*A _{x}*|

*p*=

_{x}, p_{y}*p*+

_{x}*δ*) and

*P*(

*A*|

_{y}*p*=

_{x}, p_{y}*p*−

_{x}*δ*) are identical. Using this result, we obtain the critical value

*c*=

*c*so that the false selection probability under

_{α}*H*

_{0}does not exceed

*α*, i.e.

Note that probabilities *P*(*A _{x}*) and

*P*(

*A*) will be unequal if two arms have different designs. Cases with different designs will be discussed in the next section.

_{y}Given *p _{x}* and

*p*=

_{y}*p*+ Δ, the power is calculated as

_{x}Suppose that arm *y* is accepted in the independent evaluation and *ĉ* = * _{y}* −

*(> 0) denotes the observed difference in UMVUE from a randomized phase II trial. Then, proceeding as before, we calculate*

_{x}We select neither arm if both arms are rejected in the independent evaluation, and select both arms if both arms are accepted in the independent evaluation and |* _{y}* −

*| <*

_{x}*c*.

_{α}##### Example 4

Consider Example 3, but with both arms considered as experimental. With *δ*= 0.05 and *α* = 0.1, we have *c _{α}* = 0.1520 and the Type I error is maximized at (

*p*) = (0.2565, 0.3065), where the order is unimportant. With Δ = 0.2, the power is 0.669 for (

_{x}, p_{y}*p*) = (0.15, 0.35), 0.649 for (

_{x}, p_{y}*p*) = (0.20, 0.40), and 0.639 for (

_{x}, p_{y}*p*) = (0.25, 0.45). When we observe (

_{x}, p_{y}*m*) = (2, 12), we have p-value = 0.3280 if (

_{x}, s_{x}*m*) = (2, 15); p-value = 0.1058 if (

_{y}, s_{y}*m*) = (2, 20); and p-value = 0.0102 if (

_{y}, s_{y}*m*) = (2, 25).

_{y}, s_{y}If we choose *δ* = 0.1, then we have *c _{α}* = 0.0925 for

*α*= 0.1, and the Type I error is maximized at (

*p*) = (0.2460, 0.3460), where the order is impertinent. With Δ = 0.2, the power is 0.799 for (

_{x}, p_{y}*p*) = (0.15, 0.35), 0.820 for (

_{x}, p_{y}*p*) = (0.20, 0.40), and 0.827 for (

_{x}, p_{y}*p*) = (0.25, 0.45). When we observe (

_{x}, p_{y}*m*) = (2, 12), we have p-value = 0.1501 if (

_{x}, s_{x}*m*) = (2, 15), p-value = 0.0442 if (

_{y}, s_{y}*m*) = (2, 20), and p-value = 0.0032 if (

_{y}, s_{y}*m*) = (2, 25).

_{y}, s_{y}#### 3.2 When the Two Arms Have Different Two-Stage Designs

In a randomized phase II trial, we may want to use different designs for different arms. For example, we may want to have more patients in the control arm to allow more efficient estimation of parameters in patient subgroups to be used in designing a phase III trial. Or, we may want to use a less strict early stopping rule in the control arm. If we want to compare two experimental therapies evaluated by separate single-arm phase II trials, it is very likely that the two trials will have different designs. In this section, we consider selection problems when two arms have different 2-stage designs.

In Section 3.1, we considered phase II trials randomizing patients to two arms with exactly the same two-stage designs for independent evaluation. In this case, we did not want to select an arm that is rejected in the independent evaluation. However, when the two arms have different two-stage designs, the selection rules in this section are based only on the comparison of the estimators of the response response rates.

##### 3.2.1 When One Arm Is a Control

As before, let *x* be the control arm and *y* the experimental arm and, for a maximal clinically negligible difference *δ*, we want to test

We choose a critical value *c* = *c _{α}* satisfying

where
* _{k}*,

*(·, ·), and*

_{k}*f*(·, · | ·) are design-specific sample space, UMVUE, and probability mass function, respectively, for arm

_{k}*k*=

*x, y*.

The power for Δ and *p _{x}* (

*p*=

_{y}*p*+ Δ) defined as

_{x}

can be similarly calculated as the type I error. For an observed difference, *ĉ* = * _{y}* −

*the p-value is calculated as*

_{x}##### Example 5

Consider *δ* = 0.05 in Example 1. Then with *α* = 0.1, we have *c _{α}* = 0.0717 and the Type I error is maximized at (

*p*) = (0.2185, 0.1685). With Δ = 0.2, the power is 0.933 for (

_{x}, p_{y}*p*) = (0.25, 0.45), 0.926 for (

_{x}, p_{y}*p*) = (0.30, 0.50), and 0.922 for (

_{x}, p_{y}*p*) = (0.35, 0.55). Table 1 displays p-values for our exact method.

_{x}, p_{y}##### 3.2.2 When Both Arms Are Experimental

Suppose that both arms are experimental with different designs. For a maximal difference clinically negligible *δ*, we want to test

We choose a critical value *c* = *c _{α}* satisfying

Note that the two misselection errors in the left hand side of (3) are not the same if the two arms have different designs. We fail to select one arm against the other if |* _{x}* −

*| <*

_{y}*c*.

_{α}The power for Δ and *p _{x}* (

*p*=

_{y}*p*+ Δ),

_{x}For an observed difference, *ĉ* = |* _{x}* −

*|, the p-value is calculated as*

_{y}##### Example 6

Suppose that both arms in Example 1 are experimental. Then with *δ* = 0.05 and *α* = 0.1, we have *c _{α}* = 0.1174 and the Type I error is maximized at (

*p*) = (0.2775, 0.2275), where the order is impertinent. With Δ = 0.2, the power is 0.826 for (

_{x}, p_{y}*p*) = (0.25, 0.45), 0.831 for (

_{x}, p_{y}*p*) = (0.30, 0.50), and 0.838 for (

_{x}, p_{y}*p*) = (0.35, 0.55). Table 2 reports p-values for some chosen outcomes for our exact method.

_{x}, p_{y}#### 3.3 Extension to More than Two Arms

In this section we consider two situations, one where all arms are experimental and another where one of them is a control. Each arm is independently compared to a historical control through a two-stage design. For arm *k* = 0, 1, …, *K*, let
* _{k}*,

*(·, ·), and*

_{k}*f*(·, ·|·) denote the sample space, UMVUE, and probability mass function, respectively, which are specific to the design of each arm. If all arms have the same two-stage designs, we can drop the subscripts.

_{k}##### 3.3.1 When There Are One Control Arm and *K* Experimental Arms

Suppose that patients are randomized to a control (Arm 0) and *K* experimental arms (Arms 1, …, *K*). We want to identify experimental arms that are significantly efficacious compared to the control arm. When *K* ≥ 2, we have to control the familywise error rate (FWER) to adjust for the multiplicity of the testing. The marginal type I error control applied in the previous sections will increase the misselection probability. For a maximal difference clinically negligible *δ*, we want to test

against

Given a FWER level *α*, such as 0.1, we accept Arm *k*(= 1, … *K*) if * _{k}* −

_{0}, ≥

*c*where the critical value

_{α}*c*=

*c*satisfies

_{α}If more than one arm is accepted, we may conduct pairwise comparisons among accepted arms to identify a smaller number of arms for a phase III trial as described in 3.1.2 and 3.2.2 as a secondary analysis.

Under a specific alternative hypothesis,

with Δ* _{k}* > 0, the power is obtained as

##### 3.3.2 When All *K* Arms Are Experimental

Suppose that patients are randomized to *K* experimental arms. In this case, we want to test

against

We reject *H*_{0} if max_{1≤}_{k}_{≤}* _{K} _{k}* − min

_{1≤}

_{k}_{≤}

*,*

_{K}*≥*

_{k}*c*, where critical value

_{α}*c*=

*c*satisfies

_{α}Under a specific alternative hypothesis, *H _{a}: p*

_{1}, …,

*p*, the power is obtained as

_{K}## 4 Discussion

We propose between-arm comparison methods for a randomized phase II trial. If one wants to compare two therapies evaluated through two separate single-arm phase II trials, our methods can be used if the two trials are conducted with similar populations. Each arm to be compared may have a two-stage design for independent evaluation of the therapy, so that statistical procedures based on single-stage designs, such as two-sample t-test, may result in biased results. Our methods accurately compare two arms reflecting the design aspect and the small sample sizes. We have considered two-stage designs, but extension to multi-stage designs is straightforward. The between-arm comparison proposed in this paper is conducted when competing experimental therapies are independently evaluated compared to a historical control. Jung (2007) proposed design methods when patients are randomized between a prospective control and experimental therapies, and each experimental arm is compared with the control through multiple stages.

## References

- Beckhofer RE. A single-sample multiple decision procedure for ranking means of normal populations with known variances. Annals of Mathematical Statistics. 1954;25:16–39.
- Berger R, Boos DD. P values maximized over a confidence set for the nuisance parameter. J of American Statistical Association. 1994;89:1012–1016.
- Chang MN, Therneau TM, Wieand HS, Cha SS. Designs for group sequential phase II clinical trials. Biometrics. 1987;43:865–874. [PubMed]
- Jung SH. Randomized phase II trials with a prospective control. To appear in Statistics in Medicine 2007
- Jung SH, Carey M, Kim KM. Graphical search for two-stage designs for phase II clinical trials. Controlled Clinical Trials. 2001;22:367–372. [PubMed]
- Jung SH, Kim KM. On the estimation of the binomial probability in multistage clinical trials. Statistics in Medicine. 2004;23:881–896. [PubMed]
- Jung SH, Lee TY, Kim KM, George SL. Admissible two-stage designs for phase II cancer clinical trials. Statistics in Medicine. 2004;23:561–569. [PubMed]
- Liu PY, LeBlanc M, Desai M. False positive rates of randomized phase II designs. Controlled Clinical Trials. 1999;20:343–352. [PubMed]
- Palmer CR. A comparative phase II clinical trials procedure for choosing the best of three treatments. Statistics in Medicine. 1991;10:1327–1340. [PubMed]
- Sargent DJ, Goldberg RM. A flexible design for multiple armed screening trials. Statistics in Medicine. 2001;20:1051–1060. [PubMed]
- Simon R. Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials. 1989;10:1–10. [PubMed]
- Simon R, Wittes RE, Ellenberg SS. Randomized phase II clinical trials. Cancer Treatment Reports. 1985;69:1375–1381. [PubMed]
- Spiegelhalter DJ, Freedman LS, Blackburn PR. Monitoring clinical trials: Conditional or Predictive power? Controlled Clinical Trials. 1986;7:8–17. [PubMed]
- Steinberg SM, Venzon DJ. Early selection in a randomized phase II clinical trial. Statistics in Medicine. 2002;21:1711–1726. [PubMed]
- Thall PF, Simon R, Ellenberg SS. A two-stage design for choosing among several experimental treatments and a control in clinical trials. Biometrics. 1989;45:537–547. [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (840K)

- Selecting promising treatments in randomized Phase II cancer trials with an active control.[J Biopharm Stat. 2009]
*Cheung YK.**J Biopharm Stat. 2009; 19(3):494-508.* - Randomized phase II trials with a prospective control.[Stat Med. 2008]
*Jung SH.**Stat Med. 2008 Feb 20; 27(4):568-83.* - Screened selection design for randomised phase II oncology trials: an example in chronic lymphocytic leukaemia.[BMC Med Res Methodol. 2013]
*Yap C, Pettitt A, Billingham L.**BMC Med Res Methodol. 2013 Jul 3; 13:87. Epub 2013 Jul 3.* - Randomized phase II trials: a long-term investment with promising returns.[J Natl Cancer Inst. 2011]
*Sharma MR, Stadler WM, Ratain MJ.**J Natl Cancer Inst. 2011 Jul 20; 103(14):1093-100. Epub 2011 Jun 27.* - New designs for the selection of treatments to be tested in randomized clinical trials.[Stat Med. 1994]
*Simon R, Thall PF, Ellenberg SS.**Stat Med. 1994 Mar 15-Apr 15; 13(5-7):417-29.*

- Current Issues in Oncology Drug Development, with a Focus on Phase II Trials[Journal of biopharmaceutical statistics. 20...]
*Sargent DJ, Taylor JM.**Journal of biopharmaceutical statistics. 2009; 19(3)556-562* - Screened selection design for randomised phase II oncology trials: an example in chronic lymphocytic leukaemia[BMC Medical Research Methodology. ]
*Yap C, Pettitt A, Billingham L.**BMC Medical Research Methodology. 1387* - Reporting of analyses from randomized controlled trials with multiple arms: a systematic review[BMC Medicine. ]
*Baron G, Perrodeau E, Boutron I, Ravaud P.**BMC Medicine. 1184*

- PubMedPubMedPubMed citations for these articles

- Between-Arm Comparisons in Randomized Phase II TrialsBetween-Arm Comparisons in Randomized Phase II TrialsNIHPA Author Manuscripts. 2009; 19(3)456PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...