- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- J Clin Oncol
- PMC3434989

# Randomized Phase II Trial Designs With Biomarkers

## Abstract

Efficient development of targeted therapies that may only benefit a fraction of patients requires clinical trial designs that use biomarkers to identify sensitive subpopulations. Various randomized phase III trial designs have been proposed for definitive evaluation of new targeted treatments and their associated biomarkers (eg, enrichment designs and biomarker-stratified designs). Before proceeding to phase III, randomized phase II trials are often used to decide whether the new therapy warrants phase III testing. In the presence of a putative biomarker, the phase II trial should also provide information as to what type of biomarker phase III trial is appropriate. A randomized phase II biomarker trial design is proposed, which, after completion, recommends the type of phase III trial to be used for the definitive testing of the therapy and the biomarker. The recommendations include the possibility of proceeding to a randomized phase III of the new therapy with or without using the biomarker and also the possibility of not testing the new therapy further. Evaluations of the proposed trial design using simulations and published data demonstrate that it works well in providing recommendations for phase III trial design.

## INTRODUCTION

The primary purpose of phase II trials is to screen whether new therapies are worthy of definitive testing in randomized phase III trials. In the development of a targeted therapy, there is often a biomarker that can potentially identify patients for whom the therapy will work. In this case, it is desirable that the phase II trial address not only whether the therapy should move forward to phase III definitive evaluation, but also whether this definitive evaluation should include the biomarker. There are various ways one can incorporate a biomarker into a phase III trial design.^{1} The phase III design that provides the most information about the biomarker and the therapy is the biomarker-stratified design, in which the treatment assignment is randomized between the new and standard treatments for all patients, with a separate assessment of the treatment effect in the biomarker-positive and biomarker-negative subgroups. Because the trial needs to provide a sufficiently accurate estimate of the effect of the therapy in the biomarker-positive subgroup (and possibly the biomarker-negative subgroup), the required sample size will be larger than that for a standard phase III trial, which does not have a biomarker component. When the preliminary data suggest strongly that the therapy will not work in the biomarker-negative subgroup, an enrichment phase III trial that randomly assigns only biomarker-positive patients and assesses the effect of the therapy in this subgroup is appropriate. Although an enrichment design has a smaller number of patients randomly assigned as compared with the biomarker-stratified design or standard phase III trial design (with no biomarker), it will need to screen for biomarker status approximately the same number of patients randomly assigned in a biomarker-stratified trial.

The purpose of this article is to propose a specific randomized phase II trial design that can be used to guide decision making for further development of an experimental therapy. In particular, the proposed approach is developed to optimize making one of four possible decisions after analysis of the phase II trial results (Table 1): one, perform a randomized phase III trial with a biomarker-enrichment design; two, perform a randomized phase III trial with a biomarker-stratified design; three, perform a randomized phase III trial without using the biomarker; and four, drop consideration of the new therapy. We evaluate the proposed design via simulation under different sets of hypothesized treatment effects for the biomarker-positive and biomarker-negative subgroups. We also evaluate the proposed designs by retrospectively assessing how they would have performed if they had been applied to some recently published clinical situations with completed randomized trials that considered treatment effects in biomarker-defined subgroups. We end with a discussion of some extensions and modifications of the proposed design.

## PROPOSED TRIAL DESIGN

Randomized phase II trials typically use an intermediate end point like progression-free survival (PFS) to obtain the results earlier than overall survival (OS) and to be able to target larger treatment effects than one would target for OS. The proposed phase II design uses this end point. We assume that at the time the study is designed, there is some preliminary rationale suggesting that the biomarker-positive patients are likely to derive the most benefit from the new therapy, but benefit for the biomarker-negative patients cannot be ruled out. Patients are randomly assigned to the experimental and control treatments, and the biomarker status of each patient is recorded. The idea is to use the observed treatment effects (hazard ratios) in the biomarker-positive and -negative subgroups, as well as overall, to guide decision making.

With a specified sample size and after sufficient follow-up, a decision is made concerning further development of the new treatment and biomarker as described in Figure 1. In step 1, the null hypothesis that PFS is the same for both treatment arms in the biomarker-positive subgroup is tested (at the one-sided .10 significance level). If this test in step 1 does not reject (ie, we have not demonstrated that the experimental treatment is better than the control in the biomarker-positive subgroup), then in step 2A, the null hypothesis that PFS is the same for both treatment arms for all randomly assigned patients is tested (at the one-sided .05 significance level). If this test in step 2A does not reject, the design recommends no further testing of the new therapy. If this test in step 2A does reject (ie, the new treatment is better than the control in the whole population), then the conclusion is that the new treatment is potentially active, but the biomarker is not useful, and the design recommends dropping the biomarker and performing a standard randomized phase III trial.

_{0}, null hypothesis in the overall group; H

_{0(+)}, null hypothesis in the biomarker-positive subgroup; HR, hazard ratio;

**...**

If the test rejects the null hypothesis in step 1 (ie, the new treatment is better than the control in the biomarker-positive subgroup), then the recommendation (step 2B) is based on a two-sided 80% CI for the hazard ratio (control over new treatment hazard) in the biomarker-negative subgroup. If the whole CI is below 1.3, then there is strong evidence that the new treatment is, at best, only marginally helpful for the biomarker-negative patients, and a biomarker-enrichment phase III trial is recommended. If the whole CI is above 1.5, then there is evidence that the treatment works sufficiently well in the biomarker-negative patients (and therefore the biomarker is not useful), and the recommendation is to drop the biomarker and perform a standard randomized phase III trial. If neither of these conditions holds, then a phase III biomarker-stratified design is recommended, because there is insufficient information on how well the treatment works in the biomarker-negative subgroup.

### Sample Size Considerations

In addition to the decision rule at analysis time, as delineated in Figure 1, part of trial design is the specification of sample size to have sufficient power to make appropriate decisions. Because the underlying assumption is that the benefit of the new therapy is greatest among biomarker-positive patients, sample size considerations are driven by the biomarker-positive subgroup. Consider a clinical setting where the median PFS with standard treatment is relatively low (eg, < 6 months), and we hope that the experimental therapy will double the median PFS in the biomarker-positive subgroup, (ie, a hazard ratio of 2 in this subgroup). Detecting this effect with 90% power at the one-sided .10 significance level would require 56 PFS events (progressions or deaths) in the biomarker-positive subgroup. We would also want to have approximately at least this many events in the biomarker-negative subgroup to help with the decision making. There are various combinations of accrual and follow-up that will yield a specified number of events for a given median PFS (Appendix, online only).

### Simulations

We perform simulations using the design for accrual and follow-up described in the Appendix (online only) and Figure 1 for decision making. We consider the eight scenarios for the true median PFS (and hazard ratios) in the biomarker-positive and biomarker-negative subgroups, and four possible prevalences for biomarker positivity (Table 2). For the different biomarker-positivity prevalences, the average simulated sample sizes in the biomarker-positive and -negative subgroups are 70 and 140 (20% prevalence), 70 and 133 (33% prevalence), 75 and 75 (50% prevalence), and 133 and 65 (67% prevalence), respectively.

When the treatment does not work in either the biomarker-positive or biomarker-negative subgroups (scenario 1), we drop consideration of using the new therapy 87% to 88% of the time (depending on the prevalence of positivity), which is the correct decision. When the experimental agent works only in the biomarker-positive subgroup (hazard ratio, 2; scenario 2), the best recommendation would be an enrichment design, with an acceptable recommendation being to perform a biomarker-stratified design (in which the utility of the biomarker would become apparent in that larger trial). For this scenario, the probability of either of these recommendations is ≥ 89%.

For scenarios 3 and 4, the experimental therapy works (hazard ratios, 1.5 and 1.75, respectively), but the biomarker is not useful. Under these scenarios, the best recommendation would be a phase III trial design with no biomarker, with an acceptable recommendation being to perform a biomarker-stratified design. For scenario 3, the probability of making one of these recommendations is ≥ 75%, and for scenario 4, the probability is ≥ 93%. Note that under scenario 3 (with hazard ratio of 1.5), the probability of no further testing is as high as 22%. This is a consequence of targeting a 2.0 hazard ratio in the biomarker-positive subgroup and the limited sample size of a phase II trial, as described in Discussion.

For scenarios 5 and 6, the experimental therapy works in the biomarker-positive subgroup (hazard ratio, 2.0), but not as well in the biomarker-negative subgroup (hazard ratios, 1.5 and 1.25, respectively). For scenario 5, the best recommendation would be a biomarker-stratified design or a phase III trial with no biomarker; the probability of making one of these recommendations is ≥ 93%. For scenario 6, the best recommendation would be an enrichment design, with an acceptable recommendation being to perform a biomarker-stratified design; the probability of making one of these recommendations is ≥ 89%.

Scenario 7 represents a situation in which the experimental therapy works in the biomarker-positive subgroup (but with a hazard ratio of 1.75 and not 2.0), and is slightly harmful in the biomarker-negative subgroup (hazard ratio, 0.75). The best recommendation would be an enrichment design, with an acceptable recommendation being a biomarker-stratified design. (Presumably, the biomarker-stratified design would include the possibility of stopping early for futility/inefficacy in the biomarker-negative subgroup.^{1}) The probability of making one of these recommendations is 78% to 93%. There is a fairly high probability (approximately 20%) of abandoning the experimental treatment when the positivity prevalence is ≤ 50%, similar to scenario 3.

Finally, when the biomarker is prognostic but the treatment does not work in either biomarker subgroup (scenario 8), then the recommendation is for no further testing of the new therapy 87% to 89% of the time.

## RETROSPECTIVE EVALUATION OF PUBLISHED DATA

To illustrate application of the proposed design, we use summary data from previously published randomized phase II and III trials that evaluated biomarker-subgroup treatment effects. Table 3 contains the summary statistics for the biomarker-subgroup treatment effects along with the hypothetic phase III that would have been recommended using our proposed approach. The evaluation was performed by simulating 50,000 phase II trial data sets, treating the observed summary statistics in Table 3 as true values, and then using the sample sizes and follow-up as described in the Appendix (online only) and the decision-making algorithm as described in Figure 1.

In trials 1 to 4, the hazard ratio in the biomarker-positive subgroup is ≥ 2 (in contrast to trials 5 and 6), corresponding to the treatment effect exceeding the targeted effect for the design. For these four settings, application of the proposed trial design recommended no future testing ≤ 2% of the time. The probabilities of the other three recommendations depend on the hazard ratio in the biomarker-negative subgroup in a logical way (eg, recommending an enrichment design almost always when the hazard ratio is 0.35 in trial 2).

Trial 6 illustrates a situation in which there is a clear biomarker-by-treatment interaction (ie, the hazard ratios are different for the biomarker subgroups), but with only a modest benefit of the new therapy in the biomarker-positive patients and no benefit in the biomarker-negative patients. The correct decision here is arguably an enrichment design or to have no further testing, depending on whether one considers a PFS hazard ratio of 1.47 clinically meaningful. However, the proposed design recommends a biomarker-stratified design 37% of the time, reflecting the limitations of a relatively small sample size. A similar limitation of a small sample size is seen in the probabilities of the recommendations in trial 7, where because the new treatment is potentially not useful, the recommendation for no further testing should ideally be higher than 45% in this setting.

It is interesting to note that for the two phase II studies in Table 3 (studies 1 and 5), the most likely recommendation from the proposed design was an enrichment trial. In fact, study 5 was followed by an enrichment phase III trial in Met-positive patients (NCT01456325), and study 1 was followed by an enrichment phase II trial in *KRAS*-mutated patients (NCT01395758).

## DISCUSSION

We have targeted a doubling of median PFS (hazard ratio of 2) in the biomarker-positive subgroup, a value that is larger than one would typically use for a screening randomized phase II trial in an unselected population. However, for a targeted agent that only benefits a fraction of the population, a hazard ratio of 2 in the targeted population corresponds to a smaller hazard ratio in the unselected population. Moreover, when median PFS values are low, recent experience with targeted therapies demonstrates the feasibility of achieving such effects.^{2,10–15} Therefore, we believe that targeting a hazard ratio of 2 in a biomarker-positive subgroup is appropriate when the median PFS is relatively short (eg, < 6 months). It is possible to target a smaller PFS hazard ratio than 2 in the biomarker-positive subgroup (eg, in settings with median PFS > 6 months), but this would require a larger phase II sample size. One can use the computer program (available at http://brb.nci.nih.gov/Data/FreidlinB/RP2BM) to modify the proposed design to target a smaller hazard ratio by adjusting the sample size and CI cutoffs in step 2B of Figure 1. Another computer program allows use of response rates instead of PFS as the phase II trial end point (Appendix, online only).

In practice, randomized evaluation of a new therapy is implemented either by adding the new therapy (A) to the standard of care (B) using the so-called add-on design that randomly assigns patients to either A + B or B (eg, trials 1 and 3 to 7 in Table 3), or by a head-to-head comparison of the new therapy against the standard of care, in which case patients are randomly assigned to either therapy A or therapy B (eg, trial 2 in Table 3). In settings where the standard of care has proven clinical benefit, the design should incorporate an aggressive interim inefficacy/futility look, especially in biomarker-negative patients.^{16} For example, we recommend the following rule in each biomarker subgroup: if after half of the required events are observed, the estimate of the hazard ratio (control over experimental) is ≤ 1, then accrual to the subgroup stops (if the biomarker-positive subgroup is stopped for futility, then the entire study is stopped).^{1} Another practical concern is that the biomarker status may not be available for a fraction of patients. In theory, when this fraction is relatively low, investigators may consider randomly assigning these patients and including them in the overall comparison in step 2A.

The recommended phase II trial design with biomarkers can be incorporated into a phase II/III design strategy.^{17} This strategy allows one to use phase II patients in the phase III evaluation and streamlines transition from phase II to III components of drug development. After the patients have been accrued on the phase II trial and have been observed for the required number of phase II events in the biomarker-positive subgroup, the decision is made as to whether to proceed to phase III evaluation and, if so, what phase III trial design to recommend. If an enrichment phase III design is recommended, accrual begins again but only in the biomarker-positive subgroup. If a biomarker-stratified or a phase III design without the biomarker is recommended, the trial continues accruing all patients. Note that in the biomarker setting, because of the data-driven phase III trial design selection, the phase II/III design strategy needs to adjust for inflation in the nominal phase III type I error.

We have assumed that a randomized phase II trial will be performed before embarking on a phase III trial. In some situations, early nonrandomized data in a targeted subgroup will be so dramatic as to lead directly to enrichment phase III trials restricted to the targeted subgroup. For example, a dramatic response rate for crizotonib in early studies in patients with ALK-positive non–small-cell lung cancer led to additional phase II and III trials restricted to ALK-positive patients.^{18,19} However, there is the possibility that without sufficient testing of biomarker-negative patients, further testing of the therapy may be unnecessarily restricted. For example, there was a 26% to 35% response rate seen for 23 patients with ALK-negative non–small-cell lung cancer,^{19} leading to a question as to whether the phase III trials should have been restricted to ALK-positive patients. In this case, results of a randomized phase II trial using the proposed design could have helped to inform the choice of the phase III trial design.

## Appendix

As noted in the text, there are various combinations of accrual and follow-up that will yield a specified number of events for a given median progression-free survival (PFS). For our simulations, we specified sample sizes that are 25% larger than the target number of events. For example, for a target of 56 events in the biomarker-positive subgroup, we would accrue 70 patients in this subgroup and have follow-up until the 56th event, at which time the analysis would be performed. We also assumed that the accrual is sufficiently fast to have the 56th event occur after 70 patients are enrolled.

One complication is that depending on the prevalence of biomarker positivity, one can accrue 70 patients in one biomarker-subgroup much sooner than in the other. To accommodate this, we allow some overaccrual (up to 140 patients in each subgroup) before stopping accrual in that subgroup. In particular, the suggested accrual scheme is as follows. Accrue until there are 70 biomarker-positive patients or 140 biomarker-negative patients, whichever comes first. If the 70 biomarker-positive patients are accrued first, then keep accrual open if there are fewer than 70 biomarker-negative patients accrued, until either 70 biomarker-negative patients are accrued or 140 biomarker-positive patients are accrued (at which time accrual stops). If the 140 biomarker-negative patients are accrued first, then stop accrual of biomarker-negative patients and keep accruing biomarker-positive patients, until there are 70 biomarker-positive patients accrued. In all situations, the analysis is performed when events are observed in 80% of the enrolled biomarker-positive patients. This rule ensures that there are always at least 70 patients enrolled and at least 56 events observed in the biomarker-positive subgroup, with the overall trial sample size between 140 and 210 patients.

A computer program available at (http://brb.nci.nih.gov/Data/FreidlinB/RP2BM) allows for evaluation of the design under user's choice of: (1) alpha error levels for steps 1 and 2A; (2) cutoffs and CI level for hazard ratio in step 2B; (3) biomarker prevalence; (4) median PFS in each arm in each biomarker subgroup; and (5) accrual parameters (eg, minimum sample size for the biomarker-positive subgroup; program documentation provides description of other accrual parameters). For defaults, the program uses the values given in Figure 1. The program is based on simulating multiple replications of hypothetic trials. In each replication, PFS outcome is generated using an exponential distribution, patient biomarker status is generated as a binary random variable with the underlying prevalence, and patient entry times are generated using a uniform distribution.

We have used PFS as the end point for our phase II proposal, but it is also possible to use response rates. The corresponding phase II design would target an improvement in response rates in the biomarker-positive subgroup (new therapy compared with standard therapy; eg, an absolute 30% improvement), and then would form a CI for the response rate difference in the biomarker-negative subgroup in step 2B of Figure 1 for the decision making. For example, if the whole CI is below 10%, an enrichment phase III trial is recommended; if the whole CI is above 20%, a randomized phase III trial without using the biomarker is recommended; and if neither of these conditions hold, a phase III biomarker-stratified design is recommended. A computer program is available to construct tables like Table 1 for a response rate end point for any sample size and CI cutoff in step 2B (http://brb.nci.nih.gov/Data/FreidlinB/RP2BM_R).

## Footnotes

Authors' disclosures of potential conflicts of interest and author contributions are found at the end of this article.

## AUTHORS' DISCLOSURES OF POTENTIAL CONFLICTS OF INTEREST

The author(s) indicated no potential conflicts of interest.

## AUTHOR CONTRIBUTIONS

**Manuscript writing:** All authors

**Final approval of manuscript:** All authors

## REFERENCES

**American Society of Clinical Oncology**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (101K)

- Run-in phase III trial design with pharmacodynamics predictive biomarkers.[J Natl Cancer Inst. 2013]
*Hong F, Simon R.**J Natl Cancer Inst. 2013 Nov 6; 105(21):1628-33. Epub 2013 Oct 4.* - Design issues in randomized phase II/III trials.[J Clin Oncol. 2012]
*Korn EL, Freidlin B, Abrams JS, Halabi S.**J Clin Oncol. 2012 Feb 20; 30(6):667-71. Epub 2012 Jan 23.* - Randomized phase II trials: a long-term investment with promising returns.[J Natl Cancer Inst. 2011]
*Sharma MR, Stadler WM, Ratain MJ.**J Natl Cancer Inst. 2011 Jul 20; 103(14):1093-100. Epub 2011 Jun 27.* - Design issues of randomized phase II trials and a proposal for phase II screening trials.[J Clin Oncol. 2005]
*Rubinstein LV, Korn EL, Freidlin B, Hunsberger S, Ivy SP, Smith MA.**J Clin Oncol. 2005 Oct 1; 23(28):7199-206.* - Integrated phase II/III clinical trials in oncology: a case study.[Clin Trials. 2012]
*Wang M, Dignam JJ, Zhang QE, DeGroot JF, Mehta MP, Hunsberger S.**Clin Trials. 2012 Dec; 9(6):741-7. Epub 2012 Nov 22.*

- OMICS-based personalized oncology: if it is worth doing, it is worth doing well![BMC Medicine. ]
*Hayes DF.**BMC Medicine. 11221* - Drug-Diagnostics Co-Development in Oncology[Frontiers in Oncology. ]
*Simon R.**Frontiers in Oncology. 3315* - Criteria for the use of omics-based predictors in clinical trials: explanation and elaboration[BMC Medicine. ]
*McShane LM, Cavenagh MM, Lively TG, Eberhard DA, Bigbee WL, Williams PM, Mesirov JP, Polley MY, Kim KY, Tricoli JV, Taylor JM, Shuman DJ, Simon RM, Doroshow JH, Conley BA.**BMC Medicine. 11220* - Phase III Clinical Trials That Integrate Treatment and Biomarker Evaluation[Journal of Clinical Oncology. 2013]
*Freidlin B, Sun Z, Gray R, Korn EL.**Journal of Clinical Oncology. 2013 Sep 1; 31(25)3158-3161*

- PubMedPubMedPubMed citations for these articles

- Randomized Phase II Trial Designs With BiomarkersRandomized Phase II Trial Designs With BiomarkersJournal of Clinical Oncology. Sep 10, 2012; 30(26)3304PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...