1 Introduction

I would like to congratulate Professor Stallard on an important contribution to the literature on adaptive enrichment trials (Stallard, 2022). This discussion focuses on how the paper addresses barriers that have prevented wider use of adaptive enrichment designs in practice. I finish by discussing remaining barriers that could be overcome with additional future methodology work.

2 Enrichment and Adaptive Enrichment

Enrichment of a randomized controlled trial (RCT) refers to the concept of restricting recruitment to patients who may benefit from the intervention arm treatment. All clinical trials with inclusion and exclusion criteria are enriched to some extent. When there is discussion of “enriched” trials, it is usually in a more specific context, that is a trial that restricts enrolment based on a biomarker that is thought to be associated with the intervention's likely effect. An enriched trial has advantages when this assumption is correct: (1) It avoids recruiting patients who will not benefit from an intervention while experiencing the negative side-effects; (2) it allows a more powerful trial, as the treatment effect estimate is not diluted by patients who receive no benefit; (3) if the treatment is expensive compared to measuring the biomarker, it would result in a less expensive trial.

On the other hand, when the assumption behind the enrichment is wrong, there are important drawbacks. Some patients who would have benefited from the intervention will miss out. Perhaps more cold-heartedly, for new pharmaceutical interventions the company will have unnecessarily reduced its potential market share.

Adaptive enrichment designs are a conceptually appealing way of opening the trial to the benefits of enrichment while minimizing the risks. The trial begins by recruiting a broader range of patients. At an interim analysis, patient outcome and biomarker data are used to decide whether to continue broader recruitment, or to narrow recruitment to a biomarker-defined subgroup that shows more promise of benefit.

Despite the appeal, there is little evidence of the adaptive enrichment design being widely used. A search on pubmed (“clinical trial”[Publication Type]) AND “adaptive enrichment” [Title/Abstract] shows, at the time of writing, five results. A search of www.clinicaltrials.gov for “adaptive enrichment” yields three results. This is backed up by relatively recent reviews that used a more systematic approach, such as Mistry et al. (2017) and Bothwell et al. (2018).

3 Contribution of the Paper

As pointed out in the paper, most methods for adaptive enrichment trials assume there is a prespecified biomarker that divides patients into two categories: “biomarker-negative” and “biomarker-positive.” Further, the treatment effect is assumed to be equal or greater in the biomarker-positive group than the biomarker-negative group. This simplifies the set of possible decisions at the interim analysis to continuing the broader recruitment or narrowing it to biomarker-positive patients.

A lower number of papers have considered adaptive enrichment with a continuous biomarker. Simon and Simon (2013) provided a major contribution to the area. Their method considers use of continuous biomarker measurements in an adaptive enrichment trial. At interim analyses, patient outcome data are used to select and update a cutpoint of the biomarker that is used as an enrolment criterion. Subsequent patients are only enrolled if their biomarker measurements meet the new cutpoint.

The method controls the Type I error rate of a null hypothesis of the form:

(1)

where x represents the level of the biomarker, formula the location parameter of the outcome distribution when a patient with biomarker level x is treated with arm j. The intervention and control arms are labeled by T and C, respectively. Type I error rate control is achieved through defining a test statistic that uses all patients in the trial.

The setup of the method proposed by Stallard (2022) is similar but has important differences. It is assumed that there is a separate null hypothesis for each possible cutpoint in the biomarker, with Λ representing the set of possible cutpoints. For a particular cutpoint, formula, the corresponding null hypothesis, formula, is

(2)

where formula represents the treatment effect of intervention versus control for patients with biomarker level above λ. If one wanted to link the hypotheses in (1) and (2), formula could be expressed as formula, where formula represents the probability density function of the biomarker in the population of patients who are otherwise eligible for the trial.

The main other difference is that Stallard's test statistic at the end only uses data from stage 1 and stage 2 patients who have biomarker level exceeding that cutpoint. This has several consequences.

First, it complicates the derivation of critical values that control the Type I error rate, which motivates the main contribution of the paper. An impressive set of derivations of critical values is given that provide Type I error rate control despite the selection at stage 1. This is done for six possible selection rules of the cutpoint at stage 1 and utilizes the combination-testing framework of Bauer and Köhne (1994) to control the family-wise error rate across the family of all null hypotheses formula.

Second, Stallard's method offers potential for improved power compared to the Simon and Simon approach, due to it not including stage 1 participants who have biomarker levels below the cutpoint. The gain in power is considerable. For example, in table 3, selection rule 1 increases the power of a trial from 76.4% to 84.2%, which would otherwise require a >20% increase in sample size.

Third, the null hypotheses being tested (at least, in my opinion) is more interpretable, which is an advantage for this rather complex trial design. Rejecting formula provides evidence that there is a treatment effect for patients with formula, whereas rejecting H  0 provides evidence of existence of a group for which there is a treatment effect. This difficulty in interpreting rejection of H  0 is discussed in Simon and Simon (2013); I feel the approach proposed in Stallard (2022) is one way to overcome this difficulty.

4 Monotonicity of Biomarkers

A potential drawback of the proposed method is what I will refer to as a “monotonicity” assumption. In the paper, this is phrased as “We assume that higher values of formula are associated with larger treatment effects,” although more precisely this is a nondecreasing assumption (as the null scenario permits no association between biomarker and treatment effect).

The sensitivity of the method to this assumption depends on two factors: (1) whether it is often violated in practice; (2) what the effect of violating it is on the properties of the method.

Related to the first factor, I suspect that there are settings where continuous biomarkers might not follow the monotonicity assumption. One might envision an intervention treatment having a bigger relative effect as a continuous measure of disease severity increases (as there is a bigger improvement to be had) but for the effect to decline at some point for very severe cases. This is an example of a so-called “U shaped” association, which Calabrese and Baldwin (2001) show that are not uncommon in biology.

The second factor is not clear. I do not actually see where the monotonicity assumption is used in the methodology of the paper. I suspect if deviations from monotonicity do cause Type I error rate inflation, then it might be related to the statement “a p-value that controls the type I error at the nominal level obtained from the distribution function of formula under formula, which can be calculated under the point null formula.” Perhaps deviations from monotonicity cause some cases where this point null is not the highest Type I error rate possible.

Simon and Simon's method does not require the monotonicty assumption for Type I error rate control, so may be preferable in some settings if Stallard's method is sensitive to it.

5 Remaining Barriers to use of Adaptive Enrichment

The proposed method contributes to improving the power of adaptive enrichment and ensuring the conclusion of the trial is clear. I believe this will help encourage further use of adaptive enrichment designs. Nevertheless, there are remaining barriers.

The main one is that many clinical areas lack compelling predictive biomarkers that would lead to adaptive enrichment designs adding value. Generally, to specify reliable predictive biomarkers, there must be good information from RCTs already. With increasing amounts of biological data available on patients, considering how one could develop a predictive biomarker signature during a trial and use it to enrich the trial would be beneficial. It would avoid the need to prespecify a predictive biomarker.

Another barrier, which is common to adaptive designs in general, is that a good outcome to base the adaptation on is required to be observed relatively quickly (Wason et al., 2019) compared to the recruitment length of the trial. The case study in this paper is a nice theoretical example but an adaptive enrichment design would likely not have been suitable in reality. This is because, with mortality as the outcome, the event times take too long to observe: There are few before a year of follow-up. Even with an outcome observed more quickly, the possibility of recruiting patients in stage 1 whose data will not be used at all (because their outcome was not observed before the interim, and then their biomarker level is lower than the selected threshold) has some ethical issues.

6 Conclusion

Adaptive enrichment designs are conceptually very appealing and deserve to be used more often. The proposed method provides advantages that may encourage further use, although some additional investigation of assumptions and extensions would be useful.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

Bauer
,
P.
and
Köhne
,
K.
(
1994
)
Evaluation of experiments with adaptive interim analyses
.
Biometrics
,
50
(
4
),
1029
1041
.

Bothwell
,
L.E.
,
Avorn
,
J.
,
Khan
,
N.F.
and
Kesselheim
,
A.S.
(
2018
)
Adaptive design clinical trials: a review of the literature and ClinicalTrials.gov
.
BMJ Open
,
8
(
2
),
187
200
.

Calabrese
,
E.J.
and
Baldwin
,
L.A.
(
2001
)
U-shaped dose-responses in biology, toxicology, and public health
.
Annual Review of Public Health
,
22
(
1
),
15
33
.

Mistry
,
P.
,
Dunn
,
J.A.
and
Marshall
,
A.
(
2017
)
A literature review of applied adaptive design methodology within the field of oncology in randomised controlled trials and a proposed extension to the CONSORT guidelines
.
BMJ Open
,
17
(
1
),
108
.

Simon
,
N.
and
Simon
,
R.
(
2013
)
Adaptive enrichment designs for clinical trials
.
Biostatistics
,
14
(
4
),
613
625
.

Stallard
,
N.
(
2022
)
Adaptive enrichment designs with a continuous biomarker
.
Biometrics
, Accepted. https://doi.org/10.1111/biom.13644

Wason
,
J.M.S.
,
Brocklehurst
,
P.
and
Yap
,
C.
(
2019
)
When to keep it simple—adaptive designs are not always useful
.
BMC Medicine
,
17
,
152
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)