-
PDF
- Split View
-
Views
-
Cite
Cite
Allan I Pack, Ulysses J Magalang, Bhajan Singh, Samuel T Kuna, Brendan T Keenan, Greg Maislin, To RCT or not to RCT? Depends on the question. A response to McEvoy et al., Sleep, Volume 44, Issue 4, April 2021, zsab042, https://doi.org/10.1093/sleep/zsab042
- Share Icon Share
We are grateful to McEvoy et al. [1] for their response to what they describe as our “provocative commentary.” [2] The commentary was written to stimulate discussion that will advance our field. In their response, the authors interpreted our commentary as a general criticism of randomized controlled trials (RCTs). It is not. We agree that a properly conducted RCT within the appropriate target population can provide important and generalizable evidence. However, we are concerned about a reluctance to consider alternate study designs when the research question posed is unlikely to be answered with an RCT design. We share a view expressed by others (including by the current Deputy Director for Extramural Research of NIH) that RCTs have significant limitations—they have become too expensive, take too long, and there is too much bureaucracy [3].
In our commentary, we focus on the specific case of recent RCTs assessing the benefits of continuous positive airway pressure (CPAP) on reducing the risk of cardiovascular events, and ask whether these studies fully answered the questions they sought to investigate. For the reasons described, we conclude that these RCTs should not be treated as definitive, and provide alternative methodologies to more appropriately answer the proposed questions. In our view, rather than pit different methodologies against each other in a battle for the title of “best,” consideration of the pros and cons of different strategies in specific situations is warranted.
Understanding the question asked and answered by an RCT is essential. McEvoy et al. [1] appropriately emphasize that the specific RCTs being discussed were secondary prevention studies. As such, results should not be extrapolated to answer the question of whether the treatment of obstructive sleep apnea (OSA) with CPAP will improve primary prevention. This lack of generalizability is further exacerbated by the heterogeneous nature of OSA—from both a symptomatic [4–8] and physiological [9] perspective—a major point we sought to emphasize. With such a heterogeneous disorder, the exclusion of proportions of people with OSA based on arbitrary exclusion criteria or a person’s inability or unwillingness to be randomized reduces both the scope of the inference and the generalizability of results. It also runs the risk of excluding the very individuals who have the greatest increased cardiovascular risk from OSA.
We propose, as have others [10], that excluding the most excessively sleepy individuals is likely to be particularly problematic. McEvoy et al. [1] argue that the recent demonstration that the excessively sleepy subgroup in the Sleep Heart Health Study is at increased risk of cardiovascular events [8] may be a spurious association. This is, however, not the first study to indicate that excessive sleepiness could be a “symptomatic biomarker” for someone with OSA who is particularly at risk for cardiovascular consequences. Kapur et al. [11] showed that the association between OSA and hypertension is stronger in those with excessive sleepiness compared to those without. The association was not significant in the non-sleepy [11]. A population-based study showed that those with snoring and excessive daytime sleepiness had an increased risk for hypertension, while those with only one such complaint had no increased risk [12]. Gooneratne et al. [13] showed that elderly individuals with OSA who are excessively sleepy had increased mortality, but those with OSA without sleepiness did not. Xie et al. [14] demonstrated that the re-infarction rate in individuals who were found to have OSA following their initial myocardial infarction was greater in those with sleepiness compared to those without. There is also evidence that sleepiness per se in the absence of OSA is not a risk factor for cardiovascular events. Using data obtained from the Sleep Heart Health Study, we did not find increased cardiovascular events in those with an excessively sleepy phenotype who did not have OSA [8]. This is compatible with results from the other studies described above [11–13].
Thus, there are multiple pieces of evidence that individuals with OSA and sleepiness are likely to have increased cardiovascular risk. Importantly, contrary to the assertion by McEvoy et al. [1], we do not argue for a study only in this group. Instead, we argue that it is essential to include these individuals in any study attempting to answer the question of whether CPAP has benefits for secondary or primary prevention of cardiovascular events, and their exclusion from or unwillingness to participate in randomized trials should lead to more cautious interpretations of previous results.
McEvoy et al. [1] are correct that it is not clear, at least at this time, why some individuals with the same degree of sleep-disordered breathing have excessive sleepiness and others do not. This is the topic of ongoing research. The key question is why individuals can have marked sleep disruption as a result of OSA and yet not be sleepy. The same is true for response to sleep loss [15]. Some individuals are very sensitive to the effects of acute sleep deprivation, while others are resistant [15]. Response to sleep loss is a biological trait [15] and highly heritable [16], with some gene variants having been identified [17, 18]. Whether genetic differences explain why some OSA subjects are sleepy while others are not is unknown. It is conceivable that differences in the molecular response to sleep-disordered breathing explain not only the differential effect on sleepiness, but also on cardiovascular outcomes.
Beyond the concerns surrounding excessive sleepiness, we contend that inadequate adherence is an equally important issue that limited the ability of recent trials to answer their proposed questions. McEvoy et al. [1] recognize that CPAP adherence in recent RCTs is suboptimal and potentially driven by a less symptomatic study sample. Although the authors cite short-term clinical RCTs as evidence that this amount of CPAP adherence is typical clinically, it is much lower than the recently reported data for the first 90 days of use from over 2 million clinical patients prescribed CPAP world wide [19]. In this recent study, mean (±SD) CPAP use was 5.1 ± 2.5 hours over all nights and 6.0 ± 2.0 hours on nights that CPAP was used. (Currently, we do not have similar data for longer-term use, but we hope these data will become available.) In contrast, for the recent ISAACC study [20], in particular, CPAP adherence was only 2.8 ± 2.7 hours/night. Thus, available data suggest that adherence to CPAP in these published studies is not representative of what occurs clinically. Ultimately, as with the exclusion of sleepy patients, a lack of adequate adherence due to sample characteristics limits the scope and generalizability of the RCTs. The question, “Does treatment with CPAP have cardiovascular benefits?” is not being answered. Instead, the RCTs are providing evidence as to whether suboptimal adherence to CPAP within less symptomatic adults with OSA prevents secondary cardiovascular events. We can all agree that enhancing CPAP adherence in these studies is an important component of future studies, as recently recommended [21].
Fundamental to our argument is that we need long-term studies that include excessively sleepy individuals and have adequate CPAP adherence. The question is how best to accomplish these goals. We certainly agree with McEvoy et al. [1] that more sophisticated adaptive designs have the potential to increase the efficiency of randomized trials. However, adding adaptive elements is unlikely to overcome the specific ethical (e.g. patients with severe sleepiness cannot be randomized to no care) and feasibility (e.g. the unwillingness of symptomatic patients and/or their providers to be randomized to no treatment for a prolonged time) limitations of RCTs that have resulted in study samples that do not reflect real-world populations. As evidenced by the APPLES study [22] and noted in our commentary [2], more symptomatic patients remain less willing to participate in trials when they may be randomized to no therapy for long periods of time, regardless of whether or not they are explicitly excluded. While modifying eligibility criteria to enrich for specific subgroups with more promising results may reduce required sample sizes, it also affects the external validity of the study conclusions. Similarly, re-randomizing participants with poor CPAP adherence to alternative interventions may improve the efficacy of their OSA treatment regimen, but does not change the study population. While McEvoy et al. [1] state that adults with severe sleepiness or hypoxia represented <5% of individuals who otherwise screened positive in SAVE, information on exactly what proportion of patients seen in sleep clinics actually meet all inclusion criteria for recent RCTs is not presented. This is a crucial piece of information for understanding the generalizability of the results.
In our commentary [2], we proposed that a causal inference strategy based on propensity score (PS) designs within an observational setting can address the challenge of including excessive sleepy patients in long-term studies [2]. While they did not provide specifics, McEvoy et al. [1] criticized this approach as “fraught with difficulty.” However, these same methods were utilized in their own RCTs to answer the question of the benefit of adequate adherence to CPAP on cardiovascular endpoints. Essentially, we propose adopting this same approach as the primary study design, rather than a secondary analysis of RCTs with negative results. Currently, it seems that there is often an instinctive and negative view of such approaches, and a belief that only randomized trials can produce the evidence we need. However, there is evidence to refute this [23, 24].
Historically, this negative view of observational studies was driven in part by the experience with hormone replacement therapy (HRT) in women. More than one observational study (for reviews, see refs. [25, 26]) found a benefit of HRT for reduction in cardiovascular events, while subsequent randomized trials gave the opposite result [25, 26]. The results of the RCTs were published [27] in a “blaze of publicity,” [26] which had a broad impact on the perceived reliability of observational studies. Initially [26], it was assumed that the discrepancy was the result of healthy user bias [28–31]. However, this proved not to be the case [26]. Instead, the discrepancy was due to the fact that observational studies included younger women who chose to start HRT around the time of menopause, while women in the RCTs were, on average, more than 15 years older when they started HRT. After accounting for this difference, consistent evidence that HRT is beneficial for all-cause and cardiovascular disease-related mortality when administered to healthy, younger postmenopausal women emerged from observational studies, clinical trials and meta-analyses (see Table 1 in [25]). Thus, the apparent discordance of study findings was not due to the failure of observational studies or the supremacy of randomized designs. When both approaches investigated similar questions by studying the same target population, they led to the same conclusion [25]. This experience is consistent with a broader literature showing that observational designs and RCTs lead to the same overall conclusion in multiple areas [23, 24].
The fundamental difference between the RCTs and PS designed observational studies is that randomization results in expected balance for all observed and unobserved covariates, while PS designs ensure balance for all measured covariates and for unobserved covariates to the extent they are correlated with covariates included in the PS design. The expectation of balance in RCTs is reasonable, particularly if the sample size is large. However, even with large sample sizes, there can be residual imbalance that is likely to affect outcomes [32]. Often, the balance in observed covariates achieved in PS designs is at least as good as that found when comparing two arms in an RCT [33]. The challenge for propensity score designs is identifying all of the relevant covariates in advance. As indicated in our commentary [2], this requires specification of a very rich set of covariates. If this is done, there is less likelihood of any unrecognized confounding due to an unmeasured covariate that is not correlated, at least to some extent, with the measured covariates. Importantly, new techniques also allow one to assess how large of an effect an unrecognized confounder would need to have to negate the observed result [34, 35]. Put simply, if only very large effects can overturn the conclusion, the findings are robust to unmeasured confounding. The authors of this approach contend, and we agree, that this type of analysis should be presented in all observational designs. Unfortunately, while there are evolving guidelines for RCTs—the CONSORT criteria [36–39]—there are no equivalent guidelines for publication of studies using PS designs. This is an important area of opportunity.
Leveraging propensity score designs will allow researchers to more efficiently conduct efficacy studies that are both large-scale and well-powered for all endpoints. While McEvoy et al. [1] are correct that fewer individuals are required for studying secondary prevention when assuming high annual composite event rates, we note that our statement concerning required sample sizes was an example taken from Javaheri et al. [10] for studying less frequent events. Many assumptions and variations can result in differences in the numbers of events and subjects required, and not all assumptions can be definitively discerned from the provided description. However, the requirement that approximately 10,000 participants per arm are needed can be reproduced by assuming a 3-year enrollment and follow-up time. When follow-up time is extended (or assumed event rate increased), we agree that a smaller number of subjects is needed. For example, assuming enrollment over 3 years and maximum follow-up of 6 years, a 1.5% annual event rate in untreated subjects, and a 25% reduction in compliant PAP subjects, 3272 subjects per arm (total N = 6544) are required for 80% power at a two-sided α = 0.05 [40]. With additional adjustment for loss to follow up, planned propensity score trimming, or to maintain power for secondary endpoints, the required total sample size quickly increases. Regardless of the details, properly examining a primary composite outcome and secondary composite end points for cerebrovascular and coronary events will require a study larger than anything yet accomplished. Performing this study under a randomized design is likely to take many years to complete and carry a huge cost. Propensity score designs allow leveraging of clinical databases to answer the question with much less investment in time and treasure.
Nonrandomized designs should not be dismissed outright. As stated by Black many years ago [41], this should not be a general debate about whether RCTs or observational designs are better. Rather, the debate should focus on which study design is optimal for the question being posed. RCTs remain a powerful tool in the arsenal. However, occasions arise when the RCT design is not appropriate. We propose that studying OSA while enrolling excessively sleepy individuals and evaluating individuals with adequate CPAP adherence is one such area. The question is how to proceed in answering important scientific and clinical questions if RCTs are not feasible. Other fields have taken the view that carefully designed observational studies using propensity scores are a very rational approach in this circumstance. As a prime example, RCTs of medical devices against controls have become difficult to implement due to the wide availability of effective devices. Thus, the FDA Center for Devices and Radiological Health (CDRH) has actively sought development of new statistical methodology and guidelines [42], and outline when observational studies may lead to valid inference that can provide evidence for approval of medical devices [43].
While we appreciate the thoughtful response of McEvoy et al. [1], the field of sleep medicine continues to be challenged by important unanswered questions. How can we ensure individuals have adequate CPAP adherence? Should excessively sleepy subjects continue to be excluded from studies of the effect of CPAP on cardiovascular events, despite the importance of this clinical symptom? How can we include these patients in our studies given ethical and feasibility issues? Both we [2] and Javaheri et al. [10] have made different proposals on how to accomplish this, and we welcome other ideas. Rather than throwing RCTs out with the bathwater, we argue that there is room in the tub for both RCTs and observational designs [44].
Funding
The study was supported by a grant from the (National Heart, Lung, and Blood Institute (NHLBI) (grant number P01 HL094307).
Conflict of interest statement. A.I.P. is the John Miclot Professor of Medicine. Funds for this endowment are provided by the Philips Respironics Foundation. S.T.K. has received grant support from Philips Respironics. U.J.M. has received grant support from Hill-Rom and Philips Respironics.
References
Comments