We are grateful to McEvoy et al. [1] for their response to what they describe as our “provocative commentary.” [2] The commentary was written to stimulate discussion that will advance our field. In their response, the authors interpreted our commentary as a general criticism of randomized controlled trials (RCTs). It is not. We agree that a properly conducted RCT within the appropriate target population can provide important and generalizable evidence. However, we are concerned about a reluctance to consider alternate study designs when the research question posed is unlikely to be answered with an RCT design. We share a view expressed by others (including by the current Deputy Director for Extramural Research of NIH) that RCTs have significant limitations—they have become too expensive, take too long, and there is too much bureaucracy [3].

In our commentary, we focus on the specific case of recent RCTs assessing the benefits of continuous positive airway pressure (CPAP) on reducing the risk of cardiovascular events, and ask whether these studies fully answered the questions they sought to investigate. For the reasons described, we conclude that these RCTs should not be treated as definitive, and provide alternative methodologies to more appropriately answer the proposed questions. In our view, rather than pit different methodologies against each other in a battle for the title of “best,” consideration of the pros and cons of different strategies in specific situations is warranted.

Understanding the question asked and answered by an RCT is essential. McEvoy et al. [1] appropriately emphasize that the specific RCTs being discussed were secondary prevention studies. As such, results should not be extrapolated to answer the question of whether the treatment of obstructive sleep apnea (OSA) with CPAP will improve primary prevention. This lack of generalizability is further exacerbated by the heterogeneous nature of OSA—from both a symptomatic [4–8] and physiological [9] perspective—a major point we sought to emphasize. With such a heterogeneous disorder, the exclusion of proportions of people with OSA based on arbitrary exclusion criteria or a person’s inability or unwillingness to be randomized reduces both the scope of the inference and the generalizability of results. It also runs the risk of excluding the very individuals who have the greatest increased cardiovascular risk from OSA.

We propose, as have others [10], that excluding the most excessively sleepy individuals is likely to be particularly problematic. McEvoy et al. [1] argue that the recent demonstration that the excessively sleepy subgroup in the Sleep Heart Health Study is at increased risk of cardiovascular events [8] may be a spurious association. This is, however, not the first study to indicate that excessive sleepiness could be a “symptomatic biomarker” for someone with OSA who is particularly at risk for cardiovascular consequences. Kapur et al. [11] showed that the association between OSA and hypertension is stronger in those with excessive sleepiness compared to those without. The association was not significant in the non-sleepy [11]. A population-based study showed that those with snoring and excessive daytime sleepiness had an increased risk for hypertension, while those with only one such complaint had no increased risk [12]. Gooneratne et al. [13] showed that elderly individuals with OSA who are excessively sleepy had increased mortality, but those with OSA without sleepiness did not. Xie et al. [14] demonstrated that the re-infarction rate in individuals who were found to have OSA following their initial myocardial infarction was greater in those with sleepiness compared to those without. There is also evidence that sleepiness per se in the absence of OSA is not a risk factor for cardiovascular events. Using data obtained from the Sleep Heart Health Study, we did not find increased cardiovascular events in those with an excessively sleepy phenotype who did not have OSA [8]. This is compatible with results from the other studies described above [11–13].

Thus, there are multiple pieces of evidence that individuals with OSA and sleepiness are likely to have increased cardiovascular risk. Importantly, contrary to the assertion by McEvoy et al. [1], we do not argue for a study only in this group. Instead, we argue that it is essential to include these individuals in any study attempting to answer the question of whether CPAP has benefits for secondary or primary prevention of cardiovascular events, and their exclusion from or unwillingness to participate in randomized trials should lead to more cautious interpretations of previous results.

McEvoy et al. [1] are correct that it is not clear, at least at this time, why some individuals with the same degree of sleep-disordered breathing have excessive sleepiness and others do not. This is the topic of ongoing research. The key question is why individuals can have marked sleep disruption as a result of OSA and yet not be sleepy. The same is true for response to sleep loss [15]. Some individuals are very sensitive to the effects of acute sleep deprivation, while others are resistant [15]. Response to sleep loss is a biological trait [15] and highly heritable [16], with some gene variants having been identified [17, 18]. Whether genetic differences explain why some OSA subjects are sleepy while others are not is unknown. It is conceivable that differences in the molecular response to sleep-disordered breathing explain not only the differential effect on sleepiness, but also on cardiovascular outcomes.

Beyond the concerns surrounding excessive sleepiness, we contend that inadequate adherence is an equally important issue that limited the ability of recent trials to answer their proposed questions. McEvoy et al. [1] recognize that CPAP adherence in recent RCTs is suboptimal and potentially driven by a less symptomatic study sample. Although the authors cite short-term clinical RCTs as evidence that this amount of CPAP adherence is typical clinically, it is much lower than the recently reported data for the first 90 days of use from over 2 million clinical patients prescribed CPAP world wide [19]. In this recent study, mean (±SD) CPAP use was 5.1 ± 2.5 hours over all nights and 6.0 ± 2.0 hours on nights that CPAP was used. (Currently, we do not have similar data for longer-term use, but we hope these data will become available.) In contrast, for the recent ISAACC study [20], in particular, CPAP adherence was only 2.8 ± 2.7 hours/night. Thus, available data suggest that adherence to CPAP in these published studies is not representative of what occurs clinically. Ultimately, as with the exclusion of sleepy patients, a lack of adequate adherence due to sample characteristics limits the scope and generalizability of the RCTs. The question, “Does treatment with CPAP have cardiovascular benefits?” is not being answered. Instead, the RCTs are providing evidence as to whether suboptimal adherence to CPAP within less symptomatic adults with OSA prevents secondary cardiovascular events. We can all agree that enhancing CPAP adherence in these studies is an important component of future studies, as recently recommended [21].

Fundamental to our argument is that we need long-term studies that include excessively sleepy individuals and have adequate CPAP adherence. The question is how best to accomplish these goals. We certainly agree with McEvoy et al. [1] that more sophisticated adaptive designs have the potential to increase the efficiency of randomized trials. However, adding adaptive elements is unlikely to overcome the specific ethical (e.g. patients with severe sleepiness cannot be randomized to no care) and feasibility (e.g. the unwillingness of symptomatic patients and/or their providers to be randomized to no treatment for a prolonged time) limitations of RCTs that have resulted in study samples that do not reflect real-world populations. As evidenced by the APPLES study [22] and noted in our commentary [2], more symptomatic patients remain less willing to participate in trials when they may be randomized to no therapy for long periods of time, regardless of whether or not they are explicitly excluded. While modifying eligibility criteria to enrich for specific subgroups with more promising results may reduce required sample sizes, it also affects the external validity of the study conclusions. Similarly, re-randomizing participants with poor CPAP adherence to alternative interventions may improve the efficacy of their OSA treatment regimen, but does not change the study population. While McEvoy et al. [1] state that adults with severe sleepiness or hypoxia represented <5% of individuals who otherwise screened positive in SAVE, information on exactly what proportion of patients seen in sleep clinics actually meet all inclusion criteria for recent RCTs is not presented. This is a crucial piece of information for understanding the generalizability of the results.

In our commentary [2], we proposed that a causal inference strategy based on propensity score (PS) designs within an observational setting can address the challenge of including excessive sleepy patients in long-term studies [2]. While they did not provide specifics, McEvoy et al. [1] criticized this approach as “fraught with difficulty.” However, these same methods were utilized in their own RCTs to answer the question of the benefit of adequate adherence to CPAP on cardiovascular endpoints. Essentially, we propose adopting this same approach as the primary study design, rather than a secondary analysis of RCTs with negative results. Currently, it seems that there is often an instinctive and negative view of such approaches, and a belief that only randomized trials can produce the evidence we need. However, there is evidence to refute this [23, 24].

Historically, this negative view of observational studies was driven in part by the experience with hormone replacement therapy (HRT) in women. More than one observational study (for reviews, see refs. [25, 26]) found a benefit of HRT for reduction in cardiovascular events, while subsequent randomized trials gave the opposite result [25, 26]. The results of the RCTs were published [27] in a “blaze of publicity,” [26] which had a broad impact on the perceived reliability of observational studies. Initially [26], it was assumed that the discrepancy was the result of healthy user bias [28–31]. However, this proved not to be the case [26]. Instead, the discrepancy was due to the fact that observational studies included younger women who chose to start HRT around the time of menopause, while women in the RCTs were, on average, more than 15 years older when they started HRT. After accounting for this difference, consistent evidence that HRT is beneficial for all-cause and cardiovascular disease-related mortality when administered to healthy, younger postmenopausal women emerged from observational studies, clinical trials and meta-analyses (see Table 1 in [25]). Thus, the apparent discordance of study findings was not due to the failure of observational studies or the supremacy of randomized designs. When both approaches investigated similar questions by studying the same target population, they led to the same conclusion [25]. This experience is consistent with a broader literature showing that observational designs and RCTs lead to the same overall conclusion in multiple areas [23, 24].

The fundamental difference between the RCTs and PS designed observational studies is that randomization results in expected balance for all observed and unobserved covariates, while PS designs ensure balance for all measured covariates and for unobserved covariates to the extent they are correlated with covariates included in the PS design. The expectation of balance in RCTs is reasonable, particularly if the sample size is large. However, even with large sample sizes, there can be residual imbalance that is likely to affect outcomes [32]. Often, the balance in observed covariates achieved in PS designs is at least as good as that found when comparing two arms in an RCT [33]. The challenge for propensity score designs is identifying all of the relevant covariates in advance. As indicated in our commentary [2], this requires specification of a very rich set of covariates. If this is done, there is less likelihood of any unrecognized confounding due to an unmeasured covariate that is not correlated, at least to some extent, with the measured covariates. Importantly, new techniques also allow one to assess how large of an effect an unrecognized confounder would need to have to negate the observed result [34, 35]. Put simply, if only very large effects can overturn the conclusion, the findings are robust to unmeasured confounding. The authors of this approach contend, and we agree, that this type of analysis should be presented in all observational designs. Unfortunately, while there are evolving guidelines for RCTs—the CONSORT criteria [36–39]—there are no equivalent guidelines for publication of studies using PS designs. This is an important area of opportunity.

Leveraging propensity score designs will allow researchers to more efficiently conduct efficacy studies that are both large-scale and well-powered for all endpoints. While McEvoy et al. [1] are correct that fewer individuals are required for studying secondary prevention when assuming high annual composite event rates, we note that our statement concerning required sample sizes was an example taken from Javaheri et al. [10] for studying less frequent events. Many assumptions and variations can result in differences in the numbers of events and subjects required, and not all assumptions can be definitively discerned from the provided description. However, the requirement that approximately 10,000 participants per arm are needed can be reproduced by assuming a 3-year enrollment and follow-up time. When follow-up time is extended (or assumed event rate increased), we agree that a smaller number of subjects is needed. For example, assuming enrollment over 3 years and maximum follow-up of 6 years, a 1.5% annual event rate in untreated subjects, and a 25% reduction in compliant PAP subjects, 3272 subjects per arm (total N = 6544) are required for 80% power at a two-sided α = 0.05 [40]. With additional adjustment for loss to follow up, planned propensity score trimming, or to maintain power for secondary endpoints, the required total sample size quickly increases. Regardless of the details, properly examining a primary composite outcome and secondary composite end points for cerebrovascular and coronary events will require a study larger than anything yet accomplished. Performing this study under a randomized design is likely to take many years to complete and carry a huge cost. Propensity score designs allow leveraging of clinical databases to answer the question with much less investment in time and treasure.

Nonrandomized designs should not be dismissed outright. As stated by Black many years ago [41], this should not be a general debate about whether RCTs or observational designs are better. Rather, the debate should focus on which study design is optimal for the question being posed. RCTs remain a powerful tool in the arsenal. However, occasions arise when the RCT design is not appropriate. We propose that studying OSA while enrolling excessively sleepy individuals and evaluating individuals with adequate CPAP adherence is one such area. The question is how to proceed in answering important scientific and clinical questions if RCTs are not feasible. Other fields have taken the view that carefully designed observational studies using propensity scores are a very rational approach in this circumstance. As a prime example, RCTs of medical devices against controls have become difficult to implement due to the wide availability of effective devices. Thus, the FDA Center for Devices and Radiological Health (CDRH) has actively sought development of new statistical methodology and guidelines [42], and outline when observational studies may lead to valid inference that can provide evidence for approval of medical devices [43].

While we appreciate the thoughtful response of McEvoy et al. [1], the field of sleep medicine continues to be challenged by important unanswered questions. How can we ensure individuals have adequate CPAP adherence? Should excessively sleepy subjects continue to be excluded from studies of the effect of CPAP on cardiovascular events, despite the importance of this clinical symptom? How can we include these patients in our studies given ethical and feasibility issues? Both we [2] and Javaheri et al. [10] have made different proposals on how to accomplish this, and we welcome other ideas. Rather than throwing RCTs out with the bathwater, we argue that there is room in the tub for both RCTs and observational designs [44].

Funding

The study was supported by a grant from the (National Heart, Lung, and Blood Institute (NHLBI) (grant number P01 HL094307).

Conflict of interest statement. A.I.P. is the John Miclot Professor of Medicine. Funds for this endowment are provided by the Philips Respironics Foundation. S.T.K. has received grant support from Philips Respironics. U.J.M. has received grant support from Hill-Rom and Philips Respironics.

References

1.

McEvoy
RD
, et al.
Response to Pack et al. Randomized clinical trials of cardiovascular disease in obstructive sleep apnea; understanding and overcoming bias
.
Sleep.
2021
;
44
(
4
). doi: 10.1093/sleep/zsab019

2.

Pack
AI
, et al.
Randomized clinical trials of cardiovascular disease in obstructive sleep apnea: understanding and overcoming bias
.
Sleep.
2020
;
44
(
2
). doi:10.1093/sleep/zsaa229

3.

Lauer
MS
, et al.
Efficient design of clinical trials and epidemiological research: is it possible?
Nat Rev Cardiol.
2017
;
14
(
8
):
493
501
.

4.

Ye
L
, et al.
The different clinical faces of obstructive sleep apnoea: a cluster analysis
.
Eur Respir J.
2014
;
44
(
6
):
1600
1607
.

5.

Keenan
BT
, et al.
Recognizable clinical subtypes of obstructive sleep apnea across international sleep centers: a cluster analysis
.
Sleep.
2018
;
41
(
3
). doi:10.1093/sleep/zsx214

6.

Kim
J
, et al.
Symptom-based subgroups of Koreans with obstructive sleep apnea
.
J Clin Sleep Med.
2018
;
14
(
3
):
437
443
.

7.

Pien
GW
, et al. .
Changing faces of obstructive sleep apnea: treatment effects by cluster designation in the Icelandic Sleep Apnea Cohort
.
Sleep
.
2018
;
41
(
3
). doi:10.1093/sleep/zsx201

8.

Mazzotti
DR
, et al.
Symptom subtypes of obstructive sleep apnea predict incidence of cardiovascular outcomes
.
Am J Respir Crit Care Med.
2019
;
200
(
4
):
493
506
.

9.

Zinchuk
AV
, et al.
Polysomnographic phenotypes and their cardiovascular implications in obstructive sleep apnoea
.
Thorax.
2018
;
73
(
5
):
472
480
.

10.

Javaheri
S
, et al.
CPAP treatment and cardiovascular prevention: we need to change the design and implementation of our trials
.
Chest.
2019
;
156
(
3
):
431
437
.

11.

Kapur
VK
, et al. ;
Sleep Heart Health Study Group
.
Sleep disordered breathing and hypertension: does self-reported sleepiness modify the association?
Sleep.
2008
;
31
(
8
):
1127
1132
.

12.

Lindberg
E
, et al.
Snoring and daytime sleepiness as risk factors for hypertension and diabetes in women–a population-based study
.
Respir Med.
2007
;
101
(
6
):
1283
1290
.

13.

Gooneratne
NS
, et al.
Sleep disordered breathing with excessive daytime sleepiness is a risk factor for mortality in older adults
.
Sleep.
2011
;
34
(
4
):
435
442
.

14.

Xie
J
, et al. .
Excessive daytime sleepiness independently predicts increased cardiovascular risk after myocardial infarction
.
J Am Heart Assoc
.
2018
;
7
(
2
):
e007221
.

15.

Van Dongen
HP
, et al.
Systematic interindividual differences in neurobehavioral impairment from sleep loss: evidence of trait-like differential vulnerability
.
Sleep.
2004
;
27
(
3
):
423
433
.

16.

Kuna
ST
, et al.
Heritability of performance deficit accumulation during acute sleep deprivation in twins
.
Sleep.
2012
;
35
(
9
):
1223
1233
.

17.

He
Y
, et al.
The transcriptional repressor DEC2 regulates sleep length in mammals
.
Science.
2009
;
325
(
5942
):
866
870
.

18.

Pellegrino
R
, et al.
A novel BHLHE41 variant is associated with short sleep and resistance to sleep deprivation in humans
.
Sleep.
2014
;
37
(
8
):
1327
1336
.

19.

Cistulli
PA
, et al.
Short-term CPAP adherence in obstructive sleep apnea: a big data analysis using real world data
.
Sleep Med.
2019
;
59
:
114
116
.

20.

Sanchez-de-la-Torre
M
, et al. .
Effect of obstructive sleep apnoea and its treatment with continuous positive airway pressure on the prevalence of cardiovascular events in patients with acute coronary syndrome (ISAACC study): a randomised controlled trial
.
Lancet Respir Med.
2020
;
8
(
4
):
359
367
.

21.

Sawyer
AM
, et al.
Where to next for optimizing adherence in large-scale trials of continuous positive airway pressure?
Sleep Med Clin.
2021
;
16
(
1
):
125
144
.

22.

Kushida
CA
, et al.
Effects of continuous positive airway pressure on neurocognitive function in obstructive sleep apnea patients: the Apnea Positive Pressure Long-term Efficacy Study (APPLES)
.
Sleep.
2012
;
35
(
12
):
1593
1602
.

23.

Benson
K
, et al.
A comparison of observational studies and randomized, controlled trials
.
N Engl J Med.
2000
;
342
(
25
):
1878
1886
.

24.

Concato
J
, et al.
Randomized, controlled trials, observational studies, and the hierarchy of research designs
.
N Engl J Med.
2000
;
342
(
25
):
1887
1892
.

25.

Lobo
RA
, et al.
Back to the future: hormone replacement therapy as part of a prevention strategy for women at the onset of menopause
.
Atherosclerosis.
2016
;
254
:
282
290
.

26.

Lobo
RA
.
Hormone-replacement therapy: current thinking
.
Nat Rev Endocrinol.
2017
;
13
(
4
):
220
231
.

27.

National Institutes of Health. NHLBI stops trial of estrogen plus progestin due to increased breast cancer risk and lack of overall benefit
.
South Med J.
2002
;
95
(
8
):
795
797
.

28.

Shrank
WH
, et al.
Healthy user and related biases in observational studies of preventive interventions: a primer for physicians
.
J Gen Intern Med.
2011
;
26
(
5
):
546
550
.

29.

Silverman
SL
, et al.
Healthy users, healthy adherers, and healthy behaviors?
J Bone Miner Res.
2011
;
26
(
4
):
681
682
.

30.

Kinjo
M
, et al.
Potential contribution of lifestyle and socioeconomic factors to healthy user bias in antihypertensives and lipid-lowering drugs
.
Open Heart.
2017
;
4
(
1
):
e000417
.

31.

Eurich
DT
, et al.
Development and validation of an index score to adjust for healthy user bias in observational studies
.
J Popul Ther Clin Pharmacol.
2017
;
24
(
3
):
e79
e89
.

32.

Krauss
A
.
Why all randomised controlled trials produce biased results
.
Ann Med.
2018
;
50
(
4
):
312
322
.

33.

Rubin
DB
.
The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials
.
Stat Med.
2007
;
26
(
1
):
20
36
.

34.

Vanderweele
TJ
, et al.
Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders
.
Epidemiology.
2011
;
22
(
1
):
42
52
.

35.

VanderWeele
TJ
, et al.
Sensitivity analysis in observational research: introducing the E-value
.
Ann Intern Med.
2017
;
167
(
4
):
268
274
.

36.

Moher
D
.
CONSORT: an evolving tool to help improve the quality of reports of randomized controlled trials. Consolidated Standards of Reporting Trials
.
JAMA.
1998
;
279
(
18
):
1489
1491
.

37.

Rennie
D
.
CONSORT revised–improving the reporting of randomized trials
.
JAMA.
2001
;
285
(
15
):
2006
2007
.

38.

Moher
D
, et al.
CONSORT 2010 explanation and elaboration: updated guidelines for reporting parallel group randomised trials
.
BMJ.
2010
;
340
:
c869
.

39.

Juszczak
E
, et al.
Reporting of multi-arm parallel-group randomized trials: extension of the CONSORT 2010 statement
.
JAMA.
2019
;
321
(
16
):
1610
1620
.

40.

NCSS LLC.
PASS 16 Power Analysis and Sample Size Software
.
Kaysville
,
UT: NCSS LLC; 2018
.

41.

Black
N
.
Why we need observational studies to evaluate the effectiveness of health care
.
BMJ.
1996
;
312
(
7040
):
1215
1218
.

42.

FDA Center for Devices and Radiological Health
.
Design Considerations for Pivotal Clinical Investigations for Medical Devices: Guidance for Industry, Clinical Investigators
,
Institutional Review Boards and Food and Drug Administration Staff
.
2013
.

43.

Yue
LQ
.
Statistical and regulatory issues with the application of propensity score analysis to nonrandomized medical device clinical studies
.
J Biopharm Stat.
2007
;
17
(
1
):
1
13
; discussion 15.

44.

Faraoni
D
, et al.
Randomized controlled trials vs. observational studies: why not just live together?
BMC Anesthesiol.
2016
;
16
(
1
):
102
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.