-
PDF
- Split View
-
Views
-
Cite
Cite
Fabio Barili, Nick Freemantle, Thierry Folliguet, Claudio Muneretto, Michele De Bonis, Martin Czerny, Jean Francois Obadia, Nawwar Al-Attar, Nikolaos Bonaros, Jolanda Kluin, Roberto Lorusso, Prakash Punjabi, Rafael Sadaba, Piotr Suwalski, Umberto Benedetto, Andreas Böning, Volkmar Falk, Miguel Sousa-Uva, Pieter A. Kappetein, Lorenzo Menicanti, The flaws in the detail of an observational study on transcatheter aortic valve implantation versus surgical aortic valve replacement in intermediate-risks patients, European Journal of Cardio-Thoracic Surgery, Volume 51, Issue 6, June 2017, Pages 1031–1035, https://doi.org/10.1093/ejcts/ezx058
- Share Icon Share
Abstract
The PARTNER group recently published a comparison between the latest generation SAPIEN 3 transcatheter aortic valve implantation (TAVI) system (Edwards Lifesciences, Irvine, CA, USA) and surgical aortic valve replacement (SAVR) in intermediate-risk patients, apparently demonstrating superiority of the TAVI and suggesting that TAVI might be the preferred treatment method in this risk class of patients. Nonetheless, assessment of the non-randomized methodology used in this comparison reveals challenges that should be addressed in order to elucidate the validity of the results. The study by Thourani and colleagues showed several major methodological concerns: suboptimal methods in propensity score analysis with evident misspecification of the propensity scores (PS; no adjustment for the most significantly different covariates: left ventricular ejection fraction, moderate–severe mitral regurgitation and associated procedures); use of PS quintiles rather than matching; inference on not-adjusted Kaplan–Meier curves, although the authors correctly claimed for the need of balancing score adjusting for confounding factors in order to have unbiased estimates of the treatment effect; evidence of poor fit; lack of data on valve-related death.
These methodological flaws invalidate direct comparison between treatments and cannot support authors’ conclusions that TAVI with SAPIEN 3 in intermediate-risk patients is superior to surgery and might be the preferred treatment alternative to surgery.
GENERAL CONSIDERATION
The development and availability of a transcatheter approach for treating severe aortic valve stenosis [transcatheter aortic valve implantation (TAVI)] has warranted clinical trials and observational studies to evaluate the safety and short-/long-term outcomes of newly designed prostheses in order to compare them with surgical aortic valve replacement (SAVR), the gold standard treatment [1, 2]. The new treatment has been initially reserved for patients with absolute contraindications to surgery. Subsequently, the evidence of safety of the new devices, as well as the maturation of experience with this technology, has led to the expansion of indications to higher risk patients [3, 4]. Nonetheless, technology runs fast, and new prostheses are regularly launched on the market claiming better performances and wider indications and hence requiring new evidence [5]. The PARTNER group recently published a comparison between the latest generation SAPIEN 3 TAVI system (Edwards Lifesciences, Irvine, CA, USA) and SAVR in intermediate-risk patients, apparently demonstrating superiority of the TAVI and suggesting that TAVI might be the preferred treatment method in this risk class of patients [6]. These favourable results of transcatheter approach in intermediate risk-patients could lead the decision-makers and the scientific community to consider TAVI as the new standard of care in a wider population of patients with severe aortic stenosis. The recent Food and Drug Administration (FDA) approval for expanded indications for SAPIEN 3 device based on their data somewhat support this position (http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm517281.htm?source=govdelivery&utm_ medium=email&utm_source=govdelivery).
Nonetheless, assessment of the non-randomized methodology used in this comparison reveals challenges that should be addressed in order to elucidate the validity of the results. The study is observational, employing propensity scores (PS), risk scores that can be used to match patients with a similar likelihood of receiving treatment, since non-random differences in baseline will lead to bias in comparisons between treatment conditions [7–9]. PS analysis can be used to create a ‘quasi-randomized’ comparison, but the approach has well-known intrinsic limitations and pitfalls including the misspecification of the PS, effects of unknown biases and confounding by indication [9–13]. Hence, unlike properly randomized trials, the use of the PS does not assure the internal validity of the analyses, and decision-makers and the scientific communities need to be wary of making inference from their results [11]. The PS study by Thourani et al. has a number of major design flaws, and its results have clear signs of bias [6].
THE ASSUMPTION OF ‘IGNORABILITY’ AND THE EFFECTS OF PROPENSITY SCORE MISSPECIFICATION
The first important step in PS analysis is the careful specification of the risk algorithm, as omission of important confounding factors (e.g. getting it wrong) will lead to biased estimation of treatment effect. The objective is that as a result of the PS conditioning of the relevant explanatory variables, the treatment will be independent of potential outcomes. This conditional independence assumption is called ‘ignorability’, ‘unconfoundedness’, ‘selection on observables’ and crucially it is always held as an assumption, because it is not directly testable [14]. In order to assume that treatment assignment is ‘otherwise ignorable’ [9–15], the very first step is the inclusion in the PS algorithm of all known and available confounding factors, as explanatory variables that meet the condition of affecting both treatment assignment and outcome confound the observed relationship between treatment and outcome [9–15]. The PS is compromised when important variables influencing selection have not been collected or considered and misspecification of the PS by excluding known confounders has been demonstrated to lead to largely biased results [10].
The study by Thourani et al. was designed to compare the outcomes of an observational study on the latest generation SAPIEN 3 TAVI System (Edwards Lifesciences, Irvine, CA, USA) with results of the surgical group of the PARTNER 2A trial [5, 6, 16]. The 2 groups were not homogeneous, as shown in baseline characteristics and Thourani et al. planned PS stratification before analysing outcomes [6]. The use of PS stratification rather than precise matching is surprising, as it is by design limited in the extent to which systematic differences between the comparator groups may be accounted for. Indeed, there were important differences between the comparator samples. The comparative analysis of patients’ baseline characteristics and baseline variables included in the PS algorithm showed that the most significantly different characteristics between the 2 groups (left ventricular ejection fraction, P-value <0.0001; society of thoracic surgeons (STS) score, P-value 0.0002; moderate or severe mitral regurgitation, P-value <0.0001) were omitted in the PS generation, together with other significant factors (frail condition and mean gradient). STS score has been developed to estimate early mortality, and it was demonstrated to be also a predictor of long-term mortality [17–20]. Several studies and meta-analyses demonstrated that both left ventricular ejection fraction and moderate/severe mitral regurgitation affect early and late outcomes, also in patients who undergo TAVI [21–24]. These factors, affecting both treatment assignment and outcomes, are hence major confounders that should be included in the PS. Their omission may violate the ‘ignorability’ assumption and, consequently, may lead to selection bias.
Moreover, further potential confounders not collected in the study are associated procedures, such as myocardial revascularization. These increase the risk of perioperative mortality and morbidity as widely demonstrated by STS score and EuroSCORE [17–27], and they could represent important confounders to be included in the PS algorithm. Nonetheless, although patients with non-complex coronary disease requiring revascularization were included whether a treatment plan for the coronary disease was agreed before enrolment [5, 6, 16], no information on associated myocardial revascularization in the TAVI group has been reported [6, 16]. Some information on the SAVR group can be derived from the published PARTNER 2A trial, where a total of 86 of 944 patients (9.1%) had concomitant procedures during surgery and 137 of 944 patients (14.5%) underwent associated coronary artery bypass grafting [5]. Thus, a proportion ranging between 14.5% and 23.6% had concomitant surgical procedures in the SAVR group of the PARTNER 2A trial, indicating an increased risk of mortality and morbidity and potentially a major confounder. The need for a deeper analysis on associated procedures in the Thourani et al.’s study is also strengthened by the significantly different proportion of myocardial revascularization in the PARTNER 2A trial (137 of 994, 14.5% in the SAVR; 39 of 994, 3.9% in the TAVI group; χ2P-value <0.0001) [5].
In summary, these differences in baseline characteristics between study groups reflect a different clinical selection of patients that can influence outcomes and should be balanced to avoid biased estimation of treatment effect.
CONFOUNDING BY INDICATION AND ASSESSING THE PERFORMANCE OF THE PROPENSITY SCORE
Confounding by indication is the situation where, although all known confounders have been balanced, allocation to treatment is not otherwise ignorable but instead subject to some latent (unrecognized or unmeasured) process associated with those who are treated. This confounding cannot be measured directly but only tangentially through its effects and hence the effort should be focused on performance analysis of PS [11].
The first useful precaution against unsafe inference from an observational study is to compare it with a known treatment effect and bridge from that point to consider further questions. A deeper step in diagnostic should be the evaluation of PS performance through testing the potential heterogeneity of the treatment effect across the range of the PS. A comparison between 2 well-balanced groups should lead to a homogeneous treatment effect across the range of the PS, while heterogeneous effects will raise concern.

Treatment effect of TAVR versus surgery on all-cause mortality and stroke in PARTNER 2A randomized trial and PARTNER 2A SAPIEN 3 observational study.

Treatment effect of TAVR versus surgery on composite outcome (death, stroke and moderate or severe aortic regurgitation at 1 year) across the quintiles of propensity score in the PARTNER 2A SAPIEN 3 observational study.
TO ADJUST OR NOT TO ADJUST, THIS IS ANOTHER QUESTION
The concerns also increase in the second part of the study, the time-to-event analyses. The study is based on the evidence that groups are different and biased estimated of treatment effects need to be accounted for by balancing the covariates with PS methods [6]. Nonetheless, after employing PS stratification for comparing dichotomic outcomes, the authors surprisingly did not undertake any type of adjustment in time-to-event analysis and presented simple unadjusted Kaplan–Meier estimates and curves, making inference on their results [6]. This is counter-intuitive and the curves are not interpretable, as they are simply a first-step evaluation before adjustment. Stating in results ‘important differences between TAVR and surgery for each end-point are observed in the first several months’ is inappropriate until results are confirmed by adjusted results. Making inference on unadjusted outcomes derived from biased groups should be avoided [9, 13].
IS THERE AN OUTCOME MISSING?
In the PARTNER 2 SAPIEN 3 observation study, clinical outcomes were reported as defined by Valve Academic Research Consortium 2 definitions [6, 28]. The Valve Academic Research Consortium 2 definitions recommend capturing the cause of death with a careful review and, among mortality causes to be reported, all valve-related deaths are included. Valve-related mortality and morbidity represent the main outcomes to evaluate the safety and short-/long-term follow-up after valvular treatment, as it is the most specific index of early–late performance. In a comparison, between 2 treatment options for valvular disease considering two homogeneous groups, we might reasonably expect to observe a similar non-cardiovascular and cardiac non-valve–related mortality, while the treatment effect would be expressed in differences in valve-related mortality [29]. Nonetheless, in the PARTNER 2 SAPIEN 3 observation study, only all-cause mortality, non-cardiac and cardiac death were reported, with no information on valve-related mortality shown. Therefore, as it is not possible to differentiate prostheses-related events from prostheses-unrelated deaths, such as these caused by non-embolic myocardial infarction, defined as cardiac but non-valve-related death [28, 29].
CONCLUSIONS
As shown, the study on the comparison between SAPIEN 3 TAVR and surgical AVR [6] has demonstrated several major methodological concerns:
suboptimal methods in PS analysis with evident misspecification of the PS (no adjustment for the most significantly different covariates: left ventricular ejection fraction, moderate–severe mitral regurgitation and associated procedures);
use of PS quintiles rather than matching;
inference on not-adjusted Kaplan–Meier curves, although the authors correctly claimed for the need of balancing score for adjusting for confounding factors in order to have unbiased estimates of the treatment effect;
evidence of poor fit; and
lack of data on valve-related death.
These methodological flaws invalidate direct comparison between treatments and cannot support authors’ conclusions that TAVI with SAPIEN 3 in intermediate-risk patients is superior to surgery and might be the preferred treatment alternative to surgery.
Conflict of interest: F.B. reports personal fees from St Jude Medical, outside the submitted work. N.B. reports research grant from Edwards Lifescience and speaker’s honoraria from Edwards Lifescience, Medtronic and Abbott (outside the submitted work). R.L. is the principal investigator of the PERSIST-AVR Trial. P.S. is a consultant for Atricure and Medtronic (outside the submitted work).
REFERENCES
Author notes
†Presented at the Postgraduate Course of the 30th Annual Meeting of the European Association for Cardio-Thoracic Surgery, Barcelona, Spain, 2 October 2016.
‡The first two authors contributed equally to this study.