Abstract

The PARTNER group recently published a comparison between the latest generation SAPIEN 3 transcatheter aortic valve implantation (TAVI) system (Edwards Lifesciences, Irvine, CA, USA) and surgical aortic valve replacement (SAVR) in intermediate-risk patients, apparently demonstrating superiority of the TAVI and suggesting that TAVI might be the preferred treatment method in this risk class of patients. Nonetheless, assessment of the non-randomized methodology used in this comparison reveals challenges that should be addressed in order to elucidate the validity of the results. The study by Thourani and colleagues showed several major methodological concerns: suboptimal methods in propensity score analysis with evident misspecification of the propensity scores (PS; no adjustment for the most significantly different covariates: left ventricular ejection fraction, moderate–severe mitral regurgitation and associated procedures); use of PS quintiles rather than matching; inference on not-adjusted Kaplan–Meier curves, although the authors correctly claimed for the need of balancing score adjusting for confounding factors in order to have unbiased estimates of the treatment effect; evidence of poor fit; lack of data on valve-related death.

These methodological flaws invalidate direct comparison between treatments and cannot support authors’ conclusions that TAVI with SAPIEN 3 in intermediate-risk patients is superior to surgery and might be the preferred treatment alternative to surgery.

GENERAL CONSIDERATION

The development and availability of a transcatheter approach for treating severe aortic valve stenosis [transcatheter aortic valve implantation (TAVI)] has warranted clinical trials and observational studies to evaluate the safety and short-/long-term outcomes of newly designed prostheses in order to compare them with surgical aortic valve replacement (SAVR), the gold standard treatment [1, 2]. The new treatment has been initially reserved for patients with absolute contraindications to surgery. Subsequently, the evidence of safety of the new devices, as well as the maturation of experience with this technology, has led to the expansion of indications to higher risk patients [3, 4]. Nonetheless, technology runs fast, and new prostheses are regularly launched on the market claiming better performances and wider indications and hence requiring new evidence [5]. The PARTNER group recently published a comparison between the latest generation SAPIEN 3 TAVI system (Edwards Lifesciences, Irvine, CA, USA) and SAVR in intermediate-risk patients, apparently demonstrating superiority of the TAVI and suggesting that TAVI might be the preferred treatment method in this risk class of patients [6]. These favourable results of transcatheter approach in intermediate risk-patients could lead the decision-makers and the scientific community to consider TAVI as the new standard of care in a wider population of patients with severe aortic stenosis. The recent Food and Drug Administration (FDA) approval for expanded indications for SAPIEN 3 device based on their data somewhat support this position (http://www.fda.gov/NewsEvents/Newsroom/PressAnnouncements/ucm517281.htm?source=govdelivery&utm_ medium=email&utm_source=govdelivery).

Nonetheless, assessment of the non-randomized methodology used in this comparison reveals challenges that should be addressed in order to elucidate the validity of the results. The study is observational, employing propensity scores (PS), risk scores that can be used to match patients with a similar likelihood of receiving treatment, since non-random differences in baseline will lead to bias in comparisons between treatment conditions [7–9]. PS analysis can be used to create a ‘quasi-randomized’ comparison, but the approach has well-known intrinsic limitations and pitfalls including the misspecification of the PS, effects of unknown biases and confounding by indication [9–13]. Hence, unlike properly randomized trials, the use of the PS does not assure the internal validity of the analyses, and decision-makers and the scientific communities need to be wary of making inference from their results [11]. The PS study by Thourani et al. has a number of major design flaws, and its results have clear signs of bias [6].

THE ASSUMPTION OF ‘IGNORABILITY’ AND THE EFFECTS OF PROPENSITY SCORE MISSPECIFICATION

The first important step in PS analysis is the careful specification of the risk algorithm, as omission of important confounding factors (e.g. getting it wrong) will lead to biased estimation of treatment effect. The objective is that as a result of the PS conditioning of the relevant explanatory variables, the treatment will be independent of potential outcomes. This conditional independence assumption is called ‘ignorability’, ‘unconfoundedness’, ‘selection on observables’ and crucially it is always held as an assumption, because it is not directly testable [14]. In order to assume that treatment assignment is ‘otherwise ignorable’ [9–15], the very first step is the inclusion in the PS algorithm of all known and available confounding factors, as explanatory variables that meet the condition of affecting both treatment assignment and outcome confound the observed relationship between treatment and outcome [9–15]. The PS is compromised when important variables influencing selection have not been collected or considered and misspecification of the PS by excluding known confounders has been demonstrated to lead to largely biased results [10].

The study by Thourani et al. was designed to compare the outcomes of an observational study on the latest generation SAPIEN 3 TAVI System (Edwards Lifesciences, Irvine, CA, USA) with results of the surgical group of the PARTNER 2A trial [5, 6, 16]. The 2 groups were not homogeneous, as shown in baseline characteristics and Thourani et al. planned PS stratification before analysing outcomes [6]. The use of PS stratification rather than precise matching is surprising, as it is by design limited in the extent to which systematic differences between the comparator groups may be accounted for. Indeed, there were important differences between the comparator samples. The comparative analysis of patients’ baseline characteristics and baseline variables included in the PS algorithm showed that the most significantly different characteristics between the 2 groups (left ventricular ejection fraction, P-value <0.0001; society of thoracic surgeons (STS) score, P-value 0.0002; moderate or severe mitral regurgitation, P-value <0.0001) were omitted in the PS generation, together with other significant factors (frail condition and mean gradient). STS score has been developed to estimate early mortality, and it was demonstrated to be also a predictor of long-term mortality [17–20]. Several studies and meta-analyses demonstrated that both left ventricular ejection fraction and moderate/severe mitral regurgitation affect early and late outcomes, also in patients who undergo TAVI [21–24]. These factors, affecting both treatment assignment and outcomes, are hence major confounders that should be included in the PS. Their omission may violate the ‘ignorability’ assumption and, consequently, may lead to selection bias.

Moreover, further potential confounders not collected in the study are associated procedures, such as myocardial revascularization. These increase the risk of perioperative mortality and morbidity as widely demonstrated by STS score and EuroSCORE [17–27], and they could represent important confounders to be included in the PS algorithm. Nonetheless, although patients with non-complex coronary disease requiring revascularization were included whether a treatment plan for the coronary disease was agreed before enrolment [5, 6, 16], no information on associated myocardial revascularization in the TAVI group has been reported [6, 16]. Some information on the SAVR group can be derived from the published PARTNER 2A trial, where a total of 86 of 944 patients (9.1%) had concomitant procedures during surgery and 137 of 944 patients (14.5%) underwent associated coronary artery bypass grafting [5]. Thus, a proportion ranging between 14.5% and 23.6% had concomitant surgical procedures in the SAVR group of the PARTNER 2A trial, indicating an increased risk of mortality and morbidity and potentially a major confounder. The need for a deeper analysis on associated procedures in the Thourani et al.’s study is also strengthened by the significantly different proportion of myocardial revascularization in the PARTNER 2A trial (137 of 994, 14.5% in the SAVR; 39 of 994, 3.9% in the TAVI group; χ2P-value <0.0001) [5].

In summary, these differences in baseline characteristics between study groups reflect a different clinical selection of patients that can influence outcomes and should be balanced to avoid biased estimation of treatment effect.

CONFOUNDING BY INDICATION AND ASSESSING THE PERFORMANCE OF THE PROPENSITY SCORE

Confounding by indication is the situation where, although all known confounders have been balanced, allocation to treatment is not otherwise ignorable but instead subject to some latent (unrecognized or unmeasured) process associated with those who are treated. This confounding cannot be measured directly but only tangentially through its effects and hence the effort should be focused on performance analysis of PS [11].

The first useful precaution against unsafe inference from an observational study is to compare it with a known treatment effect and bridge from that point to consider further questions. A deeper step in diagnostic should be the evaluation of PS performance through testing the potential heterogeneity of the treatment effect across the range of the PS. A comparison between 2 well-balanced groups should lead to a homogeneous treatment effect across the range of the PS, while heterogeneous effects will raise concern.

The treatment effect of the observational study by Thourani et al. [6] can be compared with the PARTNER 2A randomized trial [5]. As shown in Fig. 1, the relative risk of the main outcome (all-cause death or disabling stroke) significantly differs from the 2 studies (interaction P-value = 0.0001), which militates against drawing strong conclusions in the observational study. Moreover, a deeper analysis of the treatment effect across the PS quintiles shows that the treatment effect may not be homogeneous across classes, showing a decreasing pattern through strata (Fig. 2). Only the treatment effect in the fifth quintile is similar to the PARTNER 2A trial effect. It can be hypothesized that in patients with low likelihood of TAVI (lower quintiles of PS) there is important information that the PS did not capture and so the match was made with inappropriately low-risk individuals, leading to a not otherwise ignorable treatment assignment [11].
Treatment effect of TAVR versus surgery on all-cause mortality and stroke in PARTNER 2A randomized trial and PARTNER 2A SAPIEN 3 observational study.
Figure 1

Treatment effect of TAVR versus surgery on all-cause mortality and stroke in PARTNER 2A randomized trial and PARTNER 2A SAPIEN 3 observational study.

Treatment effect of TAVR versus surgery on composite outcome (death, stroke and moderate or severe aortic regurgitation at 1 year) across the quintiles of propensity score in the PARTNER 2A SAPIEN 3 observational study.
Figure 2

Treatment effect of TAVR versus surgery on composite outcome (death, stroke and moderate or severe aortic regurgitation at 1 year) across the quintiles of propensity score in the PARTNER 2A SAPIEN 3 observational study.

TO ADJUST OR NOT TO ADJUST, THIS IS ANOTHER QUESTION

The concerns also increase in the second part of the study, the time-to-event analyses. The study is based on the evidence that groups are different and biased estimated of treatment effects need to be accounted for by balancing the covariates with PS methods [6]. Nonetheless, after employing PS stratification for comparing dichotomic outcomes, the authors surprisingly did not undertake any type of adjustment in time-to-event analysis and presented simple unadjusted Kaplan–Meier estimates and curves, making inference on their results [6]. This is counter-intuitive and the curves are not interpretable, as they are simply a first-step evaluation before adjustment. Stating in results ‘important differences between TAVR and surgery for each end-point are observed in the first several months’ is inappropriate until results are confirmed by adjusted results. Making inference on unadjusted outcomes derived from biased groups should be avoided [9, 13].

IS THERE AN OUTCOME MISSING?

In the PARTNER 2 SAPIEN 3 observation study, clinical outcomes were reported as defined by Valve Academic Research Consortium 2 definitions [6, 28]. The Valve Academic Research Consortium 2 definitions recommend capturing the cause of death with a careful review and, among mortality causes to be reported, all valve-related deaths are included. Valve-related mortality and morbidity represent the main outcomes to evaluate the safety and short-/long-term follow-up after valvular treatment, as it is the most specific index of early–late performance. In a comparison, between 2 treatment options for valvular disease considering two homogeneous groups, we might reasonably expect to observe a similar non-cardiovascular and cardiac non-valve–related mortality, while the treatment effect would be expressed in differences in valve-related mortality [29]. Nonetheless, in the PARTNER 2 SAPIEN 3 observation study, only all-cause mortality, non-cardiac and cardiac death were reported, with no information on valve-related mortality shown. Therefore, as it is not possible to differentiate prostheses-related events from prostheses-unrelated deaths, such as these caused by non-embolic myocardial infarction, defined as cardiac but non-valve-related death [28, 29].

CONCLUSIONS

As shown, the study on the comparison between SAPIEN 3 TAVR and surgical AVR [6] has demonstrated several major methodological concerns:

  • suboptimal methods in PS analysis with evident misspecification of the PS (no adjustment for the most significantly different covariates: left ventricular ejection fraction, moderate–severe mitral regurgitation and associated procedures);

  • use of PS quintiles rather than matching;

  • inference on not-adjusted Kaplan–Meier curves, although the authors correctly claimed for the need of balancing score for adjusting for confounding factors in order to have unbiased estimates of the treatment effect;

  • evidence of poor fit; and

  • lack of data on valve-related death.

These methodological flaws invalidate direct comparison between treatments and cannot support authors’ conclusions that TAVI with SAPIEN 3 in intermediate-risk patients is superior to surgery and might be the preferred treatment alternative to surgery.

Conflict of interest: F.B. reports personal fees from St Jude Medical, outside the submitted work. N.B. reports research grant from Edwards Lifescience and speaker’s honoraria from Edwards Lifescience, Medtronic and Abbott (outside the submitted work). R.L. is the principal investigator of the PERSIST-AVR Trial. P.S. is a consultant for Atricure and Medtronic (outside the submitted work).

REFERENCES

1

Vahanian
A
,
Alfieri
O
,
Andreotti
F
,
Antunes
MJ
,
Barón-Esquivias
G
,
Baumgartner
H
et al.
Guidelines on the management of valvular heart disease (version 2012): the Joint Task Force on the Management of Valvular Heart Disease of the European Society of Cardiology (ESC) and the European Association for Cardio-Thoracic Surgery (EACTS)
.
Eur J Cardiothorac Surg
2012
;
42
:
S1
44
.

2

Nishimura
RA
,
Otto
CM
,
Bonow
RO
,
Carabello
BA
,
Erwin
JP
3rd
,
Guyton
RA
et al.
2014 AHA/ACC guideline for the management of patients with valvular heart disease: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines
.
J Thorac Cardiovasc Surg
2014
;
148
:
e1
132
.

3

Mack
MJ
,
Leon
MB
,
Smith
CR
,
Miller
DC
,
Moses
JW
,
Tuzcu
EM
et al.
5-year outcomes of transcatheter aortic valve replacement or surgical aortic valve replacement for high surgical risk patients with aortic stenosis (PARTNER 1): a randomised controlled trial
.
Lancet
2015
;
385
2477
84
.

4

Kapadia
SR
,
Leon
MB
,
Makkar
RR
,
Tuzcu
EM
,
Svensson
LG
,
Kodali
S
et al.
5-year outcomes of transcatheter aortic valve replacement compared with standard treatment for patients with inoperable aortic stenosis (PARTNER 1): a randomised controlled trial
.
Lancet
2015
;
385
2485
91
.

5

Leon
MB
,
Smith
CR
,
Mack
MJ
,
Makkar
RR
,
Svensson
LG
,
Kodali
SK
et al.
Transcatheter or surgical aortic-valve replacement in intermediate-risk patients
.
N Engl J Med
2016
;
374
:
1609
20
.

6

Thourani
VH
,
Kodali
S
,
Makkar
RR
,
Herrmann
HC
,
Williams
M
,
Babaliaros
V
et al.
Transcatheter aortic valve replacement versus surgical valve replacement in intermediate-risk patients: a propensity score analysis
.
Lancet
2016
;
387
:
2218
25
.

7

Rosenbaum
PR
,
Dubin
DB.
The central role of the propensity score in observational studies for causal effect
.
Biometrika
1983
;
70
:
41
55
.

8

Rosenbaum
PR
,
Dubin
DB.
Reducing bias in observational studies using subclassification on the propensity score
.
J Am Stat Assoc
1984
;
79
:
516
24
.

9

Blackstone
E.
Comparing apples and oranges
.
J Thorac Cardiovasc Surg
2002
;
123
:
8
15
.

10

Drake
C.
Effects of misspecification of the propensity score on estimators of treatment effects
.
Biometrics
1993
;
49
:
1231
36
.

11

Freemantle
N
,
Marston
L
,
Walters
K
,
Wood
J
,
Reynolds
MR
,
Petersen
I.
Making inferences on treatment effects from real world data: propensity scores, confounding by indication, and other perils for the unwary in observational research
.
BMJ
2013
;
347
:
f6409.

12

Rosenbaum
PR.
Optimal matching for observational studies
.
J Am Stat Assoc
1989
;
84
:
1024
32
.

13

D'Agostino
RB
Jr.
Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group
.
Stat Med
1998
;
17
:
2265
81
.

14

Xie
Y
,
Brand
JE
,
Jann
B.
Estimating heterogeneous treatment effects with observational data
.
Sociol Methodol
2012
;
42
:
314
47
.

15

Rubin
DB.
Estimating causal effects from large data sets using propensity scores
.
Ann Intern Med
1997
;
127
:
757
63
.

16

Kodali
S
,
Thourani
VH
,
White
J
,
Malaisrie
SC
,
Lim
S
,
Greason
KL
et al.
Early clinical and echocardiographic outcomes after SAPIEN 3 transcatheter aortic valve replacement in inoperable, high-risk and intermediate-risk patients with aortic stenosis
.
Eur Heart J
2016
;
37
:
2252
62
.

17

O'Brien
SM
,
Shahian
DM
,
Filardo
G
,
Ferraris
VA
,
Haan
CK
,
Rich
JB
et al.
The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2—isolated valve surgery
.
Ann Thorac Surg
2009
;
88(Suppl 1)
:
S23
42
.

18

Shahian
DM
,
He
X
,
Jacobs
JP
,
Rankin
JS
,
Welke
KF
,
Filardo
G
et al.
The society of thoracic surgeons isolated aortic valve replacement (AVR) composite score: a report of the STS quality measurement task force
.
Ann Thorac Surg
2012
;
94
:
2166
71
.

19

Barili
F
,
Pacini
D
,
D'Ovidio
M
,
Ventura
M
,
Alamanni
F
,
Di Bartolomeo
R
et al.
Reliability of modern scores to predict long-term mortality after isolated aortic valve operations
.
Ann Thorac Surg
2016
;
101
:
599
605
.

20

Barili
F
,
Pacini
D
,
Capo
A
,
Ardemagni
E
,
Pellicciari
G
,
Zanobini
M
et al.
Reliability of new scores in predicting perioperative mortality after isolated aortic valve surgery: a comparison with the society of thoracic surgeons score and logistic EuroSCORE
.
Ann Thorac Surg
2013
;
95
:
1539
44
.

21

Eleid
MF
,
Goel
K
,
Murad
MH
,
Erwin
PJ
,
Suri
RM
,
Greason
KL
et al.
Meta-analysis of the prognostic impact of stroke volume, gradient, and ejection fraction after transcatheter aortic valve implantation
.
Am J Cardiol
2015
;
116
:
989
94
.

22

Sannino
A
,
Losi
MA
,
Schiattarella
GG
,
Gargiulo
G
,
Perrino
C
,
Stabile
E
et al.
Meta-analysis of mortality outcomes and mitral regurgitation evolution in 4,839 patients having transcatheter aortic valve implantation for severe aortic stenosis
.
Am J Cardiol
2014
;
114
:
875
82
.

23

Schubert
SA
,
Yarboro
LT
,
Madala
S
,
Ayunipudi
K
,
Kron
IL
,
Kern
JA
et al.
Natural history of coexistent mitral regurgitation after aortic valve replacement
.
J Thorac Cardiovasc Surg
2016
;
151
:
1032
9
.

24

Tan
TC
,
Flynn
AW
,
Chen-Tournoux
A
,
Rudski
LG
,
Mehrotra
P
,
Nunes
MC
et al.
Risk prediction in aortic valve replacement: incremental value of the preoperative echocardiogram
.
J Am Heart Assoc
2015
;
4
:
e002129.

25

Shahian
DM
,
O'Brien
SM
,
Filardo
G
,
Ferraris
VA
,
Haan
CK
,
Rich
JB
et al.
The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3–valve plus coronary artery bypass grafting surgery
.
Ann Thorac Surg
2009
;
88(Suppl 1)
:
S43
62
.

26

Nashef
SA
,
Roques
F
,
Sharples
LD
,
Nilsson
J
,
Smith
C
,
Goldstone
AR
et al.
EuroSCORE II
.
Eur J Cardiothorac Surg
2012
;
41
:
734
44
.

27

Barili
F
,
Pacini
D
,
Capo
A
,
Rasovic
O
,
Grossi
C
,
Alamanni
F
et al.
Does EuroSCORE II perform better than its original versions? A multicentre validation study
.
Eur Heart J
2013
;
34
:
22
9
.

28

Kappetein
AP
,
Head
SJ
,
Généreux
P
,
Piazza
N
,
van Mieghem
NM
,
Blackstone
EH
et al.
Updated standardized endpoint definitions for transcatheter aortic valve implantation: the Valve Academic Research Consortium-2 consensus document
.
Eur J Cardiothorac Surg
2012
;
42
:
S45
60
.

29

Akins
CW
,
Miller
DC
,
Turina
MI
,
Kouchoukos
NT
,
Blackstone
EH
,
Grunkemeier
GL
et al.
Guidelines for reporting mortality and morbidity after cardiac valve interventions
.
J Thorac Cardiovasc Surg
2008
;
135
:
732
8
.

Author notes

†Presented at the Postgraduate Course of the 30th Annual Meeting of the European Association for Cardio-Thoracic Surgery, Barcelona, Spain, 2 October 2016.

‡The first two authors contributed equally to this study.