-
PDF
- Split View
-
Views
-
Cite
Cite
Norman R Williams, Hannah Patrick, Francesca Fiorentino, Alexander Allen, Manuj Sharma, Mišel Milošević, Fergus Macbeth, Tom Treasure, Pulmonary Metastasectomy in Colorectal Cancer (PulMiCC) randomized controlled trial: a systematic review of published responses, European Journal of Cardio-Thoracic Surgery, Volume 62, Issue 1, July 2022, ezac253, https://doi.org/10.1093/ejcts/ezac253
- Share Icon Share
Abstract
The objective of this review was to assess the nature and tone of the published responses to the Pulmonary Metastasectomy in Colorectal Cancer (PulMiCC) randomized controlled trial.
Published articles that cited the PulMiCC trial were identified from Clarivate Web of Science (©. Duplicates and self-citations were excluded and relevant text was extracted. Four independent researchers rated the extracts independently using agreed scales for the representativeness of trial data and the textual tone. The ratings were aggregated and summarized. Two PulMiCC authors carried out a thematic analysis of the extracts.
Sixty-four citations were identified and relevant text was extracted and examined. The consensus rating for data inclusion was a median of 0.25 out of 6 (range 0–5.25, interquartile range 0–1.5) and, for textual tone, the median rating was 1.87 out of 6 (range 0–5.75, interquartile range 1–3.5). The majority of citations did not provide adequate representation of the PulMiCC data and the overall textual tone was dismissive. Although some were supportive, many discounted the findings because the trial closed early and was underpowered to show non-inferiority. Two misinterpreted the authors’ conclusions but there was an acceptance that 5-year survival was much higher than widely assumed.
Published comments reveal a widespread reluctance to consider seriously the results of a carefully conducted randomized trial. This may be because the results challenge accepted practice because of ‘motivated reasoning’, but there is a widespread misunderstanding of the fact that though PulMiCC with 93 patients was underpowered to test non-inferiority, it still provides reliable evidence to undermine the widespread belief in a major survival benefit from metastasectomy.
INTRODUCTION
It ought to be remembered that there is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things.
This … arises partly … from the incredulity of men, who do not readily believe in new things until they have had a long experience of them. Niccolò Machiavelli, The Prince
The prospective cohort study Pulmonary Metastasectomy in Colorectal Cancer (PulMiCC), with its embedded randomized controlled trial (RCT), was presented to the European Association for Cardiothoracic Surgery at its 35th Annual Meeting in Barcelona, in 2021. The study was proposed at the Workshop on Pulmonary Metastasectomy in Cluj-Napoca, Romania, in 2006 before a meeting of the European Society of Thoracic Surgeons [1]. The launch of the PulMiCC trial was announced in the full report of the European Society of Thoracic Surgeons Lung Metastasectomy Project in 2010 [2].
Lung metastasectomy for colorectal cancer (CRC) is based on surgical follow-up studies since the 1970s, gaining momentum in the 1980s and 1990s [3]. The publication in 1997 of the International Registry of Lung Metastases with data on 5206 patients [4] showed that survival after metastasectomy was better if the metastases were solitary and there was a longer elapsed time since primary resection. There were no RCTs, but there had been 1 comparative study published in 1980 [5]. Åberg and colleagues identified patients in the pre-metastasectomy era whose characteristics would make them candidates for metastasectomy in the 1980s. They found, in a small series of a dozen patients, that survival was similar to that attributed to lung metastasectomy (Fig. 1). Åberg and Treasure [6] returned to the debate in 2016 in an EJCTS editorial making the case for an evidence-based approach. Schirren et al. [7] countered in an editorial opening ‘Surgery for lung metastases is a pillar of modern thoracic surgery’. The cited data for CRC showed a 60% 5-year survival. The Society of Thoracic Surgeons (STS) Expert Consensus Document states that ‘metastatic disease survival is assumed to be zero’ [8]. The implication is that all 5-year survival is attributable to metastasectomy.
![Control patients were sought from the records in the era before the introduction of lung metastasectomy from which time they would have been candidate for the operation. Original graph by Åberg [5] reproduced with permission. There is no difference in 5-year survival. Survival for non-operated patients was about 20%, not zero.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ejcts/62/1/10.1093_ejcts_ezac253/2/m_ezac253f1.jpeg?Expires=1749471284&Signature=va8qlqtGayJjlTB4aqSDpypCqtXtoDoa0Qjda4Vj1UZuEQVt3-UV2bCrfo8DN9F~oWNqwh8y1bXNiUMLdFMNNGXuR19aODdMcD8eho3HhSzIO6mM7lJoZ44FEZS9nTWO15LBb4~LflUux4JwVncQeMHfwpn69uEYyFFbooq7CSLUYzMZ~iqsREOzW3DAALFHXWhiEcQptJkPyd0tjr4yNN8CcnwF9CUUnhvXetWh5Hxo0aAfaMTrLYMw0UfKEKr7WZDqtunNjVYpJ0Nylx85OPtCIWVDM4ImnZqS~srsNsFIDKUvL-86we84TIqxhvEN~tkTLESSkXggTD4I8MZGbw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Control patients were sought from the records in the era before the introduction of lung metastasectomy from which time they would have been candidate for the operation. Original graph by Åberg [5] reproduced with permission. There is no difference in 5-year survival. Survival for non-operated patients was about 20%, not zero.
The two-stage PulMiCC study recruited well; 512 patients consented to stage 1 registration in 25 centres between 2010 and 2016 [9]. After assessment, 28 did not fit the study inclusion criteria leaving 484 patients. After stage 2 consent, 93 patients were randomly assigned to lung metastasectomy or control [10]. The sample size calculation for non-inferiority of control versus metastasectomy required 300 randomized patients. The opinions underpinning 2 prominently published papers [7, 8] indicating a supposed 5-year survival gain, attributable to metastasectomy, from 0% to 60% made randomization difficult. Accrual into the RCT slowed and the trial steering committee closed the trial and instigated analysis [9, 10]. Before making the decision, the trialists were asked to investigate the reasons for not randomizing. Full details are provided in the first RCT report [9] and the full 512 participant cohort study report [11]. The 3 most actively recruiting centres analysed reasons in 155 non-randomized patients. Of them, 41 fully informed patients had chosen to make their own decision and chose metastasectomy or not in approximately equal numbers. For 78 fully eligible patients, the clinician made the decision and 99% had a metastasectomy [9, 11]. Of the 36 remaining patients, 10 had primary lung cancer and 9 were deemed ineligible by local team decisions.
The elective cohort provided a wealth of trial-quality prospective data on 391 patients, of whom 263 (67%) had an elective lung metastasectomy [11]. Five-year survival (Graphical Abstract) after metastasectomy was 58.5% [95% confidence interval (CI): 52.0–64.8] confirming that the PulMiCC cohort replicated the best of ‘real-world’ results. Critically important were the hitherto missing data on those who were clinically selected to not have metastasectomy. Their 5-year survival, 24.0% (95% CI: 16.9–31.9), was much higher than assumed.
From baseline data characteristics collected to RCT standards, there is reliable information about prognostic factors. The proportion of solitary metastases in the 263 electively operated patients was 69% vs 35% in the 128 unoperated. Fewer operated patients had raised carcinoembryonic antigen (CEA) (12% vs 20%). By meta-analysis [12], the hazard ratios were 2.04 for non-solitary metastases and 1.91 for elevated CEA. That is about twice the likelihood of death for each. The 5-year death rates were 41% and 76%, a difference compatible with the hazard ratios. Also, fewer operated patients had liver involvement (28% vs 36%); they had better lung function (FEV1 96% vs 87%) and a higher rate of zero Eastern Cooperative Oncology Group performance scores (68% vs 36%) and were on average 5 years younger (67 vs 72 years). In the RCT, there was excellent balance in metastasis numbers, CEA, primary cancer stage, the interval since primary resection, liver involvement, lung function, performance status, age and sex. There was no survival difference at any time point. We cannot escape the conclusion that the perception of survival benefit in uncontrolled observational follow-up studies is mainly—maybe all—due to the selection of those more likely to survive. Because of the wide confidence intervals, we cannot exclude a small eventual difference in survival but it cannot be as large as is widely believed or what patients are told.
Opening the discussion at European Association for Cardiothoracic Surgery in 2021, Tim Batchelor remarked that the PulMiCC trial had received ‘a mixed reception’. That prompted this systematic review of publications citing the PulMiCC RCT [9, 10] to investigate its reception.
MATERIALS AND METHODS
Ethics statement
The material for analysis is from 64 papers, all of which are published [14–77]. Central ethical approval for the PulMiCC trial was granted by the National Research Ethics Committee London—Hampstead (No. 10/H0720/5) but the work referred to here is all previously published [9–11].
Search methods
A search was conducted for publications providing survival results of the PulMiCC RCT and the cohort study [9–11]. Citations were derived from Clarivate Web of Science (© Copyright Clarivate 2021) on 31 October 2021. The publications were searched for content related to PulMiCC.
Rating publications citing the PulMiCC randomized controlled trial
Potential raters were identified by Hannah Patrick and Tom Treasure. They needed the experience of systematic reviewing and no prior involvement in PulMiCC. Four PulMiCC-independent researchers volunteered: Alexander Allen, Francesca Fiorentino, Hannah Patrick and Manuj Sharma.
Before presenting the material to the raters, the papers were filed in alphabetical order by the surname of the first author and individually searched for all text related to PulMiCC. Blocks of text were extracted and copied verbatim, including all statements and comments about PulMiCC, erring on the side of over inclusion. Word counts were made with MS Word and quartiles were calculated using MS Excel. The blocks of text were assigned an identity by sequential numbering masking the authors and their affiliations.
The extent of representation of PulMiCC data from none to a full summary of the results was rated using a numeric ordinal rating system from 0 to 6. On a similar system, the tone of the comments from dismissive to supportive was rated 0–6.
The agreed scales were:
(1) Representation of the data in the PulMiCC RCT
None . | Omits CIs and significance . | Representative . | Full and fair summary . | |||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
None . | Omits CIs and significance . | Representative . | Full and fair summary . | |||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
None . | Omits CIs and significance . | Representative . | Full and fair summary . | |||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
None . | Omits CIs and significance . | Representative . | Full and fair summary . | |||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
(2) Tone of the text related to the PulMiCC RCT
Dismissive . | Balanced appraisal . | Supportive . | ||||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
Dismissive . | Balanced appraisal . | Supportive . | ||||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
Dismissive . | Balanced appraisal . | Supportive . | ||||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
Dismissive . | Balanced appraisal . | Supportive . | ||||
---|---|---|---|---|---|---|
0 | 1 | 2 | 3 | 4 | 5 | 6 |
Working individually in undisclosed practice runs, the raters were invited to refine and agree on the scales as fit for purpose. They then returned their 64 ratings individually. These were entered on a single spreadsheet by the corresponding author. The standard deviation of the 4 ratings was calculated for each of the 64 papers giving a simple but robust indication of the spread of the ratings. The rows of 4 ratings were colour coded to indicate how close or dispersed they were and returned to the raters who could see their colleagues’ ratings alongside their own, providing an opportunity to reconsider them in a Delphi consensus process. Inter-rater reliability was assessed by ordinal weighted agreement coefficients and confidence limits calculated by the method of Gwet [13] [Inter-Rater Reliability using the SAS System, 2nd Edition, K Gwet, 2021, AgreeStat Analytics] using SAS software (copyright © 2021 SAS Institute Inc., Cary, NC, USA).
RESULTS
Of 123 titles found, 46 were by PulMiCC authors and in 12 there was no reference found to PulMiCC (Fig. 2). There was only one independent citation to the cohort study, which was excluded leaving 64 publications for analysis [14–77], of which 40 cited the preliminary report [9], 12 cited the full RCT report [10], and 12 cited both.

Sankey flow diagram. On the left are the numbers of publications retrieved. Citations to either or both randomized controlled trial reports (N = 64) are categorized above. Reasons for exclusion (N = 59) are shown below. More details are in the text.
All 64 publications were from teams engaged in the local treatment of lung metastases. Three sub-groups were identified: 28 original research reports, 24 opinion pieces (editorials, commentaries and letters) and 12 reviews (10 narrative and 2 systematic) (Fig. 2). The contributors were thoracic surgical teams (30/64), other interventionalists (16/64), colorectal oncology multidisciplinary groups (12/64), head and neck surgeons (4/64) and hepatobiliary surgical groups (2/64). Of the 28 research papers, 17 were follow-up studies on the treatment of lung metastases (Table 1).
Author . | Start year . | End year . | Study . | Intervention . | Pathology . | N . | Solitary mets (%) . | Survival 5 years (%) . | Median months . |
---|---|---|---|---|---|---|---|---|---|
Corsini [28] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 194 | NF | 57 | 76 |
Dudek [30] | 2008 | 2018 | SC F-up | Surgery | Head and neck | 44 | 48 | 41 | 28 |
Dudek [31] | 2008 | 2018 | SC F-up | Surgery | Mixed | 281 | 57 | 47 | NF |
Forster [32] | 2003 | 2018 | SC F-up | 1st surgery | Mixed | 198 | 61 | 56 | NF |
Repeat | Mixed | 66 | NF | 79 | 32 | ||||
Fukada [33] | 2000 | 2019 | SC F-up | Surgery | Colorectal | 126 | 71 | 61 | |
Gossling [37] | 1985 | 2019 | SC F-up | Surgery | Colorectal | 59 | 53 | 50 | 58 |
Mammana [42] | 2001 | 2017 | SC F-up | Surgery | Colorectal | 129 | 89 | NF | 90 |
Markowiak [44] | 2009 | 2017 | SC F-up | Surgery | Mixed | 251 | 86 | 50 | 61 |
Palma [55] | 2012 | 2016 | RCT | SABR | Mixed | 66 | 46 | 42 | 50 |
Control | 33 | 36 | 18 | 28 | |||||
Sponholz [60] | 1999 | 2014 | SC F-up | Surgery | Colorectal | 233 | 47 | 47 | 57 |
van Dorp [65] | 2012 | 2017 | DLCA | Surgery | Mixed | 2090 | 70 | NA | NA |
Vidarsdottir [66] | 2000 | 2014 | SC F-up | Surgery | Colorectal | 216 | 70 | 56 | 68 |
Yaftian [68] | 2000 | 2017 | MC F-up | Surgery | Mixed | 476 | 58 | 50 | NF |
Yildiz [69] | 2012 | 2019 | SC F-up | Surgery | Colorectal | 33 | 91 | NF | 55 |
Yun [70] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 173 | 61 | 52 | NF |
Zhao [73] | 2001 | 2018 | PMS | Surgery | Nasopharyngeal | 45 | NF | 76 | NF |
Control | 22 | NF | 48 | NF | |||||
Zhong [75] | 2008 | 2014 | RSC | RFA ± surgery | Colorectal | 60 | NF | 44 | 52 |
Author . | Start year . | End year . | Study . | Intervention . | Pathology . | N . | Solitary mets (%) . | Survival 5 years (%) . | Median months . |
---|---|---|---|---|---|---|---|---|---|
Corsini [28] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 194 | NF | 57 | 76 |
Dudek [30] | 2008 | 2018 | SC F-up | Surgery | Head and neck | 44 | 48 | 41 | 28 |
Dudek [31] | 2008 | 2018 | SC F-up | Surgery | Mixed | 281 | 57 | 47 | NF |
Forster [32] | 2003 | 2018 | SC F-up | 1st surgery | Mixed | 198 | 61 | 56 | NF |
Repeat | Mixed | 66 | NF | 79 | 32 | ||||
Fukada [33] | 2000 | 2019 | SC F-up | Surgery | Colorectal | 126 | 71 | 61 | |
Gossling [37] | 1985 | 2019 | SC F-up | Surgery | Colorectal | 59 | 53 | 50 | 58 |
Mammana [42] | 2001 | 2017 | SC F-up | Surgery | Colorectal | 129 | 89 | NF | 90 |
Markowiak [44] | 2009 | 2017 | SC F-up | Surgery | Mixed | 251 | 86 | 50 | 61 |
Palma [55] | 2012 | 2016 | RCT | SABR | Mixed | 66 | 46 | 42 | 50 |
Control | 33 | 36 | 18 | 28 | |||||
Sponholz [60] | 1999 | 2014 | SC F-up | Surgery | Colorectal | 233 | 47 | 47 | 57 |
van Dorp [65] | 2012 | 2017 | DLCA | Surgery | Mixed | 2090 | 70 | NA | NA |
Vidarsdottir [66] | 2000 | 2014 | SC F-up | Surgery | Colorectal | 216 | 70 | 56 | 68 |
Yaftian [68] | 2000 | 2017 | MC F-up | Surgery | Mixed | 476 | 58 | 50 | NF |
Yildiz [69] | 2012 | 2019 | SC F-up | Surgery | Colorectal | 33 | 91 | NF | 55 |
Yun [70] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 173 | 61 | 52 | NF |
Zhao [73] | 2001 | 2018 | PMS | Surgery | Nasopharyngeal | 45 | NF | 76 | NF |
Control | 22 | NF | 48 | NF | |||||
Zhong [75] | 2008 | 2014 | RSC | RFA ± surgery | Colorectal | 60 | NF | 44 | 52 |
DLCA: Dutch Lung Cancer Audit; MC F-up: multicentre follow-up study; NA: not available; NF: not found; PMS: propensity matched case control study; RCT: randomized controlled trial; RFA: radiofrequency ablation; RSC: retrospective single centre; SABR: stereotactic ablative radiotherapy; SC F-up: single-centre follow-up study.
Author . | Start year . | End year . | Study . | Intervention . | Pathology . | N . | Solitary mets (%) . | Survival 5 years (%) . | Median months . |
---|---|---|---|---|---|---|---|---|---|
Corsini [28] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 194 | NF | 57 | 76 |
Dudek [30] | 2008 | 2018 | SC F-up | Surgery | Head and neck | 44 | 48 | 41 | 28 |
Dudek [31] | 2008 | 2018 | SC F-up | Surgery | Mixed | 281 | 57 | 47 | NF |
Forster [32] | 2003 | 2018 | SC F-up | 1st surgery | Mixed | 198 | 61 | 56 | NF |
Repeat | Mixed | 66 | NF | 79 | 32 | ||||
Fukada [33] | 2000 | 2019 | SC F-up | Surgery | Colorectal | 126 | 71 | 61 | |
Gossling [37] | 1985 | 2019 | SC F-up | Surgery | Colorectal | 59 | 53 | 50 | 58 |
Mammana [42] | 2001 | 2017 | SC F-up | Surgery | Colorectal | 129 | 89 | NF | 90 |
Markowiak [44] | 2009 | 2017 | SC F-up | Surgery | Mixed | 251 | 86 | 50 | 61 |
Palma [55] | 2012 | 2016 | RCT | SABR | Mixed | 66 | 46 | 42 | 50 |
Control | 33 | 36 | 18 | 28 | |||||
Sponholz [60] | 1999 | 2014 | SC F-up | Surgery | Colorectal | 233 | 47 | 47 | 57 |
van Dorp [65] | 2012 | 2017 | DLCA | Surgery | Mixed | 2090 | 70 | NA | NA |
Vidarsdottir [66] | 2000 | 2014 | SC F-up | Surgery | Colorectal | 216 | 70 | 56 | 68 |
Yaftian [68] | 2000 | 2017 | MC F-up | Surgery | Mixed | 476 | 58 | 50 | NF |
Yildiz [69] | 2012 | 2019 | SC F-up | Surgery | Colorectal | 33 | 91 | NF | 55 |
Yun [70] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 173 | 61 | 52 | NF |
Zhao [73] | 2001 | 2018 | PMS | Surgery | Nasopharyngeal | 45 | NF | 76 | NF |
Control | 22 | NF | 48 | NF | |||||
Zhong [75] | 2008 | 2014 | RSC | RFA ± surgery | Colorectal | 60 | NF | 44 | 52 |
Author . | Start year . | End year . | Study . | Intervention . | Pathology . | N . | Solitary mets (%) . | Survival 5 years (%) . | Median months . |
---|---|---|---|---|---|---|---|---|---|
Corsini [28] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 194 | NF | 57 | 76 |
Dudek [30] | 2008 | 2018 | SC F-up | Surgery | Head and neck | 44 | 48 | 41 | 28 |
Dudek [31] | 2008 | 2018 | SC F-up | Surgery | Mixed | 281 | 57 | 47 | NF |
Forster [32] | 2003 | 2018 | SC F-up | 1st surgery | Mixed | 198 | 61 | 56 | NF |
Repeat | Mixed | 66 | NF | 79 | 32 | ||||
Fukada [33] | 2000 | 2019 | SC F-up | Surgery | Colorectal | 126 | 71 | 61 | |
Gossling [37] | 1985 | 2019 | SC F-up | Surgery | Colorectal | 59 | 53 | 50 | 58 |
Mammana [42] | 2001 | 2017 | SC F-up | Surgery | Colorectal | 129 | 89 | NF | 90 |
Markowiak [44] | 2009 | 2017 | SC F-up | Surgery | Mixed | 251 | 86 | 50 | 61 |
Palma [55] | 2012 | 2016 | RCT | SABR | Mixed | 66 | 46 | 42 | 50 |
Control | 33 | 36 | 18 | 28 | |||||
Sponholz [60] | 1999 | 2014 | SC F-up | Surgery | Colorectal | 233 | 47 | 47 | 57 |
van Dorp [65] | 2012 | 2017 | DLCA | Surgery | Mixed | 2090 | 70 | NA | NA |
Vidarsdottir [66] | 2000 | 2014 | SC F-up | Surgery | Colorectal | 216 | 70 | 56 | 68 |
Yaftian [68] | 2000 | 2017 | MC F-up | Surgery | Mixed | 476 | 58 | 50 | NF |
Yildiz [69] | 2012 | 2019 | SC F-up | Surgery | Colorectal | 33 | 91 | NF | 55 |
Yun [70] | 2011 | 2017 | SC F-up | Surgery | Colorectal | 173 | 61 | 52 | NF |
Zhao [73] | 2001 | 2018 | PMS | Surgery | Nasopharyngeal | 45 | NF | 76 | NF |
Control | 22 | NF | 48 | NF | |||||
Zhong [75] | 2008 | 2014 | RSC | RFA ± surgery | Colorectal | 60 | NF | 44 | 52 |
DLCA: Dutch Lung Cancer Audit; MC F-up: multicentre follow-up study; NA: not available; NF: not found; PMS: propensity matched case control study; RCT: randomized controlled trial; RFA: radiofrequency ablation; RSC: retrospective single centre; SABR: stereotactic ablative radiotherapy; SC F-up: single-centre follow-up study.
The 64 blocks of texts provided for the raters totalled 8444 words, of individual lengths varying from 19 to 673 (median 81, interquartile range 39–164). The numbers of each of the 0–6 ratings assigned in the first and second rounds and the range, median and interquartile range are given in Table 2. The inter-rater reliability association coefficients (Gwet’s AC2 with confidence interval) are given in Table 3 and Fig. 3 showing higher association coefficients after the second round. There were some differences in inter-rater reliability within the subgroups of papers. The presentation of PulMiCC numerical data was predictably the easier task and had ‘very good’ reliability. The rating of textual tone showed more variation, but the association was ‘good’ for all categories.

The inter-rater reliability coefficients in Table 3 are shown in vertical bars (left for data and right for textual tone) with the 95% confidence intervals for all 64 publications (All) and for opinion pieces (Op), research papers (Res) and reviews (Rev).
Quartiles . | Data R1 . | Data R2 . | Text R1 . | Text R2 . |
---|---|---|---|---|
Minimum | 0 | 0 | 0 | 0 |
25% | 0 | 0 | 0 | 0 |
Median | 1 | 1 | 3 | 2 |
75% | 1 | 1 | 3 | 2 |
Maximum | 4 | 2 | 4 | 4 |
Scale 0–6 | Data R1 | Data R2 | Text R1 | Text R2 |
0 | 30 | 31 | 2 | 2 |
1 | 22 | 27 | 17 | 29 |
2 | 7 | 6 | 12 | 23 |
3 | 4 | 0 | 23 | 6 |
4 | 1 | 0 | 7 | 4 |
5 | 0 | 0 | 3 | 0 |
6 | 0 | 0 | 0 | 0 |
Total | 64 | 64 | 64 | 64 |
Quartiles . | Data R1 . | Data R2 . | Text R1 . | Text R2 . |
---|---|---|---|---|
Minimum | 0 | 0 | 0 | 0 |
25% | 0 | 0 | 0 | 0 |
Median | 1 | 1 | 3 | 2 |
75% | 1 | 1 | 3 | 2 |
Maximum | 4 | 2 | 4 | 4 |
Scale 0–6 | Data R1 | Data R2 | Text R1 | Text R2 |
0 | 30 | 31 | 2 | 2 |
1 | 22 | 27 | 17 | 29 |
2 | 7 | 6 | 12 | 23 |
3 | 4 | 0 | 23 | 6 |
4 | 1 | 0 | 7 | 4 |
5 | 0 | 0 | 3 | 0 |
6 | 0 | 0 | 0 | 0 |
Total | 64 | 64 | 64 | 64 |
Above distribution of the ratings: 0 indicates unanimity, 1 indicates no ratings more than 1 rank apart with the possible maximum of 6. Ratings were for data and textual tone. The dispersion was reduced by the Delphi process. Below are the averaged ratings from 0 to 6. These data are illustrated in Fig. 4.
Quartiles . | Data R1 . | Data R2 . | Text R1 . | Text R2 . |
---|---|---|---|---|
Minimum | 0 | 0 | 0 | 0 |
25% | 0 | 0 | 0 | 0 |
Median | 1 | 1 | 3 | 2 |
75% | 1 | 1 | 3 | 2 |
Maximum | 4 | 2 | 4 | 4 |
Scale 0–6 | Data R1 | Data R2 | Text R1 | Text R2 |
0 | 30 | 31 | 2 | 2 |
1 | 22 | 27 | 17 | 29 |
2 | 7 | 6 | 12 | 23 |
3 | 4 | 0 | 23 | 6 |
4 | 1 | 0 | 7 | 4 |
5 | 0 | 0 | 3 | 0 |
6 | 0 | 0 | 0 | 0 |
Total | 64 | 64 | 64 | 64 |
Quartiles . | Data R1 . | Data R2 . | Text R1 . | Text R2 . |
---|---|---|---|---|
Minimum | 0 | 0 | 0 | 0 |
25% | 0 | 0 | 0 | 0 |
Median | 1 | 1 | 3 | 2 |
75% | 1 | 1 | 3 | 2 |
Maximum | 4 | 2 | 4 | 4 |
Scale 0–6 | Data R1 | Data R2 | Text R1 | Text R2 |
0 | 30 | 31 | 2 | 2 |
1 | 22 | 27 | 17 | 29 |
2 | 7 | 6 | 12 | 23 |
3 | 4 | 0 | 23 | 6 |
4 | 1 | 0 | 7 | 4 |
5 | 0 | 0 | 3 | 0 |
6 | 0 | 0 | 0 | 0 |
Total | 64 | 64 | 64 | 64 |
Above distribution of the ratings: 0 indicates unanimity, 1 indicates no ratings more than 1 rank apart with the possible maximum of 6. Ratings were for data and textual tone. The dispersion was reduced by the Delphi process. Below are the averaged ratings from 0 to 6. These data are illustrated in Fig. 4.
N . | Category . | Round 1 . | Round 2 . | Diff . | SE (diff) . | t-Stat . | P-Value . |
---|---|---|---|---|---|---|---|
Data | |||||||
64 | All publications | 0.9411 | 0.9645 | 0.0234 | 0.0107 | 2.1820 | 0.0328 |
24 | Opinion | 0.9492 | 0.9629 | 0.0137 | 0.0108 | 1.2674 | 0.2177 |
28 | Research | 0.9394 | 0.9485 | 0.0091 | 0.0071 | 1.2834 | 0.2102 |
12 | Reviews | 0.8481 | 0.9440 | 0.0958 | 0.0667 | 1.4368 | 0.1786 |
Text | |||||||
64 | All publications | 0.6293 | 0.7739 | 0.1446 | 0.0262 | 5.5252 | <0.0001 |
24 | Opinion | 0.6344 | 0.7837 | 0.1494 | 0.0476 | 3.1389 | 0.0046 |
28 | Research | 0.6244 | 0.6858 | 0.0614 | 0.0401 | 1.5316 | 0.1373 |
12 | Reviews | 0.7437 | 0.7745 | 0.0308 | 0.0404 | 0.7630 | 0.4615 |
N . | Category . | Round 1 . | Round 2 . | Diff . | SE (diff) . | t-Stat . | P-Value . |
---|---|---|---|---|---|---|---|
Data | |||||||
64 | All publications | 0.9411 | 0.9645 | 0.0234 | 0.0107 | 2.1820 | 0.0328 |
24 | Opinion | 0.9492 | 0.9629 | 0.0137 | 0.0108 | 1.2674 | 0.2177 |
28 | Research | 0.9394 | 0.9485 | 0.0091 | 0.0071 | 1.2834 | 0.2102 |
12 | Reviews | 0.8481 | 0.9440 | 0.0958 | 0.0667 | 1.4368 | 0.1786 |
Text | |||||||
64 | All publications | 0.6293 | 0.7739 | 0.1446 | 0.0262 | 5.5252 | <0.0001 |
24 | Opinion | 0.6344 | 0.7837 | 0.1494 | 0.0476 | 3.1389 | 0.0046 |
28 | Research | 0.6244 | 0.6858 | 0.0614 | 0.0401 | 1.5316 | 0.1373 |
12 | Reviews | 0.7437 | 0.7745 | 0.0308 | 0.0404 | 0.7630 | 0.4615 |
The table shows inter-rater agreement coefficients in the 2 rounds. The t-stat is the statistic from a paired t-test for testing 2 correlated agreement coefficients, assuming ordinal weights using the method described by Gwet. All the agreement coefficients increased significantly (P < 0.05) when all 64 publications were considered. See also Fig. 3 depicting the data from round 2.
N . | Category . | Round 1 . | Round 2 . | Diff . | SE (diff) . | t-Stat . | P-Value . |
---|---|---|---|---|---|---|---|
Data | |||||||
64 | All publications | 0.9411 | 0.9645 | 0.0234 | 0.0107 | 2.1820 | 0.0328 |
24 | Opinion | 0.9492 | 0.9629 | 0.0137 | 0.0108 | 1.2674 | 0.2177 |
28 | Research | 0.9394 | 0.9485 | 0.0091 | 0.0071 | 1.2834 | 0.2102 |
12 | Reviews | 0.8481 | 0.9440 | 0.0958 | 0.0667 | 1.4368 | 0.1786 |
Text | |||||||
64 | All publications | 0.6293 | 0.7739 | 0.1446 | 0.0262 | 5.5252 | <0.0001 |
24 | Opinion | 0.6344 | 0.7837 | 0.1494 | 0.0476 | 3.1389 | 0.0046 |
28 | Research | 0.6244 | 0.6858 | 0.0614 | 0.0401 | 1.5316 | 0.1373 |
12 | Reviews | 0.7437 | 0.7745 | 0.0308 | 0.0404 | 0.7630 | 0.4615 |
N . | Category . | Round 1 . | Round 2 . | Diff . | SE (diff) . | t-Stat . | P-Value . |
---|---|---|---|---|---|---|---|
Data | |||||||
64 | All publications | 0.9411 | 0.9645 | 0.0234 | 0.0107 | 2.1820 | 0.0328 |
24 | Opinion | 0.9492 | 0.9629 | 0.0137 | 0.0108 | 1.2674 | 0.2177 |
28 | Research | 0.9394 | 0.9485 | 0.0091 | 0.0071 | 1.2834 | 0.2102 |
12 | Reviews | 0.8481 | 0.9440 | 0.0958 | 0.0667 | 1.4368 | 0.1786 |
Text | |||||||
64 | All publications | 0.6293 | 0.7739 | 0.1446 | 0.0262 | 5.5252 | <0.0001 |
24 | Opinion | 0.6344 | 0.7837 | 0.1494 | 0.0476 | 3.1389 | 0.0046 |
28 | Research | 0.6244 | 0.6858 | 0.0614 | 0.0401 | 1.5316 | 0.1373 |
12 | Reviews | 0.7437 | 0.7745 | 0.0308 | 0.0404 | 0.7630 | 0.4615 |
The table shows inter-rater agreement coefficients in the 2 rounds. The t-stat is the statistic from a paired t-test for testing 2 correlated agreement coefficients, assuming ordinal weights using the method described by Gwet. All the agreement coefficients increased significantly (P < 0.05) when all 64 publications were considered. See also Fig. 3 depicting the data from round 2.
The majority of ratings for data content were <2 (58/64) indicating that they did not provide sufficient PulMiCC data to inform a reader (Table 2 and Fig. 4). For textual tone, the comments were predominantly dismissive, with 35/64 rated at ≤2, rather than balanced or supportive. The patterns of the presentation of the data and textual tone are illustrated in Fig. 4 and the relationship between them is illustrated in Fig. 5.

The arithmetic mean of the 4 ratings is shown above the presentation of PulMiCC data set out in ascending sequence to aid interpretation and below a similar display of the rating of textual tone.

The ratings of PulMiCC ‘data presentation’ (horizontal axis) plotted against the ratings of ‘textual tone’. The overall trend was that publications presenting fuller data had a more supportive textual tone. Top left are authors who cited PulMiCC and commented favourably without providing data in support. R = 0.43, r2 = 0.185, linear regression y = 0.39 × (95% confidence interval: 0.18–0.60) + 1.85
Textual analysis
Opinions on the methods of the PulMiCC trial
Of the 6 publications commenting on the method and conduct of the trial, 5 were favourable [16, 18, 40, 53, 54]. It was noted to be ‘the world’s only randomized pulmonary metastasectomy study’ and that the ‘well-constructed study showed no advantage in the surgical arm’ [16]. Other favourable comments were ‘The results of the PulMiCC study are impressive’ [40] and ‘The PulMiCC study had the most interesting design and showed no advantage from lung metastasectomy’ [53].
There was only one response overtly critical of the nature of the study. In this rhetorical question, the authors invoked the parachute analogy. ‘Would you perform a randomized trial of whether to deploy a parachute when jumping out of an airplane at high altitudes?’ [56]
Power considerations
Twelve publications referred to the question of the ‘power’ of PulMiCC, but the statistical issues were not addressed in any detail [15, 20, 25, 35, 36, 39, 50, 58, 64, 65, 67, 70]. Most of the publications questioning the power failed to include any substantial data that could support their claim.
Reasons for discounting the conclusions of the PulMiCC trial
Nine texts described PulMiCC as a ‘failure’ [16, 77] or a ‘failed trial’ [24, 30, 51, 55, 58, 65, 67] often in a short comment after which its findings were, to varying degrees, discounted.
Fourteen publications said that the trial was ‘stopped’ [19–21, 25, 37, 38, 41, 43, 45, 50, 52, 62, 63, 70, 77] that it ‘closed early’ [22, 29, 32, 57, 59, 64, 69, 77] or ‘prematurely’ [26, 51, 53, 54, 76].
The numbers of patients in the RCT—65 in the first report [9] and 93 in the full report [10]—were often seen as sufficient reason to discount the findings, for example ‘small sample size precluded definitive conclusions’ [15], ‘due to poor accrual’ [20], ‘failed to accrue patients adequately’ [24], ‘insufficient number’ [31], ‘small sample size’ [34] and ‘poor recruitment’ [35].
However, 1 author interpreted the small size as evidence of resistance of the clinical teams: ‘Its small size again bears testament to entrenched surgical practice’ [16].
PulMiCC’s refutation of the assumed zero survival without metastasectomy
One firm conclusion of PulMiCC was that the STS ‘zero survival’ assumption [8] was refuted. This was acknowledged in 10 publications [5, 17, 18, 20, 23, 34, 44, 65, 67, 74] with 1 author, counter to the STS Consensus Statement [8] writing ‘The 5-year OS of large numbers of unselected patients with stage IV colorectal cancer has been >8% even before potential improvements from recent advances in systemic therapies’ [23].
The question of survival benefit
Three commentators explicitly considered that the PulMiCC RCT results were a signal that there might be no survival benefit [15, 39, 76] or ‘that any survival advantage from resection of colorectal lung metastases is, in all likelihood, very much smaller than has been assumed’ [48]. But it was clear that most authors discounted the possibility that there was no benefit and there seemed to be a note of incredulity in this statement ‘some authors openly doubt that PM might even provide a survival advantage’ [33]. Others suggested that the results show benefit: ‘the partial results of this suspended trial should be considered, in my opinion, as further support for the local treatment of pulmonary metastases’ [59]. Another paper had misinterpreted the conclusion of the PulMiCC trialists writing ‘We agree with the authors of the PulMiCC trial, who state that (although non-significant), a hazard ratio of 0.82 suggests that it is likely that in some patients, for whom isolated lung metastasis remains the only remnant of their otherwise fully-treated colorectal cancer, pulmonary metastasectomy is likely to convey benefit’ [65].
Is PulMiCC applicable to practice?
Amongst the comments was one specifically noting the applicability to clinical practice: ‘These findings are interesting, relevant, and important to keep in mind during both multidisciplinary tumour board discussions as well as in informed consent discussion with patients’ [18]. There were some personal plaudits for the PulMiCC trialists who were called ‘the true trail blazers in challenging established dogma surrounding the treatment of CRC metastases’ and ‘The PulMiCC trial was intrepid in reminding us to assess the true benefit of therapies we provide, particularly in conditions like metastatic CRC where cure is rarely guaranteed’ [15].
DISCUSSION
There was a ‘mixed reception’ to the PulMiCC trial shown in the analysis of the 64 citing publications revealing a spread of opinions from dismissing the PulMiCC trial’s worth to some very supportive comments (Fig. 4). But only a small minority reported the results adequately. Most oversimplified the key issues and 44/64 were rated <3 on the 0–6 scale. But those who commented on the PulMiCC conclusions implicitly agreed that the 5-year survival benefit from metastasectomy was likely to be much less than the widely believed 40% and none of the 64 publications restated the zero assumption of the STS Consensus Statement [8].
The phrase ‘a negative trial’ for a study that does not show a treatment effect is an oversimplification. High-quality RCT data where none existed before may answer part of the bigger question. Most observational studies report ∼40% 5-year survival [12] and by implication attribute all of that to metastasectomy. PulMiCC is the first study to report on potential metastasectomy candidates who remained unoperated and their survival was 20–30% [9, 10]. So the prior belief in a ∼40% improvement in the 5-year survival from metastasectomy is seriously challenged. A mathematical modelling study using cancer registry data, undertaken during the planning of the sample size for the PulMiCC study, had found that the 40% 5-year survival then widely reported [3] could be explained by case selection [78]. It may in fact be only 10% or less. In the Introduction, we stress the importance of the non-randomized cohort of 391 operated and non-operated patients [11] in giving context to the RCT data (Fig. 2). As seen in the visual abstract, the elective non-operated cohort—who were the less-favoured patients—shows a clear refutation of the ‘zero survival’ assumption.
No quantitative data were found in 31/64 citing publications and a further 27 (ratings of <2/6) give no more than the number of randomized patients. In publications where there were little or no data, there was often a summary dismissal of the PulMiCC RCT (Fig. 5). In fairness, at the time of writing, the authors of 40/64 of the publications had only seen the incomplete report of 65 randomized patients, published in 2019. The full report was published in 2020, but it seems that it was not seen by many of the 64 sets of authors; however, it makes no material difference because the increased number, from 65 to 93 randomized patients, narrowed the confidence intervals but did not change the conclusion. The impression is that the authors were very ready to disregard PulMiCC. Some controlled data are surely an improvement on none at all. The STS Expert Consensus had stated ‘Only a randomized clinical trial will definitively determine the value of PM for colorectal cancer’. The findings of the PulMiCC study in its totality, with patients treated by clinical decision and random assignment, make high ‘value’ unlikely.
All 64 of the publications citing the PulMiCC RCT were from authors involved in the local treatment of lung metastases and among them were 17 reports of clinical follow-up with a total of 4795 patients (Table 1). There may well be vested interests in maintaining the status quo, which raises the possibility of ‘motivated reasoning’. Human beings tend to place more reliance on information that confirms their beliefs and seek arguments against evidence that contradicts them [79]. But as can be seen from the textual analysis, there were laudatory statements, commending the trialists for their efforts. A group of hepato-biliary authors agreed that the practice of metastasectomy had expanded due to a lack of contrary evidence subtitling their letter ‘When to Draw the Line’ [15].
The ‘pillar of modern thoracic surgery’ authors alluded to the parachute analogy [7] as did one of the commenting papers in this rhetorical question: ‘Would you perform a randomized trial of whether to deploy a parachute when jumping out of an airplane at high altitude?’ [56] The analogy is to a circumstance when death is virtually certain within seconds, but patients who are offered lung metastasectomy are not at imminent risk of death. Lung metastasectomy is not remotely analogous to a parachute jump [80].
It is perhaps understandable why so many authors dismissed the findings of the PulMiCC RCT especially on seeing the first publication including only 65 patients, but the 17 clinical reports (Table 1) included 7 [30, 32, 37, 55, 69, 73, 75], which drew conclusions from groups of 33–66 patients (mean 43), which is fewer than PulMiCC, but the authors nevertheless commented on the RCT as small [37, 73], insufficient [30], poorly accruing [55, 69] and a failure [75]. Now that the full study including the observational cohorts is published, the results need to be considered carefully and the uncertainty about the extent of benefit from metastasectomy should be acknowledged and honestly discussed with those highly selected patients to whom it is offered.
Thoracic surgeons are aware of the incursions being made in the treatment of lung metastases from stereotactic radiotherapy (SABR/SBRT) and image-guided thermal ablation. Trials have been mooted suggesting direct comparisons between these methods to see if similar results can be shown with less-invasive methods. If such trials are proposed and designed, funders and ethicists might reasonably ask to check on the foundations of what is claimed to be ‘a pillar of thoracic surgery’. Amongst many things we have learned in the time of COVID-19 is that it is possible to get regulatory approval at speed and patients are willing to come forward to be randomized, but this can only happen if the medical profession can admit to not having all the answers.
Presented at the 35th Annual Meeting of the European Association for Cardio-Thoracic Surgery, Barcelona, Spain, 13–16 October 2021.
Acknowledgements
We are grateful to the 512 patients, the many research nurses, trials staff and the 25 Principal Investigators at the local sites who contributed to the PulMiCC study.
Funding
PulMiCC was funded by Cancer Research UK Grant No. C7678/A11393. The work for this review was not funded and was undertaken with the help of researchers who gave their time unpaid.
Conflict of interest: none declared.
Data Availability Statement
All data used in this study are available on appropriately justified request to the corresponding author.
Author contributions
Norman R. Williams: Conceptualization; Formal analysis; Investigation; Methodology; Project administration; Writing—review & editing. Hannah Patrick: Formal analysis; Methodology; Writing—review & editing. Francesca Fiorentino: Formal analysis; Writing—review & editing. Alexander Allen: Formal analysis; Investigation; Writing—review & editing. Manuj Sharma: Formal analysis; Writing—review & editing. Mišel Milošević: Investigation; Writing—review & editing. Fergus Macbeth: Conceptualization; Formal analysis; Investigation; Writing—review & editing. Tom Treasure: Conceptualization; Formal analysis; Investigation; Methodology; Project administration; Software; Validation; Visualization; Writing—original draft; Writing—review & editing.
Reviewer information
European Journal of Cardio-Thoracic Surgery thanks Larry R Kaiser and the other, anonymous reviewer(s) for their contribution to the peer review process of this article.
References
ABBREVIATIONS
- CEA
Carcinoembryonic antigen
- CRC
Colorectal cancer
- CI
Confidence interval
- PulMiCC
Pulmonary Metastasectomy in Colorectal Cancer
- RCT
Randomized controlled trial
- STS
Society of Thoracic Surgeons