ABSTRACT

Introduction

Lung cancer screening (LCS) uptake is low. Assessing patients’ cigarette pack-years and years since quitting is challenging given the lack of documentation in structured electronic health record data.

Materials and Methods

We used a convenience sample of patients with a chest CT scan in the Veterans Health Administration. We abstracted data on cigarette use from electronic health record notes to determine LCS eligibility based on the 2021 U.S. Preventive Services Task Force age and cigarette use eligibility criteria. We used these data as the “ground truth” of LCS eligibility to compare them with structured data regarding tobacco use and a COPD diagnosis. We calculated sensitivity and specificity as well as fast-and-frugal decision trees.

Results

For 50-80–year-old veterans identified as former or current tobacco users, we obtained 94% sensitivity and 47% specificity. For 50-80–year-old veterans identified as current tobacco users, we obtained 59% sensitivity and 79% specificity. Our fast-and-frugal decision tree that included a COPD diagnosis had a sensitivity of 69% and a specificity of 60%.

Conclusion

These results can help health care systems make their LCS outreach efforts more efficient and give administrators and researchers a simple method to estimate their number of possibly eligible patients.

INTRODUCTION

Low-dose CT for lung cancer screening (LCS) improves lung cancer mortality and is widely recommended, but uptake continues to be low.1–5 Among multiple barriers, assessing patients’ pack-years (average packs per day multiplied by years of using cigarettes) and years since quitting is challenging given the lack of documentation in structured electronic health record (EHR) data.6–9 Health care systems must therefore deploy resource-intensive processes to collect data to determine LCS eligibility for individual patients.10 In addition, clinicians and health care systems cannot set performance standards for offering LCS to their patients without a reliable denominator of eligible patients.11 Lastly, researchers and policy makers are hampered by the lack of structured EHR data regarding LCS eligibility when trying to study the effectiveness of LCS using low-dose CT compared to that of not performing this service.12

We sought to evaluate the utility of structured EHR data for assessing LCS eligibility. Many organizations have limited structured data on tobacco use, so we also included data on a common medical condition among eligible patients, COPD, which is associated with increased cigarette smoke exposure. We prioritized simple algorithms over multivariable prediction models to emphasize the real-world capacity of clinicians and health care systems to identify patients with high, low, and intermediate likelihood of being eligible for LCS.

METHODS

We used a cohort from the VA Corporate Data Warehouse (CDW) (VA Portland Health Care System Institutional Review Board #3225) as a convenience sample.13–15 First, we included patients from 2012 to 2017 from two systems with structured data codes for incidental pulmonary nodules identified on routine, non-screening, chest imaging. Second, we included randomly selected patients from the same institutions from 2012 to 2017 who obtained a routine, non-screening chest CT without structured data codes for pulmonary nodules.

Trained researchers abstracted data from clinical notes (i.e., unstructured data) in the EHR regarding pack-years and years since quitting cigarettes. Clinical encounter notes were searched in the EHR using the text term “smoke”. We defined the index date as the date of the patient’s chest CT scan that first identified a nodule or their first chest CT scan for non-nodule patients. We first reviewed notes 1 year before the index date to document the most recent pack-years and quit date. If this information was not found in the previous year, we (1) reviewed encounter notes 1 year after the index date, then (2) all encounter notes before the index date, and finally, (3) all notes after the index date, always starting with the note closest in time. We used age at index date.

We then retrieved structured data from the VA CDW to determine LCS eligibility compared to manually abstracted data. The VA CDW includes data on tobacco use, not cigarette use per se. We recorded tobacco use status (categorized as never, former, or current) of the temporally closest structured data record in the CDW before or after the index date.13 We defined COPD diagnosis by ICD codes in the CDW (ICD-9: 491.xx, 492.xx, and 493.2; ICD-10: J41.x, J42.x, J43.x, and J44.x), requiring two occurrences at least 1 month apart with at least one code recorded before the index scan. We did not assess for other diagnoses that could impact LCS eligibility.

We classified the “ground truth” of LCS eligibility from manually abstracted EHR data based on the 2021 U.S. Preventive Services Task Force (USPSTF) criteria: (1) age 50 to 80, (2) at least 20 pack-years of cigarette use, and (3) current cigarette use or former cigarette users who quit within 15 years.2 Participants with missing pack-years and years since quitting were categorized as ineligible. We calculated sensitivity, specificity, balanced accuracy, and positive and negative predicted values. We constructed fast-and-frugal decision trees (FFTs) in R version 4.1.0 and the FFTree package.16 Fast-and-frugal decision trees are dichotomous classification trees that are designed to efficiently facilitate patient categorization using limited resources.17,18 For the FFTs, we first excluded veterans outside the 50-80–year-old age criteria for an analytic sample of 4,365 subjects. Of note, a small number of veterans were not included in the COPD FFT since they had missing COPD diagnosis data because their CDW identifying information did not perfectly match our cohort identifiers.

RESULTS

We included 5,289 patients with both text notes from the EHR and at least one structured data tobacco status, of whom 2,177 met the 2021 USPSTF LCS eligibility criteria. Among those eligible, 1,018 (46.8%), and 1,159 (53.2%) patients were former and current cigarette users, respectively, with average pack-years of 51.8 ± 31.0. Among LCS-eligible patients with non-missing COPD diagnosis (n = 2,077), 995 (47.9%) subjects had a COPD diagnosis.

Including both current and former tobacco users had high sensitivity but low specificity (Table I). After applying the 50-80–year-old age criteria and including veterans identified as former or current tobacco users in the EHR, we obtained 94% sensitivity, but specificity was only 47% (model 1). Conversely, after applying the 50-80–year-old age criteria and including veterans identified as current tobacco users, we obtained 59% sensitivity and 79% specificity (model 2).

TABLE I.

Model Results

ModelSensitivity (%)Specificity (%)Balanced accuracy (%)PPV (%)NPV (%)Potentially eligible group (%)a
Model 1: 50-80 years old and current or former tobacco use944771499468
Model 2: 50-80 years old and current tobacco use597969607835
FFT 1: 50-67 years old and current tobacco use487662606628
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis696065577241
ModelSensitivity (%)Specificity (%)Balanced accuracy (%)PPV (%)NPV (%)Potentially eligible group (%)a
Model 1: 50-80 years old and current or former tobacco use944771499468
Model 2: 50-80 years old and current tobacco use597969607835
FFT 1: 50-67 years old and current tobacco use487662606628
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis696065577241
a

The size of group compared to the original cohort of 5,289 subjects after applying all inclusion/exclusion criteria.

Abbreviations: FFT: fast-and-frugal decision tree; NPV: negative predictive value; PPV: positive predictive value.

TABLE I.

Model Results

ModelSensitivity (%)Specificity (%)Balanced accuracy (%)PPV (%)NPV (%)Potentially eligible group (%)a
Model 1: 50-80 years old and current or former tobacco use944771499468
Model 2: 50-80 years old and current tobacco use597969607835
FFT 1: 50-67 years old and current tobacco use487662606628
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis696065577241
ModelSensitivity (%)Specificity (%)Balanced accuracy (%)PPV (%)NPV (%)Potentially eligible group (%)a
Model 1: 50-80 years old and current or former tobacco use944771499468
Model 2: 50-80 years old and current tobacco use597969607835
FFT 1: 50-67 years old and current tobacco use487662606628
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis696065577241
a

The size of group compared to the original cohort of 5,289 subjects after applying all inclusion/exclusion criteria.

Abbreviations: FFT: fast-and-frugal decision tree; NPV: negative predictive value; PPV: positive predictive value.

The first FFT analysis determined a pathway that eliminated patients not currently using cigarettes (Fig. 1A). Of the remaining patients, it selected patients aged 67 and younger as likely eligible for LCS. The sensitivity was 48%, the specificity was 76%, and the cohort of potentially eligible patients was reduced to 28% of its original size. A second FFT (Fig. 1B) included a total of 4,145 patients, as an additional 220 were excluded for a missing COPD diagnosis status in the structured data. This FFT first included all patients with structured data that indicated they currently smoked. Among those remaining, it then excluded subjects without a diagnosis of COPD and over 67 years old. This FFT with COPD diagnosis had a sensitivity of 69% and a specificity of 60%.

Fast-and-frugal decision trees, indicating FNs, FPs, TNs, and TPs at each node that met the 2021 USPSTF LCS eligibility criteria: (A) FFT 1 and (B) FFT 2. Abbreviations: FFT = fast-and-frugal decision tree; FN = false negative; FP = false positive; LCS = lung cancer screening; TN = true negative; TP = true positive; USPSTF = U.S. Preventive Services Task Force.
FIGURE 1.

Fast-and-frugal decision trees, indicating FNs, FPs, TNs, and TPs at each node that met the 2021 USPSTF LCS eligibility criteria: (A) FFT 1 and (B) FFT 2. Abbreviations: FFT = fast-and-frugal decision tree; FN = false negative; FP = false positive; LCS = lung cancer screening; TN = true negative; TP = true positive; USPSTF = U.S. Preventive Services Task Force.

DISCUSSION

We found that more than 40% of patients in our cohort met the 2021 USPSTF LCS eligibility criteria regarding age and cigarette use, and structured EHR data could be used to identify these patients with reasonable accuracy. However, our results suggest that the VA must collect specific data on pack-years and years since quitting if they want to systematically and equitably offer LCS to all potentially eligible veterans. Our highly sensitive model 1 could be used to exclude many patients who are unlikely to be eligible for LCS. When resources or technical limitations make it challenging to collect cigarette use history information, health care systems could use this model to exclude many patients and rarely miss one who would be potentially eligible for LCS.

Limitations

Documentation of smoking status, pack-years, and time since quit are subject to reporting errors in the EHR and are based on self-report that is vulnerable to recall and moderator biases.21,22 Lung cancer screening coordinators or researchers may obtain different information compared to that obtained during routine care.8,21 Veterans Health Administration administrative data classify veterans based on tobacco use status rather than cigarette use per se, although we previously found these data had 81% agreement with EHR information regarding cigarette use.13 Lastly, we evaluated two primary characteristics involved in LCS eligibility, age and cigarette use, but we did not assess for exclusionary factors such as significant comorbidities or desire to undergo screening. Notably, since severe COPD may limit a person’s life expectancy, our COPD FFT may include a substantial number of patients who are ineligible for LCS.19

CONCLUSION

We found two simple models with reasonable, balanced accuracy: model 2 and FFT 2. Using either of these models, health care systems could substantially reduce the number of patients targeted for further outreach or assessment but at a cost of missing many patients who are potentially LCS eligible. Notably, since COPD increases the risk of lung cancer, our second FFT model could be used to target a high-risk subgroup of potentially eligible patients who may particularly benefit from screening.19 Importantly, this study expands on prior work demonstrating that age and tobacco use are reasonably good predictors of LCS eligibility but now addresses this question by using the 2021 USPSTF criteria for eligibility and adding COPD into the model.20

ACKNOWLEDGMENTS

The authors would like to acknowledge the help and support of Matthew Howard, BS; Tara Thomas, BS; Sujata Thakurta, MPA:HA; and Philip Tostado, MA, for their countless hours of manual chart review.

FUNDING

This study and Dr Slatore are supported by an award from the DVA (Clinical Services Research & Development Epidemiology-007-15S and Health Services Research & Development Investigator Initiated Research 16-003). It was also supported by resources from the Center to Involve Veteran Involvement in Care, VA Portland Health Care System, Portland, OR, and the Seattle Epidemiologic Research & Information Center, VA Puget Sound Health Care System, Seattle, WA. Dr Crothers is supported by VA ICU000170A.

CONFLICT OF INTEREST STATEMENT

None declared.

CLINICAL TRIAL REGISTRATION

Not applicable.

INSTITUTIONAL REVIEW BOARD (HUMAN SUBJECTS)

This study was approved by the VA Portland Health Care System Institutional Review Board #3225

INSTITUTIONAL ANIMAL CARE AND USE COMMITTEE (IACUC)

Not applicable.

INDIVIDUAL AUTHOR CONTRIBUTION STATEMENT

All authors have made substantial contributions to the (1) conception and design, acquisition of data, or analysis and interpretation of data; (2) have contributed to drafting the article for important intellectual content; and (3) have provided final approval of the version to be published. C.G.S. takes responsibility for the content of the manuscript, including data and analysis.

DATA AVAILABILITY

The data underlying this article cannot be shared publicly due to ethical and privacy concerns.

INSTITUTIONAL CLEARANCE

Not applicable.

REFERENCES

1.

Jonas
DE
,
Reuland
DS
,
Reddy
SM
, et al. :
Screening for lung cancer with low-dose computed tomography: an evidence review for the U.S. Preventive Services Task Force
.
Agency for Healthcare Research and Quality (US)
.
2021
. Available at http://www.ncbi.nlm.nih.gov/books/NBK568573/; accessed
April 14, 2022
.

2.

US Preventive Services Task Force
:
Screening for lung cancer: US Preventive Services Task Force recommendation statement
.
JAMA
2021
;
325
(
10
):
962
70
.doi: .

3.

National Center for Health Promotion and Disease Prevention
:
Screening for Lung Cancer
.
Prevention.VA.gov
. Available at https://www.prevention.va.gov/preventing_diseases/screening_for_lung_cancer.asp,
Updated March 18, 2022
; accessed
April 14 2022
.

4.

Boudreau
JH
,
Miller
DR
,
Qian
S
,
Nunez
ER
,
Caverly
TJ
,
Wiener
RS
:
Access to lung cancer screening in the Veterans Health Administration: does geographic distribution match need in the population?
Chest
2021
;
160
(
1
):
358
67
.doi: .

5.

Fedewa
SA
,
Bandi
P
,
Smith
RA
,
Silvestri
GA
,
Jemal
A
:
Lung cancer screening rates during the COVID-19 pandemic
.
Chest
2022
;
161
(
2
):
586
9
.doi: .

6.

Triplette
M
,
Thayer
JH
,
Pipavath
SN
,
Crothers
K
:
Poor uptake of lung cancer screening: opportunities for improvement
.
J Am Coll Radiol
2019
;
16
(
4
):
446
50
.doi: .

7.

Carter-Harris
L
,
Gould
MK
:
Multilevel barriers to the successful implementation of lung cancer screening: why does it have to be so hard?
Ann Am Thorac Soc
2017
;
14
(
8
):
1261
5
.doi: .

8.

Modin
HE
,
Fathi
JT
,
Gilbert
CR
, et al. :
Pack-year cigarette smoking history for determination of lung cancer screening eligibility. Comparison of the electronic medical record versus a shared decision-making conversation
.
Ann Am Thorac Soc
2017
;
14
(
8
):
1320
5
.doi: .

9.

Fathi
JT
,
White
CS
,
Greenberg
GM
,
Mazzone
PJ
,
Smith
RA
,
Thomson
CC
:
The integral role of the electronic health record and tracking software in the implementation of lung cancer screening – a call to action to developers: a white paper from the National Lung Cancer Roundtable
.
Chest
2020
;
157
(
6
):
1674
9
.doi: .

10.

Kinsinger
LS
,
Anderson
C
,
Kim
J
, et al. :
Implementation of lung cancer screening in the Veterans Health Administration
.
JAMA Intern Med
2017
;
177
(
3
):
399
406
.doi: .

11.

Kahn
JM
,
Gould
MK
,
Krishnan
JA
, et al. :
An official American Thoracic Society workshop report: developing performance measures from clinical practice guidelines
.
Ann Am Thorac Soc
2014
;
11
(
4
):
S186
95
.doi: .

12.

Carson
SS
,
Goss
CH
,
Patel
SR
, et al. :
An official American Thoracic Society research statement: comparative effectiveness research in pulmonary, critical care, and sleep medicine
.
Am J Respir Crit Care Med
2013
;
188
(
10
):
1253
61
.doi: .

13.

Golden
SE
,
Hooker
ER
,
Shull
S
, et al. :
Validity of Veterans Health Administration structured data to determine accurate smoking status
.
Health Informatics J
2020
;
26
(
3
):
1507
15
.doi: .

14.

Hedstrom
GH
,
Hooker
ER
,
Howard
M
, et al. :
The chain of adherence for incidentally-detected pulmonary nodules after an initial radiologic imaging study: a multi-system observational study
.
Ann Am Thorac Soc
.
2022
;
19
(
8
):
1379
89
.doi: .

15.

VA Information Resource Center
:
VIReC Research User Guides: VA Corporate Data Warehouse
.
VIReC.Research.VA.gov
. Available at https://www.virec.research.va.gov/Resources/RUGs.asp,
Updated June 27, 2016
; accessed
March 8, 2022
.

16.

Phillips
N
,
Neth
H
,
Woike
J
,
Gaissmaier
W
:
FFTrees: A toolbox to create, visualize, and evaluate fast-and-frugal decision trees
.
Judgm Decis Mak
2017
;
12
(
4
):
344
68
.doi: .

17.

Green
L
,
Mehr
DR
:
What alters physicians’ decisions to admit to the coronary care unit?
J Fam Pract
1997
;
45
(
3
):
219
26
.

18.

Zusman
NL
,
Radoslovich
SS
,
Smith
SJ
,
Tanski
M
,
Gundle
KR
,
Yoo
JU
:
Physical examination is predictive of cauda equina syndrome: MRI to rule out diagnosis is unnecessary
.
Glob Spine J
2022
;
12
(
2
):
209
14
.doi: .

19.

Rivera
MP
,
Tanner
NT
,
Silvestri
GA
, et al. :
Incorporating coexisting chronic illness into decisions about patient selection for lung cancer screening. An official American Thoracic Society research statement
.
Am J Respir Crit Care Med
2018
;
198
(
2
):
e3
13
.doi: .

20.

Triplette
M
,
Donovan
LM
,
Crothers
K
,
Madtes
DK
,
Au
DH
:
Prediction of lung cancer screening eligibility using simplified criteria
.
Ann Am Thorac Soc
2019
;
16
(
10
):
1280
5
.doi: .

21.

Patel
N
,
Miller
DP
,
Snavely
AC
, et al. :
A comparison of smoking history in the electronic health record with self-report
.
Am J Prev Med
2020
;
58
(
4
):
591
5
.doi: .

22.

Kukhareva
PV
,
Caverly
TJ
,
Li
H
, et al. :
Inaccuracies in electronic health records smoking data and a potential approach to address resulting underestimation in determining lung cancer screening eligibility
.
J Am Med Inform Assoc
2022
;
29
(
5
):
779
88
.doi: .

Author notes

The DVA did not have a role in the conduct of the study; in the collection, management, analysis, and interpretation of data; or in the preparation of the manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of the DVA or the U.S. Government.

This work is written by (a) US Government employee(s) and is in the public domain in the US.