-
PDF
- Split View
-
Views
-
Cite
Cite
Kenneth Gundle, Elizabeth R Hooker, Sara E Golden, Sarah Shull, Kristina Crothers, Anne C Melzer, Christopher G Slatore, Use of Veterans Health Administration Structured Data to Identify Patients Eligible for Lung Cancer Screening, Military Medicine, Volume 188, Issue 7-8, July/August 2023, Pages e2419–e2423, https://doi.org/10.1093/milmed/usad017
- Share Icon Share
ABSTRACT
Lung cancer screening (LCS) uptake is low. Assessing patients’ cigarette pack-years and years since quitting is challenging given the lack of documentation in structured electronic health record data.
We used a convenience sample of patients with a chest CT scan in the Veterans Health Administration. We abstracted data on cigarette use from electronic health record notes to determine LCS eligibility based on the 2021 U.S. Preventive Services Task Force age and cigarette use eligibility criteria. We used these data as the “ground truth” of LCS eligibility to compare them with structured data regarding tobacco use and a COPD diagnosis. We calculated sensitivity and specificity as well as fast-and-frugal decision trees.
For 50-80–year-old veterans identified as former or current tobacco users, we obtained 94% sensitivity and 47% specificity. For 50-80–year-old veterans identified as current tobacco users, we obtained 59% sensitivity and 79% specificity. Our fast-and-frugal decision tree that included a COPD diagnosis had a sensitivity of 69% and a specificity of 60%.
These results can help health care systems make their LCS outreach efforts more efficient and give administrators and researchers a simple method to estimate their number of possibly eligible patients.
INTRODUCTION
Low-dose CT for lung cancer screening (LCS) improves lung cancer mortality and is widely recommended, but uptake continues to be low.1–5 Among multiple barriers, assessing patients’ pack-years (average packs per day multiplied by years of using cigarettes) and years since quitting is challenging given the lack of documentation in structured electronic health record (EHR) data.6–9 Health care systems must therefore deploy resource-intensive processes to collect data to determine LCS eligibility for individual patients.10 In addition, clinicians and health care systems cannot set performance standards for offering LCS to their patients without a reliable denominator of eligible patients.11 Lastly, researchers and policy makers are hampered by the lack of structured EHR data regarding LCS eligibility when trying to study the effectiveness of LCS using low-dose CT compared to that of not performing this service.12
We sought to evaluate the utility of structured EHR data for assessing LCS eligibility. Many organizations have limited structured data on tobacco use, so we also included data on a common medical condition among eligible patients, COPD, which is associated with increased cigarette smoke exposure. We prioritized simple algorithms over multivariable prediction models to emphasize the real-world capacity of clinicians and health care systems to identify patients with high, low, and intermediate likelihood of being eligible for LCS.
METHODS
We used a cohort from the VA Corporate Data Warehouse (CDW) (VA Portland Health Care System Institutional Review Board #3225) as a convenience sample.13–15 First, we included patients from 2012 to 2017 from two systems with structured data codes for incidental pulmonary nodules identified on routine, non-screening, chest imaging. Second, we included randomly selected patients from the same institutions from 2012 to 2017 who obtained a routine, non-screening chest CT without structured data codes for pulmonary nodules.
Trained researchers abstracted data from clinical notes (i.e., unstructured data) in the EHR regarding pack-years and years since quitting cigarettes. Clinical encounter notes were searched in the EHR using the text term “smoke”. We defined the index date as the date of the patient’s chest CT scan that first identified a nodule or their first chest CT scan for non-nodule patients. We first reviewed notes 1 year before the index date to document the most recent pack-years and quit date. If this information was not found in the previous year, we (1) reviewed encounter notes 1 year after the index date, then (2) all encounter notes before the index date, and finally, (3) all notes after the index date, always starting with the note closest in time. We used age at index date.
We then retrieved structured data from the VA CDW to determine LCS eligibility compared to manually abstracted data. The VA CDW includes data on tobacco use, not cigarette use per se. We recorded tobacco use status (categorized as never, former, or current) of the temporally closest structured data record in the CDW before or after the index date.13 We defined COPD diagnosis by ICD codes in the CDW (ICD-9: 491.xx, 492.xx, and 493.2; ICD-10: J41.x, J42.x, J43.x, and J44.x), requiring two occurrences at least 1 month apart with at least one code recorded before the index scan. We did not assess for other diagnoses that could impact LCS eligibility.
We classified the “ground truth” of LCS eligibility from manually abstracted EHR data based on the 2021 U.S. Preventive Services Task Force (USPSTF) criteria: (1) age 50 to 80, (2) at least 20 pack-years of cigarette use, and (3) current cigarette use or former cigarette users who quit within 15 years.2 Participants with missing pack-years and years since quitting were categorized as ineligible. We calculated sensitivity, specificity, balanced accuracy, and positive and negative predicted values. We constructed fast-and-frugal decision trees (FFTs) in R version 4.1.0 and the FFTree package.16 Fast-and-frugal decision trees are dichotomous classification trees that are designed to efficiently facilitate patient categorization using limited resources.17,18 For the FFTs, we first excluded veterans outside the 50-80–year-old age criteria for an analytic sample of 4,365 subjects. Of note, a small number of veterans were not included in the COPD FFT since they had missing COPD diagnosis data because their CDW identifying information did not perfectly match our cohort identifiers.
RESULTS
We included 5,289 patients with both text notes from the EHR and at least one structured data tobacco status, of whom 2,177 met the 2021 USPSTF LCS eligibility criteria. Among those eligible, 1,018 (46.8%), and 1,159 (53.2%) patients were former and current cigarette users, respectively, with average pack-years of 51.8 ± 31.0. Among LCS-eligible patients with non-missing COPD diagnosis (n = 2,077), 995 (47.9%) subjects had a COPD diagnosis.
Including both current and former tobacco users had high sensitivity but low specificity (Table I). After applying the 50-80–year-old age criteria and including veterans identified as former or current tobacco users in the EHR, we obtained 94% sensitivity, but specificity was only 47% (model 1). Conversely, after applying the 50-80–year-old age criteria and including veterans identified as current tobacco users, we obtained 59% sensitivity and 79% specificity (model 2).
Model . | Sensitivity (%) . | Specificity (%) . | Balanced accuracy (%) . | PPV (%) . | NPV (%) . | Potentially eligible group (%)a . |
---|---|---|---|---|---|---|
Model 1: 50-80 years old and current or former tobacco use | 94 | 47 | 71 | 49 | 94 | 68 |
Model 2: 50-80 years old and current tobacco use | 59 | 79 | 69 | 60 | 78 | 35 |
FFT 1: 50-67 years old and current tobacco use | 48 | 76 | 62 | 60 | 66 | 28 |
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis | 69 | 60 | 65 | 57 | 72 | 41 |
Model . | Sensitivity (%) . | Specificity (%) . | Balanced accuracy (%) . | PPV (%) . | NPV (%) . | Potentially eligible group (%)a . |
---|---|---|---|---|---|---|
Model 1: 50-80 years old and current or former tobacco use | 94 | 47 | 71 | 49 | 94 | 68 |
Model 2: 50-80 years old and current tobacco use | 59 | 79 | 69 | 60 | 78 | 35 |
FFT 1: 50-67 years old and current tobacco use | 48 | 76 | 62 | 60 | 66 | 28 |
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis | 69 | 60 | 65 | 57 | 72 | 41 |
The size of group compared to the original cohort of 5,289 subjects after applying all inclusion/exclusion criteria.
Abbreviations: FFT: fast-and-frugal decision tree; NPV: negative predictive value; PPV: positive predictive value.
Model . | Sensitivity (%) . | Specificity (%) . | Balanced accuracy (%) . | PPV (%) . | NPV (%) . | Potentially eligible group (%)a . |
---|---|---|---|---|---|---|
Model 1: 50-80 years old and current or former tobacco use | 94 | 47 | 71 | 49 | 94 | 68 |
Model 2: 50-80 years old and current tobacco use | 59 | 79 | 69 | 60 | 78 | 35 |
FFT 1: 50-67 years old and current tobacco use | 48 | 76 | 62 | 60 | 66 | 28 |
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis | 69 | 60 | 65 | 57 | 72 | 41 |
Model . | Sensitivity (%) . | Specificity (%) . | Balanced accuracy (%) . | PPV (%) . | NPV (%) . | Potentially eligible group (%)a . |
---|---|---|---|---|---|---|
Model 1: 50-80 years old and current or former tobacco use | 94 | 47 | 71 | 49 | 94 | 68 |
Model 2: 50-80 years old and current tobacco use | 59 | 79 | 69 | 60 | 78 | 35 |
FFT 1: 50-67 years old and current tobacco use | 48 | 76 | 62 | 60 | 66 | 28 |
FFT 2: 50-80 years old and current tobacco use (or) 50-67 years-old and COPD diagnosis | 69 | 60 | 65 | 57 | 72 | 41 |
The size of group compared to the original cohort of 5,289 subjects after applying all inclusion/exclusion criteria.
Abbreviations: FFT: fast-and-frugal decision tree; NPV: negative predictive value; PPV: positive predictive value.
The first FFT analysis determined a pathway that eliminated patients not currently using cigarettes (Fig. 1A). Of the remaining patients, it selected patients aged 67 and younger as likely eligible for LCS. The sensitivity was 48%, the specificity was 76%, and the cohort of potentially eligible patients was reduced to 28% of its original size. A second FFT (Fig. 1B) included a total of 4,145 patients, as an additional 220 were excluded for a missing COPD diagnosis status in the structured data. This FFT first included all patients with structured data that indicated they currently smoked. Among those remaining, it then excluded subjects without a diagnosis of COPD and over 67 years old. This FFT with COPD diagnosis had a sensitivity of 69% and a specificity of 60%.

Fast-and-frugal decision trees, indicating FNs, FPs, TNs, and TPs at each node that met the 2021 USPSTF LCS eligibility criteria: (A) FFT 1 and (B) FFT 2. Abbreviations: FFT = fast-and-frugal decision tree; FN = false negative; FP = false positive; LCS = lung cancer screening; TN = true negative; TP = true positive; USPSTF = U.S. Preventive Services Task Force.
DISCUSSION
We found that more than 40% of patients in our cohort met the 2021 USPSTF LCS eligibility criteria regarding age and cigarette use, and structured EHR data could be used to identify these patients with reasonable accuracy. However, our results suggest that the VA must collect specific data on pack-years and years since quitting if they want to systematically and equitably offer LCS to all potentially eligible veterans. Our highly sensitive model 1 could be used to exclude many patients who are unlikely to be eligible for LCS. When resources or technical limitations make it challenging to collect cigarette use history information, health care systems could use this model to exclude many patients and rarely miss one who would be potentially eligible for LCS.
Limitations
Documentation of smoking status, pack-years, and time since quit are subject to reporting errors in the EHR and are based on self-report that is vulnerable to recall and moderator biases.21,22 Lung cancer screening coordinators or researchers may obtain different information compared to that obtained during routine care.8,21 Veterans Health Administration administrative data classify veterans based on tobacco use status rather than cigarette use per se, although we previously found these data had 81% agreement with EHR information regarding cigarette use.13 Lastly, we evaluated two primary characteristics involved in LCS eligibility, age and cigarette use, but we did not assess for exclusionary factors such as significant comorbidities or desire to undergo screening. Notably, since severe COPD may limit a person’s life expectancy, our COPD FFT may include a substantial number of patients who are ineligible for LCS.19
CONCLUSION
We found two simple models with reasonable, balanced accuracy: model 2 and FFT 2. Using either of these models, health care systems could substantially reduce the number of patients targeted for further outreach or assessment but at a cost of missing many patients who are potentially LCS eligible. Notably, since COPD increases the risk of lung cancer, our second FFT model could be used to target a high-risk subgroup of potentially eligible patients who may particularly benefit from screening.19 Importantly, this study expands on prior work demonstrating that age and tobacco use are reasonably good predictors of LCS eligibility but now addresses this question by using the 2021 USPSTF criteria for eligibility and adding COPD into the model.20
ACKNOWLEDGMENTS
The authors would like to acknowledge the help and support of Matthew Howard, BS; Tara Thomas, BS; Sujata Thakurta, MPA:HA; and Philip Tostado, MA, for their countless hours of manual chart review.
FUNDING
This study and Dr Slatore are supported by an award from the DVA (Clinical Services Research & Development Epidemiology-007-15S and Health Services Research & Development Investigator Initiated Research 16-003). It was also supported by resources from the Center to Involve Veteran Involvement in Care, VA Portland Health Care System, Portland, OR, and the Seattle Epidemiologic Research & Information Center, VA Puget Sound Health Care System, Seattle, WA. Dr Crothers is supported by VA ICU000170A.
CONFLICT OF INTEREST STATEMENT
None declared.
CLINICAL TRIAL REGISTRATION
Not applicable.
INSTITUTIONAL REVIEW BOARD (HUMAN SUBJECTS)
This study was approved by the VA Portland Health Care System Institutional Review Board #3225
INSTITUTIONAL ANIMAL CARE AND USE COMMITTEE (IACUC)
Not applicable.
INDIVIDUAL AUTHOR CONTRIBUTION STATEMENT
All authors have made substantial contributions to the (1) conception and design, acquisition of data, or analysis and interpretation of data; (2) have contributed to drafting the article for important intellectual content; and (3) have provided final approval of the version to be published. C.G.S. takes responsibility for the content of the manuscript, including data and analysis.
DATA AVAILABILITY
The data underlying this article cannot be shared publicly due to ethical and privacy concerns.
INSTITUTIONAL CLEARANCE
Not applicable.
REFERENCES
Author notes
The DVA did not have a role in the conduct of the study; in the collection, management, analysis, and interpretation of data; or in the preparation of the manuscript. The views expressed in this article are those of the authors and do not necessarily represent the views of the DVA or the U.S. Government.