-
Views
-
Cite
Cite
Chao Yan, Monika E Grabowska, Rut Thakkar, Alyson L Dickson, Peter J Embí, QiPing Feng, Joshua C Denny, Vern Eric Kerchberger, Bradley A Malin, Wei-Qi Wei, Beyond Phecodes: leveraging PheMAP to identify patients lacking diagnosis codes in electronic health records, Journal of the American Medical Informatics Association, Volume 32, Issue 6, June 2025, Pages 1007–1014, https://doi.org/10.1093/jamia/ocaf055
- Share Icon Share
Abstract
Objective
Diagnosis codes documented in electronic health records (EHR) are often relied upon to clinically phenotype patients for biomedical research. However, these diagnoses can be incomplete and inaccurate, leading to false negatives when searching for patients with phenotypes of interest. This study aims to determine whether PheMAP, a comprehensive knowledgebase integrating multiple clinical terminologies beyond diagnosis to capture phenotypes, can effectively identify patients lacking relevant EHR diagnosis codes.
Materials and Methods
We investigated a collection of 3.5 million patient records from Vanderbilt University Medical Center’s EHR and focused on 4 well-studied phenotypes: (1) type 2 diabetes mellitus (T2DM), (2) dementia, (3) prostate cancer, and (4) sensorineural hearing loss. We applied PheMAP to match structured concepts in patient records and calculated a phenotype risk score (PheScore) to indicate patient-phenotype similarity. Patients meeting predefined PheScore criteria but lacking diagnosis codes were identified. Clinically knowledgeable experts adjudicated randomly selected patients per phenotype as Positive, Possibly Positive, or Negative.
Results
Our approach indicated that 5.3% of patients lacked a diagnosis for T2DM, 4.5% for dementia, 2.2% for prostate cancer, and 0.2% for sensorineural hearing loss. The expert review indicated 100% precision (for Possibly Positive or Positive cases) for dementia and sensorineural hearing loss, and 90.0% and 85.0% precision for T2DM and prostate cancer, respectively. Excluding Possibly Positive cases, the precision for T2DM and prostate cancer was 88.9% and 81.3%, respectively.
Conclusions
Leveraging clinical terminologies incorporated by PheMAP can effectively identify patients with phenotypes who lack EHR diagnosis codes, thereby enhancing phenotyping quality and related research reliability.