-
PDF
- Split View
-
Views
-
Cite
Cite
Liesbet Van Bulck, Meghan Reading Turchioe, Maxim Topaz, Jiyoun Song, Exploring the full potential of the electronic health record: the application of natural language processing for clinical practice, European Journal of Cardiovascular Nursing, Volume 24, Issue 2, March 2025, Pages 332–337, https://doi.org/10.1093/eurjcn/zvae091
- Share Icon Share
Abstract
The electronic health record (EHR) contains valuable patient data and offers opportunities to administer and analyse patients’ individual needs longitudinally. However, most information in the EHR is currently stored in unstructured text notations. Natural language processing (NLP), a branch of artificial intelligence that enables computers to understand, interpret, and generate human language, can be used to delve into unstructured text data to uncover valuable insights and knowledge. This article discusses different types of NLP, the potential of NLP for cardiovascular nursing, and how to get started with NLP as a clinician.
Gain familiarity with the fundamentals of natural language processing (NLP) in cardiovascular nursing, including an understanding of key NLP terminology and recognition of NLP’s potential applications in cardiovascular nursing.
Comprehend the practical implementation of NLP in specific cardiovascular nursing research projects.
Familiarize oneself with user-friendly NLP tools and learn how to initiate NLP practices as a clinician.
The problem
The need for more personalized care in healthcare is undeniable, as the ‘one size fits all’ approach often falls short in meeting the distinctive and unique needs of patients.1 To truly deliver effective care, healthcare providers must have a comprehensive overview of the individual needs, including both symptoms and outcomes, of each patient.
However, achieving this level of personalization can be challenging, given the time constraints on patients and healthcare workers. Traditional methods to assess individual patient needs, such as surveys and patient-reported outcome measures, while valuable, can be excessively time-consuming for patients to complete and healthcare workers to analyse.2 Indeed, it is demanding to assess and analyse individual needs for every patient longitudinally, given the high patient volumes and limited resources of the healthcare system. As the healthcare industry continues to evolve, finding innovative ways to strike a balance between personalized care and efficiency becomes increasingly important to enhance patient outcomes and overall healthcare quality.2
Electronic health records (EHRs) may offer opportunities in this regard, as the clinical record contains valuable patient data about different kinds of patient needs. Indeed, the EHR contains information about the underlying condition, symptoms, and deterioration of health.3 However, ∼70–80% of patients’ clinical information is stored as unstructured text notations.4 A manual analysis of these text fields would be too time-consuming, but text mining and natural language processing (NLP), definitions included in the glossary in Table 1, can be solutions in this regard. Recent studies have shown the potential of NLP to identify symptoms in unstructured encounter notes in the EHR.5,6 Hence, using NLP, information from unstructured notes can be used to contribute to a more comprehensive overview of a patient’s needs.
Term . | Definition . |
---|---|
F1 score | A measure of a model’s accuracy that considers both precision and recall. The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both |
Natural language processing (NLP) | NLP is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It includes a variety of techniques that help to extract meaning from narrative data |
Neural network | A computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Neural networks are capable of learning complex patterns from data |
Supervised machine learning model | A type of machine learning where the model is trained on a labelled data set. The algorithm learns to map input data to the correct output based on examples provided during training |
Text mining | Text mining is a scientific process that delves into unstructured text data to uncover valuable insights and knowledge. In this procedure, large amounts of text data are examined and interpreted using NLP techniques. |
Unsupervised machine learning model | A type of machine learning where the model is trained on data without labelled responses. The algorithm tries to learn the patterns and structure from the input data itself |
Word embedding model | A word embedding model is a type of NLP technique used to represent words as numerical vectors in a multidimensional space. These vectors capture the semantic meaning of words based on their context in a corpus of text |
Term . | Definition . |
---|---|
F1 score | A measure of a model’s accuracy that considers both precision and recall. The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both |
Natural language processing (NLP) | NLP is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It includes a variety of techniques that help to extract meaning from narrative data |
Neural network | A computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Neural networks are capable of learning complex patterns from data |
Supervised machine learning model | A type of machine learning where the model is trained on a labelled data set. The algorithm learns to map input data to the correct output based on examples provided during training |
Text mining | Text mining is a scientific process that delves into unstructured text data to uncover valuable insights and knowledge. In this procedure, large amounts of text data are examined and interpreted using NLP techniques. |
Unsupervised machine learning model | A type of machine learning where the model is trained on data without labelled responses. The algorithm tries to learn the patterns and structure from the input data itself |
Word embedding model | A word embedding model is a type of NLP technique used to represent words as numerical vectors in a multidimensional space. These vectors capture the semantic meaning of words based on their context in a corpus of text |
Term . | Definition . |
---|---|
F1 score | A measure of a model’s accuracy that considers both precision and recall. The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both |
Natural language processing (NLP) | NLP is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It includes a variety of techniques that help to extract meaning from narrative data |
Neural network | A computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Neural networks are capable of learning complex patterns from data |
Supervised machine learning model | A type of machine learning where the model is trained on a labelled data set. The algorithm learns to map input data to the correct output based on examples provided during training |
Text mining | Text mining is a scientific process that delves into unstructured text data to uncover valuable insights and knowledge. In this procedure, large amounts of text data are examined and interpreted using NLP techniques. |
Unsupervised machine learning model | A type of machine learning where the model is trained on data without labelled responses. The algorithm tries to learn the patterns and structure from the input data itself |
Word embedding model | A word embedding model is a type of NLP technique used to represent words as numerical vectors in a multidimensional space. These vectors capture the semantic meaning of words based on their context in a corpus of text |
Term . | Definition . |
---|---|
F1 score | A measure of a model’s accuracy that considers both precision and recall. The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both |
Natural language processing (NLP) | NLP is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. It includes a variety of techniques that help to extract meaning from narrative data |
Neural network | A computational model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Neural networks are capable of learning complex patterns from data |
Supervised machine learning model | A type of machine learning where the model is trained on a labelled data set. The algorithm learns to map input data to the correct output based on examples provided during training |
Text mining | Text mining is a scientific process that delves into unstructured text data to uncover valuable insights and knowledge. In this procedure, large amounts of text data are examined and interpreted using NLP techniques. |
Unsupervised machine learning model | A type of machine learning where the model is trained on data without labelled responses. The algorithm tries to learn the patterns and structure from the input data itself |
Word embedding model | A word embedding model is a type of NLP technique used to represent words as numerical vectors in a multidimensional space. These vectors capture the semantic meaning of words based on their context in a corpus of text |
A solution: natural language processing
Text mining, also known as text analytics, is a scientific process that delves into unstructured text data to uncover valuable insights and knowledge. In this procedure, large amounts of text data are examined and interpreted using NLP techniques. Natural language processing is a branch of artificial intelligence (AI) that focuses on enabling computers to understand, interpret, and generate human language.7 It includes a variety of techniques that help to extract meaning from narrative data, under rule-based NLP and machine learning NLP.
Different types of natural language processing
Rule-based natural language processing
Rule-based NLP is a well-established part of NLP that uses predefined rules and patterns to extract meaningful information and comprehend textual data.8 To discern and elucidate the structures and semantics within the textual content, it is necessary to construct linguistic rules, regular expressions, and grammar-based patterns. For example, if we want to detect a symptom such as shortness of breath in a clinical note, several rules and patterns must be created. Some rules will indicate the presence of the symptom and will contain words such as ‘dyspnoea’ or ‘difficulty breathing’. Other rules will ensure that ‘no shortness of breath’ is not interpreted as if the symptom is occurring by adding predefined negation terms such as ‘no’ or ‘without’ related to the signs. Typically, such rules are formulated by linguists, domain specialists with medical or nursing backgrounds, or NLP practitioners, with each rule tailored to the specific constraints of the analysis job.
Rule-based NLP is particularly useful in domain-specific contexts, where it is important to accurately identify specific terms or medical codes. This approach is good for tasks that can be clearly defined by rules, such as pulling out specific information from medical records. It allows for precise and controlled processing of data, which is essential for tasks like identifying symptoms in patient notes. User-friendly rule-based NLP software has been developed to help healthcare workers with limited knowledge and coding skills set up their own rule-based NLP projects (for more information, see How to get started with natural language processing section).
Nevertheless, rule-based NLP approaches also have limitations, such as constrained scalability and adaptability, since the rules will have to be adjusted for each setting and purpose. Creating and updating a large set of rules for understanding complex language can be a lot of work. Since language keeps changing, these systems might not keep up well with new or changing text types, requiring regular rule updates. This means they might not always work well for big or changing NLP projects.
Machine learning natural language processing
Machine learning NLP is a different approach that overcomes some of the limitations of rule-based NLP. It uses algorithms and statistical models to help computers understand and create human language. This method allows NLP systems to learn and improve on their own by analysing large amounts of text, helping them grasp complex language patterns and contexts. In machine learning NLP, various algorithms are employed, including supervised models for text classification and sentiment analysis, unsupervised methods for clustering and topic modelling, and deep learning methods such as neural network and transformer models, which excel at sequence-to-sequence analysis. Its main advantage is its ability to understand and use complex language features and meanings from large text collections, making it highly effective for tasks where rule-based methods struggle due to language complexity and constant changes.
A noteworthy advancement in machine learning NLP is the emergence of large language models (LLMs), which play a pivotal role in generative AI. Large language models, such as generative pre-trained transformer (GPT) models, excel at understanding and generating human-like text by learning from vast amounts of data. These models utilize advanced techniques to capture intricate linguistic features and semantic nuances from large text collections, enabling them to generate coherent and contextually relevant responses. By leveraging the capabilities of generative AI, LLMs enhance the performance of NLP systems in handling complex language tasks with improved accuracy and fluency.
The potential of natural language processing for cardiovascular nursing
A review from 2023 highlights that the application of NLP in nursing notes remains rather limited. According to Mitha et al.,9 only 43 studies had employed NLP techniques in this context. Notably, the bulk of these studies (86%) were conducted prior to 2021, covering diverse topics such as cardiac symptoms, mortality, and fall risks.9 A 2019 review that synthesized the literature on the use of NLP to process or analyse symptom information, a core area of nursing research, identified 14 articles.5 Of these articles, five studies focused on the cardiology specialty.5 A 2022 review identified 37 studies developing and applying NLP in various areas of cardiology, more specifically health failure, imaging, coronary artery disease, electrophysiology, general cardiology, and valvular heart disease.10 Often, rule-based NLP methods were used to identify patients with a specific diagnosis and/or extract disease severity.10 Notably, only a few of these studies focused on nursing and were published in nursing journals. Hence, up to now, there seems to be relatively little use of NLP within the cardiovascular nursing domain. For NLP to gain broader acceptance in cardiovascular nursing, there is a pressing need for enhanced education among nurses, increased research efforts, and the development of comprehensive guidelines for reporting and implementing NLP technologies in nursing.11
However, already now, it is clear that NLP holds significant potential for improving patient care and healthcare processes and can have several important applications.9 Natural language processing can, for example, be used to analyse and extract valuable insights from EHRs, facilitating the efficient retrieval of patient information such as medical histories, medications, and treatment plans.12,13 Moreover, it can also aid in automating clinical documentation, thereby reducing the administrative burden on nurses, and allowing them to allocate more time to direct patient care.14 Furthermore, NLP-driven analytics can assist in risk assessment and prediction by identifying early warning signs and high-risk factors, thereby enabling proactive interventions.15,16 Additionally, NLP-powered chatbots and virtual assistants can enhance patient communication, provide timely information, and support education.17,18 Indeed, overall, NLP has the potential to revolutionize cardiovascular nursing by streamlining processes, improving data-driven decision-making, and enhancing patient-centred care.9
Examples of the use of natural language processing in cardiovascular research projects
Use of rule-based natural language processing to identify symptoms of atrial fibrillation
Turchioe and colleagues6,19 identified trajectories and clusters of co-occurring symptoms of patients undergoing catheter ablation for atrial fibrillation using NLP and machine learning. In their research, 1293 patients who underwent ablation for their atrial fibrillation were included. Rule-based NLP was used to identify symptoms in inpatient and outpatient notes such as nursing notes, encounter notes, and discharge summaries. The NLP software NimbleMiner was used, which is an R application that allows the creation of a symptom vocabulary using word embeddings and concepts of interest.20 The accuracy of the NLP tool in this context has been evaluated by comparing the results with a subset of 400 randomly selected notes that clinical experts manually annotated. The model achieved acceptable performance (average F-score = 0.81). The most frequently documented symptoms were dyspnoea, oedema, chest pain, anxiety, fatigue, and palpitations.19 Additionally, they identified unique patterns of symptom resolution post-ablation that differed by patients’ personal and clinical characteristics.6
Use of rule-based natural language processing to develop an automated algorithm to identify early risks of hospitalizations or emergency department visits in patients receiving home healthcare
Song et al.21 developed an automated NLP algorithm for the identification of language at risk for hospitalization or emergency department (ED) visits in patients receiving home healthcare. In this study, the Omaha System was used, which is a nursing terminology that describes health problems and symptoms that occur in the community setting. Concepts indicative of the risk of hospitalization or ED visits were identified and were automatically searched for in clinical notes using an NLP algorithm. The developed NLP algorithm, for which NimbleMiner was also used, was evaluated compared with gold-standard manual review and it performed relatively well in identifying these concepts in clinical notes (average F-score = 0.84). When the algorithm was applied to a large subset of narrative notes (2.3 million notes), 18% were identified as having at least one indication of a risk of hospitalization or ED visits.21 The risk factors identified by NLP algorithms in clinical notes were validated as precise indicators when compared with factors recorded in the clinical notes of patients with hospitalization or ED visits, compared with those without. The study showed that NLP can automatically extract information from narrative clinical notes to improve the understanding of a patient’s needs.
Use of machine learning natural language processing to analyse nursing notes and discharge summaries for the prediction of 30-day rehospitalization among patients with heart failure
In this article by Kang et al.,22 free-text physician discharge summary notes and nursing notes were analysed using the data mining package Orange3. After applying NLP, a machine learning pipeline was used to analyse the data further and build a model that could predict 30-day rehospitalization. For the physician discharge summaries, the best model reached an area under the curve of 0.74 and an F1 score of 0.61. For the nursing notes, the best model had an area under the curve of 0.85 and an F1 score of 0.80. Hence, nursing notes were found to be superior to physician discharge notes in developing NLP or machine learning predictive models for 30-day rehospitalization in patients with heart failure. The study underscores the utility of using unstructured data and NLP in the event prediction for heart failure.22
How to get started with natural language processing
This section provides concrete information for healthcare workers who would like to start using NLP. It includes a step-by-step process (see Central Illustration) for performing NLP and information on available text-mining tools and platforms. Please note that this section emphasizes no-coding solutions.

A step-by-step process for performing natural language processing.
Step-by-step process for performing natural language processing
Step 1: pre-processing of unstructured texts
The pre-processing of unstructured texts is the first step to ensure the effectiveness of text mining. This involves converting them into structured data suitable for machine learning. A crucial step in this process includes text pre-processing tasks like tokenization (i.e. breaking text into words or phrases), stemming (i.e. reducing words to their root form), and removing stop words (i.e. common words like ‘the’ or ‘and’ that may not contribute significantly to analysis).
Step 2: feature extraction step
The feature extraction step aims to convert words into numerical representations for computer comprehension based on their relevance within the document and in the entire collection. To do so, word embedding models (e.g. Word2Vec) can be used. In these models, words from a collection of text are mapped to a multidimensional space, where the distance and direction between the vectors reflect the semantic relationships between the words. This enables computers to understand the meaning of words based on their contextual usage in the text.
(for rule-based Natural Language Processing) Step 3: determination of predefined rules
Then, the predefined rules used to identify specific patterns or information in text data can be determined and formulated using the rule-based NLP approach. Initial rules are tested within the library (i.e. a set of predefined words or expressions reflecting the concept of interest) to confirm its ability to execute them accurately. It is important to check how well these rules work by using some example data to see how good the system finds patterns, pulls out information, and analyses language based on the rules we have set.
(for machine learning Natural Language Processing) Step 3: training of predictive models
The pre-processed and feature-extracted data will be used to train predictive models or perform information extraction tasks in a machine learning–based NLP approach. These models learn patterns and relationships from the data during the training process. Once trained, machine learning–based NLP models can be applied to new, unseen data. This model has the advantage of generalizing patterns from the training data to make predictions on new, similar data.
Evaluation of performance and presentation of natural language processing findings
The performance of an NLP model can be evaluated using various metrics and techniques. Often, precision, recall, and the F1 score are used. Precision is the ratio of true positive predictions to the total predicted positives. Hence, it indicates the accuracy of positive predictions. Recall is the ratio of true positive predictions to the total actual positives. Hence, this measures the model’s ability to identify all relevant instances. The F1 score is the mean of precision and recall. Besides, receiver operating characteristic curves, area under the curves, or confusion matrices can also be used to express the performance of NLP models. It is essential to present the performance metrics of the NLP model when describing the findings.
Some available text-mining tools and platforms for no-coding solutions
For certain nurses and other healthcare workers, applying text mining is challenging, as, often, there is the need for coding skills to set up and adjust these systems. Even though text mining can find important information in lots of clinical texts, it usually needs some programming knowledge to get started and make changes. Learning to code can be challenging for certain healthcare workers, which, in turn, makes the application of NLP inaccessible to numerous healthcare providers. To solve this problem, easy-to-use, no-coding-needed tools for NLP have become popular. These user-friendly tools do not require coding skills, come with ready-to-use features, and save time.
One of these tools is the Konstanz Information Miner (KNIME) platform (https://www.knime.com/), a free and open-source data analytic platform. The platform works smoothly with other libraries and tools, making it highly suitable for performing various scientific tasks like sorting texts, analysing opinions, finding themes, recognizing named entities in large text collections, and analysing texts in custom ways. The KNIME platform can be downloaded and run locally on one’s computer, avoiding the need to send data to external servers.
Another text-mining tool is NimbleMiner, which is a user-friendly codeless rule-based NLP application built on the R ShinyApp platform.20 NimbleMiner empowers users to harness the power of rule-based NLP without the need for extensive coding expertise. It simplifies the extraction of analytical insights from text data, allowing users to unveil associations, expand synonym usage, identify patterns, and assign categorical labels to the resulting data set. This all-encompassing functionality is seamlessly integrated into an intuitive and user-friendly interface, enhancing the efficiency of the analytical process and making it accessible to a broader audience.
With these tools, clinicians can leverage the power of text mining without extensive coding knowledge, making it easier to extract meaningful information from clinical narratives, EHRs, and medical literature. Furthermore, collaborations between clinicians and data scientists or informaticians can contribute to developing tailored text-mining solutions that align with clinical needs, while minimizing coding burdens.
Challenges of using natural language processing
Several challenges and considerations must be addressed when employing NLP in healthcare. Firstly, data privacy is paramount; ensuring patient confidentiality is essential, often necessitating de-identified and encrypted data. Secondly, the success of NLP hinges on the quality of input data. Inaccuracies or inconsistencies in the content of the notes or throughout the labelling or feature extraction process can lead to flawed outcomes. Thirdly, the complexity of clinical notes poses a significant challenge for NLP systems, necessitating specialized training for users to effectively navigate and utilize these tools. Fourthly, interoperability issues arise due to the diverse EHR platforms in use, which may not seamlessly integrate with NLP tools, often resulting in the absence of built-in NLP functionalities within EHR systems. Fifthly, the deployment of NLP and AI in healthcare raises ethical concerns regarding transparency and patient autonomy, emphasizing the need for stakeholder awareness of data usage and the importance of rigorous validation and certification of NLP systems before their integration into EHRs for clinical decision-making. Indeed, transparency and the ‘black box’ phenomenon have also been identified as a significant barrier to implementation.23 The black box phenomenon occurs when an algorithm reaches a conclusion without users being able to comprehend the underlying mechanisms of the system, which is an important barrier for clinicians to implement AI models in clinical practice. Moreover, there is a growing concern over the variance in documentation completeness and quality across disadvantaged or minority groups, presenting significant challenges for the data sets employed in NLP research.24 This underscores the necessity of addressing these disparities to ensure the equitable development and application of NLP technologies in healthcare.
Future directions for natural language processing in cardiovascular nursing
Natural language processing algorithms excel in processing unstructured data from various healthcare sources such as medical records, physician/nursing notes, and patient-reported outcomes. Through the analysis of textual data, intricate patterns, trends, and correlations can be unearthed, often beyond the scope of human observation. Based on these insights, NLP algorithms assist healthcare providers in tailoring treatment plans to meet individual patients’ unique needs and preferences. Applying NLP algorithms in individualized healthcare can enhance the efficiency and accuracy of clinical decision-making. Ultimately, it promotes a patient-centred approach to healthcare that prioritizes individualized care and enhances the quality of patient care.
Since this is a new research area, many research opportunities are available to better understand how this technology can be used to improve healthcare provision and outcomes. Future research could, for example, focus on investigating and developing tools for clinical decision support, predictive analytics, patient engagement, nursing documentation, and interdisciplinary collaboration. Also, future research should assess and document relevant performance indicators and employ established standard nursing terminologies to facilitate the scalability of methodologies and results in the future.9 There is a need for exploratory research as well as more advanced research such as randomized controlled clinical trails. In future research, it will be important to involve patients and healthcare workers in evaluating and developing NLP and AI in general for healthcare. Nurses need to be familiar with NLP so that they can effectively engage in the development and implementation of NLP tools that may impact their clinical care.
Conclusions
Although NLP is not yet widely used in cardiovascular nursing, it is clear that NLP has the potential to revolutionize this domain by streamlining processes, improving data-driven decision-making, and enhancing patient-centred care. We suggest clinicians get familiar with the basics of NLP and find out how NLP could make their work and research easier and better. Also for nurses and other healthcare professionals who lack the time to immerse themselves in programming and coding, there are easy-to-use NLP tools available that do not require coding skills and that can come into play.
Funding
This work was supported by the Fonds Wetenschappelijk Onderzoek/Research Foundation Flanders (FWO) (grant number 1159522N to L.V.B.), by NINR (R00NR019124) to M.R.T., and by NHLBI (1K99HL169940) to J.S.
Author contributions
All authors meet the criteria for authorship. L.V.B., J.S., and M.R.T. drafted the manuscript. L.V.B., J.S., M.R.T., and M.T. critically revised and intellectually enhanced the manuscript and approved the final version.
Data availability
Not applicable.
References
Author notes
Conflict of interest: The authors have no relationships relevant to the contents of this paper to disclose. Throughout the preparation of this manuscript, ChatGPT was used to assist the authors in refining their writing process as well as ensuring the coherence of the sentence structure and checking grammatical errors. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the publication’s content.
Comments