-
PDF
- Split View
-
Views
-
Cite
Cite
Liu Yang, Mudan Ren, Shuifa Sun, Ji Lu, Yirong Wu, Investigation on the preferences for data quality assessment indicators of electronic health records: user-oriented perspective, JAMIA Open, Volume 7, Issue 4, December 2024, ooae142, https://doi.org/10.1093/jamiaopen/ooae142
- Share Icon Share
Abstract
This study aims to investigate whether different types of electronic health record (EHR) users have distinct preferences for data quality assessment indicators (DQAI) and explore how these preferences can guide the enhancement of EHR systems and the optimization of related policies.
High-frequency indicators were identified by a systematic literature review to construct a DQAI system, which was assessed by a user-oriented investigation involving doctors, nurses, hospital supervisors, and clinical researchers. The entropy weight method and fuzzy comprehensive evaluation model were employed for the system comprehensive evaluation. Exploratory factor analysis was used to construct dimensions, and visualization analysis was utilized to explore preferences at both the indicator and dimension levels.
Sixteen indicators were identified to construct the DQAI system and grouped into 2 dimensions: structural and relational. The DQAI system achieved a comprehensive evaluation score of 90.445, corresponding to a “very important” membership level (62.5%). Doctors and nurses exhibited a higher score mean (4.43-4.66 out of 5) than supervisors (3.73-4.55 out of 5). Researchers emphasized credibility, with a score mean of 4.79 out of 5.
The findings reveal that different types of EHR users exhibit distinct preferences for the DQAI at both indicator and dimension levels. Doctors and nurses thought that all indicators were important, clinical researchers emphasized credibility, and supervisors focused mainly on accuracy. Indicators in the relational dimension were generally more valued than structural ones. Doctors and nurses prioritized indicators of relational dimension, while researchers and supervisors leaned towards indicators of structural dimension. These insights suggest that tailored approaches in EHR system development and policy-making could enhance EHR data quality.
This study underscores the importance of user-centered approaches in optimizing EHR systems, highlighting diverse user preferences at both indicator and dimension levels.
Lay Summary
This study investigates how different types of users of electronic health records (EHRs), including doctors, nurses, hospital supervisors, and clinical researchers, prioritize various data quality indicators. By understanding these preferences, this study aims to improve EHR data quality, optimize data management, and guide related policy-making. Unlike previous studies that relied on subjective expert experiences or literature reviews, we used quantitative statistical methods to build a scientific and reliable data quality assessment indicator (DQAI) system. Additionally, we broadened the scope of previous studies, which often focused only on specific contexts or events, by incorporating the perspectives of diverse EHR users. By emphasizing the views of those directly involved with EHR data, our findings highlight the impact of user perceptions on their willingness to use EHR systems, which directly influences data quality. The results showed that relational and structural indicators were generally valued by doctors and nurses, while researchers prioritized credibility and supervisors focused on accuracy. These insights suggest that tailored strategies in EHR system development can better meet users’ needs, ultimately enhancing data quality and healthcare outcomes.
Introduction
With the increasing application of digital health technology, electronic health record (EHR) data, as a primary source of healthcare information,1 play a crucial role in this digital transformation. The quality of EHR data has become a research focus. High-quality EHR data are essential for enhancing the efficiency of digital healthcare, facilitating data analysis, and supporting clinical decision-making processes. However, a lack of trust in EHR data quality remains an obstacle to its utilization.2
One widely adopted quality concept comes from Juran, who defined quality as “fitness-for-use.”3,4 In terms of data quality (DQ), data are deemed of sufficient quality when they meet the needs of specific users pursuing specific goals. Therefore, the evaluation of DQ must be tailed to the context of the task for which the data are intended.5,6 Under the concept of fitness-for-use, studies on EHR DQAI have been conducted from multiple perspectives. First, for different information systems, different degrees of data integration have been observed, resulting in different DQ assessment indicators (DQAI) to be evaluated, such as accuracy, completeness, and consistency.7 Second, for different data types, such as structured data, unstructured data, and semi-structured data, different focuses on DQAI have been noted. Since structured data are easier to standardize and use than semi-structured and unstructured data,8 many scholars have studied the impact of structured data on improving EHR DQ7. The DQ assessment indicators include completeness,8,9 correctness,8 consistency,8 conformance,9 plausibility,9 etc. Third, some studies are conducted for different purposes. By evaluating the DQ of secondary clinical data, researchers can determine whether the data are suitable for their specific research purposes.3,10,11 In studies heading for clinical healthcare, it can be observed that poor-quality data in primary healthcare settings could lead to inadequate patient care, with different DQAI being applied to identify the issues.12,13
Most studies on EHR DQAI under fitness-for-use focus on the analysis of data quality in a particular perspective, such as different systems, data types, or intended uses—an approach we term the object-oriented perspective. In many cases, these studies neglect the actual experiences and requirements of end-users directly engaged with EHR data (referred to as the user-oriented perspective), an oversight that can significantly influence the quality of EHR DQ. Therefore, in this study, we attempt to expand the research focus from the object-oriented to the user-oriented. Generally, EHR users can include doctors, nurses, hospital supervisors, clinical researchers, health organizations, and industry partners. This study specifically targets the users most directly involved in EHR data generation and application: doctors, nurses, hospital supervisors, and clinical researchers. We first constructed an EHR DQAI system (some preliminary results have been roughly reported14). Then, through statistical data from a questionnaire survey, a quantitative methodology was used to delineate DQAI dimensions and explore the views of different groups on the DQAI system at both the indicator level and dimension level. Ultimately, some recommendations were provided to improve EHR data quality and policy-making from a user-oriented standpoint.
Materials and methods
Study setting
The procedure of this study can be divided into 5 primary steps: (1) performing a literature review; (2) refining indicators; (3) designing a questionnaire and conducting a survey; (4) constructing dimensions and evaluating the DQAI system; and (5) analyzing the different views of different types of users.
Literature review
A literature review was performed under the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (see Figure 1).15 A search was conducted up to October 2022 in 2 databases: PubMed, a database for biomedical literature; and Web of Science, a comprehensive citation index database for core journals. The search keywords “Data Quality,” “Electronic Health Record*,” EHR, “Electronic Medical Record*,” and EMR were used through different Boolean operators. Screening of abstract was performed based on the systematic review checklist of Critical Appraisal Skills Programme (CASP).16,17 Each abstract was scored by 3 independent assessors, and decisions were made based on consensus or majority vote. This process yielded a set of key papers for further indicator refinement.

Procedure and result of literature review. CASP = Critical Appraisal Skills Programme; EHR = electronic health record; WOS = Web of Science.
Indicator refinement
Based on the set of key papers, the EHR DQAI were refined in 3 main steps (see Figure 2): (1) listing and normalizing all relevant indicators mentioned within the set; (2) determining the frequencies of the listed indicators were obtained and retaining the indicators with higher frequencies; and (3) discussing and retaining additional important indicators based on input from team members and domain experts. Eventually, the retained indicators with the highest frequency and the indicators suggested by experts, enriched with definitions and examples, formed an EHR DQAI system for the follow-up analysis.

Process of indicator refinement. DQAI = data quality assessment indicators; EHR = electronic health record.
Survey design and conduct
Survey design
To investigate the preferences of different types of EHR users, a questionnaire was developed based on the DQAI system, in which a 5-point Likert-type scale was used to measure their perception (score 1-5: from very unimportant to very important). A higher score indicates higher importance of an indicator.
Survey conduct
This study adopted a stratified random sampling method to ensure the representativeness of the questionnaire results. Three institutions in a third-tier city in central China were selected as the survey sites, including 2 Class III Grade A hospitals (the highest-level hospital in China, accounting for 33% of such hospitals in the city) and one research-oriented university (accounting for 50% of the total universities in the city). This city, as a sub-central city of a developed province, has a concentration of medical resources and a high level of healthcare informatization, which reflects a representative sample of above-average level in China. Then, coordinators were appointed at each institutions to facilitate the survey.
Before conducting the online survey, we ensured that participants could understand the questionnaire content. On one hand, we prepared instructional guidelines for the questionnaire to help responders understand the process and objectives. On the other hand, we invited 2 experienced trainers to supervise the test coordinators during the survey implementation.
Comprehensive evaluation
To improve the objectivity of “fuzziness” and render the results more objective and scientific, the entropy weight method and the fuzzy comprehensive evaluation model were used to comprehensively evaluate the DQAI system. The entropy weight method was utilized to determine the weight of each indicator. If the entropy of an indicator is smaller, more information it contains and a higher weight. In general, the higher the comprehensive evaluation score, the better the effect.18 The process of the comprehensive evaluation was composed of the following steps: (1) establishing the factor set, the evaluation set, and the set of entropy weights; (2) constructing the fuzzy relationship matrix; (3) performing the fuzzy comprehensive evaluation; and (4) calculating the comprehensive evaluation score. Appendix S1 provides detailed steps.
Statistical analysis and dimension construction
Descriptive statistical analyses, such as demographics, were used to describe the sample distribution. Kendall’s W coefficient was used to describe the consistency of participants in the survey. The Kruskal-Wallis test was conducted to examine the differences of each indicator among different types of users. Exploratory factor analysis (EFA)19 using the principal component analysis method was performed for structural validity analysis to define the dimension of the EHR DQAI system.
Preference analysis
Preference analysis for different types of users was conducted through heat maps, kernel density plots, and scatter plots. The preference analysis was performed at both the indicator-level and dimension-level.
Results
EHR DQAI
As shown in Figure 2, we obtained 29 papers following the PRISMA literature review process. Additionally, based on experts’ advice, we included an official Chinese standard document issued by the National Medical Products Administration (NMPA). A set of 30 key papers was finally obtained. From this corpus, we extracted 41 indicators, which were subsequently normalized (see Table 1) and analyzed for frequency. Based on the frequency analysis, we retained indicators that appeared more than 4 times. One exception was “portability,” which was retained due to its inclusion in the NMFA standard we supplemented. We finally extracted 16 indicators for our EHR DQAI system (see Table 2),14 which were also used to develop the Questionnaire named the Importance of Electronic Health Records Data Quality Indicators (see Appendix S2).
Harmonized indicator . | Terminologies used in the literature . |
---|---|
accuracy | accuracy, correctness |
completeness | completeness, comprehensiveness |
timeliness | timeliness, contemporaneous, currency |
consistency | consistency, concordance |
precision | precision, correctness |
conformance | conformance, conformity, formal medical coding schemes |
uniqueness | uniqueness, duplication |
credibility | credibility, integrity, believability, reliability, trustworthiness |
plausibility | plausibility |
traceability | traceability, attributable, contextualization |
portability | portability |
usability | usability |
accessibility | accessibility, available, linkability |
relevance | relevance, representativeness |
applicability | applicability, flexibility, predictive value, adaptability |
understandability | understandability, legible, interpretability, comprehensiveness, definition, readability |
Harmonized indicator . | Terminologies used in the literature . |
---|---|
accuracy | accuracy, correctness |
completeness | completeness, comprehensiveness |
timeliness | timeliness, contemporaneous, currency |
consistency | consistency, concordance |
precision | precision, correctness |
conformance | conformance, conformity, formal medical coding schemes |
uniqueness | uniqueness, duplication |
credibility | credibility, integrity, believability, reliability, trustworthiness |
plausibility | plausibility |
traceability | traceability, attributable, contextualization |
portability | portability |
usability | usability |
accessibility | accessibility, available, linkability |
relevance | relevance, representativeness |
applicability | applicability, flexibility, predictive value, adaptability |
understandability | understandability, legible, interpretability, comprehensiveness, definition, readability |
Harmonized indicator . | Terminologies used in the literature . |
---|---|
accuracy | accuracy, correctness |
completeness | completeness, comprehensiveness |
timeliness | timeliness, contemporaneous, currency |
consistency | consistency, concordance |
precision | precision, correctness |
conformance | conformance, conformity, formal medical coding schemes |
uniqueness | uniqueness, duplication |
credibility | credibility, integrity, believability, reliability, trustworthiness |
plausibility | plausibility |
traceability | traceability, attributable, contextualization |
portability | portability |
usability | usability |
accessibility | accessibility, available, linkability |
relevance | relevance, representativeness |
applicability | applicability, flexibility, predictive value, adaptability |
understandability | understandability, legible, interpretability, comprehensiveness, definition, readability |
Harmonized indicator . | Terminologies used in the literature . |
---|---|
accuracy | accuracy, correctness |
completeness | completeness, comprehensiveness |
timeliness | timeliness, contemporaneous, currency |
consistency | consistency, concordance |
precision | precision, correctness |
conformance | conformance, conformity, formal medical coding schemes |
uniqueness | uniqueness, duplication |
credibility | credibility, integrity, believability, reliability, trustworthiness |
plausibility | plausibility |
traceability | traceability, attributable, contextualization |
portability | portability |
usability | usability |
accessibility | accessibility, available, linkability |
relevance | relevance, representativeness |
applicability | applicability, flexibility, predictive value, adaptability |
understandability | understandability, legible, interpretability, comprehensiveness, definition, readability |
Indicator . | Definition . | Example . | Reference . |
---|---|---|---|
accuracy | The patient information recorded in the EHR system is consistent with the actual situation of the patient. | Whether the patient’s information is registered accurately (name, age, etc). | 3,10,12,20–39 |
completeness | The patient information recorded in the EHR system is detailed and complete. | Whether the patient’s personal information and condition information are complete. | 3,9,10,12,13,20,21,23–44 |
timeliness | The patient status recorded in the EHR system is timely and effective. | When the patient’s condition changes, relevant information is updated in time. | 3,12,13,20–22,25–30,32–37,39,41,43 |
consistency | The degree of internal and external consistency of EHR data should meet the indicators claimed by managers. | The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance. | 3,10,12,21,23,25–39,41–43 |
precision | The qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer. | The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter. | 12,21,28,31,37 |
conformance | EHR data are stored, processed, and circulated in a standardized format. | The information of the EHR is registered according to the department’s digital code and the patient's ID number. | 9,21,23,24,30,32,35,39,40,42,44 |
uniqueness | There is no duplication of EHR data records; some data must remain unique. | The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication. | 12,21,24,25,28,31,32,37,38 |
credibility | EHR data are sourced from professional institutions; data are frequently reviewed. | The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital. | 21,23,25–28,30,33–35,38–40 |
plausibility | The value of EHR data are reasonable. | Whether the data such as the patient’s age and condition conform to common sense. | 3,9,24,27,29,32,39,40,44 |
traceability | Ensure the auditability of the EHR data access trail and change trail. | EHR data can track information such as diagnosis and treatment institutions and doctors. | 25,28,31,34,36,37 |
portability | The extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producer | EHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements. | 37 |
usability | EHR data should be available at the level claimed by the data administrator. | The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease. | 21,22,32,37,38 |
accessibility | EHR data can be easily accessed and extracted with a user-friendly interface. | EHR data provide a professional website for the attendings and patients to easily access and obtain. | 20–24,34,36–39 |
relevance | There is some desired correlation between EHR data. | The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement. | 21,22,25,26,30,32,34 |
applicability | The content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out. | The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis. | 22,26,28,31,32 |
understandability | The preview and interpretation levels of EHR data should be at the level claimed by the data collector. | The information explained by EHR data can be understood by doctors or other users clear and easily. | 20–24,32,34,36–38 |
Indicator . | Definition . | Example . | Reference . |
---|---|---|---|
accuracy | The patient information recorded in the EHR system is consistent with the actual situation of the patient. | Whether the patient’s information is registered accurately (name, age, etc). | 3,10,12,20–39 |
completeness | The patient information recorded in the EHR system is detailed and complete. | Whether the patient’s personal information and condition information are complete. | 3,9,10,12,13,20,21,23–44 |
timeliness | The patient status recorded in the EHR system is timely and effective. | When the patient’s condition changes, relevant information is updated in time. | 3,12,13,20–22,25–30,32–37,39,41,43 |
consistency | The degree of internal and external consistency of EHR data should meet the indicators claimed by managers. | The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance. | 3,10,12,21,23,25–39,41–43 |
precision | The qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer. | The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter. | 12,21,28,31,37 |
conformance | EHR data are stored, processed, and circulated in a standardized format. | The information of the EHR is registered according to the department’s digital code and the patient's ID number. | 9,21,23,24,30,32,35,39,40,42,44 |
uniqueness | There is no duplication of EHR data records; some data must remain unique. | The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication. | 12,21,24,25,28,31,32,37,38 |
credibility | EHR data are sourced from professional institutions; data are frequently reviewed. | The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital. | 21,23,25–28,30,33–35,38–40 |
plausibility | The value of EHR data are reasonable. | Whether the data such as the patient’s age and condition conform to common sense. | 3,9,24,27,29,32,39,40,44 |
traceability | Ensure the auditability of the EHR data access trail and change trail. | EHR data can track information such as diagnosis and treatment institutions and doctors. | 25,28,31,34,36,37 |
portability | The extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producer | EHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements. | 37 |
usability | EHR data should be available at the level claimed by the data administrator. | The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease. | 21,22,32,37,38 |
accessibility | EHR data can be easily accessed and extracted with a user-friendly interface. | EHR data provide a professional website for the attendings and patients to easily access and obtain. | 20–24,34,36–39 |
relevance | There is some desired correlation between EHR data. | The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement. | 21,22,25,26,30,32,34 |
applicability | The content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out. | The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis. | 22,26,28,31,32 |
understandability | The preview and interpretation levels of EHR data should be at the level claimed by the data collector. | The information explained by EHR data can be understood by doctors or other users clear and easily. | 20–24,32,34,36–38 |
EHR = electronic health record.
Indicator . | Definition . | Example . | Reference . |
---|---|---|---|
accuracy | The patient information recorded in the EHR system is consistent with the actual situation of the patient. | Whether the patient’s information is registered accurately (name, age, etc). | 3,10,12,20–39 |
completeness | The patient information recorded in the EHR system is detailed and complete. | Whether the patient’s personal information and condition information are complete. | 3,9,10,12,13,20,21,23–44 |
timeliness | The patient status recorded in the EHR system is timely and effective. | When the patient’s condition changes, relevant information is updated in time. | 3,12,13,20–22,25–30,32–37,39,41,43 |
consistency | The degree of internal and external consistency of EHR data should meet the indicators claimed by managers. | The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance. | 3,10,12,21,23,25–39,41–43 |
precision | The qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer. | The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter. | 12,21,28,31,37 |
conformance | EHR data are stored, processed, and circulated in a standardized format. | The information of the EHR is registered according to the department’s digital code and the patient's ID number. | 9,21,23,24,30,32,35,39,40,42,44 |
uniqueness | There is no duplication of EHR data records; some data must remain unique. | The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication. | 12,21,24,25,28,31,32,37,38 |
credibility | EHR data are sourced from professional institutions; data are frequently reviewed. | The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital. | 21,23,25–28,30,33–35,38–40 |
plausibility | The value of EHR data are reasonable. | Whether the data such as the patient’s age and condition conform to common sense. | 3,9,24,27,29,32,39,40,44 |
traceability | Ensure the auditability of the EHR data access trail and change trail. | EHR data can track information such as diagnosis and treatment institutions and doctors. | 25,28,31,34,36,37 |
portability | The extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producer | EHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements. | 37 |
usability | EHR data should be available at the level claimed by the data administrator. | The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease. | 21,22,32,37,38 |
accessibility | EHR data can be easily accessed and extracted with a user-friendly interface. | EHR data provide a professional website for the attendings and patients to easily access and obtain. | 20–24,34,36–39 |
relevance | There is some desired correlation between EHR data. | The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement. | 21,22,25,26,30,32,34 |
applicability | The content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out. | The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis. | 22,26,28,31,32 |
understandability | The preview and interpretation levels of EHR data should be at the level claimed by the data collector. | The information explained by EHR data can be understood by doctors or other users clear and easily. | 20–24,32,34,36–38 |
Indicator . | Definition . | Example . | Reference . |
---|---|---|---|
accuracy | The patient information recorded in the EHR system is consistent with the actual situation of the patient. | Whether the patient’s information is registered accurately (name, age, etc). | 3,10,12,20–39 |
completeness | The patient information recorded in the EHR system is detailed and complete. | Whether the patient’s personal information and condition information are complete. | 3,9,10,12,13,20,21,23–44 |
timeliness | The patient status recorded in the EHR system is timely and effective. | When the patient’s condition changes, relevant information is updated in time. | 3,12,13,20–22,25–30,32–37,39,41,43 |
consistency | The degree of internal and external consistency of EHR data should meet the indicators claimed by managers. | The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance. | 3,10,12,21,23,25–39,41–43 |
precision | The qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer. | The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter. | 12,21,28,31,37 |
conformance | EHR data are stored, processed, and circulated in a standardized format. | The information of the EHR is registered according to the department’s digital code and the patient's ID number. | 9,21,23,24,30,32,35,39,40,42,44 |
uniqueness | There is no duplication of EHR data records; some data must remain unique. | The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication. | 12,21,24,25,28,31,32,37,38 |
credibility | EHR data are sourced from professional institutions; data are frequently reviewed. | The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital. | 21,23,25–28,30,33–35,38–40 |
plausibility | The value of EHR data are reasonable. | Whether the data such as the patient’s age and condition conform to common sense. | 3,9,24,27,29,32,39,40,44 |
traceability | Ensure the auditability of the EHR data access trail and change trail. | EHR data can track information such as diagnosis and treatment institutions and doctors. | 25,28,31,34,36,37 |
portability | The extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producer | EHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements. | 37 |
usability | EHR data should be available at the level claimed by the data administrator. | The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease. | 21,22,32,37,38 |
accessibility | EHR data can be easily accessed and extracted with a user-friendly interface. | EHR data provide a professional website for the attendings and patients to easily access and obtain. | 20–24,34,36–39 |
relevance | There is some desired correlation between EHR data. | The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement. | 21,22,25,26,30,32,34 |
applicability | The content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out. | The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis. | 22,26,28,31,32 |
understandability | The preview and interpretation levels of EHR data should be at the level claimed by the data collector. | The information explained by EHR data can be understood by doctors or other users clear and easily. | 20–24,32,34,36–38 |
EHR = electronic health record.
Descriptive statistics
In the survey, we distributed 210 questionnaires, collected 198 questionnaires, and eliminated 3 invalid questionnaires, resulting in a total of 195 valid questionnaires (92.86%).
We provided demographic details separately for gender, age, educational background, job role, years of clinical work experience, years of usage of EHR system, and skill level of using EHR system (see Table 3). The average age of participants was 36.74, the average clinical working years was 13.73, the average years of EHR system usage was 7.92, and almost three fourth of the participants were skilled or very skilled in using the EHR system. Among them, the proportions of female hospital supervisors, doctors, nurses, and researchers were as follows: 72.7% (8/11), 38.0% (35/92), 94.1% (64/68), and 25.0% (6/24).
. | Variable . | N . | Percent in total (%) . |
---|---|---|---|
Gender | Male | 82 | 42.1 |
Female | 113 | 57.9 | |
Age | −28 | 39 | 20.0 |
28-35 | 52 | 26.7 | |
35-40 | 38 | 19.5 | |
40-50 | 56 | 28.7 | |
50- | 10 | 5.1 | |
Degree | Associate | 3 | 1.5 |
Bachelor | 115 | 59.0 | |
Master | 69 | 35.4 | |
PhD or MD | 8 | 4.1 | |
Job role | Hospital supervisor | 11 | 5.6 |
Doctor | 92 | 47.2 | |
Nurse | 68 | 34.9 | |
Clinical researcher | 24 | 12.3 | |
Years of clinical | −5 | 49 | 25.1 |
5-10 | 35 | 17.9 | |
10-15 | 31 | 15.9 | |
15-25 | 51 | 26.2 | |
25- | 29 | 14.9 | |
Years of EHR | −5 | 64 | 32.8 |
5-10 | 92 | 47.2 | |
10-15 | 29 | 14.9 | |
15-25 | 7 | 3.6 | |
25- | 3 | 1.5 | |
Level of EHR | Very unskilled | 3 | 1.5 |
Unskilled | 4 | 2.1 | |
Generally skilled | 43 | 22.1 | |
Skilled | 107 | 54.9 | |
Very skilled | 38 | 19.5 |
. | Variable . | N . | Percent in total (%) . |
---|---|---|---|
Gender | Male | 82 | 42.1 |
Female | 113 | 57.9 | |
Age | −28 | 39 | 20.0 |
28-35 | 52 | 26.7 | |
35-40 | 38 | 19.5 | |
40-50 | 56 | 28.7 | |
50- | 10 | 5.1 | |
Degree | Associate | 3 | 1.5 |
Bachelor | 115 | 59.0 | |
Master | 69 | 35.4 | |
PhD or MD | 8 | 4.1 | |
Job role | Hospital supervisor | 11 | 5.6 |
Doctor | 92 | 47.2 | |
Nurse | 68 | 34.9 | |
Clinical researcher | 24 | 12.3 | |
Years of clinical | −5 | 49 | 25.1 |
5-10 | 35 | 17.9 | |
10-15 | 31 | 15.9 | |
15-25 | 51 | 26.2 | |
25- | 29 | 14.9 | |
Years of EHR | −5 | 64 | 32.8 |
5-10 | 92 | 47.2 | |
10-15 | 29 | 14.9 | |
15-25 | 7 | 3.6 | |
25- | 3 | 1.5 | |
Level of EHR | Very unskilled | 3 | 1.5 |
Unskilled | 4 | 2.1 | |
Generally skilled | 43 | 22.1 | |
Skilled | 107 | 54.9 | |
Very skilled | 38 | 19.5 |
EHR = electronic health record.
. | Variable . | N . | Percent in total (%) . |
---|---|---|---|
Gender | Male | 82 | 42.1 |
Female | 113 | 57.9 | |
Age | −28 | 39 | 20.0 |
28-35 | 52 | 26.7 | |
35-40 | 38 | 19.5 | |
40-50 | 56 | 28.7 | |
50- | 10 | 5.1 | |
Degree | Associate | 3 | 1.5 |
Bachelor | 115 | 59.0 | |
Master | 69 | 35.4 | |
PhD or MD | 8 | 4.1 | |
Job role | Hospital supervisor | 11 | 5.6 |
Doctor | 92 | 47.2 | |
Nurse | 68 | 34.9 | |
Clinical researcher | 24 | 12.3 | |
Years of clinical | −5 | 49 | 25.1 |
5-10 | 35 | 17.9 | |
10-15 | 31 | 15.9 | |
15-25 | 51 | 26.2 | |
25- | 29 | 14.9 | |
Years of EHR | −5 | 64 | 32.8 |
5-10 | 92 | 47.2 | |
10-15 | 29 | 14.9 | |
15-25 | 7 | 3.6 | |
25- | 3 | 1.5 | |
Level of EHR | Very unskilled | 3 | 1.5 |
Unskilled | 4 | 2.1 | |
Generally skilled | 43 | 22.1 | |
Skilled | 107 | 54.9 | |
Very skilled | 38 | 19.5 |
. | Variable . | N . | Percent in total (%) . |
---|---|---|---|
Gender | Male | 82 | 42.1 |
Female | 113 | 57.9 | |
Age | −28 | 39 | 20.0 |
28-35 | 52 | 26.7 | |
35-40 | 38 | 19.5 | |
40-50 | 56 | 28.7 | |
50- | 10 | 5.1 | |
Degree | Associate | 3 | 1.5 |
Bachelor | 115 | 59.0 | |
Master | 69 | 35.4 | |
PhD or MD | 8 | 4.1 | |
Job role | Hospital supervisor | 11 | 5.6 |
Doctor | 92 | 47.2 | |
Nurse | 68 | 34.9 | |
Clinical researcher | 24 | 12.3 | |
Years of clinical | −5 | 49 | 25.1 |
5-10 | 35 | 17.9 | |
10-15 | 31 | 15.9 | |
15-25 | 51 | 26.2 | |
25- | 29 | 14.9 | |
Years of EHR | −5 | 64 | 32.8 |
5-10 | 92 | 47.2 | |
10-15 | 29 | 14.9 | |
15-25 | 7 | 3.6 | |
25- | 3 | 1.5 | |
Level of EHR | Very unskilled | 3 | 1.5 |
Unskilled | 4 | 2.1 | |
Generally skilled | 43 | 22.1 | |
Skilled | 107 | 54.9 | |
Very skilled | 38 | 19.5 |
EHR = electronic health record.
Consistency
Cronbach’s α coefficient was 0.980 and Kendall’s W coefficient was 0.730 (), indicating that the questionnaire has a high internal consistency and the results are reliable.
Kruskal-Wallis test
The Kruskal-Wallis (KW) test was employed to examine the differences among 4 groups of participants across 16 indicators. The P-values after the Bonferroni correction are shown in Figure 3. Only the adjusted P-value for timeliness exceeded .05 (P = .057), indicating no significant difference, while the adjusted P-values for the other indicators were all less than .05 (ranging from .002 to .057), demonstrating statistically significant differences among 4 types of users.

Entropy weight
As an intermediate step in the comprehensive evaluation, the entropy weight method was applied, resulting in a set of entropy weights ranging from 5.33%-7.62% (refer to Table S1).
Fuzzy comprehensive evaluation
The fuzzy comprehensive evaluation of the DQAI system was performed using the weighted average M (∧,+) operator and the maximum membership rule. As shown in Table 4, the membership degree of “very important” was 62.5% and the comprehensive evaluation score was 90.445 points, which demonstrates that the overall evaluation of the DQAI system was “very important,” evidencing a strong acceptance of the system among the survey responders.
. | Very unimportant . | Unimportant . | Generally important . | Important . | Very important . |
---|---|---|---|---|---|
Membership weights | 0.017 | 0.004 | 0.03 | 0.325 | 0.625a |
Composite score | 90.445b |
. | Very unimportant . | Unimportant . | Generally important . | Important . | Very important . |
---|---|---|---|---|---|
Membership weights | 0.017 | 0.004 | 0.03 | 0.325 | 0.625a |
Composite score | 90.445b |
The score signifies a ‘very important’ classification at a 62.5% membership level.
The score of 90.445 points denotes the outcome of the comprehensive evaluation.
. | Very unimportant . | Unimportant . | Generally important . | Important . | Very important . |
---|---|---|---|---|---|
Membership weights | 0.017 | 0.004 | 0.03 | 0.325 | 0.625a |
Composite score | 90.445b |
. | Very unimportant . | Unimportant . | Generally important . | Important . | Very important . |
---|---|---|---|---|---|
Membership weights | 0.017 | 0.004 | 0.03 | 0.325 | 0.625a |
Composite score | 90.445b |
The score signifies a ‘very important’ classification at a 62.5% membership level.
The score of 90.445 points denotes the outcome of the comprehensive evaluation.
Dimension construction
The structural validity of the DQAI system was analyzed using the method of principal component analysis. The Kaiser-Meyer-Olkin (KMO) value was 0.959 and the Bartlett sphericity test had a significance of P = .000. By fixing 2 factors and applying a varimax rotation, the total variance explained increased to 82.416%, indicating that the 2 factors contain a large amount of information with a high degree of explanation (refer to Table S2). Among 16 indicators, the 9 indicators (accuracy, completeness, timeliness, consistency, precision, conformance, uniqueness, credibility, and plausibility) were included in common factor 1, while the remaining 7 indicators (traceability, portability, usability, accessibility, relevance, applicability, and understandability) were contained in common factor 2. The loadings of the 2 common factors ranged from 0.652-0.847, indicating that the questionnaire has high structural validity. Cronbach’s α coefficients of the 2 common factors were verified again, which were 0.967 (common factor 1) and 0.968 (common factor 2), respectively. The results show that the questionnaire possesses solid structural validity, and the 16 indicators are well divided into 2 common factors.
For the first common factor, we observed that the indicators it contains can be easily quantified and standardized, and they can be used to describe the characteristics of the data itself. Therefore, we designate the 9 indicators belonging to common factor 1 as structural indicators, which form the structural dimension. For the second common factor, its indicators are more suited to characterizing attributes that are intimately associated with the generation, utilization, and dissemination of data. Therefore, we designate the 7 indicators belonging to common factor 2 as relational indicators, which form the relational dimension.
Preferences analysis for 4 different types of EHR users
Indicator-level preferences analysis
We can observe the mean values for the 4 groups of EHR users on each indicator from the heatmap (see Figure 4). For the 16 indicators, 3 key insights emerge in a highly intuitive way: First, there is a striking similarity in the mean scores between doctors (ranging from 4.43 to 4.66) and nurses (ranging from 4.53 to 4.63) on the heatmap, indicating a high level of agreement regarding the importance of all indicators within our DQAI system. This correlation is expected, given their primary roles as the principal producers and consumers of EHR data, underscoring their recognition of the significance of each indicator in ensuring data quality. Second, researchers rated credibility the highest, with a mean of 4.79, while their mean values for other indicators are slightly lower than those of doctors and nurses. As secondary users of EHR data, researchers rely on high credibility of data to ensure accuracy in tasks such as training computer-assisted diagnosis models. Third, aside from accuracy, supervisors exhibit a lower concern for other indicators, as evidenced by their mean values for portability (3.73), understandability (3.82), and accessibility (3.91), with a large range of mean values (3.73-4.55). Unlike operational and technical EHR users, supervisors, from a managerial perspective, primarily focus on accuracy to obtain precise reports.

The heatmap of the mean for each indicator on 4 different types of roles.
The detailed density distribution of scores for the 4 groups is shown in Figure 5. The X-axis represents different scores (1-5). The Y-axis represents the density distribution in different scores, which means that the more scores of 5 participants choose, the higher the density of 5.

The density distribution of each indicator on 4 different types of roles.
The overall density distribution of different types of users on each indicator is often similar, with a higher density on scores of 4 or 5. In general, the highest density for doctors is at a score of 5, while for nurses, it is at a score of 4. In terms of differences between scores of 4 and 5, doctors show a more pronounced distribution difference (with a higher density at 5 and a lower density at 4), while nurses display a smaller gap (with scores of 4 and 5 more closely distributed). This may relate to factors such as job-related stress, professional roles, personality traits, or demographics differences. For instance, 94.1% of nurses are female, which may naturally influence an inclination away from consistently giving the highest score of 5. Additionally, the density plots corroborate the observations made in the heatmap analysis. For example, researchers show only slight density differences between scores of 4 and 5, with a more concentrated difference for credibility, where most ratings are at 5. Supervisors, in contrast, have the widest density range, indicating more dispersed ratings, which aligns with their generally lower concern for these indicators as highlighted in Figure 4.
Dimension-level preference analysis
As shown in Figure 6, the X and Y coordinates in the scatter plot represent the relational and structural dimensions, respectively. The distribution of samples with different job roles is represented with different colors and shapes. Based on Figures 4 and 6, we can observe that (1) hospital supervisors are more inclined towards quantifiable structural dimension. (2) Doctors demonstrate a strong emphasis on both relational and structural dimensions. Nevertheless, some extremely low values are observed. (3) Nurses attach great importance to the 2 dimensions, which is similar to the doctors’ point of view. Nevertheless, the difference is that the dots in the scatter plot are more concentrated for nurses. The indicators with the top 3 mean values for nurses are accuracy, uniqueness, and plausibility. (4) Clinical researchers pay more attention to quantifiable structural indicators, similar to hospital supervisors. The indicators with the top 3 mean values for clinical researchers are conformance, uniqueness, and credibility.

The dimension-level perspective on 4 different types of job roles.
Discussion
In this study, the EHR DQAI system was developed, which includes 16 indicators. These indicators can be categorized into 2 dimensions: the structural dimension and the relational dimension. The structural dimension indicators can be readily standardized and pertain to the structure of the data itself, while the relational dimension includes indicators that associate with the dynamics of data production, utilization, and sharing. Additionally, the results of the preference analysis show that different types of users demonstrate different preferences toward the EHR DQAI system, both at the indicator level and the dimension level. At the indicator level, doctors and nurses generally attach great importance to all indicators, while supervisors generally do not attach much importance to indicators, with the exception of accuracy. Researchers place a similar level of importance on all indicators except for a high importance on credibility. At the dimension level, doctors and nurses have shown great interest in both 2 dimensions, while clinical researchers and hospital supervisors focus more on structural dimension indicators.
Given that doctors and nurses are the 2 large sample groups in this study, and as the largest producers and users, they believe that convenient data input and fast access to accurate EHR data can greatly improve their work efficiency. Therefore, they recognize the importance of DQAI system and prefer relational dimension more since relational indicators can help them operate and share EHR data easily. The difference between doctors and nurses lies in the fact that the mean for nurses is more evenly, which may be due to various factors such as the differences in gender traits, education, job responsibilities, and pressure between the 2 groups. For doctors, some extremely low values are observed in 2 dimensions (see Figure 4), indicating that doctors working under high-pressure intensity are more likely to have negative emotions toward EHR data. For nurses, most of them are female (94.1%). They are more sensitive to details such as whether the data are consistent with common sense, or whether there are multiple cases with different keywords for the same patient, etc. Therefore, they score high on plausibility and accessibility, and their perception distribution on the scatter plot of the relationship dimension is also more concentrated in high-scoring areas. For hospital supervisors, although they do not attach great importance to the DQAI system, it is evident that they are very interested in structural dimension indicators, especially accuracy. They are usually the owners and users of EHR data, determining the content and permissions for data retrieval. They usually focus more on the efficiency and accuracy of obtaining reports rather than the actual data generation and other operational tasks. For researchers, they often need to reuse EHR data for research and modeling. For example, they may reuse EHR data to develop a model to predict the risk of patient death and the DQ directly affects the effectiveness of the model. Therefore, credibility is particularly important for them.
In this study, considering that different types of users have different needs and preferences in different data usage scenarios, we systematically not only constructed a DQAI system but also conducted a more comprehensive analysis and verification through empirical investigation. The main contributions of our study are manifold: first, we used statistical methods to categorize the dimensions of the DQAI system. This approach addresses the shortcomings of previous studies that relied on expert experience or literature review.3,11 Second, we broadened the perspective of previous research, which was limited to an object-oriented viewpoint.8–10 From a user-oriented perspective, we focused on the perceptions and opinions of EHR data users, as these influence their willingness to use EHR systems, which, in turn, affects the quality of EHR data. Third, the results of this study are rooted in real-world usage scenarios, enabling the formulation of practical recommendations for improving EHR data quality and hospital management. Specific recommendations include: (1) Doctors and nurses, as the main producers and users of EHR data, could benefit from more user-friendly interfaces designed using relational indicators. (2) Hospital supervisors, who act as both users and regulators, emphasize data accuracy and the overall performance of EHR DQAI systems. We can enhance data reporting functions based on their needs, enabling easy and quick retrieval of reports. (3) For clinical researchers, data credibility is crucial. As secondary users of EHR data, we can provide convenient data querying and analysis tools tailored to their practices.
There are several limitations in this study. First, the sample size is relatively small. In the future, we plan to increase the sample size, specifically including more hospital supervisors and clinical researchers. Second, the participants of the empirical investigation are selected from one area in China. A possible line of future research is to validate the results of this study using the data collected from other areas in China or other countries. Third, this study focuses on the analysis of the preferences of DQAI indicators for different job roles. We plan to expand this study by analyzing the preferences from other perspectives.
Conclusions
This study develops and evaluates an EHR DQAI system composed of 16 indicators in 2 dimensions using a systematic approach. It takes a user-oriented perspective that has been overlooked in the past and discovers differences in preferences for DQAI among different user groups. This can provide new insights for the optimization and development of EHR systems, as well as policy formulation by relevant agencies, thereby enhancing the quality of EHR data throughout its lifecycle, from inception in data production to application in policy-making.
Acknowledgments
The authors would like to thank the support of the National Social Science Foundation of China, Beijing Normal University, and Yichang Central People’s Hospital.
Author contributions
Study concepts and design: Liu Yang, Yirong Wu; data evaluation and collecting: Mudan Ren, Ji Lu; data analysis: Liu Yang, Shuifa Sun; original manuscript drafting: Liu Yang; editing manuscript: Yirong Wu, Shuifa Sun; approval of the final version of the submitted manuscript: all authors; agree to ensure any questions related to the work are appropriately resolved: all authors.
Supplementary material
Supplementary material is available at JAMIA Open online.
Funding
This work was supported by the project “Research on data quality of electronic medical record based on data semantics” of the National Social Science Foundation of China, grant number 20BTQ-066.
Conflicts of interest
There are no competing interests to declare.
Data availability
The research data can be made available upon reasonable request from the corresponding author. The data are not publicly available due to privacy restrictions.