Abstract

Objectives

This study aims to investigate whether different types of electronic health record (EHR) users have distinct preferences for data quality assessment indicators (DQAI) and explore how these preferences can guide the enhancement of EHR systems and the optimization of related policies.

Materials and Methods

High-frequency indicators were identified by a systematic literature review to construct a DQAI system, which was assessed by a user-oriented investigation involving doctors, nurses, hospital supervisors, and clinical researchers. The entropy weight method and fuzzy comprehensive evaluation model were employed for the system comprehensive evaluation. Exploratory factor analysis was used to construct dimensions, and visualization analysis was utilized to explore preferences at both the indicator and dimension levels.

Results

Sixteen indicators were identified to construct the DQAI system and grouped into 2 dimensions: structural and relational. The DQAI system achieved a comprehensive evaluation score of 90.445, corresponding to a “very important” membership level (62.5%). Doctors and nurses exhibited a higher score mean (4.43-4.66 out of 5) than supervisors (3.73-4.55 out of 5). Researchers emphasized credibility, with a score mean of 4.79 out of 5.

Discussion

The findings reveal that different types of EHR users exhibit distinct preferences for the DQAI at both indicator and dimension levels. Doctors and nurses thought that all indicators were important, clinical researchers emphasized credibility, and supervisors focused mainly on accuracy. Indicators in the relational dimension were generally more valued than structural ones. Doctors and nurses prioritized indicators of relational dimension, while researchers and supervisors leaned towards indicators of structural dimension. These insights suggest that tailored approaches in EHR system development and policy-making could enhance EHR data quality.

Conclusion

This study underscores the importance of user-centered approaches in optimizing EHR systems, highlighting diverse user preferences at both indicator and dimension levels.

Lay Summary

This study investigates how different types of users of electronic health records (EHRs), including doctors, nurses, hospital supervisors, and clinical researchers, prioritize various data quality indicators. By understanding these preferences, this study aims to improve EHR data quality, optimize data management, and guide related policy-making. Unlike previous studies that relied on subjective expert experiences or literature reviews, we used quantitative statistical methods to build a scientific and reliable data quality assessment indicator (DQAI) system. Additionally, we broadened the scope of previous studies, which often focused only on specific contexts or events, by incorporating the perspectives of diverse EHR users. By emphasizing the views of those directly involved with EHR data, our findings highlight the impact of user perceptions on their willingness to use EHR systems, which directly influences data quality. The results showed that relational and structural indicators were generally valued by doctors and nurses, while researchers prioritized credibility and supervisors focused on accuracy. These insights suggest that tailored strategies in EHR system development can better meet users’ needs, ultimately enhancing data quality and healthcare outcomes.

Introduction

With the increasing application of digital health technology, electronic health record (EHR) data, as a primary source of healthcare information,1 play a crucial role in this digital transformation. The quality of EHR data has become a research focus. High-quality EHR data are essential for enhancing the efficiency of digital healthcare, facilitating data analysis, and supporting clinical decision-making processes. However, a lack of trust in EHR data quality remains an obstacle to its utilization.2

One widely adopted quality concept comes from Juran, who defined quality as “fitness-for-use.”3,4 In terms of data quality (DQ), data are deemed of sufficient quality when they meet the needs of specific users pursuing specific goals. Therefore, the evaluation of DQ must be tailed to the context of the task for which the data are intended.5,6 Under the concept of fitness-for-use, studies on EHR DQAI have been conducted from multiple perspectives. First, for different information systems, different degrees of data integration have been observed, resulting in different DQ assessment indicators (DQAI) to be evaluated, such as accuracy, completeness, and consistency.7 Second, for different data types, such as structured data, unstructured data, and semi-structured data, different focuses on DQAI have been noted. Since structured data are easier to standardize and use than semi-structured and unstructured data,8 many scholars have studied the impact of structured data on improving EHR DQ7. The DQ assessment indicators include completeness,8,9 correctness,8 consistency,8 conformance,9 plausibility,9 etc. Third, some studies are conducted for different purposes. By evaluating the DQ of secondary clinical data, researchers can determine whether the data are suitable for their specific research purposes.3,10,11 In studies heading for clinical healthcare, it can be observed that poor-quality data in primary healthcare settings could lead to inadequate patient care, with different DQAI being applied to identify the issues.12,13

Most studies on EHR DQAI under fitness-for-use focus on the analysis of data quality in a particular perspective, such as different systems, data types, or intended uses—an approach we term the object-oriented perspective. In many cases, these studies neglect the actual experiences and requirements of end-users directly engaged with EHR data (referred to as the user-oriented perspective), an oversight that can significantly influence the quality of EHR DQ. Therefore, in this study, we attempt to expand the research focus from the object-oriented to the user-oriented. Generally, EHR users can include doctors, nurses, hospital supervisors, clinical researchers, health organizations, and industry partners. This study specifically targets the users most directly involved in EHR data generation and application: doctors, nurses, hospital supervisors, and clinical researchers. We first constructed an EHR DQAI system (some preliminary results have been roughly reported14). Then, through statistical data from a questionnaire survey, a quantitative methodology was used to delineate DQAI dimensions and explore the views of different groups on the DQAI system at both the indicator level and dimension level. Ultimately, some recommendations were provided to improve EHR data quality and policy-making from a user-oriented standpoint.

Materials and methods

Study setting

The procedure of this study can be divided into 5 primary steps: (1) performing a literature review; (2) refining indicators; (3) designing a questionnaire and conducting a survey; (4) constructing dimensions and evaluating the DQAI system; and (5) analyzing the different views of different types of users.

Literature review

A literature review was performed under the guidelines of Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) (see Figure 1).15 A search was conducted up to October 2022 in 2 databases: PubMed, a database for biomedical literature; and Web of Science, a comprehensive citation index database for core journals. The search keywords “Data Quality,” “Electronic Health Record*,” EHR, “Electronic Medical Record*,” and EMR were used through different Boolean operators. Screening of abstract was performed based on the systematic review checklist of Critical Appraisal Skills Programme (CASP).16,17 Each abstract was scored by 3 independent assessors, and decisions were made based on consensus or majority vote. This process yielded a set of key papers for further indicator refinement.

PRISMA flow diagram illustrating the systematic review process. A total of 1738 records were retrieved from PubMed and Web of Science databases. After removing 490 duplicates and 78 records published before 2010, 1170 records remained for screening. Of these, 767 were excluded based on titles that did not include the term ‘quality.’ The remaining 403 records underwent abstract screening, where 374 were excluded for lacking direct relevance to fitness-for-use and data quality based on CASP. One additional document, a Chinese official standard, was included. A total of 30 studies were included in the final review.
Figure 1.

Procedure and result of literature review. CASP = Critical Appraisal Skills Programme; EHR = electronic health record; WOS = Web of Science.

Indicator refinement

Based on the set of key papers, the EHR DQAI were refined in 3 main steps (see Figure 2): (1) listing and normalizing all relevant indicators mentioned within the set; (2) determining the frequencies of the listed indicators were obtained and retaining the indicators with higher frequencies; and (3) discussing and retaining additional important indicators based on input from team members and domain experts. Eventually, the retained indicators with the highest frequency and the indicators suggested by experts, enriched with definitions and examples, formed an EHR DQAI system for the follow-up analysis.

Flow diagram illustrating the process of developing the EHR Data Quality Assessment Indicator (DQAI) system. Indicators from 30 key studies were standardized and listed. According to ‘fitness for use’ principle, relevant indicators were retained, and irrelevant ones were excluded. Then, high-frequency indicators were prioritized, while low-frequency indicators were removed. This process resulted in 16 high-frequency, ‘fitness for use’-aligned indicators, which were defined and supplemented with examples to finalize the DQAI system.
Figure 2.

Process of indicator refinement. DQAI = data quality assessment indicators; EHR = electronic health record.

Survey design and conduct

Survey design

To investigate the preferences of different types of EHR users, a questionnaire was developed based on the DQAI system, in which a 5-point Likert-type scale was used to measure their perception (score 1-5: from very unimportant to very important). A higher score indicates higher importance of an indicator.

Survey conduct

This study adopted a stratified random sampling method to ensure the representativeness of the questionnaire results. Three institutions in a third-tier city in central China were selected as the survey sites, including 2 Class III Grade A hospitals (the highest-level hospital in China, accounting for 33% of such hospitals in the city) and one research-oriented university (accounting for 50% of the total universities in the city). This city, as a sub-central city of a developed province, has a concentration of medical resources and a high level of healthcare informatization, which reflects a representative sample of above-average level in China. Then, coordinators were appointed at each institutions to facilitate the survey.

Before conducting the online survey, we ensured that participants could understand the questionnaire content. On one hand, we prepared instructional guidelines for the questionnaire to help responders understand the process and objectives. On the other hand, we invited 2 experienced trainers to supervise the test coordinators during the survey implementation.

Comprehensive evaluation

To improve the objectivity of “fuzziness” and render the results more objective and scientific, the entropy weight method and the fuzzy comprehensive evaluation model were used to comprehensively evaluate the DQAI system. The entropy weight method was utilized to determine the weight of each indicator. If the entropy of an indicator is smaller, more information it contains and a higher weight. In general, the higher the comprehensive evaluation score, the better the effect.18 The process of the comprehensive evaluation was composed of the following steps: (1) establishing the factor set, the evaluation set, and the set of entropy weights; (2) constructing the fuzzy relationship matrix; (3) performing the fuzzy comprehensive evaluation; and (4) calculating the comprehensive evaluation score. Appendix S1 provides detailed steps.

Statistical analysis and dimension construction

Descriptive statistical analyses, such as demographics, were used to describe the sample distribution. Kendall’s W coefficient was used to describe the consistency of participants in the survey. The Kruskal-Wallis test was conducted to examine the differences of each indicator among different types of users. Exploratory factor analysis (EFA)19 using the principal component analysis method was performed for structural validity analysis to define the dimension of the EHR DQAI system.

Preference analysis

Preference analysis for different types of users was conducted through heat maps, kernel density plots, and scatter plots. The preference analysis was performed at both the indicator-level and dimension-level.

Results

EHR DQAI

As shown in Figure 2, we obtained 29 papers following the PRISMA literature review process. Additionally, based on experts’ advice, we included an official Chinese standard document issued by the National Medical Products Administration (NMPA). A set of 30 key papers was finally obtained. From this corpus, we extracted 41 indicators, which were subsequently normalized (see Table 1) and analyzed for frequency. Based on the frequency analysis, we retained indicators that appeared more than 4 times. One exception was “portability,” which was retained due to its inclusion in the NMFA standard we supplemented. We finally extracted 16 indicators for our EHR DQAI system (see Table 2),14 which were also used to develop the Questionnaire named the Importance of Electronic Health Records Data Quality Indicators (see Appendix S2).

Table 1.

Harmonization of terminologies.

Harmonized indicatorTerminologies used in the literature
accuracyaccuracy, correctness
completenesscompleteness, comprehensiveness
timelinesstimeliness, contemporaneous, currency
consistencyconsistency, concordance
precisionprecision, correctness
conformanceconformance, conformity, formal medical coding schemes
uniquenessuniqueness, duplication
credibilitycredibility, integrity, believability, reliability, trustworthiness
plausibilityplausibility
traceabilitytraceability, attributable, contextualization
portabilityportability
usabilityusability
accessibilityaccessibility, available, linkability
relevancerelevance, representativeness
applicabilityapplicability, flexibility, predictive value, adaptability
understandabilityunderstandability, legible, interpretability, comprehensiveness, definition, readability
Harmonized indicatorTerminologies used in the literature
accuracyaccuracy, correctness
completenesscompleteness, comprehensiveness
timelinesstimeliness, contemporaneous, currency
consistencyconsistency, concordance
precisionprecision, correctness
conformanceconformance, conformity, formal medical coding schemes
uniquenessuniqueness, duplication
credibilitycredibility, integrity, believability, reliability, trustworthiness
plausibilityplausibility
traceabilitytraceability, attributable, contextualization
portabilityportability
usabilityusability
accessibilityaccessibility, available, linkability
relevancerelevance, representativeness
applicabilityapplicability, flexibility, predictive value, adaptability
understandabilityunderstandability, legible, interpretability, comprehensiveness, definition, readability
Table 1.

Harmonization of terminologies.

Harmonized indicatorTerminologies used in the literature
accuracyaccuracy, correctness
completenesscompleteness, comprehensiveness
timelinesstimeliness, contemporaneous, currency
consistencyconsistency, concordance
precisionprecision, correctness
conformanceconformance, conformity, formal medical coding schemes
uniquenessuniqueness, duplication
credibilitycredibility, integrity, believability, reliability, trustworthiness
plausibilityplausibility
traceabilitytraceability, attributable, contextualization
portabilityportability
usabilityusability
accessibilityaccessibility, available, linkability
relevancerelevance, representativeness
applicabilityapplicability, flexibility, predictive value, adaptability
understandabilityunderstandability, legible, interpretability, comprehensiveness, definition, readability
Harmonized indicatorTerminologies used in the literature
accuracyaccuracy, correctness
completenesscompleteness, comprehensiveness
timelinesstimeliness, contemporaneous, currency
consistencyconsistency, concordance
precisionprecision, correctness
conformanceconformance, conformity, formal medical coding schemes
uniquenessuniqueness, duplication
credibilitycredibility, integrity, believability, reliability, trustworthiness
plausibilityplausibility
traceabilitytraceability, attributable, contextualization
portabilityportability
usabilityusability
accessibilityaccessibility, available, linkability
relevancerelevance, representativeness
applicabilityapplicability, flexibility, predictive value, adaptability
understandabilityunderstandability, legible, interpretability, comprehensiveness, definition, readability
Table 2.

The EHR DQAI system.

IndicatorDefinitionExampleReference
accuracyThe patient information recorded in the EHR system is consistent with the actual situation of the patient.Whether the patient’s information is registered accurately (name, age, etc).3,10,12,20–39
completenessThe patient information recorded in the EHR system is detailed and complete.Whether the patient’s personal information and condition information are complete.3,9,10,12,13,20,21,23–44
timelinessThe patient status recorded in the EHR system is timely and effective.When the patient’s condition changes, relevant information is updated in time.3,12,13,20–22,25–30,32–37,39,41,43
consistencyThe degree of internal and external consistency of EHR data should meet the indicators claimed by managers.The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance.3,10,12,21,23,25–39,41–43
precisionThe qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer.The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter.12,21,28,31,37
conformanceEHR data are stored, processed, and circulated in a standardized format.The information of the EHR is registered according to the department’s digital code and the patient's ID number.9,21,23,24,30,32,35,39,40,42,44
uniquenessThere is no duplication of EHR data records; some data must remain unique.The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication.12,21,24,25,28,31,32,37,38
credibilityEHR data are sourced from professional institutions; data are frequently reviewed.The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital.21,23,25–28,30,33–35,38–40
plausibilityThe value of EHR data are reasonable.Whether the data such as the patient’s age and condition conform to common sense.3,9,24,27,29,32,39,40,44
traceabilityEnsure the auditability of the EHR data access trail and change trail.EHR data can track information such as diagnosis and treatment institutions and doctors.25,28,31,34,36,37
portabilityThe extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producerEHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements.37
usabilityEHR data should be available at the level claimed by the data administrator.The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease.21,22,32,37,38
accessibilityEHR data can be easily accessed and extracted with a user-friendly interface.EHR data provide a professional website for the attendings and patients to easily access and obtain.20–24,34,36–39
relevanceThere is some desired correlation between EHR data.The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement.21,22,25,26,30,32,34
applicabilityThe content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out.The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis.22,26,28,31,32
understandabilityThe preview and interpretation levels of EHR data should be at the level claimed by the data collector.The information explained by EHR data can be understood by doctors or other users clear and easily.20–24,32,34,36–38
IndicatorDefinitionExampleReference
accuracyThe patient information recorded in the EHR system is consistent with the actual situation of the patient.Whether the patient’s information is registered accurately (name, age, etc).3,10,12,20–39
completenessThe patient information recorded in the EHR system is detailed and complete.Whether the patient’s personal information and condition information are complete.3,9,10,12,13,20,21,23–44
timelinessThe patient status recorded in the EHR system is timely and effective.When the patient’s condition changes, relevant information is updated in time.3,12,13,20–22,25–30,32–37,39,41,43
consistencyThe degree of internal and external consistency of EHR data should meet the indicators claimed by managers.The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance.3,10,12,21,23,25–39,41–43
precisionThe qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer.The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter.12,21,28,31,37
conformanceEHR data are stored, processed, and circulated in a standardized format.The information of the EHR is registered according to the department’s digital code and the patient's ID number.9,21,23,24,30,32,35,39,40,42,44
uniquenessThere is no duplication of EHR data records; some data must remain unique.The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication.12,21,24,25,28,31,32,37,38
credibilityEHR data are sourced from professional institutions; data are frequently reviewed.The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital.21,23,25–28,30,33–35,38–40
plausibilityThe value of EHR data are reasonable.Whether the data such as the patient’s age and condition conform to common sense.3,9,24,27,29,32,39,40,44
traceabilityEnsure the auditability of the EHR data access trail and change trail.EHR data can track information such as diagnosis and treatment institutions and doctors.25,28,31,34,36,37
portabilityThe extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producerEHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements.37
usabilityEHR data should be available at the level claimed by the data administrator.The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease.21,22,32,37,38
accessibilityEHR data can be easily accessed and extracted with a user-friendly interface.EHR data provide a professional website for the attendings and patients to easily access and obtain.20–24,34,36–39
relevanceThere is some desired correlation between EHR data.The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement.21,22,25,26,30,32,34
applicabilityThe content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out.The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis.22,26,28,31,32
understandabilityThe preview and interpretation levels of EHR data should be at the level claimed by the data collector.The information explained by EHR data can be understood by doctors or other users clear and easily.20–24,32,34,36–38

EHR = electronic health record.

Table 2.

The EHR DQAI system.

IndicatorDefinitionExampleReference
accuracyThe patient information recorded in the EHR system is consistent with the actual situation of the patient.Whether the patient’s information is registered accurately (name, age, etc).3,10,12,20–39
completenessThe patient information recorded in the EHR system is detailed and complete.Whether the patient’s personal information and condition information are complete.3,9,10,12,13,20,21,23–44
timelinessThe patient status recorded in the EHR system is timely and effective.When the patient’s condition changes, relevant information is updated in time.3,12,13,20–22,25–30,32–37,39,41,43
consistencyThe degree of internal and external consistency of EHR data should meet the indicators claimed by managers.The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance.3,10,12,21,23,25–39,41–43
precisionThe qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer.The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter.12,21,28,31,37
conformanceEHR data are stored, processed, and circulated in a standardized format.The information of the EHR is registered according to the department’s digital code and the patient's ID number.9,21,23,24,30,32,35,39,40,42,44
uniquenessThere is no duplication of EHR data records; some data must remain unique.The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication.12,21,24,25,28,31,32,37,38
credibilityEHR data are sourced from professional institutions; data are frequently reviewed.The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital.21,23,25–28,30,33–35,38–40
plausibilityThe value of EHR data are reasonable.Whether the data such as the patient’s age and condition conform to common sense.3,9,24,27,29,32,39,40,44
traceabilityEnsure the auditability of the EHR data access trail and change trail.EHR data can track information such as diagnosis and treatment institutions and doctors.25,28,31,34,36,37
portabilityThe extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producerEHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements.37
usabilityEHR data should be available at the level claimed by the data administrator.The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease.21,22,32,37,38
accessibilityEHR data can be easily accessed and extracted with a user-friendly interface.EHR data provide a professional website for the attendings and patients to easily access and obtain.20–24,34,36–39
relevanceThere is some desired correlation between EHR data.The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement.21,22,25,26,30,32,34
applicabilityThe content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out.The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis.22,26,28,31,32
understandabilityThe preview and interpretation levels of EHR data should be at the level claimed by the data collector.The information explained by EHR data can be understood by doctors or other users clear and easily.20–24,32,34,36–38
IndicatorDefinitionExampleReference
accuracyThe patient information recorded in the EHR system is consistent with the actual situation of the patient.Whether the patient’s information is registered accurately (name, age, etc).3,10,12,20–39
completenessThe patient information recorded in the EHR system is detailed and complete.Whether the patient’s personal information and condition information are complete.3,9,10,12,13,20,21,23–44
timelinessThe patient status recorded in the EHR system is timely and effective.When the patient’s condition changes, relevant information is updated in time.3,12,13,20–22,25–30,32–37,39,41,43
consistencyThe degree of internal and external consistency of EHR data should meet the indicators claimed by managers.The patient EHR is consistent within the department and between internal and external departments such as joint diagnosis teams and medical insurance.3,10,12,21,23,25–39,41–43
precisionThe qualitative or quantitative precision of the EHR data should meet the level claimed by the data set producer.The localization of the breast mass should be done according to the combination of the clock face and the quadrant, and the height should be recorded to the centimeter.12,21,28,31,37
conformanceEHR data are stored, processed, and circulated in a standardized format.The information of the EHR is registered according to the department’s digital code and the patient's ID number.9,21,23,24,30,32,35,39,40,42,44
uniquenessThere is no duplication of EHR data records; some data must remain unique.The patient medical record data are registered according to the ID number as the main keyword needs to be kept unique to avoid duplication.12,21,24,25,28,31,32,37,38
credibilityEHR data are sourced from professional institutions; data are frequently reviewed.The diagnosis data in the EHR need to be filled in by the attendings of the professional hospital.21,23,25–28,30,33–35,38–40
plausibilityThe value of EHR data are reasonable.Whether the data such as the patient’s age and condition conform to common sense.3,9,24,27,29,32,39,40,44
traceabilityEnsure the auditability of the EHR data access trail and change trail.EHR data can track information such as diagnosis and treatment institutions and doctors.25,28,31,34,36,37
portabilityThe extent to which EHR data can be stored, replaced, or transferred from one system to another while maintaining the existing quality claimed by the data producerEHR data can be copied and stored from medical institutions to insurance institutions according to certain specifications, providing support for medical insurance settlements.37
usabilityEHR data should be available at the level claimed by the data administrator.The attendings can retrieve the patient’s previous physical examination data from the EHR system to effectively diagnose and judge the current disease.21,22,32,37,38
accessibilityEHR data can be easily accessed and extracted with a user-friendly interface.EHR data provide a professional website for the attendings and patients to easily access and obtain.20–24,34,36–39
relevanceThere is some desired correlation between EHR data.The patient ID number of the EHR is associated with information such as diagnosis, treatment, and medical reimbursement.21,22,25,26,30,32,34
applicabilityThe content recorded in the EHR is suitable for the patient's health management and disease diagnosis and treatment; the extracted data are suitable for the research or diagnosis carried out.The patient’s physical examination information recorded in the EHR provides relevant support for the diagnosis.22,26,28,31,32
understandabilityThe preview and interpretation levels of EHR data should be at the level claimed by the data collector.The information explained by EHR data can be understood by doctors or other users clear and easily.20–24,32,34,36–38

EHR = electronic health record.

Descriptive statistics

In the survey, we distributed 210 questionnaires, collected 198 questionnaires, and eliminated 3 invalid questionnaires, resulting in a total of 195 valid questionnaires (92.86%).

We provided demographic details separately for gender, age, educational background, job role, years of clinical work experience, years of usage of EHR system, and skill level of using EHR system (see Table 3). The average age of participants was 36.74, the average clinical working years was 13.73, the average years of EHR system usage was 7.92, and almost three fourth of the participants were skilled or very skilled in using the EHR system. Among them, the proportions of female hospital supervisors, doctors, nurses, and researchers were as follows: 72.7% (8/11), 38.0% (35/92), 94.1% (64/68), and 25.0% (6/24).

Table 3.

Demographic information (Nin total = 195).

VariableNPercent in total (%)
GenderMale8242.1
Female11357.9
Age−283920.0
28-355226.7
35-403819.5
40-505628.7
50-105.1
DegreeAssociate31.5
Bachelor11559.0
Master6935.4
PhD or MD84.1
Job roleHospital supervisor115.6
Doctor9247.2
Nurse6834.9
Clinical researcher2412.3
Years of clinical−54925.1
5-103517.9
10-153115.9
15-255126.2
25-2914.9
Years of EHR−56432.8
5-109247.2
10-152914.9
15-2573.6
25-31.5
Level of EHRVery unskilled31.5
Unskilled42.1
Generally skilled4322.1
Skilled10754.9
Very skilled3819.5
VariableNPercent in total (%)
GenderMale8242.1
Female11357.9
Age−283920.0
28-355226.7
35-403819.5
40-505628.7
50-105.1
DegreeAssociate31.5
Bachelor11559.0
Master6935.4
PhD or MD84.1
Job roleHospital supervisor115.6
Doctor9247.2
Nurse6834.9
Clinical researcher2412.3
Years of clinical−54925.1
5-103517.9
10-153115.9
15-255126.2
25-2914.9
Years of EHR−56432.8
5-109247.2
10-152914.9
15-2573.6
25-31.5
Level of EHRVery unskilled31.5
Unskilled42.1
Generally skilled4322.1
Skilled10754.9
Very skilled3819.5

EHR = electronic health record.

Table 3.

Demographic information (Nin total = 195).

VariableNPercent in total (%)
GenderMale8242.1
Female11357.9
Age−283920.0
28-355226.7
35-403819.5
40-505628.7
50-105.1
DegreeAssociate31.5
Bachelor11559.0
Master6935.4
PhD or MD84.1
Job roleHospital supervisor115.6
Doctor9247.2
Nurse6834.9
Clinical researcher2412.3
Years of clinical−54925.1
5-103517.9
10-153115.9
15-255126.2
25-2914.9
Years of EHR−56432.8
5-109247.2
10-152914.9
15-2573.6
25-31.5
Level of EHRVery unskilled31.5
Unskilled42.1
Generally skilled4322.1
Skilled10754.9
Very skilled3819.5
VariableNPercent in total (%)
GenderMale8242.1
Female11357.9
Age−283920.0
28-355226.7
35-403819.5
40-505628.7
50-105.1
DegreeAssociate31.5
Bachelor11559.0
Master6935.4
PhD or MD84.1
Job roleHospital supervisor115.6
Doctor9247.2
Nurse6834.9
Clinical researcher2412.3
Years of clinical−54925.1
5-103517.9
10-153115.9
15-255126.2
25-2914.9
Years of EHR−56432.8
5-109247.2
10-152914.9
15-2573.6
25-31.5
Level of EHRVery unskilled31.5
Unskilled42.1
Generally skilled4322.1
Skilled10754.9
Very skilled3819.5

EHR = electronic health record.

Consistency

Cronbach’s α coefficient was 0.980 and Kendall’s W coefficient was 0.730 (χ2=2266.213,P< .05), indicating that the questionnaire has a high internal consistency and the results are reliable.

Kruskal-Wallis test

The Kruskal-Wallis (KW) test was employed to examine the differences among 4 groups of participants across 16 indicators. The P-values after the Bonferroni correction are shown in Figure 3. Only the adjusted P-value for timeliness exceeded .05 (P = .057), indicating no significant difference, while the adjusted P-values for the other indicators were all less than .05 (ranging from .002 to .057), demonstrating statistically significant differences among 4 types of users.

Figure 3 showing the results of the Kruskal-Wallis test with Bonferroni correction applied. The adjusted P-value for ‘timeliness’ exceeds .05, indicating no significant difference. For all other indicators, the adjusted P-values are below .05, suggesting statistically significant differences among the four user types across the 16 indicators.
Figure 3.

P-values of Kruskal-Wallis test for each indicator.

Entropy weight

As an intermediate step in the comprehensive evaluation, the entropy weight method was applied, resulting in a set of entropy weights ranging from 5.33%-7.62% (refer to Table S1).

Fuzzy comprehensive evaluation

The fuzzy comprehensive evaluation of the DQAI system was performed using the weighted average M (∧,+) operator and the maximum membership rule. As shown in Table 4, the membership degree of “very important” was 62.5% and the comprehensive evaluation score was 90.445 points, which demonstrates that the overall evaluation of the DQAI system was “very important,” evidencing a strong acceptance of the system among the survey responders.

Table 4.

Result of comprehensive evaluation

Very unimportantUnimportantGenerally importantImportantVery important
Membership weights0.0170.0040.030.3250.625a
Composite score90.445b
Very unimportantUnimportantGenerally importantImportantVery important
Membership weights0.0170.0040.030.3250.625a
Composite score90.445b
a

The score signifies a ‘very important’ classification at a 62.5% membership level.

b

The score of 90.445 points denotes the outcome of the comprehensive evaluation.

Table 4.

Result of comprehensive evaluation

Very unimportantUnimportantGenerally importantImportantVery important
Membership weights0.0170.0040.030.3250.625a
Composite score90.445b
Very unimportantUnimportantGenerally importantImportantVery important
Membership weights0.0170.0040.030.3250.625a
Composite score90.445b
a

The score signifies a ‘very important’ classification at a 62.5% membership level.

b

The score of 90.445 points denotes the outcome of the comprehensive evaluation.

Dimension construction

The structural validity of the DQAI system was analyzed using the method of principal component analysis. The Kaiser-Meyer-Olkin (KMO) value was 0.959 and the Bartlett sphericity test had a significance of P = .000. By fixing 2 factors and applying a varimax rotation, the total variance explained increased to 82.416%, indicating that the 2 factors contain a large amount of information with a high degree of explanation (refer to Table S2). Among 16 indicators, the 9 indicators (accuracy, completeness, timeliness, consistency, precision, conformance, uniqueness, credibility, and plausibility) were included in common factor 1, while the remaining 7 indicators (traceability, portability, usability, accessibility, relevance, applicability, and understandability) were contained in common factor 2. The loadings of the 2 common factors ranged from 0.652-0.847, indicating that the questionnaire has high structural validity. Cronbach’s α coefficients of the 2 common factors were verified again, which were 0.967 (common factor 1) and 0.968 (common factor 2), respectively. The results show that the questionnaire possesses solid structural validity, and the 16 indicators are well divided into 2 common factors.

For the first common factor, we observed that the indicators it contains can be easily quantified and standardized, and they can be used to describe the characteristics of the data itself. Therefore, we designate the 9 indicators belonging to common factor 1 as structural indicators, which form the structural dimension. For the second common factor, its indicators are more suited to characterizing attributes that are intimately associated with the generation, utilization, and dissemination of data. Therefore, we designate the 7 indicators belonging to common factor 2 as relational indicators, which form the relational dimension.

Preferences analysis for 4 different types of EHR users

Indicator-level preferences analysis

We can observe the mean values for the 4 groups of EHR users on each indicator from the heatmap (see Figure 4). For the 16 indicators, 3 key insights emerge in a highly intuitive way: First, there is a striking similarity in the mean scores between doctors (ranging from 4.43 to 4.66) and nurses (ranging from 4.53 to 4.63) on the heatmap, indicating a high level of agreement regarding the importance of all indicators within our DQAI system. This correlation is expected, given their primary roles as the principal producers and consumers of EHR data, underscoring their recognition of the significance of each indicator in ensuring data quality. Second, researchers rated credibility the highest, with a mean of 4.79, while their mean values for other indicators are slightly lower than those of doctors and nurses. As secondary users of EHR data, researchers rely on high credibility of data to ensure accuracy in tasks such as training computer-assisted diagnosis models. Third, aside from accuracy, supervisors exhibit a lower concern for other indicators, as evidenced by their mean values for portability (3.73), understandability (3.82), and accessibility (3.91), with a large range of mean values (3.73-4.55). Unlike operational and technical EHR users, supervisors, from a managerial perspective, primarily focus on accuracy to obtain precise reports.

Heatmap displaying the average scores of four EHR user groups across each indicator. The scores for doctors and nurses show remarkable similarity. Researchers rated ‘credibility’ highest, while their scores for other indicators were slightly lower than those of doctors and nurses. Supervisors showed lower attention to most indicators except for ‘accuracy.’
Figure 4.

The heatmap of the mean for each indicator on 4 different types of roles.

The detailed density distribution of scores for the 4 groups is shown in Figure 5. The X-axis represents different scores (1-5). The Y-axis represents the density distribution in different scores, which means that the more scores of 5 participants choose, the higher the density of 5.

Figure 5 illustrates the density distribution of scores across four user types. The X-axis represents the score range from 1 to 5, while the Y-axis indicates the density of scores at each level. A higher density at a score of 5 signifies a greater number of users selecting that score. The density distributions for different user types across indicators are broadly consistent, with notable peaks at scores of 4 and 5. Doctors display the highest density at a score of 5, whereas nurses show their peak density at a score of 4. The density plot corroborates the findings from the heatmap analysis.
Figure 5.

The density distribution of each indicator on 4 different types of roles.

The overall density distribution of different types of users on each indicator is often similar, with a higher density on scores of 4 or 5. In general, the highest density for doctors is at a score of 5, while for nurses, it is at a score of 4. In terms of differences between scores of 4 and 5, doctors show a more pronounced distribution difference (with a higher density at 5 and a lower density at 4), while nurses display a smaller gap (with scores of 4 and 5 more closely distributed). This may relate to factors such as job-related stress, professional roles, personality traits, or demographics differences. For instance, 94.1% of nurses are female, which may naturally influence an inclination away from consistently giving the highest score of 5. Additionally, the density plots corroborate the observations made in the heatmap analysis. For example, researchers show only slight density differences between scores of 4 and 5, with a more concentrated difference for credibility, where most ratings are at 5. Supervisors, in contrast, have the widest density range, indicating more dispersed ratings, which aligns with their generally lower concern for these indicators as highlighted in Figure 4.

Dimension-level preference analysis

As shown in Figure 6, the X and Y coordinates in the scatter plot represent the relational and structural dimensions, respectively. The distribution of samples with different job roles is represented with different colors and shapes. Based on Figures 4 and 6, we can observe that (1) hospital supervisors are more inclined towards quantifiable structural dimension. (2) Doctors demonstrate a strong emphasis on both relational and structural dimensions. Nevertheless, some extremely low values are observed. (3) Nurses attach great importance to the 2 dimensions, which is similar to the doctors’ point of view. Nevertheless, the difference is that the dots in the scatter plot are more concentrated for nurses. The indicators with the top 3 mean values for nurses are accuracy, uniqueness, and plausibility. (4) Clinical researchers pay more attention to quantifiable structural indicators, similar to hospital supervisors. The indicators with the top 3 mean values for clinical researchers are conformance, uniqueness, and credibility.

Figure 6 is a scatter plot where the X-axis represents the relational dimension, and the Y-axis represents the structural dimension. Different colors and shapes denote the distribution of samples by job roles. The blue square represents hospital supervisors, the orange circle represents doctors, the green circle represents nurses, and the purple stars represent clinical researchers. The plot reveals that hospital supervisors are more inclined towards the quantifiable structural dimension. Doctors place strong emphasis on both dimensions, though some extreme low values are present. Nurses, similar to doctors, value both dimensions; however, their distribution is more concentrated. Clinical researchers, like supervisors, focus more on structural indicators.
Figure 6.

The dimension-level perspective on 4 different types of job roles.

Discussion

In this study, the EHR DQAI system was developed, which includes 16 indicators. These indicators can be categorized into 2 dimensions: the structural dimension and the relational dimension. The structural dimension indicators can be readily standardized and pertain to the structure of the data itself, while the relational dimension includes indicators that associate with the dynamics of data production, utilization, and sharing. Additionally, the results of the preference analysis show that different types of users demonstrate different preferences toward the EHR DQAI system, both at the indicator level and the dimension level. At the indicator level, doctors and nurses generally attach great importance to all indicators, while supervisors generally do not attach much importance to indicators, with the exception of accuracy. Researchers place a similar level of importance on all indicators except for a high importance on credibility. At the dimension level, doctors and nurses have shown great interest in both 2 dimensions, while clinical researchers and hospital supervisors focus more on structural dimension indicators.

Given that doctors and nurses are the 2 large sample groups in this study, and as the largest producers and users, they believe that convenient data input and fast access to accurate EHR data can greatly improve their work efficiency. Therefore, they recognize the importance of DQAI system and prefer relational dimension more since relational indicators can help them operate and share EHR data easily. The difference between doctors and nurses lies in the fact that the mean for nurses is more evenly, which may be due to various factors such as the differences in gender traits, education, job responsibilities, and pressure between the 2 groups. For doctors, some extremely low values are observed in 2 dimensions (see Figure 4), indicating that doctors working under high-pressure intensity are more likely to have negative emotions toward EHR data. For nurses, most of them are female (94.1%). They are more sensitive to details such as whether the data are consistent with common sense, or whether there are multiple cases with different keywords for the same patient, etc. Therefore, they score high on plausibility and accessibility, and their perception distribution on the scatter plot of the relationship dimension is also more concentrated in high-scoring areas. For hospital supervisors, although they do not attach great importance to the DQAI system, it is evident that they are very interested in structural dimension indicators, especially accuracy. They are usually the owners and users of EHR data, determining the content and permissions for data retrieval. They usually focus more on the efficiency and accuracy of obtaining reports rather than the actual data generation and other operational tasks. For researchers, they often need to reuse EHR data for research and modeling. For example, they may reuse EHR data to develop a model to predict the risk of patient death and the DQ directly affects the effectiveness of the model. Therefore, credibility is particularly important for them.

In this study, considering that different types of users have different needs and preferences in different data usage scenarios, we systematically not only constructed a DQAI system but also conducted a more comprehensive analysis and verification through empirical investigation. The main contributions of our study are manifold: first, we used statistical methods to categorize the dimensions of the DQAI system. This approach addresses the shortcomings of previous studies that relied on expert experience or literature review.3,11 Second, we broadened the perspective of previous research, which was limited to an object-oriented viewpoint.8–10 From a user-oriented perspective, we focused on the perceptions and opinions of EHR data users, as these influence their willingness to use EHR systems, which, in turn, affects the quality of EHR data. Third, the results of this study are rooted in real-world usage scenarios, enabling the formulation of practical recommendations for improving EHR data quality and hospital management. Specific recommendations include: (1) Doctors and nurses, as the main producers and users of EHR data, could benefit from more user-friendly interfaces designed using relational indicators. (2) Hospital supervisors, who act as both users and regulators, emphasize data accuracy and the overall performance of EHR DQAI systems. We can enhance data reporting functions based on their needs, enabling easy and quick retrieval of reports. (3) For clinical researchers, data credibility is crucial. As secondary users of EHR data, we can provide convenient data querying and analysis tools tailored to their practices.

There are several limitations in this study. First, the sample size is relatively small. In the future, we plan to increase the sample size, specifically including more hospital supervisors and clinical researchers. Second, the participants of the empirical investigation are selected from one area in China. A possible line of future research is to validate the results of this study using the data collected from other areas in China or other countries. Third, this study focuses on the analysis of the preferences of DQAI indicators for different job roles. We plan to expand this study by analyzing the preferences from other perspectives.

Conclusions

This study develops and evaluates an EHR DQAI system composed of 16 indicators in 2 dimensions using a systematic approach. It takes a user-oriented perspective that has been overlooked in the past and discovers differences in preferences for DQAI among different user groups. This can provide new insights for the optimization and development of EHR systems, as well as policy formulation by relevant agencies, thereby enhancing the quality of EHR data throughout its lifecycle, from inception in data production to application in policy-making.

Acknowledgments

The authors would like to thank the support of the National Social Science Foundation of China, Beijing Normal University, and Yichang Central People’s Hospital.

Author contributions

Study concepts and design: Liu Yang, Yirong Wu; data evaluation and collecting: Mudan Ren, Ji Lu; data analysis: Liu Yang, Shuifa Sun; original manuscript drafting: Liu Yang; editing manuscript: Yirong Wu, Shuifa Sun; approval of the final version of the submitted manuscript: all authors; agree to ensure any questions related to the work are appropriately resolved: all authors.

Supplementary material

Supplementary material is available at JAMIA Open online.

Funding

This work was supported by the project “Research on data quality of electronic medical record based on data semantics” of the National Social Science Foundation of China, grant number 20BTQ-066.

Conflicts of interest

There are no competing interests to declare.

Data availability

The research data can be made available upon reasonable request from the corresponding author. The data are not publicly available due to privacy restrictions.

References

1

Dai
MF
,
Meng
Q.
Opportunities and challenges in data mining and data analysis of health care big data
.
Chin J Health Inform Manag
.
2017
;
14
:
126
-
130
.

2

Stephens
KA
,
Lee
ES
,
Estiri
H
,
Jung
H.
Examining researcher needs and barriers for using electronic health data for translational research
.
AMIA Jt Summits Transl Sci Proc
.
2015
;
2015
:
168
-
172
.

3

Weiskopf
NG
,
Weng
CH.
Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research
.
J Am Med Inform Assoc
.
2013
;
20
:
144
-
151
.

4

Juran
JM
,
Godfrey
AB
,
Hoogstoel
RE
, et al.
Juran’s Quality Handbook
. 5th ed.
McGraw Hill
;
1999
.

5

Wang
RY
,
Strong
DM.
Beyond accuracy: what data quality means to data consumers
.
J Manag Inform Syst
.
1996
;
12
:
5
-
33
.

6

Strong
DM
,
Lee
YW
,
Wang
RY.
Data quality in context
.
Commun ACM
.
1997
;
40
:
103
-
110
.

7

Batini
C
,
Cappiello
C
,
Francalanci
C
,
Maurino
A.
Methodologies for data quality assessment and improvement
.
ACM Comput Surv
.
2009
;
41
:
1
-
52
.

8

Taggart
J
,
Liaw
ST
,
Yu
H.
Structured data quality reports to improve EHR data quality
.
Int J Med Inform
.
2015
;
84
:
1094
-
1098
.

9

Kamdje-Wabo
G
,
Gradinger
T
,
Löbe
M
, et al.
Towards structured data quality assessment in the German medical informatics initiative: initial approach in the mii demonstrator study
.
Stud Health Technol Inform
.
2019
;
264
:
1508
-
1509
.

10

Botsis
T
,
Hartvigsen
G
,
Chen
F
,
Weng
C.
Secondary use of EHR: data quality issues and informatics opportunities
.
Summit Transl Bioinform
.
2010
;
2010
:
1
-
5
.

11

Razzaghi
H
,
Greenberg
J
,
Bailey
LC.
Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use
.
Learn Health Syst
.
2022
;
6
:
e10264
.

12

Ehsani-Moghaddam
B
,
Martin
K
,
Queenan
JA.
Data quality in healthcare: a report of practical experience with the Canadian primary care sentinel surveillance network data
.
Health Inf Manag
.
2021
;
50
:
88
-
92
.

13

Fu
S
,
Wen
A
,
Schaeferle
GM
, et al.
Assessment of data quality variability across two EHR systems through a case study of post-surgical complications
.
AMIA Annu Symp Proc
.
2022
;
2022
:
196
-
205
.

14

Yang
L
,
Li
XL
,
Li
SP
,
Wu
YR.
Study on the construction of an electronic medical record data quality assessment index system
.
J Med Inform
.
2023
;
44
:
28
-
32,38
.

15

Tricco
AC
,
Lillie
E
,
Zarin
W
, et al.
PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation
.
Ann Intern Med
.
2018
;
169
:
467
-
473
.

16

CASP systematic review checklist
. Accessed November 24, 2022. https://casp-uk.net/casp-tools-checklists/

17

Zeng
X
,
Zhang
Y
,
Kwong
JSW
, et al.
The methodological quality assessment tools for preclinical and clinical studies, systematic review and meta-analysis, and clinical practice guideline: a systematic review
.
J Evid Based Med
.
2015
;
8
:
2
-
10
.

18

Liu
Y
,
Liang
ZR.
Human literacy evaluation index of medical laboratory students based on entropy weight and fuzzy comprehensive evaluation method
.
Chin J Clin Lab Mgt
.
2021
;
9
:
182
-
186
.

19

Stapleton
CD.
Basic concepts in exploratory factor analysis (EFA) as a tool to evaluate score validity: a right-brained approach.
1997
. Accessed July 18, 2023. https://files.eric.ed.gov/fulltext/ED407419.pdf

20

Farzandipour
M
,
Karami
M
,
Arbabi
M
, et al.
Quality of patient information in emergency department
.
Int J Health Care Qual Assur
.
2019
;
32
:
108
-
119
.

21

Chen
H
,
Hailey
D
,
Wang
N
, et al.
A review of data quality assessment methods for public health information systems
.
Int J Environ Res Public Health
.
2014
;
11
:
5170
-
5207
.

22

Laberge
M
,
Shachak
A.
Developing a tool to assess the quality of socio-demographic data in community health centres
.
Appl Clin Inform
.
2013
;
4
:
1
-
11
.

23

Hinds
A
,
Lix
LM
,
Smith
M
, et al.
Quality of administrative health databases in Canada: a scoping review
.
Can J Public Health
.
2016
;
107
:
e56
-
e61
.

24

Cho
S
,
Weng
C
,
Kahn
MG
, et al.
Identifying data quality dimensions for person-generated wearable device data: multi-method study
.
JMIR Mhealth Uhealth
.
2021
;
9
:
e31618
.

25

Aerts
H
,
Kalra
D
,
Sáez
C
, et al.
Quality of hospital electronic health record (EHR) data based on the international consortium for health outcomes measurement (ICHOM) in heart failure: pilot data quality assessment study
.
JMIR Med Inform
.
2021
;
9
:
e27842
.

26

Johnson
SG
,
Speedie
S
,
Simon
G
, et al.
A data quality ontology for the secondary use of EHR data
.
AMIA Symp
.
2015
;
15
:
1937
-
1946
.

27

Diaz-Garelli
JF
,
Bernstam
EV
,
Lee
M
, et al.
DataGauge: a practical process for systematically designing and implementing quality assessments of repurposed clinical data
.
eGEMS (Wash DC)
.
2019
;
7
:
32
.

28

Sáez
C
,
Martínez-Miranda
J
,
Robles
M
, et al.
Organizing data quality assessment of shifting biomedical data
.
Stud Health Technol Inf
.
2012
;
180
:
721
-
725
.

29

Weiskopf
NG
,
Bakken
S
,
Hripcsak
G
, et al.
A data quality assessment guideline for electronic health record data reuse
.
eGEMS (Wash DC)
.
2017
;
5
:
14
.

30

Dungey
S
,
Beloff
N
,
Puri
S
,
Boggon
R
,
Williams
T
,
Tate
AR
. A pragmatic approach for measuring data quality in primary care databases. In: 2014 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI).
2014
:
797
-
800
.

31

García
R
,
Sáez
C
,
Muñoz-Soler
V
, et al.
Construction of quality-assured infant feeding process of care data repositories: definition and design (part 1)
.
Comput Biol Med
.
2015
;
67
:
95
-
103
.

32

Bian
J
,
Lyu
T
,
Loiacono
A
, et al.
Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data
.
J Am Med Inform Assoc
.
2020
;
27
:
1999
-
2010
.

33

Feder
SL.
Data quality in electronic health records research: quality domains and assessment methods
.
West J Nurs Res
.
2018
;
40
:
753
-
766
.

34

Badr
NG.
Guidelines for health IT addressing the quality of data in EHR information systems. In: 2019  the International Conference on Health Informatics (Healthinf).
2019
:
169
-
181
.

35

Makeleni
N
,
Cilliers
L.
Critical success factors to improve data quality of electronic medical records in public healthcare institutions
.
SA J Inform Manag
.
2021
;
23
:a1230.

36

Dong
CY
,
Yao
C
,
Gao
S
, et al.
Strengthening clinical research source data management in hospitals to promote data quality of clinical research in China
.
JEBM
.
2019
;
19
:
1255
-
1261
.

37

National Medical Products Administration
. Artificial Intelligence Medical Devices Quality Requirements and Evaluation, Part 2: General Requirements for Datasets. YY/T 1833.2-2022. NMPA;
2022
.

38

Alvarez Sanchez
R
,
Beristain Iraola
A
,
Epelde Unanue
G
, et al.
TAQIH, a tool for tabular data quality assessment and improvement in the context of health data
.
Comput Methods Programs Biomed
.
2019
;
181
:
104824
.

39

Iturry
MD
,
Alves-Souza
SN
,
Ito
M
,
da Silva
SA
. Data Quality in health records: a literature review. In: 2021 IEEE 16th Iberian Conference on Information Systems and Technologies (CISTI).
2021
:
1
-
6
.

40

Lee
K
,
Weiskopf
N
,
Pathak
J.
A framework for data quality assessment in clinical research datasets
.
AMIA Symp
.
2017
;
2017
:
1080
-
1089
.

41

Johnson
SG
,
Speedie
S
,
Simon
G
, et al.
Application of an ontology for characterizing data quality for a secondary use of EHR data
.
Appl Clin Inform
.
2016
;
7
:
69
-
88
.

42

Jetley
G
,
Zhang
H.
Electronic health records in is research: quality issues, essential thresholds and remedial actions
.
Decis Support Syst
.
2019
;
126
:
113137
.

43

Terry
AL
,
Stewart
M
,
Cejic
S
, et al.
A basic model for assessing primary health care electronic medical record data quality
.
BMC Med Inform Decis Mak
.
2019
;
19
:
30
-
11
.

44

Kahn
MG
,
Callahan
TJ
,
Barnard
J
, et al.
A harmonized data quality assessment terminology and framework for the secondary use of electronic health record data
.
eGEMS (Wash DC)
.
2016
;
4
:
1244
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Supplementary data