As long ago as 1960, AJ Schneider wrote “… practical medicine is basically founded on comparison” (1). Reference intervals (RIs), the most used comparison tool in laboratory medicine, should ideally be only influenced by the population served and not by the analytical systems used, provided they are sufficiently harmonized/standardized. Therefore, it is theoretically possible and desirable to apply common reference intervals in a specific area. A paper from Canada, published in this issue of Clinical Chemistry (2), gives important information on the feasibility of this approach.

The study uses large amounts of laboratory data (big data), collected from several community laboratories in Canada, and analyses them with a sophisticated statistical approach (the refineR algorithm) to derive “indirect” RI potentially applicable to the Canadian population. The authors, additionally, perform a verification step, by distributing 60 samples from apparently healthy volunteers to check the applicability of the obtained results regardless of the analytical systems or of the matrix (serum or plasma).

This very elegant study reveals that the situation is still far from optimal and the harmonization of RIs remains a substantial challenge, but paves the way for further improvements.

The so-called “indirect approach,” which relies on the use of large amounts of data already stored in the laboratory databases for the definition or the verification of RI, displays several advantages over the conventional direct approach, including reduced costs and easier application in specific age groups or with uncommon sample types (3). Many indirect algorithms have been developed so far and can often be implemented using freely available statistical software. However, the indirect approach also presents some limitations or critical issues.

Firstly, big data will be “contaminated” with results from diseased individuals. Regardless of the statistical method used (4), and depending on the population selected and on the measurand, it is impossible without other information (e.g., body mass index [BMI] or medications) to completely exclude pathological patients. Although modern algorithms may tolerate contamination up to 20% to 30%, data cleaning remains a fundamental step to obtain reliable RI. Many strategies have been devised for data cleaning, including the elimination of frankly pathological results (multiples of upper reference limit [URL]), elimination of repeated results (with the removal of all patient’s results including in some cases the first one), elimination of results of patients showing other test abnormalities, with a specific diagnosis or on specific medications (5). Lack of information availability and subjectivity of some criteria may limit the data selection step and consequently greatly affect RI determination. In this regard, the use of big data technology (considered here not solely as a large amount of data but as multiple interconnected databases storing different types of information) coupled with machine learning methods to identify patient conditions or variables potentially associated to pathological results, may prove to be extremely valuable (6).

Secondly, the lack of control of the pre-analytical phase may be problematic: serum vs plasma, fasting vs non-fasting, sample treatment before analysis, etc. Many enthusiasts for the RI indirect approach may view this aspect as a strength rather than a weakness, as the strict control of pre-analytical conditions required by the traditional RI approach may be very different to routine clinical and laboratory practice. However, the lack of knowledge, or control, of pre-analytical conditions may seriously hinder the implementation of a calculated RI in the laboratory. The cases of potassium and total protein presented in the Canadian paper (2) are clear demonstrations of the importance of this aspect.

Thirdly, big data approaches may suffer from issues related to lack of standardization and/or selectivity of the analytical methods used. This represents a critical issue in RI determination by the indirect approach, both within a single laboratory (drift effect, change of the analytical method requiring a comparison study) and, particularly, when data from different laboratories implementing different methods are used, with the risk of wider RI limits and, therefore, lower diagnostic accuracy. Moreover, the calculated RI limits may be greatly affected by the combination of data taken from the different laboratories (a different percentage of results could, at least theoretically, lead to a different RI, depending on the bias between methods).

All these aspects clearly emerge from the paper, underlining the fact that, to reach harmonization of the RI, a preliminary harmonization of the pre-analytical and of the analytical phase are mandatory.

Specifically regarding the analytical phase, the paper presents several examples of insufficient harmonization. The case of albumin is paradigmatic: this measurand has relevant clinical impact (7) and, despite the availability of an international standard (ERM-DA470k_IFCC HUMAN SERUM) (8), standardization and selectivity problems are well known (9) and the practical consequences of using an unspecific method are clear (10). Nonetheless, 2 different types of dye-binding methods remain extremely popular, not only precluding the possibility of defining a common reference interval but also questioning the general applicability of the proposed decision limits. In addition, issues in relation to free thyroxine (FT4) are well known and efforts for TSH harmonization are still not completely successful (11), with data from certain manufacturers lower than the mean of the group, as is clearly shown in the Canadian paper (2). Careful examination of the very large and well-presented supplemental data (2) is very interesting in this respect.

Alanine aminotransferase (ALT) is a special case that deserves comment. The ALT upper reference limits recommended by the American College of Gastroenterologists (ACOG) (12) are decision limits, not an RI, and have been criticized as having potential to lead to overdiagnosis (13). Interestingly, the ALT indirect RI found in the Canadian paper corresponds well with the direct RI defined for individuals with BMI <25 kg/m2 (13). Moreover, we must underline that, in the case of enzymes, the measurand is defined by the method, so it is conceptually wrong to combine results obtained with or without pyridoxal phosphate, even if in healthy individuals the results are similar.

In the Canadian paper (2), several arbitrary choices, although in part statistically based, were taken, both in the selection and cleaning steps, in aggregating results measured by different methods for some measurands and in selecting the criteria for RI validation by reference individuals. Although in many cases authors’ decisions seem reasonable, this subjectivity should be carefully considered as a potential weakness of this approach.

Interestingly, the Canadian paper validated the calculated common indirect RI using 60 apparently healthy subjects [a larger sample size than required by the binomial method proposed in the CLSI EP28-A3c (14)]. Although a subjective acceptability criterion was used (80%), this additional verification approach should be applauded, stressing as it does the importance of and need for direct validation of indirect RI. The possible impact of new common derived RI could also be evaluated by comparing the resulting new prevalence of pathological results (outside RI) obtained by applying the new RI with the historical rate of the laboratory.

In conclusion, big data provides an important resource in relation to RIs. For many measurands, use of big data provides a complementary approach to the traditional way of defining common RI, which is often hindered by both theoretical and practical issues. The availability of large amounts of data also allows robust evaluation of the need for partitioning according to age, sex, or other factors. Machine learning algorithms may help in extracting knowledge from big data to allow the generation of a cleaner data set excluding diseased individuals. This might mean excluding patient results based on medications, the presence of diseases, specific clinical indications for blood drawing, or even associations between variables, whether previously known or unforeseen. Theoretically, it may be possible to discriminate between effects due to analytical differences from those related to ethnicity, life habits, or environment. Taking into account these considerations, for some measurands, where methods display comparable results and when similarities between pre-analytical conditions are met, indirect methods can represent a quasi a posteriori direct approach, without additional costs and resources. Additionally, and worth noting, the specific indirect approach presented by the authors may also be applied to verify the state of the harmonization/standardization of both the analytical and pre-analytical phase.

So, to reply to the question presented in the title, big data alone does not provide a complete solution to the challenge of harmonizing reference intervals, but it will be a great help in achieving it.

Author Contributions

The corresponding author takes full responsibility that all authors on this publication have met the following required criteria of eligibility for authorship: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved. Nobody who qualifies for authorship has been omitted from the list.

Ferruccio Ceriotti (Conceptualization-Lead, Data curation-Equal, Writing—original draft-Lead, Writing—review & editing-Equal), and Matteo Vidali (Conceptualization-Supporting, Writing—original draft-Supporting, Writing—review & editing-Equal).

Authors’ Disclosures or Potential Conflicts of Interest

No authors declared any potential conflicts of interest.

References

1

Schneider
AJ
.
Some thoughts on normal, or standard, values in clinical medicine
.
Pediatrics
1960
;
26
:
973
84
.

2

Bohn
MK
,
Bailey
D
,
Balion
C
,
Cembrowski
G
,
Collier
C
,
De Guire
V
, et al.
Reference interval harmonization: harnessing the power of big data analytics to derive common reference intervals across populations and testing platforms
.
Clin Chem
2023
;
69
:
991
1008
.

3

Jones
GRD
,
Haeckel
R
,
Loh
TP
,
Sikaris
K
,
Streichert
T
,
Katayev
A
, et al.
Indirect methods for reference interval determination—review and recommendations
.
Clin Chem Lab Med
2018
;
57
:
20
9
.

4

Haeckel
R
,
Wosniok
W
,
Streichert
T
;
Members of the Section Guide Limits of the DGKL
.
Review of potentials and limitations of indirect approaches for estimating reference limits/intervals of quantitative procedures in laboratory medicine
.
J Lab Med
2021
;
45
:
35
53
.

5

Farrell
CL
,
Nguyen
L
.
Indirect reference intervals: harnessing the power of stored laboratory data
.
Clin Biochem Rev
2019
;
40
:
99
111
.

6

Poole
S
,
Schroeder
LF
,
Shah
N
.
An unsupervised learning method to identify reference intervals from a clinical database
.
J Biomed Inform
2016
;
59
:
276
84
.

7

Ceriotti
F
,
Fernandez-Calle
P
,
Klee
GG
,
Nordin
G
,
Sandberg
S
,
Streichert
T
, et al.
Criteria for assigning laboratory measurands to models for analytical performance specifications defined in the 1st EFLM Strategic Conference
.
Clin Chem Lab Med
2017
;
55
:
189
94
.

8

Zegers
I
,
Keller
T
,
Schreiber
W
,
Sheldon
J
,
Albertini
R
,
Blirup-Jensen
S
, et al.
Characterization of the new serum protein reference material ERM-DA470k/IFCC: value assignment by immunoassay
.
Clin Chem
2010
;
56
:
1880
8
.

9

Bachmann
LM
,
Yu
M
,
Boyd
JC
,
Bruns
DE
,
Miller
WG
.
State of harmonization of 24 serum albumin measurement procedures and implications for medical decisions
.
Clin Chem
2017
;
63
:
770
9
.

10

Pasqualetti
S
,
Aloisio
E
,
Panteghini
M
.
Letter to the editor: Serum albumin in COVID-19: a good example in which analytical and clinical performance of a laboratory test are strictly intertwined
.
Hepatology
2021
;
74
:
2905
7
.

11

Barth
JH
,
Luvai
A
,
Jassam
N
,
Mbagaya
W
,
Kilpatrick
ES
,
Narayanan
D
,
Spoors
S
.
Comparison of method-related reference intervals for thyroid hormones: studies from a prospective reference population and a literature review
.
Ann Clin Biochem
2018
;
55
:
107
12
.

12

Kwo
PY
,
Cohen
SM
,
Lim
JK
.
ACG Clinical guideline: evaluation of abnormal liver chemistries
.
Am J Gastroenterol
2017
;
112
:
18
35
.

13

Panteghini
M
,
Adeli
K
,
Ceriotti
F
,
Sandberg
S
,
Horvath
AR
.
American Liver guidelines and cutoffs for “normal” ALT: a potential for overdiagnosis
.
Clin Chem
2017
;
63
:
1196
8
.

14

CLSI
.
Defining, establishing, and verifying reference intervals in the clinical laboratory; approved guideline
.
Wayne
(
PA
):
Clinical and Laboratory Standards Institute
;
2010
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)