With interest I read the study by Swaminathan et al., in which the performance of faecal myeloperoxidase [fMPO] as biomarker of endoscopic disease activity was assessed in patients with inflammatory bowel disease [IBD].1 Additionally, performance was assessed in comparison with C-reactive protein [CRP] and faecal calprotectin [fCal], being established biomarkers of disease activity. The study concluded that fMPO is an accurate biomarker of endoscopic activity in IBD and predicted a more complicated IBD course during follow-up.

Patients with IBD suffer from fluctuating disease activity that is difficult to detect and monitor. To avoid repeated endoscopies, other means to monitor disease activity are widely used, eg, clinical indices, CRP, and fCal. However, these often demonstrate inconsistent associations and lack specificity,2 emphasising the need for additional biomarkers capable of reflecting intestinal inflammation. To assess the performance of fMPO, fCal, and CRP in predicting endoscopic disease activity, the authors leveraged receiver operating characteristic [ROC] statistics and ROC curves, the latter visualising performance of a biomarker/model across different classification thresholds.

Although ROC curves are commonly used to assess biomarker performance, this assessment can be optimistically biased if based on a single evaluation of data.3 Therefore, testing the ability of a biomarker/model to generalise to new cases, ie, a sample [‘test sample’] independent of the sample used to predict the dependent variable [‘training sample’], is recommended. Using all cases from the original analysis, instead of this train/test-approach, may result in overly optimistic estimates of biomarker performance. To overcome this issue, external validation of biomarker performance in independent cohorts is pivotal to determine generalisability. In addition, internal validation or cross-validation methods [eg, leave pair out cross-validation [LPO], k-fold cross-validation, or bootstrap resampling] can be used to generate more realistic estimates, yielding more variance to build broader confidence intervals. These methods split the original ‘training’ dataset into multiple subsets, which are then iteratively used as ‘training’ and ‘test’ subsets, either randomly or not, and averaged to roughly indicate how biomarker performance is affected by changes in ‘training’ data.

Aside from cross-validation, there are also accepted methods to effectively compare correlated area under the ROC curve [AUROC] estimates, eg, DeLong’s test,4 and to quantify the maximised sensitivity/specificity balance for ‘optimal cut-off points’, eg, Youden’s J-statistic.5 Although the presented results do hint at these methods, the authors do not clearly state what criteria were used to compare fMPO with fCal and CRP or to assess optimal cut-off points. Finally, standard errors or confidence intervals were lacking, which could facilitate readers to better judge biomarker performances.

The efforts of Swaminathan et al. in identifying novel biomarkers for disease activity are important and can only be applauded. However, validation—whether internally or externally performed or both—remains critical to precisely determine the clinical utility of biomarkers.

Funding

No funding was received for this work.

Conflict of Interest

ARB has no conflict of interest to declare.

References

1.

Swaminathan
A
,
Borichevsky
GM
,
Edwards
TS
, et al. .
Faecal myeloperoxidase as a biomarker of endoscopic disease activity in inflammatory bowel disease
.
J Crohns Colitis
2022
. doi:10.1093/ecco-jcc/jjac098. Online ahead of print.

2.

Lewis
JD.
The utility of biomarkers in the diagnosis and therapy of inflammatory bowel disease
.
Gastroenterology
2011
;
140
:
1817
26.e2
. doi:10.1053/j.gastro.2010.11.058.

3.

Hanczar
B
,
Hua
J
,
Sima
C
,
Weinstein
J
,
Bittner
M
,
Dougherty
ER.
Small-sample precision of ROC-related estimates
.
Bioinformatics
2010
;
26
:
822
30
. doi:10.1093/bioinformatics/btq037.

4.

DeLong
ER
,
DeLong
DM
,
Clarke-Pearson
DL.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach
.
Biometrics
1988
;
44
:
837
45
. doi:10.2307/2531595.

5.

Schisterman
EF
,
Perkins
NJ
,
Liu
A
,
Bondell
H.
Optimal cut-point and its corresponding Youden index to discriminate individuals using pooled blood samples
.
Epidemiology
2005
;
16
:
73
81
. doi:10.1097/01.ede.0000147512.81966.ba.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)