Derivation and External Validation of a Clinical Prediction Model for Viral Diarrhea Etiology in Bangladesh

Author Notes

Abstract

Background

Antibiotics are commonly overused for diarrheal illness in many low- and middle-income countries, partly due to a lack of diagnostics to identify viral cases, in which antibiotics are not beneficial. This study aimed to develop clinical prediction models to predict risk of viral-only diarrhea across all ages, using routinely collected demographic and clinical variables.

Methods

We used a derivation dataset from 10 hospitals across Bangladesh and a separate validation dataset from the icddr,b Dhaka Hospital. The primary outcome was viral-only etiology determined by stool quantitative polymerase chain reaction. Multivariable logistic regression models were fit and externally validated; discrimination was quantified using area under the receiver operating characteristic curve (AUC) and calibration assessed using calibration plots.

Results

Viral-only diarrhea was common in all age groups (<1 year, 41.4%; 18–55 years, 17.7%). A forward stepwise model had AUC of 0.82 (95% confidence interval [CI], .80–.84) while a simplified model with age, abdominal pain, and bloody stool had AUC of 0.81 (95% CI, .78–.82). In external validation, the models performed adequately although less robustly (AUC, 0.72 [95% CI, .70–.74]).

Conclusions

Prediction models consisting of 3 routinely collected variables can accurately predict viral-only diarrhea in patients of all ages in Bangladesh and may help support efforts to reduce inappropriate antibiotic use.

antimicrobial resistance, Bangladesh, clinical prediction model, diarrhea, viruses

Despite significant reductions in mortality over the past several decades, diarrheal disease remains the second most common acute condition globally, causing >6 billion episodes and 1.3 million deaths annually [1, 2]. While a major cause of morbidity worldwide, diarrheal diseases disproportionately affect patients in low- and middle-income countries (LMICs) in communities with poor access to healthcare, safe water, and sanitation [2]. Although the majority of diarrhea episodes are self-limiting and the mainstay of treatment is rehydration, clinicians must also make decisions regarding the appropriate use of antibiotics [3]. For most cases of diarrhea, antibiotics are not beneficial, particularly for viral etiologies of diarrhea in which antibiotics have no role. Guidelines from the World Health Organization (WHO) recommend avoiding antibiotics for treating most cases of diarrhea, and only recommend antibiotics when there is a suspicion of Vibrio cholerae infection with severe dehydration, suspicion of Shigella infection as indicated by bloody stool, or concurrent illness such as severe malnutrition [4]. However, despite guidelines, overuse of antibiotics for viral diarrhea remains widespread, particularly in LMICs, partly due to a lack of diagnostic testing to guide management, shortages of trained healthcare providers, nonprescription antibiotic availability, and patient expectations for antibiotics [5, 6].

Inappropriate antibiotic use can lead to increased costs, adverse effects (eg, hemolytic uremic syndrome in Shiga toxin–producing Escherichia coli), and antimicrobial resistance (AMR), which has been identified as a serious global public health concern [7–9]. Recent studies have demonstrated widespread resistance among numerous enteric pathogens with multidrug-resistant V cholerae, Shigella spp, and Campylobacter identified in Asia, Africa, and Latin America [10–15]. AMR impedes the treatment of patients and outbreak management, and drug-resistant pathogens quickly spread worldwide [16–18]. Ideally, antibiotic use decisions would be guided by molecular diagnostics or stool cultures; however, these tests are unavailable in the vast majority of LMICs [19, 20]. As a result, clinicians often use syndromic guidelines or clinical suspicion to decide on antimicrobial use [4, 20]. Unfortunately, physician judgment and syndromic guidelines poorly predict need for antibiotics; a study of patients with diarrhea in Kenya showed that syndrome-based guidelines for Shigella led to the failure to diagnose shigellosis in nearly 90% of cases [21]. Clinician assumptions regarding etiologies of diarrhea that are not evidence-based may lead to poor management decisions. Better tools to assist clinician decision-making, that do not rely on costly laboratory tests, are urgently needed.

Much of the existing research on diarrhea epidemiology in LMICs has focused primarily on children <5 years old, while there remains a paucity of data on diarrheal etiology in older individuals [22, 23]. Prior research has shown that viral etiologies of diarrhea (eg, rotavirus, norovirus) are most common in younger children, whereas bacterial causes predominate in older children, adults, and the elderly globally [24–27]. However, recent research using multiplex molecular platforms suggests that viral cases of diarrhea in older individuals may be more common than previously described, with adults remaining highly susceptible to viral enteric infections [28, 29]. Additionally, there is scant evidence on the clinical predictors most associated with viral etiologies in older children and adults that may help clinicians identify patients with viral diarrhea who do not warrant antibiotic use. Clinical tools to better determine diarrhea etiology at the point-of-care without relying on laboratory tests are greatly needed to reduce antibiotic overuse while conserving scarce healthcare resources. The aim of this study was to develop and validate new clinical prediction models to predict the risk of viral-only diarrhea etiology among patients of all ages to assist clinician decision-making for appropriate use of antibiotics.

METHODS

Study Design and Setting

This was a retrospective study using datasets from 2 established diarrhea epidemiology surveillance systems in Bangladesh: (1) Institute of Epidemiology Disease Control and Research nationwide diarrhea surveillance network, which included data from 10 sentinel hospital sites across Bangladesh; and (2) the routine diarrheal surveillance system at the icddr,b in Dhaka, Bangladesh (Supplementary Figure 1). The icddr,b provides free clinical services to >100 000 patients with diarrhea each year in the capital city of Dhaka and surrounding rural districts and is not part of the nationwide surveillance network. The 10 hospital sites are public, mostly district-level hospitals, which provide free medical services to their surrounding catchment areas.

Study Populations: Derivation and Validation Datasets

The nationwide surveillance network (n = 2516) was used as a derivation dataset and the icddr,b diarrhea surveillance (n = 3000) as the validation dataset. In the derivation dataset collected from February 2014 through June 2018, the first patient <5 years of age and the first patient >5 years of age was enrolled each week at each of the 10 sites. In the validation dataset collected during August 2014–June 2017, every 50th patient presenting with acute diarrhea was enrolled. Diarrhea was defined as ≥3 loose or liquid stools within 24 hours or ≤3 liquid stools causing dehydration in the last 24 hours; for children <2 months of age, diarrhea was defined as a change in stool habits from the usual frequency or nature of stool [30]. Patients who met the case definition, had a stool sample with diagnostic testing performed, and had a valid test result, with no other severe comorbidity (eg, respiratory illness, acute cardiovascular symptoms, or severe neurological disorder), were included.

Predictor Variables

All demographic and clinical variables collected during routine clinical care and available in both derivation and validation datasets were considered for inclusion in the models, including age (predetermined age categories <12 months, 12–23 months, 24–59 months, 5–17 years, 18–55 years, and >55 years), sex (male or female), duration of diarrhea (days), severe diarrhea (yes/no >10 episodes of diarrhea in the past 24 hours), dehydration status (none, some, severe), history of vomiting (yes/no), bloody stool (yes/no), and abdominal pain (yes/no). Dehydration status was determined as none, some, or severe dehydration by the treating physician following WHO criteria [31]. Given the known association between season and diarrhea epidemiology in Bangladesh, season was included as a predictor using the patient encounter date. Prior research on diarrhea seasonality has considered Bangladesh to have 3 primary seasons: winter (November–February, cool/dry), summer (March–May, hot/dry), and monsoon (June–October, rainy/wet) [32, 33].

Microbiological Testing and Attribution of Diarrhea Etiology

A stool sample was collected from each enrolled patient. Custom multiplex TaqMan Array Cards (TACs) containing compartmentalized probe-based quantitative real-time polymerase chain reaction (qPCR) assays for 32 pathogens were used to perform qPCR assays for a broad range of pathogens at the icddr,b laboratory (pathogen targets are listed in Supplementary Table 1); a full description of TAC laboratory methods have been previously described [34].

The primary outcome (dependent) variable was defined as identification of only viral pathogen(s) (“viral-only”) (ie, no bacterial or parasitic pathogens) on TAC PCR. This clinically relevant outcome was selected because patients with viral-only diarrhea do not warrant antibiotics. Stool samples without detection of any viral targets or evidence of coinfection of viral and bacterial or protozoal pathogen based on TAC results were considered to have nonviral-only diarrhea. A TAC cycle threshold (Ct) of <30 was used to define a positive result for pathogen detection based on expert consensus.

Data Analysis, Model Development, and Validation

Descriptive analyses using frequencies with percentages, medians with interquartile ranges (IQRs), or means with standard deviations were performed as appropriate. Comparisons between the derivation and validation datasets were conducted with Pearson χ² test or Mann-Whitney U test. Univariable logistic regression was performed to assess the unadjusted associations between viral diarrhea etiology and predictors, with magnitudes of effect given as odds ratios (ORs) and their respective 95% confidence intervals (CIs).

Three multivariable logistic regression models were fit using clinically relevant candidate predictors available from the derivation dataset, with each model’s included predictors shown in Supplementary Table 2. Model 1 (full model) contained all clinically relevant predictors available in both the derivation and validation datasets and the season variable; model 2 (forward stepwise model) was fit using forward stepwise selection using P < .1 for entry and P > .2 for removal; model 3 (simplified model) contained only 3 predictors (age, bloody stool, abdominal pain) selected a priori based on diarrhea syndromic guidelines regarding viral versus bacterial causes of diarrhea [4, 35, 36]. The predictors' adjusted associations with viral diarrhea etiology were expressed as adjusted ORs (aORs) and their respective 95% CIs. The models were then externally validated using the icddr,b surveillance dataset.

Model performance was calculated using the area under the receiver operating characteristic curve (AUC) to evaluate discrimination. The Delong method was used to calculate 95% CIs around the AUC. Calibration was assessed by comparing the predicted probability to the observed probability of viral-only diarrhea and reported using calibration-in-the-large and calibration slope [37]. Flexible calibration curves were obtained using locally estimated scatterplot smoothing using val.prob.ci.2 in R software as recommended in the literature [38]. Nagelkerke pseudo R² was calculated to provide a global measure of the estimated explained variance of the models on a new dataset. From the AUC curves, probability cutoffs were used to calculate sensitivity and specificity and positive and negative predictive values [39, 40]. R software (R Foundation for Statistical Computing, Vienna, Austria) was used for all analyses. The Transparent Reporting of a Multivariable Prediction Model for Individual Diagnosis (TRIPOD) Checklist for Prediction Model Validation was used.

Sensitivity Analysis

Two sensitivity analyses were conducted. First, an alternative categorization for age was explored using classification and regression trees (CART) from step_discretize_cart in the R package embed (version 1.0.0) to optimally discretize the age variable using supervised binning. Second, different Ct values were explored to evaluate the effect of using TAC PCR Ct cutoff thresholds of <25 and <35 for attribution of diarrhea etiology.

RESULTS

Participant Characteristics

In the derivation dataset, 2532 patients had samples available for testing and 2516 (99.4%) had valid results. In the validation dataset, 3154 patients had samples available for testing, of whom 3000 (95.1%) had valid results for inclusion in the analysis (Figure 1). The derivation cohort had slightly more females, patients with severe dehydration, and older patients (median age, 22 [IQR, 1–40] years in the derivation dataset compared to 7 [IQR, 0–32] years in the validation cohort). Further characteristics of the study populations are shown in Table 1.

Figure 1.

Study flow diagram.

Open in new tab Download slide

Table 1.

Open in new tab

Clinical Characteristics of the Study Population

Characteristic	Derivation (Nationwide) (n = 2516)	Validation (icddr,b) (n = 3000)	P Value
Age, y, median (IQR)	22 (1–40)	7 (0–32)	<.01^a
Age category			<.01^b
<12 mo	488 (19.4)	950 (31.7)
12–23 mo	355 (14.1)	401 (13.4)
24–59 mo	148 (5.9)	122 (4.1)
6–17 y	136 (5.4)	135 (4.5)
18–55 y	1159 (46.1)	1241 (41.4)
>55 y	230 (9.1)	151 (5.0)
Male sex	1387 (55.1)	1752 (58.4)	.02^b
Duration of diarrhea, d, median (IQR)	3 (2–3)	2 (1–2)	<.01^a
Abdominal pain	1563 (62.1)	1762 (58.7)	.01^b
>10 episodes of diarrhea in 24 h	1821 (72.4)	2170 (72.3)	.97^b
Blood in stool	29 (1.2)	46 (1.5)	.22^b
Presence of vomiting	1786 (71.0)	2303 (76.8)	<.01^b
Dehydration severity			<.01^b
None	922 (36.6)	983 (32.8)
Some	1426 (56.7)	1179 (39.3)
Severe	168 (36.6)	838 (27.9)
Season			<.01^b
Winter	733 (29.1)	1106 (36.9)
Summer	517 (20.6)	957 (31.9)
Monsoon	1266 (50.3)	937 (31.2)

Characteristic	Derivation (Nationwide) (n = 2516)	Validation (icddr,b) (n = 3000)	P Value
Age, y, median (IQR)	22 (1–40)	7 (0–32)	<.01^a
Age category			<.01^b
<12 mo	488 (19.4)	950 (31.7)
12–23 mo	355 (14.1)	401 (13.4)
24–59 mo	148 (5.9)	122 (4.1)
6–17 y	136 (5.4)	135 (4.5)
18–55 y	1159 (46.1)	1241 (41.4)
>55 y	230 (9.1)	151 (5.0)
Male sex	1387 (55.1)	1752 (58.4)	.02^b
Duration of diarrhea, d, median (IQR)	3 (2–3)	2 (1–2)	<.01^a
Abdominal pain	1563 (62.1)	1762 (58.7)	.01^b
>10 episodes of diarrhea in 24 h	1821 (72.4)	2170 (72.3)	.97^b
Blood in stool	29 (1.2)	46 (1.5)	.22^b
Presence of vomiting	1786 (71.0)	2303 (76.8)	<.01^b
Dehydration severity			<.01^b
None	922 (36.6)	983 (32.8)
Some	1426 (56.7)	1179 (39.3)
Severe	168 (36.6)	838 (27.9)
Season			<.01^b
Winter	733 (29.1)	1106 (36.9)
Summer	517 (20.6)	957 (31.9)
Monsoon	1266 (50.3)	937 (31.2)

Data are presented as No. (%) unless otherwise indicated.

Abbreviation: IQR, interquartile range.

Mann-Whitney U test.

χ² test.

Table 1.

Open in new tab

Clinical Characteristics of the Study Population

Characteristic	Derivation (Nationwide) (n = 2516)	Validation (icddr,b) (n = 3000)	P Value
Age, y, median (IQR)	22 (1–40)	7 (0–32)	<.01^a
Age category			<.01^b
<12 mo	488 (19.4)	950 (31.7)
12–23 mo	355 (14.1)	401 (13.4)
24–59 mo	148 (5.9)	122 (4.1)
6–17 y	136 (5.4)	135 (4.5)
18–55 y	1159 (46.1)	1241 (41.4)
>55 y	230 (9.1)	151 (5.0)
Male sex	1387 (55.1)	1752 (58.4)	.02^b
Duration of diarrhea, d, median (IQR)	3 (2–3)	2 (1–2)	<.01^a
Abdominal pain	1563 (62.1)	1762 (58.7)	.01^b
>10 episodes of diarrhea in 24 h	1821 (72.4)	2170 (72.3)	.97^b
Blood in stool	29 (1.2)	46 (1.5)	.22^b
Presence of vomiting	1786 (71.0)	2303 (76.8)	<.01^b
Dehydration severity			<.01^b
None	922 (36.6)	983 (32.8)
Some	1426 (56.7)	1179 (39.3)
Severe	168 (36.6)	838 (27.9)
Season			<.01^b
Winter	733 (29.1)	1106 (36.9)
Summer	517 (20.6)	957 (31.9)
Monsoon	1266 (50.3)	937 (31.2)

Characteristic	Derivation (Nationwide) (n = 2516)	Validation (icddr,b) (n = 3000)	P Value
Age, y, median (IQR)	22 (1–40)	7 (0–32)	<.01^a
Age category			<.01^b
<12 mo	488 (19.4)	950 (31.7)
12–23 mo	355 (14.1)	401 (13.4)
24–59 mo	148 (5.9)	122 (4.1)
6–17 y	136 (5.4)	135 (4.5)
18–55 y	1159 (46.1)	1241 (41.4)
>55 y	230 (9.1)	151 (5.0)
Male sex	1387 (55.1)	1752 (58.4)	.02^b
Duration of diarrhea, d, median (IQR)	3 (2–3)	2 (1–2)	<.01^a
Abdominal pain	1563 (62.1)	1762 (58.7)	.01^b
>10 episodes of diarrhea in 24 h	1821 (72.4)	2170 (72.3)	.97^b
Blood in stool	29 (1.2)	46 (1.5)	.22^b
Presence of vomiting	1786 (71.0)	2303 (76.8)	<.01^b
Dehydration severity			<.01^b
None	922 (36.6)	983 (32.8)
Some	1426 (56.7)	1179 (39.3)
Severe	168 (36.6)	838 (27.9)
Season			<.01^b
Winter	733 (29.1)	1106 (36.9)
Summer	517 (20.6)	957 (31.9)
Monsoon	1266 (50.3)	937 (31.2)

Data are presented as No. (%) unless otherwise indicated.

Abbreviation: IQR, interquartile range.

Mann-Whitney U test.

χ² test.

Diarrhea Etiology

The prevalence of viral-only etiology of diarrhea was similar between the 2 study cohorts with 802 (31.9%) of patients in the derivation dataset and 1016 (33.9%) of patients in the validation dataset having viral-only pathogens detected (Supplementary Table 3). Coinfection of viral with nonviral pathogens was found in 104 (4.1%) of the derivation cohort and 309 (10.3%) of the validation cohort. Pathogens detected using TAC PCR are listed in Supplementary Table 3.

In unadjusted analysis, longer duration of diarrhea (OR, 1.41 [95% CI, 1.32–1.49]) and having severe dehydration (OR, 1.76 [95% CI, 1.26–2.46] for severe vs no dehydration) predicted higher odds of viral-only diarrhea, whereas older age (OR, 0.07 [95% CI, .06–.08] for adults 18–55 years of age vs <12 months), abdominal pain (OR, 0.25 [95% CI, .21–.30]), presenting during monsoon season (OR, 0.4 [95% CI, .37–.55] for monsoon vs winter season), and bloody stool (OR, 0.24 [95% CI, .06–.69]) predicted lower odds of viral-only diarrhea (Supplementary Table 4).

The adjusted association of predictors of viral-only diarrhea etiology in the 3 prediction models are shown in Table 2. In the full model (model 1), longer duration of diarrhea (aOR, 1.11 [95% CI, 1.04–1.20]) and having severe dehydration (aOR, 2.32 [95% CI, 1.53–3.54] for severe vs no dehydration) predicted higher odds of viral-only etiology, while older age (aOR, 0.80 [95% CI, .64–.98] for age >55 years vs <12 months), abdominal pain (aOR, 0.63 [95% CI, .50–.79]), presenting during monsoon season (aOR, 0.59 [95% CI, .47–.75] for monsoon vs winter/dry season), and male sex (aOR, 0.80 [95% CI, .64–.98]) predicted lower odds of viral-only diarrhea (Table 2). In the simplified model (model 3), older age (aOR, 0.06 [95% CI, .04–.10] for age >55 years vs <12 months), abdominal pain (aOR, 0.63 [95% CI, .51–.79]), and bloody stool (aOR, 0.28 [95% CI, .07–1.02]) predicted lower odds of viral-only diarrhea.

Table 2.

Open in new tab

Adjusted Association of Predictors of Viral-Only Diarrhea Etiology Among 3 Candidate Prediction Models in the National Surveillance (Derivation) Dataset, Bangladesh, 2014–2018

Characteristic	Model 1 (Full Model)	Model 2 (Forward Stepwise)	Model 3 (Simplified)
Characteristic	OR (95% CI)	OR (95% CI)	OR (95% CI)
Age category
<12 mo	Ref	Ref	Ref
12–23 mo	0.81 (.60–1.10)	0.81 (.60–1.09)	0.84 (.63–1.13)
24–59 mo	0.44 (.29–.65)	0.43 (.29–.64)	0.39 (.26–.57)
6–17 y	0.10 (.06–.18)	0.10 (.06–.18)	0.09 (.05–.15)
18–55 y	0.09 (.07–.12)	0.09 (.07–.12)	0.08 (.06–.11)
>55 y	0.07 (.04–.11)	0.07 (.04–.11)	0.06 (.04–.10)
Male sex	0.80 (.64–.98)	0.80 (.65–.98)	…
Duration of diarrhea, d	1.11 (1.04–1.20)	1.11 (1.04–1.19)	…
Abdominal pain	0.63 (.50–.79)	0.63 (.50–.79)	0.63 (.51–.79)
>10 episodes of diarrhea in 24 h	1.09 (.87–1.38)	…	…
Blood in stool	0.28 (.08–1.03)	0.28 (.08–1.02)	0.28 (.07–1.02)
Presence of vomiting	0.98 (.78–1.23)	…	…
Dehydration severity
None	Ref	Ref	…
Some	1.18 (.95–1.48)	1.18 (.95–1.47)	…
Severe	2.32 (1.53–3.54)	2.33 (1.54–3.55)	…
Season
Winter	Ref	Ref	…
Summer	1.00 (.75–1.33)	1.00 (.75–1.33)	…
Monsoon	0.59 (.47–.75)	0.59 (.47–.75)	…

Characteristic	Model 1 (Full Model)	Model 2 (Forward Stepwise)	Model 3 (Simplified)
Characteristic	OR (95% CI)	OR (95% CI)	OR (95% CI)
Age category
<12 mo	Ref	Ref	Ref
12–23 mo	0.81 (.60–1.10)	0.81 (.60–1.09)	0.84 (.63–1.13)
24–59 mo	0.44 (.29–.65)	0.43 (.29–.64)	0.39 (.26–.57)
6–17 y	0.10 (.06–.18)	0.10 (.06–.18)	0.09 (.05–.15)
18–55 y	0.09 (.07–.12)	0.09 (.07–.12)	0.08 (.06–.11)
>55 y	0.07 (.04–.11)	0.07 (.04–.11)	0.06 (.04–.10)
Male sex	0.80 (.64–.98)	0.80 (.65–.98)	…
Duration of diarrhea, d	1.11 (1.04–1.20)	1.11 (1.04–1.19)	…
Abdominal pain	0.63 (.50–.79)	0.63 (.50–.79)	0.63 (.51–.79)
>10 episodes of diarrhea in 24 h	1.09 (.87–1.38)	…	…
Blood in stool	0.28 (.08–1.03)	0.28 (.08–1.02)	0.28 (.07–1.02)
Presence of vomiting	0.98 (.78–1.23)	…	…
Dehydration severity
None	Ref	Ref	…
Some	1.18 (.95–1.48)	1.18 (.95–1.47)	…
Severe	2.32 (1.53–3.54)	2.33 (1.54–3.55)	…
Season
Winter	Ref	Ref	…
Summer	1.00 (.75–1.33)	1.00 (.75–1.33)	…
Monsoon	0.59 (.47–.75)	0.59 (.47–.75)	…

Abbreviations: CI, confidence interval; OR, odds ratio; Ref, reference group.

Table 2.

Open in new tab

Adjusted Association of Predictors of Viral-Only Diarrhea Etiology Among 3 Candidate Prediction Models in the National Surveillance (Derivation) Dataset, Bangladesh, 2014–2018

Characteristic	Model 1 (Full Model)	Model 2 (Forward Stepwise)	Model 3 (Simplified)
Characteristic	OR (95% CI)	OR (95% CI)	OR (95% CI)
Age category
<12 mo	Ref	Ref	Ref
12–23 mo	0.81 (.60–1.10)	0.81 (.60–1.09)	0.84 (.63–1.13)
24–59 mo	0.44 (.29–.65)	0.43 (.29–.64)	0.39 (.26–.57)
6–17 y	0.10 (.06–.18)	0.10 (.06–.18)	0.09 (.05–.15)
18–55 y	0.09 (.07–.12)	0.09 (.07–.12)	0.08 (.06–.11)
>55 y	0.07 (.04–.11)	0.07 (.04–.11)	0.06 (.04–.10)
Male sex	0.80 (.64–.98)	0.80 (.65–.98)	…
Duration of diarrhea, d	1.11 (1.04–1.20)	1.11 (1.04–1.19)	…
Abdominal pain	0.63 (.50–.79)	0.63 (.50–.79)	0.63 (.51–.79)
>10 episodes of diarrhea in 24 h	1.09 (.87–1.38)	…	…
Blood in stool	0.28 (.08–1.03)	0.28 (.08–1.02)	0.28 (.07–1.02)
Presence of vomiting	0.98 (.78–1.23)	…	…
Dehydration severity
None	Ref	Ref	…
Some	1.18 (.95–1.48)	1.18 (.95–1.47)	…
Severe	2.32 (1.53–3.54)	2.33 (1.54–3.55)	…
Season
Winter	Ref	Ref	…
Summer	1.00 (.75–1.33)	1.00 (.75–1.33)	…
Monsoon	0.59 (.47–.75)	0.59 (.47–.75)	…

Characteristic	Model 1 (Full Model)	Model 2 (Forward Stepwise)	Model 3 (Simplified)
Characteristic	OR (95% CI)	OR (95% CI)	OR (95% CI)
Age category
<12 mo	Ref	Ref	Ref
12–23 mo	0.81 (.60–1.10)	0.81 (.60–1.09)	0.84 (.63–1.13)
24–59 mo	0.44 (.29–.65)	0.43 (.29–.64)	0.39 (.26–.57)
6–17 y	0.10 (.06–.18)	0.10 (.06–.18)	0.09 (.05–.15)
18–55 y	0.09 (.07–.12)	0.09 (.07–.12)	0.08 (.06–.11)
>55 y	0.07 (.04–.11)	0.07 (.04–.11)	0.06 (.04–.10)
Male sex	0.80 (.64–.98)	0.80 (.65–.98)	…
Duration of diarrhea, d	1.11 (1.04–1.20)	1.11 (1.04–1.19)	…
Abdominal pain	0.63 (.50–.79)	0.63 (.50–.79)	0.63 (.51–.79)
>10 episodes of diarrhea in 24 h	1.09 (.87–1.38)	…	…
Blood in stool	0.28 (.08–1.03)	0.28 (.08–1.02)	0.28 (.07–1.02)
Presence of vomiting	0.98 (.78–1.23)	…	…
Dehydration severity
None	Ref	Ref	…
Some	1.18 (.95–1.48)	1.18 (.95–1.47)	…
Severe	2.32 (1.53–3.54)	2.33 (1.54–3.55)	…
Season
Winter	Ref	Ref	…
Summer	1.00 (.75–1.33)	1.00 (.75–1.33)	…
Monsoon	0.59 (.47–.75)	0.59 (.47–.75)	…

Abbreviations: CI, confidence interval; OR, odds ratio; Ref, reference group.

Model characteristics including AUC and pseudo R² were similar between the 3 models with AUC of 0.82 (95% CI, .80–.84) for the full model (model 1) and forward stepwise model (model 2) and AUC of 0.81 (95% CI, .78–.82) for the simplified model (model 3) as shown in Table 3. Pseudo R² was similar for all models at 0.26 for model 1 and 2 and 0.25 for model 3, indicating that the models explained a moderate amount of the total variability.

Table 3.

Open in new tab

Model Characteristics Using the Area Under Receiver Operating Characteristic Curve and Pseudo R² for Each Model in the National Surveillance (Derivation) Dataset, Bangladesh, 2014–2018

Model	Included Predictors	AUC (95% CI)	Pseudo R²
Model 1 (full model)	Age, sex, duration, >10 diarrhea episodes/24 h, abdominal pain, vomiting, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 2 (forward stepwise)	Age, sex, duration, abdominal pain, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 3 (simplified)	Age, abdominal pain, bloody stool	0.81 (.78–.82)	0.25

Model	Included Predictors	AUC (95% CI)	Pseudo R²
Model 1 (full model)	Age, sex, duration, >10 diarrhea episodes/24 h, abdominal pain, vomiting, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 2 (forward stepwise)	Age, sex, duration, abdominal pain, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 3 (simplified)	Age, abdominal pain, bloody stool	0.81 (.78–.82)	0.25

Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval.

Table 3.

Open in new tab

Model Characteristics Using the Area Under Receiver Operating Characteristic Curve and Pseudo R² for Each Model in the National Surveillance (Derivation) Dataset, Bangladesh, 2014–2018

Model	Included Predictors	AUC (95% CI)	Pseudo R²
Model 1 (full model)	Age, sex, duration, >10 diarrhea episodes/24 h, abdominal pain, vomiting, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 2 (forward stepwise)	Age, sex, duration, abdominal pain, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 3 (simplified)	Age, abdominal pain, bloody stool	0.81 (.78–.82)	0.25

Model	Included Predictors	AUC (95% CI)	Pseudo R²
Model 1 (full model)	Age, sex, duration, >10 diarrhea episodes/24 h, abdominal pain, vomiting, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 2 (forward stepwise)	Age, sex, duration, abdominal pain, bloody stool, dehydration severity, season	0.82 (.80–.84)	0.26
Model 3 (simplified)	Age, abdominal pain, bloody stool	0.81 (.78–.82)	0.25

Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval.

In external validation, all models performed similarly although were less robust (AUC, 0.72 [95% CI, .70–.74]) (Figure 2). Calibration was similar for all models with slope of 0.74 (95% CI, .66–.81) and intercept of −0.26 (95% CI, −.35 to −.18) for model 1 and slope of 0.69 (95% CI, .63–.76) and intercept of −0.21 (95% CI, −.30 to −.12) for model 3 (Table 4). Calibration plots for each model are shown in Supplementary Figure 2. At a specificity level of 70%, model sensitivity was 82%, 83%, and 82% for model 1, 2, and 3, respectively. Sensitivity, specificity, positive predictive value, and negative predictive values for the 3 prediction models for viral-only diarrhea are shown in Table 5.

Figure 2.

Area under the receiver operating characteristic curve (AUC) for model 1 (A), model 2 (B), and model 3 (C) in the derivation (left) and validation (right) datasets.

Open in new tab Download slide

Table 4.

Open in new tab

Model Performance Using the Area Under Receiver Operating Characteristic Curve, Calibration-in-the-Large (α), and Calibration Slope (β) for Each Model in the Validation Dataset, Bangladesh, 2014–2017

Model	AUC (95% CI)	α: Calibration-in-the-Large (95% CI)	β: Calibration Slope (95% CI)
Model 1 (full model)	0.72 (.70–.74)	−.26 (−.35 to −.18)	.73 (.66–.81)
Model 2 (forward stepwise selection)	0.72 (.70–.73)	−.27 (−.35 to −.18)	.73 (.66–.81)
Model 3 (simplified)	0.72 (.70–.74)	−.21 (−.30 to −.12)	.69 (.63–.76)

Model	AUC (95% CI)	α: Calibration-in-the-Large (95% CI)	β: Calibration Slope (95% CI)
Model 1 (full model)	0.72 (.70–.74)	−.26 (−.35 to −.18)	.73 (.66–.81)
Model 2 (forward stepwise selection)	0.72 (.70–.73)	−.27 (−.35 to −.18)	.73 (.66–.81)
Model 3 (simplified)	0.72 (.70–.74)	−.21 (−.30 to −.12)	.69 (.63–.76)

Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval.

Table 4.

Open in new tab

Model	AUC (95% CI)	α: Calibration-in-the-Large (95% CI)	β: Calibration Slope (95% CI)
Model 1 (full model)	0.72 (.70–.74)	−.26 (−.35 to −.18)	.73 (.66–.81)
Model 2 (forward stepwise selection)	0.72 (.70–.73)	−.27 (−.35 to −.18)	.73 (.66–.81)
Model 3 (simplified)	0.72 (.70–.74)	−.21 (−.30 to −.12)	.69 (.63–.76)

Model	AUC (95% CI)	α: Calibration-in-the-Large (95% CI)	β: Calibration Slope (95% CI)
Model 1 (full model)	0.72 (.70–.74)	−.26 (−.35 to −.18)	.73 (.66–.81)
Model 2 (forward stepwise selection)	0.72 (.70–.73)	−.27 (−.35 to −.18)	.73 (.66–.81)
Model 3 (simplified)	0.72 (.70–.74)	−.21 (−.30 to −.12)	.69 (.63–.76)

Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval.

Table 5.

Open in new tab

Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value for the 3 Prediction Models for Viral-Only Diarrhea

Specificity	Sensitivity	PPV	NPV	Probability Cutoff
Model 1
0.6	0.84	0.50	0.89	0.15
0.7	0.82	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.80	0.61
Model 2
0.6	0.85	0.50	0.89	0.15
0.7	0.83	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.81	0.61
Model 3
0.6	0.85	0.50	0.89	0.15
0.7	0.82	0.53	0.89	0.17
0.8	0.74	0.65	0.87	0.41
0.9	0.50	0.71	0.79	0.67

Specificity	Sensitivity	PPV	NPV	Probability Cutoff
Model 1
0.6	0.84	0.50	0.89	0.15
0.7	0.82	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.80	0.61
Model 2
0.6	0.85	0.50	0.89	0.15
0.7	0.83	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.81	0.61
Model 3
0.6	0.85	0.50	0.89	0.15
0.7	0.82	0.53	0.89	0.17
0.8	0.74	0.65	0.87	0.41
0.9	0.50	0.71	0.79	0.67

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

Table 5.

Open in new tab

Sensitivity, Specificity, Positive Predictive Value, and Negative Predictive Value for the 3 Prediction Models for Viral-Only Diarrhea

Specificity	Sensitivity	PPV	NPV	Probability Cutoff
Model 1
0.6	0.84	0.50	0.89	0.15
0.7	0.82	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.80	0.61
Model 2
0.6	0.85	0.50	0.89	0.15
0.7	0.83	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.81	0.61
Model 3
0.6	0.85	0.50	0.89	0.15
0.7	0.82	0.53	0.89	0.17
0.8	0.74	0.65	0.87	0.41
0.9	0.50	0.71	0.79	0.67

Specificity	Sensitivity	PPV	NPV	Probability Cutoff
Model 1
0.6	0.84	0.50	0.89	0.15
0.7	0.82	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.80	0.61
Model 2
0.6	0.85	0.50	0.89	0.15
0.7	0.83	0.56	0.89	0.17
0.8	0.77	0.64	0.88	0.33
0.9	0.53	0.71	0.81	0.61
Model 3
0.6	0.85	0.50	0.89	0.15
0.7	0.82	0.53	0.89	0.17
0.8	0.74	0.65	0.87	0.41
0.9	0.50	0.71	0.79	0.67

Abbreviations: NPV, negative predictive value; PPV, positive predictive value.

Sensitivity Analyses

In sensitivity analysis, different Ct values were used to attribute viral-only etiology (Supplementary Table 5). Model performance at varying thresholds was similar to the original cutoff of <30. Using Ct <25, model discrimination was similar with AUC of 0.82 (95% CI, .80–.84), 0.81 (95% CI, .80–.84), and 0.80 (95% CI, .79–.82) among model 1, 2, and 3, respectively (Supplementary Table 6). Using Ct <35, AUC was 0.80 (95% CI, .78–.82), 0.80 (95% CI, .78–.82), and 0.78 (95% CI, .76–.80) among model 1, 2, and 3, respectively (Supplementary Table 7). Using binary categorization for age (age <4 vs ≥4 years), the models performed nearly identically to the original models using the a priori clinically determined age categories with AUC 0.82 (95% CI, .80–.84) for model 1 and 2 and AUC 0.82 (95% CI, .78–.82) in model 3 (Supplementary Table 8).

DISCUSSION

In this study, we have derived and externally validated several clinical prediction models that can adequately discriminate viral-only diarrhea from other etiologies using relatively few, easily collected predictor variables that are commonly obtained during routine clinical care, using data from these 2 large surveillance systems in Bangladesh. Our findings provide an improved understanding of the key predictors of viral-only diarrhea etiology among patients of all ages and add insights regarding the prevalence of viral-only diarrhea etiologies in older age groups, which may allow for more judicious use of antibiotics.

Prior research has shown that up to 50%–80% of children <5 years of age inappropriately receive antibiotics for diarrhea, the majority of whom have viral etiologies [19, 41–44]. Additionally, a reanalysis of results from the Global Enteric Multicenter (GEMS) study estimated that there were approximately 12.6 inappropriate-treated diarrhea cases for each appropriately treated case, with viruses among the leading etiologies inappropriately treated with antibiotics [45]. While multiple factors contribute to high levels of antibiotic use, 1 major contributing factor is a lack of evidence-based tools to assist clinicians with assessing the risk of viral diarrhea etiology, which may subsequently influence antibiotic prescribing behaviors. Clinical prediction models such as those developed in this study may provide clinicians with evidence-based tools to better identify patients with viral-only diarrhea and increase their confidence in following guidelines when antibiotic use is not indicated. Recently, a randomized crossover trial of a clinical decision support tool for viral diarrhea prediction among pediatric patients in Bangladesh and Mali found that a 10% increase in predicted probability of viral-only diarrhea was associated with a 14% decrease in the odds of antibiotic prescribing [46].

Consistent with prior studies, this study shows that age is a major predictor of viral-only etiology diarrhea, with the predicted odds of viral-only etiology decreased with increasing patient age, particularly dropping off after early childhood [47]. In our sensitivity analysis using CART to discretize age, age was optimally split into a binary variable, differentiating young children (<4 years) versus older children and adults (≥4 years), and performed nearly as well as the original age categories, indicating that the age remains one of the most important epidemiological predictors of viral-only diarrhea. However, viral diarrhea still constitutes a substantial burden of diarrhea in adults, who are often presumed to have nonviral diarrhea by clinicians based on age alone. In our study, nearly 1 in 5 adults aged 18–55 years had viral-only pathogens detected on TAC PCR, indicating that viruses remain an important cause of diarrhea in older populations. This finding may be driven by the high incidence of viral pathogens in young children, as this age group often consists of parents and caregivers. Our findings suggest that clinicians should simultaneously maintain a high index of suspicion for viral diarrhea in infants and young children, while avoiding broad generalizations of presumed nonviral etiology among older children and adults. It is also important for clinicians to assess for additional clinical characteristics such as bloody stool and abdominal pain in all age groups, as currently recommended in clinical guidelines, including from the WHO [4, 36]. Models such as those developed here may assist clinicians with making more accurate assessments of risk of viral versus nonviral diarrhea, based on more than a single clinical symptom.

Bloody stool and abdominal pain predicted lower odds of viral diarrhea, consistent with prior guidelines and studies indicating their association with invasive bacterial etiologies [4, 36, 48]. Our simplified prediction model (model 3) consisting of only age, abdominal pain, and bloody stool performed nearly as well as the alternative models. Model 2, which eliminated the variables of diarrhea severity and vomiting, performed nearly identically to model 1, which contained all candidate variables. We found that diarrhea severity and vomiting were not predictive of viral-only diarrhea, indicating that assessing for these symptoms is of limited benefit for risk prediction of viral etiology diarrhea. Given the models’ similar performance, the simplified model may be most feasible for use in high-volume and highly resource-constrained settings where a rapid assessment using only a few variables is desirable, while model 2 is optimal by adding a clinical assessment of dehydration and accounting for seasonal variations and could be used when resources allow.

Season was found to be predictive of viral-only etiology, with presentation during the monsoon versus winter season having a lower predicted odds of viral-only diarrhea. This finding is consistent with the known seasonal cycles of diarrhea in Bangladesh, with rotavirus peaks typically occurring in the winter and cholera peaks in the beginning and end of monsoon season [49–51]. Interestingly, severe dehydration (vs no dehydration) was found to be a positive predictor of viral-only etiology of diarrhea. This was unexpected, as some bacterial causes of diarrhea (eg, cholera) are characterized by high-volume stool output leading to severe dehydration very rapidly [4]. This may be related to those with viral diarrhea having a longer duration of symptoms and perhaps be reflective of greater delays in seeking care with viral diarrhea, which may be more insidious, in contrast to bacterial causes, which may cause more rapid-onset and alarming symptoms prompting care-seeking.

This study has several limitations. The data were collected exclusively at hospitals and therefore may be skewed toward patients with more severe illness; these study findings may therefore not be generalizable to patients presenting to outpatient clinics. Additionally, while the data were collected from diverse settings throughout Bangladesh, the prediction model may not be generalizable to other countries with different epidemiology of diarrhea, particularly locations that have initiated new rotavirus vaccination campaigns.

As evidenced by the modest pseudo R², other predictor variables associated with viral diarrhea etiology may have improved the performance of the model; however, this was not possible given the retrospective nature of this study. For example, while data on history of fever were collected in the derivation dataset, this variable was not collected in the validation dataset, which precluded its inclusion in the models. However, given the variable association of fever with both viral and bacterial pathogens (eg, Shigella and Salmonella often cause fever, whereas other bacteria such as V cholerae rarely do), the inclusion of this variable is of questionable utility [36]. Further research using a wider array of easily collected clinical variables may improve model performance across diverse settings.

There is currently no existing gold standard to attribute etiology using TAC, therefore a Ct value of <30 was set a priori based on expert consultation, although this may not be the ideal cutoff. However, our sensitivity analysis using varying Ct values found similar results to the primary analysis, showing that this threshold is likely reasonable when using similar methodology. Additionally, differences in the age, sex, and symptomatology of the populations that visit the hospitals in the 2 datasets indicate that the epidemiology of diarrhea may vary depending on healthcare facility type and may have contributed to decreased performance of the models in external validation.

Last, while the aim of this study was to develop a prediction model for risk of viral-only etiology, conversely a prediction model for nonviral/bacterial causes was considered. However, we opted to predict viral-only etiologies since this information would be most clinically actionable (ie, clinicians could avoid prescribing antibiotics given consensus on the nonuse of antibiotics for viral diarrhea) whereas there is no uniform standard for treatment of bacterial causes, although most cases still do not warrant antibiotic use per WHO recommendations. However, further research examining the preferences of clinicians on the type of prediction model output for use in clinical decision-making is needed.

CONCLUSIONS

We have derived and retrospectively validated several clinical prediction models for viral-only diarrhea in Bangladesh that can be applied to patients of all ages. Further research to develop this model into a mobile application may help clinicians identify patients with viral diarrhea who do not warrant antibiotics, without the need for laboratory stool diagnostics, thereby supporting the reduction of inappropriate antibiotic use.

Supplementary Data

Supplementary materials are available at Open Forum Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.

Notes

Acknowledgments. The authors thank all participants who participated in this study.

Disclaimer. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funders. The funders had no role in the study design, data collection, data analysis, interpretation of data, or in the writing or decision to submit the manuscript for publication.

Ethical approval. As only de-identified data were used, institutional review board approval was not required for this study.

Data availability. The de-identified dataset is available by reasonable request to the corresponding author.

Financial support. This work was supported by the Bill & Melinda Gates Foundation (grant number OPP1131107 to F. Q.) and the National Institute of Allergy and Infectious Diseases (grant number R01AI135114 to D. L.).

References

James

Abate

, et al.

Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017

Lancet

2018

;

392

1789

–

858

Month:	Total Views:
May 2023	2
June 2023	12
July 2023	170
August 2023	151
September 2023	67
October 2023	48
November 2023	28
December 2023	51
January 2024	48
February 2024	37
March 2024	39
April 2024	39
May 2024	49
June 2024	48
July 2024	42
August 2024	42
September 2024	36
October 2024	32
November 2024	18
December 2024	20
January 2025	25
February 2025	15
March 2025	27
April 2025	24
May 2025	1

Article Contents

Derivation and External Validation of a Clinical Prediction Model for Viral Diarrhea Etiology in Bangladesh

Abstract

METHODS

Study Design and Setting

Study Populations: Derivation and Validation Datasets

Predictor Variables

Microbiological Testing and Attribution of Diarrhea Etiology

Data Analysis, Model Development, and Validation

Sensitivity Analysis

RESULTS

Participant Characteristics

Diarrhea Etiology

Sensitivity Analyses

DISCUSSION

CONCLUSIONS

Supplementary Data

Notes

References

Author notes

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

More on this topic

Related articles in PubMed

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only