Development and validation of a deep learning algorithm for the prediction of serum creatinine in critically ill patients

Abstract

Objectives

Serum creatinine (SCr) is the primary biomarker for assessing kidney function; however, it may lag behind true kidney function, especially in instances of acute kidney injury (AKI). The objective of the work is to develop Nephrocast, a deep-learning model to predict next-day SCr in adult patients treated in the intensive care unit (ICU).

Materials and Methods

Nephrocast was trained and validated, temporally and prospectively, using electronic health record data of adult patients admitted to the ICU in the University of California San Diego Health (UCSDH) between January 1, 2016 and June 22, 2024. The model features consisted of demographics, comorbidities, vital signs and laboratory measurements, and medications. Model performance was evaluated by mean absolute error (MAE) and root-mean-square error (RMSE) and compared against the prediction day’s SCr as a reference.

Results

A total of 28 191 encounters met the eligibility criteria, corresponding to 105 718 patient-days. The median (interquartile range [IQR]) MAE and RMSE in the internal test set were 0.09 (0.085-0.09) mg/dL and 0.15 (0.146-0.152) mg/dL, respectively. In the prospective validation, the MAE and RMSE were 0.09 mg/dL and 0.14 mg/dL, respectively. The model’s performance was superior to the reference SCr.

Discussion and Conclusion

Our model demonstrated good performance in predicting next-day SCr by leveraging clinical data routinely collected in the ICU. The model could aid clinicians in in identifying high-risk patients for AKI, predicting AKI trajectory, and informing the dosing of renally eliminated drugs.

Lay Summary

Assessment of kidney function is important in critically ill patients. Serum creatinine (SCr) is the most common biomarker used for this purpose; however, it may lag behind kidney function. To overcome this limitation, we have developed and prospectively validated a machine learning model to predict next-day SCr in critically ill adult patients using data derived from the University of California San Diego Health System. The model features consisted of demographics, comorbidities, vital signs, laboratory measurements, and medications. When tested, our model demonstrated good performance in predicting next-day SCr by leveraging data routinely collected in the hospital system. The model could aid clinicians in identifying high-risk patients for kidney dysfunctions and guide the dosing of drugs affected by kidney function.

Background

Acute kidney injury (AKI) is a major source of mortality and morbidity in hospitalized patients.^1–4 It has been estimated that 40%-60% of patients will experience at least one AKI event during their intensive care unit (ICU) stay.⁵^,⁶ The management of AKI is an ongoing challenge, especially in critically ill patients.⁷ Drugs cleared by the kidneys often have a narrow therapeutic window for efficacy without causing adverse reactions.⁸ The margin for error is even lower in the presence of nephrotoxic drugs, where the therapeutic window is frequently shifting due to AKI.^9–13 Accurate assessment of glomerular filtration rate (GFR) is crucial when initiating and adjusting the dose of renally eliminated drugs in patients with AKI.^14–17 Broadly, there are 2 ways to estimate kidney function: measuring the urinary clearance of solutes and estimating clearance based on observed serum levels of solutes. Because solute clearance varies based on a person’s body size, clearance can be indexed to body surface area to produce an estimated GFR as another measure of kidney function.¹⁸

A common urine-based method of estimating kidney function is creatinine clearance (CrCl), which measures the clearance using a 24-h urine collection paired with a serum creatinine (SCr) measurement.^19–21 The inconvenience of collecting urine over such a long time period limits its utility in clinical settings.^22–24 To mitigate this limitation, several statistical equations have been developed to estimate kidney function using a single SCr measurement, including the Cockcroft-Gault (CG),²⁵ and Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equations.^26–28 Both the CrCl and the SCr-based estimation equations rely on the assumption of steady-state SCr. While this assumption is reasonable in ambulatory settings, it is rarely met in the ICU, where substantial fluctuations in kidney function occur on 35%-40% of patient-days in the ICU.²⁹ The reason these equations are inaccurate during AKI is that the change in SCr lags behind the true kidney function, especially in rapidly progressing AKI.^30–32 If we could anticipate the future SCr, then the SCr-based estimation equation would more accurately reflect the GFR.

While many machine learning models have been previously developed to predict the onset of AKI based on SCr change, these models typically do not estimate future SCr.^33–36 This limits the utility of these models in predicting AKI trajectory, differentiating between transient and persistent AKI, and informing the dosing of renally eliminated drugs. Achieving an accurate and dynamic prediction of kidney function could address these constraints. In this study, we aim to develop and validate Nephrocast, a deep-learning model capable of accurately predicting next-day SCr levels in critically ill adult patients.

Materials and methods

Study cohort

Data were extracted from the electronic health record (EHR) system of the University of California San Diego (UCSD) Health System between January 1, 2016 and June 30, 2023 for Nephrocast development and internal validation. The UCSD Health System consists of 2 academic medical centers, including a Level I Trauma Center, that provide critical care across a wide range of specialties, including medical, cardiovascular, and surgical ICUs. Data were collected from July 1, 2023 to November 31, 2023 to perform temporal validation. Nephrocast was validated prospectively using data collected from January 1, 2024 to June 22, 2024. In this study, patients were eligible for inclusion if their age was ≥18 years old and had spent a minimum of 24 h in an ICU. Patients were excluded if they had a diagnosis of chronic kidney disease (CKD) stage 5 or end-stage kidney disease (ESKD). Patient-days were excluded if the patient received renal replacement therapy (RRT) on the prediction day or the last 7 days from the prediction day, and if the ICU stay extended beyond 14 days. The University of California San Diego Institutional Review Board (IRB) approval was obtained with the waiver of informed consent (#800257).

Outcome definition

The outcome variable, next-day SCr, was defined as the SCr value measured at 6:00 am on the next patient-day. If no measurement was available at 6:00 am, the closest SCr measurement within a 6-h range was selected. If the SCr measurement for the following day was missing within a 6-h window centered around 6:00 am, no prediction was made during model training.

Clinical features as predictors

Variables consisted of 50 vital signs and laboratory measurements, 6 demographic features, 11 Systemic Inflammatory Response Syndrome (SIRS) and Sequential Organ Failure Assessment (SOFA) criteria, 11 medication features, and 62 comorbidities. Vital signs and laboratory variables were compiled at an hourly resolution into non-overlapping bins with the median value utilized for variables with multiple measurements per hour. Old values were carried forward for up to 24 h if no new measurements were available. All remaining missing variables were imputed using the mean. For each vital sign and laboratory measurement, an additional 2 features consisting of the slope of change and mean value over the previous 72 h were calculated.

Model development and validation

The initial model of Nephrocast consisted of a feedforward neural network with 2 hidden layers of size 128 and 64 units. To enhance temporal focus and capture relevant time-based patterns, we integrated a multi-headed attention layer. This attention mechanism allowed the model to prioritize critical timestamps, improving its predictive accuracy. The final layer of the model produces a single value, representing the predicted next-day SCr. Training was conducted using L2 regularization and the Adam optimizer, with hyperparameters optimized through Bayesian hyperoptimization.³⁷^,³⁸

To evaluate the Nephrocast’s performance, we employed 10-fold cross-validation, ensuring robust and unbiased performance assessment with a 90:10 split at the encounter level for training and testing in each fold. This approach grouped each encounter’s prediction days entirely into either the training or testing set to prevent data leakage and ensure the integrity of our evaluation. The best-performing model from cross-validation was further used for temporal validation.

To validate Nephrocast in a production environment for real-time performance assessment, we leveraged an existing cloud-based infrastructure designed to directly access UCSDH EHR data using Fast Healthcare Interoperability Resources (FHIR) and Health Level 7 (HL7) standards with OAuth 2.0 authentication, as previously described by Boussina et al.³⁹ The schematic diagram of this “silent mode” prospective validation environment is shown in Figure 1. The input feature set (including demographics, comorbidities, vital signs, laboratory measurements, and medications) was extracted by the platform from January 1, 2024 to June 22, 2024 and passed to Nephrocast to predict next-day SCr levels.

Figure 1.

Schematic diagram of the Nephrocast prospective validation pipeline. Abbreviations: AWS = amazon web services; EC2 = elastic compute cloud; FHIR = fast healthcare interoperability resources; HL7 = health level 7; RDS = relational database service.

Open in new tab Download slide

To interpret the model’s predictions, we calculated feature importance scores, which quantify the relative contribution of each input feature to the model’s output. Higher importance scores indicate a greater influence of the corresponding feature on the predicted outcome. For instance, a feature with a high importance score suggests that variations in this feature have a substantial impact on the predicted next-day SCr, underscoring its clinical relevance.⁴⁰

Evaluation of predicted SCr

In the context of this research, reference SCr was defined as the laboratory measured SCr on the day of making the prediction. Predicted SCr was defined as the SCr predicted by Nephrocast for the next patient-day. Measured SCr was defined as the laboratory measured SCr on the next patient-day. To further illustrate these terms, consider a hypothetical scenario involving a patient in the ICU whose SCr levels on the second and third days of their ICU stay are 1.0 mg/dL and 1.5 mg/dL, respectively. The goal is to predict SCr on the third day. In this case, the reference SCr would be 1.0 mg/dL, and the measured SCr would be 1.5 mg/dL. The predicted SCr would be Nephrocast prediction for day 3.

Error was defined as the difference between predicted SCr and measured SCr.

Error = Predicted SCr - Measured SCr .

Errors were summarized using mean absolute error (MAE) and root mean squared error (RMSE), with the latter being more sensitive to large errors.

\begin{matrix} M AE = \frac{\sum_{i = 1}^{N} | {Predicted SCr}_{i} - {Measured SCr}_{i} |}{N}, \\ RMSE = \sqrt{\frac{\sum_{i = 1}^{N} {({Predicted SCr}_{i} - {Measured SCr}_{i})}^{2}}{N}} \end{matrix},

where $N$ is the number of analyzed SCr observations.

Similar to the approach of Huang et al, reference SCr served as a baseline model, and its performance was compared against that of measured SCr to assess the clinical usefulness of the model. This comparison was made under the assumption that the SCr level would remain consistent between prediction’s day and next-day, reflecting standard clinical practice.⁴¹ Additionally, we trained a multivariable linear regression model with L2 regularization using Nephrocast’s predictors, and its performance was evaluated against that of Nephrocast. To assess the performance of the Nephrocast in patients with significant fluctuating kidney function, we evaluated the Nephrocast performance on days of unstable kidney function, defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient day. Bland-Altman plots were used to assess the difference between predicted and measured SCr.⁴²

AKI definition and staging

AKI was defined according to the 2012 Kidney Disease Improving Global Outcomes (KDIGO) AKI guidelines criteria using the peak-to-baseline SCr ratio.⁸ Baseline SCr was defined as SCr at hospital admission. The urine output criterion was not implemented due to the sparsity of data.

Descriptive analyses and software

Patient characteristics were described and summarized using descriptive statistics such as mean (SD), median (interquartile range [IQR]), or counts (%), where appropriate. Continuous variables were analyzed using the Wilcoxon rank-sum test. All hypotheses were two-sided, and significance levels were set at the 5% level. Python 3.10 was used for analysis. NumPy 1.23.5 was used for all data preprocessing. The deep learning model was implemented using TensorFlow 2.13.0.⁴³

Results

Study population

A total of 25 243 encounters met the eligibility criteria in the training dataset, corresponding to 95 111 patient-days. In the training set, males represented 60.8% of the cohort with a median (IQR) age of 61.5 (48.4-71.5) years. The most common ethnicity was White (12 781, 50.63%). The median (IQR) days of ICU stays was 3.67 (2.26-6.54). During their hospital stay, 94.4% of the patients were admitted to the medical ICU at some point, and about 37.8% of the patients were admitted to the surgical ICU at some point during their hospital stay. The median (IQR) baseline SCr was 0.68 (0.52-0.89) mg/dL. Overall, 10.5% of patients had a diagnosis of CKD. The median (IQR) SOFA II score was 5.^3–8 The mortality rate during hospital stay was 6.16%. The percentage of patients who developed AKI was 46.4%, and the percentage of patients with AKI stage I, II, and III were 25.5%, 12.8%, and 8.1%, respectively. The percentage of unstable measurements represented 11.5% of all measurements (Table 1). The demographic and clinical characteristics of the temporal and prospective validation cohorts were comparable to those of the training cohort.

Table 1.

Open in new tab

Patient characteristics.

Variable	Training (N = 25 243)	Temporal validation (N = 1378)	Prospective validation (N = 1570)
Patient-days, (N)	95 111	5153	5454
Age, median (IQR), years	61.5 (48.4-71.5)	61.1 (46.1-70.8)	63.5 (50.6-73.2)
Sex, n (%)
Male	15 348 (60.8%)	831 (60.3%)	937 (59.7%)
Female	9905 (39.2%)	547 (39.7%)	633 (40.3%)
Ethnicity, n (%)
Black	1913 (7.58%)	99 (7.18%)	99 (6.31%)
White	12 781 (50.63%)	644 (46.73%)	782 (49.81%)
Asian	1477 (5.85%)	88 (6.39%)	109 (6.94%)
Other	9072 (35.94%)	547 (39.64%)	580 (36.94%)
SOFA score, median (IQR)	5 (3-8)	5 (3-8)	5 (2-9)
ICU length of stay, median (IQR), days	3.67 (2.26-6.54)	3.83 (2.35-7.18)	3.77 (2.49-6.80)
Unit type, n^a
MICU	23 817	1454	1563
Other	4687	273	246
ICU mortality, N (%)	1555 (6.16%)	61 (4.43%)	81 (5.16%)
Baseline SCr, median (IQR)^b	0.68 (0.52-0.89)	0.65 (0.49-0.87)	0.67 (0.51-0.89)
AKI, stage, n (%)^c
I	6443 (25.5%)	382 (27.7%)	421 (26.8%)
II	3230 (12.8%)	182 (13.2%)	234 (14.9%)
III	2049 (8.1%)	121 (8.8%)	130 (8.3%)
Unstable patient-days, n (%)^d	10 948 (11.51 %)	616 (11.95%)	633 (9.81%)
Comorbidities, n (%)
Anemia	1211 (4.8%)	86 (6.2%)	116 (7.4%)
Chronic kidney disease	1281 (10.5%)	97 (7.0%)	114 (7.3%)
Coronary artery disease	2034 (8.1%)	115 (8.3%)	178 (11.3%)
Diabetes	2438 (9.7%)	150 (10.9%)	211 (13.4%)
Hypertension	3697 (14.6%)	236 (17.1%)	321 (20.4%)
Liver disease	116 (0.5%)	10 (0.7%)	21 (1.3%)
Severe sepsis/septic shock	2039 (8.1%)	103 (7.5%)	103 (6.6%)
Mechanical ventilation, N (%)	11 468 (45.43%)	667 (48.40%)	769 (48.98%)

Variable	Training (N = 25 243)	Temporal validation (N = 1378)	Prospective validation (N = 1570)
Patient-days, (N)	95 111	5153	5454
Age, median (IQR), years	61.5 (48.4-71.5)	61.1 (46.1-70.8)	63.5 (50.6-73.2)
Sex, n (%)
Male	15 348 (60.8%)	831 (60.3%)	937 (59.7%)
Female	9905 (39.2%)	547 (39.7%)	633 (40.3%)
Ethnicity, n (%)
Black	1913 (7.58%)	99 (7.18%)	99 (6.31%)
White	12 781 (50.63%)	644 (46.73%)	782 (49.81%)
Asian	1477 (5.85%)	88 (6.39%)	109 (6.94%)
Other	9072 (35.94%)	547 (39.64%)	580 (36.94%)
SOFA score, median (IQR)	5 (3-8)	5 (3-8)	5 (2-9)
ICU length of stay, median (IQR), days	3.67 (2.26-6.54)	3.83 (2.35-7.18)	3.77 (2.49-6.80)
Unit type, n^a
MICU	23 817	1454	1563
Other	4687	273	246
ICU mortality, N (%)	1555 (6.16%)	61 (4.43%)	81 (5.16%)
Baseline SCr, median (IQR)^b	0.68 (0.52-0.89)	0.65 (0.49-0.87)	0.67 (0.51-0.89)
AKI, stage, n (%)^c
I	6443 (25.5%)	382 (27.7%)	421 (26.8%)
II	3230 (12.8%)	182 (13.2%)	234 (14.9%)
III	2049 (8.1%)	121 (8.8%)	130 (8.3%)
Unstable patient-days, n (%)^d	10 948 (11.51 %)	616 (11.95%)	633 (9.81%)
Comorbidities, n (%)
Anemia	1211 (4.8%)	86 (6.2%)	116 (7.4%)
Chronic kidney disease	1281 (10.5%)	97 (7.0%)	114 (7.3%)
Coronary artery disease	2034 (8.1%)	115 (8.3%)	178 (11.3%)
Diabetes	2438 (9.7%)	150 (10.9%)	211 (13.4%)
Hypertension	3697 (14.6%)	236 (17.1%)	321 (20.4%)
Liver disease	116 (0.5%)	10 (0.7%)	21 (1.3%)
Severe sepsis/septic shock	2039 (8.1%)	103 (7.5%)	103 (6.6%)
Mechanical ventilation, N (%)	11 468 (45.43%)	667 (48.40%)	769 (48.98%)

Patients may undergo transfers to various units throughout their hospitalization.

Baseline SCr was defined as the first measurement during hospital stay.

AKI was defined according to the 2012 Kidney Disease Improving Global Outcomes (KDIGO) criteria using the peak-to-baseline SCr ratio⁸.

Unstable patient-days were defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient-day.

Abbreviations: AKI = acute kidney injury; ICU = intensive care until; MICU = medical intensive care unit; SCr = serum creatinine; SICU = surgical intensive care unit; SOFA = sequential organ failure assessment.

Table 1.

Open in new tab

Patient characteristics.

Variable	Training (N = 25 243)	Temporal validation (N = 1378)	Prospective validation (N = 1570)
Patient-days, (N)	95 111	5153	5454
Age, median (IQR), years	61.5 (48.4-71.5)	61.1 (46.1-70.8)	63.5 (50.6-73.2)
Sex, n (%)
Male	15 348 (60.8%)	831 (60.3%)	937 (59.7%)
Female	9905 (39.2%)	547 (39.7%)	633 (40.3%)
Ethnicity, n (%)
Black	1913 (7.58%)	99 (7.18%)	99 (6.31%)
White	12 781 (50.63%)	644 (46.73%)	782 (49.81%)
Asian	1477 (5.85%)	88 (6.39%)	109 (6.94%)
Other	9072 (35.94%)	547 (39.64%)	580 (36.94%)
SOFA score, median (IQR)	5 (3-8)	5 (3-8)	5 (2-9)
ICU length of stay, median (IQR), days	3.67 (2.26-6.54)	3.83 (2.35-7.18)	3.77 (2.49-6.80)
Unit type, n^a
MICU	23 817	1454	1563
Other	4687	273	246
ICU mortality, N (%)	1555 (6.16%)	61 (4.43%)	81 (5.16%)
Baseline SCr, median (IQR)^b	0.68 (0.52-0.89)	0.65 (0.49-0.87)	0.67 (0.51-0.89)
AKI, stage, n (%)^c
I	6443 (25.5%)	382 (27.7%)	421 (26.8%)
II	3230 (12.8%)	182 (13.2%)	234 (14.9%)
III	2049 (8.1%)	121 (8.8%)	130 (8.3%)
Unstable patient-days, n (%)^d	10 948 (11.51 %)	616 (11.95%)	633 (9.81%)
Comorbidities, n (%)
Anemia	1211 (4.8%)	86 (6.2%)	116 (7.4%)
Chronic kidney disease	1281 (10.5%)	97 (7.0%)	114 (7.3%)
Coronary artery disease	2034 (8.1%)	115 (8.3%)	178 (11.3%)
Diabetes	2438 (9.7%)	150 (10.9%)	211 (13.4%)
Hypertension	3697 (14.6%)	236 (17.1%)	321 (20.4%)
Liver disease	116 (0.5%)	10 (0.7%)	21 (1.3%)
Severe sepsis/septic shock	2039 (8.1%)	103 (7.5%)	103 (6.6%)
Mechanical ventilation, N (%)	11 468 (45.43%)	667 (48.40%)	769 (48.98%)

Variable	Training (N = 25 243)	Temporal validation (N = 1378)	Prospective validation (N = 1570)
Patient-days, (N)	95 111	5153	5454
Age, median (IQR), years	61.5 (48.4-71.5)	61.1 (46.1-70.8)	63.5 (50.6-73.2)
Sex, n (%)
Male	15 348 (60.8%)	831 (60.3%)	937 (59.7%)
Female	9905 (39.2%)	547 (39.7%)	633 (40.3%)
Ethnicity, n (%)
Black	1913 (7.58%)	99 (7.18%)	99 (6.31%)
White	12 781 (50.63%)	644 (46.73%)	782 (49.81%)
Asian	1477 (5.85%)	88 (6.39%)	109 (6.94%)
Other	9072 (35.94%)	547 (39.64%)	580 (36.94%)
SOFA score, median (IQR)	5 (3-8)	5 (3-8)	5 (2-9)
ICU length of stay, median (IQR), days	3.67 (2.26-6.54)	3.83 (2.35-7.18)	3.77 (2.49-6.80)
Unit type, n^a
MICU	23 817	1454	1563
Other	4687	273	246
ICU mortality, N (%)	1555 (6.16%)	61 (4.43%)	81 (5.16%)
Baseline SCr, median (IQR)^b	0.68 (0.52-0.89)	0.65 (0.49-0.87)	0.67 (0.51-0.89)
AKI, stage, n (%)^c
I	6443 (25.5%)	382 (27.7%)	421 (26.8%)
II	3230 (12.8%)	182 (13.2%)	234 (14.9%)
III	2049 (8.1%)	121 (8.8%)	130 (8.3%)
Unstable patient-days, n (%)^d	10 948 (11.51 %)	616 (11.95%)	633 (9.81%)
Comorbidities, n (%)
Anemia	1211 (4.8%)	86 (6.2%)	116 (7.4%)
Chronic kidney disease	1281 (10.5%)	97 (7.0%)	114 (7.3%)
Coronary artery disease	2034 (8.1%)	115 (8.3%)	178 (11.3%)
Diabetes	2438 (9.7%)	150 (10.9%)	211 (13.4%)
Hypertension	3697 (14.6%)	236 (17.1%)	321 (20.4%)
Liver disease	116 (0.5%)	10 (0.7%)	21 (1.3%)
Severe sepsis/septic shock	2039 (8.1%)	103 (7.5%)	103 (6.6%)
Mechanical ventilation, N (%)	11 468 (45.43%)	667 (48.40%)	769 (48.98%)

Patients may undergo transfers to various units throughout their hospitalization.

Baseline SCr was defined as the first measurement during hospital stay.

AKI was defined according to the 2012 Kidney Disease Improving Global Outcomes (KDIGO) criteria using the peak-to-baseline SCr ratio⁸.

Unstable patient-days were defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient-day.

Model performance

Nephrocast exhibited a small prediction error and outperformed reference SCr in internal testing, temporal validation, and prospective validation (Table 2). The median (IQR) MAE and RMSE across all days in the internal test dataset were 0.09 (0.085-0.09) mg/dL and 0.15 (0.146-0.152) mg/dL compared to MAE of 0.13 (0.131-0.135) mg/dL and RMSE of 0.25 (0.245-0.253) mg/dL for reference SCr. In unstable days, the median (IQR) MAE and RMSE in the internal test set were 0.20 (0.197-0.203) mg/dL and 0.31 (0.307, 0.327) mg/dL, respectively, and were superior to the MAE of 0.54 (0.532-0.548) mg/dL and RMSE of 0.67 (0.663, 0.694) mg/dL of reference SCr. The model performance in the training set is shown in Table S1.

Table 2.

Open in new tab

Summary of model performance on all days and unstable days.

	All days			Unstable days
Test set
	Nephrocast	Reference SCr	P-value	Nephrocast	Reference SCr	P-value
MAE, median (IQR), mg/dL	0.09 (0.085, 0.09)	0.13 (0.131, 0.135)	<.01	0.20 (0.197, 0.203)	0.54 (0.532, 0.548)	<.01
RMSE, median (IQR), mg/dL	0.15 (0.146,0.152)	0.25 (0.245, 0.253)	<.01	0.31 (0.307, 0.327)	0.67 (0.663, 0.694)	<.01
Temporal validation
MAE, mg/dL	0.08	0.13	N/A	0.19	0.54	N/A
RMSE, mg/dL	0.14	0.25	N/A	0.31	0.66	N/A
Prospective validation
MAE, mg/dL	0.09	0.13	N/A	0.18	0.50	N/A
RMSE, mg/dL	0.14	0.23	N/A	0.28	0.60	N/A

	All days			Unstable days
Test set
	Nephrocast	Reference SCr	P-value	Nephrocast	Reference SCr	P-value
MAE, median (IQR), mg/dL	0.09 (0.085, 0.09)	0.13 (0.131, 0.135)	<.01	0.20 (0.197, 0.203)	0.54 (0.532, 0.548)	<.01
RMSE, median (IQR), mg/dL	0.15 (0.146,0.152)	0.25 (0.245, 0.253)	<.01	0.31 (0.307, 0.327)	0.67 (0.663, 0.694)	<.01
Temporal validation
MAE, mg/dL	0.08	0.13	N/A	0.19	0.54	N/A
RMSE, mg/dL	0.14	0.25	N/A	0.31	0.66	N/A
Prospective validation
MAE, mg/dL	0.09	0.13	N/A	0.18	0.50	N/A
RMSE, mg/dL	0.14	0.23	N/A	0.28	0.60	N/A

Reference SCr was defined as the measured SCr on the day of making the prediction. Predicted SCr was defined as the predicted SCr by our model for the next patient-day. Measured SCr was defined as the laboratory measured SCr on the next patient-day. Model error was defined as the difference between predicted SCr and measured SCr, which was compared against the difference between reference SCr and measured SCr. Error was summarized using MAE and RMSE. Unstable patient-days were defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient-day.

Abbreviations: IQR = interquartile range; MAE = mean absolute error; RMSE = root mean squared error; SCr = serum creatinine.

Table 2.

Open in new tab

Summary of model performance on all days and unstable days.

	All days			Unstable days
Test set
	Nephrocast	Reference SCr	P-value	Nephrocast	Reference SCr	P-value
MAE, median (IQR), mg/dL	0.09 (0.085, 0.09)	0.13 (0.131, 0.135)	<.01	0.20 (0.197, 0.203)	0.54 (0.532, 0.548)	<.01
RMSE, median (IQR), mg/dL	0.15 (0.146,0.152)	0.25 (0.245, 0.253)	<.01	0.31 (0.307, 0.327)	0.67 (0.663, 0.694)	<.01
Temporal validation
MAE, mg/dL	0.08	0.13	N/A	0.19	0.54	N/A
RMSE, mg/dL	0.14	0.25	N/A	0.31	0.66	N/A
Prospective validation
MAE, mg/dL	0.09	0.13	N/A	0.18	0.50	N/A
RMSE, mg/dL	0.14	0.23	N/A	0.28	0.60	N/A

	All days			Unstable days
Test set
	Nephrocast	Reference SCr	P-value	Nephrocast	Reference SCr	P-value
MAE, median (IQR), mg/dL	0.09 (0.085, 0.09)	0.13 (0.131, 0.135)	<.01	0.20 (0.197, 0.203)	0.54 (0.532, 0.548)	<.01
RMSE, median (IQR), mg/dL	0.15 (0.146,0.152)	0.25 (0.245, 0.253)	<.01	0.31 (0.307, 0.327)	0.67 (0.663, 0.694)	<.01
Temporal validation
MAE, mg/dL	0.08	0.13	N/A	0.19	0.54	N/A
RMSE, mg/dL	0.14	0.25	N/A	0.31	0.66	N/A
Prospective validation
MAE, mg/dL	0.09	0.13	N/A	0.18	0.50	N/A
RMSE, mg/dL	0.14	0.23	N/A	0.28	0.60	N/A

Abbreviations: IQR = interquartile range; MAE = mean absolute error; RMSE = root mean squared error; SCr = serum creatinine.

The model performance was comparable between training and validation. In temporal validation, Nephrocast’s MAE and RMSE across all days were 0.08 mg/dL and 0.14 mg/dL, respectively, and were superior to reference SCr MAE of 0.13 mg/dL and RMSE of 0.25 mg/dL. In unstable days, the temporal validation MAE and RMSE were 0.19 mg/dL and 0.31 mg/dL, respectively, and were superior to reference SCr MAE of 0.54 mg/dL and RMSE of 0.66 mg/dL. In the prospective cohort, Nephrocast’s MAE and RMSE across all days were 0.09 mg/dL and 0.14 mg/dL, respectively, outperforming the reference SCr MAE of 0.13 mg/dL and RMSE of 0.23 mg/dL. In unstable days, Nephrocast’s MAE and RMSE were 0.18 mg/dL and 0.28 mg/dL, respectively, and superior to the reference SCr MAE of 0.50 mg/dL and RMSE of 0.60 mg/dL.

When compared to the regularized multivariable linear regression model, Nephrocast performance was statistically superior (Table S2). The difference was specifically notable in days of unstable kidney function (MAE = 0.20 versus 0.24; RMSE = 0.31 versus 0.43 mg/dL; P <.01). Additionally, Nephrocast demonstrated consistent performance throughout the first 8 days of ICU stay (Figure 2) and across a wide range of SCr value changes (Figure 3) in both stable and unstable days.

Graphs comparing temporal trends of prediction error for all days and unstable days across development, temporal, and prospective validation cohorts.

Figure 2.

Temporal trends of prediction error for all days and unstable days. Reference SCr was defined as the measured SCr on the day of making the prediction. Predicted SCr was defined as the predicted SCr by Nephrocast on the next patient-day. Measured SCr was defined as the laboratory measured SCr on the next patient-day. Model error was defined as the difference between measured SCr and predicted SCr, which was compared against the difference between measured SCr and reference SCr. Unstable days were defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient day. Abbreviation: SCr = serum creatinine.

Open in new tab Download slide

Graphs comparing predicted and measured serum creatinine across internal test, temporal validation, and prospective validation cohorts.

Figure 3.

Bland-Altman plots of predicted and measured serum creatinine in internal test dataset, temporal validation dataset, and prospective for both all days and unstable days. Unstable days were defined as those in which there was a change of 30% or 0.3 mg/dL or more in SCr concentration between the reference SCr and the measured SCr on the following patient day. Abbreviations: SCr = serum creatinine; SD = standard deviation.

Open in new tab Download slide

Feature importance

A total of 241 features were assessed (Table S3). The top 15 most important continuous and categorical features across all days are shown in Figure 4. The most important continuous feature was the ICU length of stay. In total, 3 features were related to SCr, 2 features were related to hospital and ICU length of stay, and the remaining features were for laboratory results and vital signs. The most important categorical feature was gender. Four features were components of the SIRS score, 2 features were for ICU unit type, and the remaining features were related to comorbidities, and medical conditions. The top 15 most important features on unstable days are shown in Figure S1. In those days, SCr-related features ranked as the most important continuous features.

Graphs of top 15 features in order of importance for all days in terms of continuous and categorical features.

Figure 4.

Top 15 features in order of importance for all days in terms of continuous and categorical features. Abbreviations: CMS = centers for medicare and medicaid services; SCr = serum creatinine; SIRS = systemic inflammatory response syndrome; SOFA = sequential organ failure assessment; WBC = white blood cell.

Open in new tab Download slide

Discussion

In this study, we developed Nephrocast, a deep-learning model to predict next-day SCr in critically ill adult patients using EHR data. The model demonstrated good performance in internal testing, temporal validation, and prospective validation. It also exhibited consistent performance across different clinical scenarios, including stable days, unstable days, throughout the first 8 days of ICU stay, and over wide ranges of SCr concentrations. Our model outperformed reference SCr and penalized multivariable linear regression as baseline models, especially in days of unstable kidney function, demonstrating the advantage of deep learning methods. The results of prospective validation support the potential integration of Nephrocast into clinical workflows to evaluate its operational and clinical impact.

To date, limited work has been done to predict kidney function using machine learning techniques. Huang et al constructed a machine-learning model to predict next-day CrCl derived from measured 24-h urine collection, achieving an impressive RMSE of 18.1 (95% CI 17.9-18.3) mL/min in external validation; however, their model depends on prior CrCl measurements to make the next day prediction.⁴¹ Unlike 24-h urine collection, SCr is inexpensive to measure with a short laboratory turnaround time, making it the primary biomarker to estimate kidney function in hospital settings.⁴⁴^,⁴⁵ In clinical practice, renal drug dose adjustments have relied on estimates derived from the CG equation, and more recently, estimated GFR based on MDRD and CKD-EPI equations.⁴⁶ Although these equations assume steady-state SCr, they persist in clinical practice as the main method to guide dose adjustments in patients with AKI due to the lack of practical alternatives.⁴⁷ Non-steady state equations such Jelliffe and kinetic GFR provide a more accurate assessment of kidney function for patients in AKI.⁴⁸^,⁴⁹ These equations require 2 discrete SCr measurements at 2 different times. This situation presents an opportunity to incorporate predicted SCr from our model into these equations, thereby enhancing the ability to estimate GFR with greater accuracy.⁵⁰ Regardless of the clinician’s choice of equation, a reliable prediction of SCr is required to estimate kidney function accurately in patients who are not in a steady state.

Emerging evidence suggests that AKI classification based on early SCr trajectory could provide clinically meaningful insights into AKI risk stratification. Takkavatakarn et al have recently shown that in critically ill patients with sepsis, 8 distinct SCr trajectories exist. These trajectories varied significantly in their risk for acute kidney disease (AKD), AKD or mortality by day 7, and AKD or mortality by hospital discharge.⁵¹ Similarly, Bhatraju et al evaluated the SCr trajectory in the first 72 h of ICU in critically ill patients with AKI and demonstrated that patients with non-resolving AKI trajectories had a higher risk of mortality compared to resolving AKI trajectories.⁵² By accurately predicting next-day SCr levels, our model could inform clinicians early about the SCr trajectory, enabling a patient-specific approach to AKI management.

There is ongoing discussion regarding when antimicrobial dose adjustments should be deferred for the first 24-48 h in hospitalized patients with acute infections with known AKI, especially for agents with a wide therapeutic index, such as β-lactams.⁵³^,⁵⁴ Crass et al argued that because AKI resolves in >50% of patients within 48-72 h and SCr lags behind changes in GFR, such dosage adjustments based on equations that were derived from patients with CKD or normal renal function could result in subtherapeutic antimicrobial concentrations and potentially decreased clinical response.⁵⁵ Unlike other AKI models, which only predict AKI onset, our model predicts next-day SCr; thus offering insights into the onset and recovery of AKI, and potentially informing clinicians facing decisions regarding dosage adjustments. For example, in patients with a recent onset of AKI, if our model predicts a significant improvement in kidney function in the next 24 h, indicated by a notable decline in predicted SCr level, the clinician may consider not adjusting standard doses. Conversely, if the model predicts declining kidney function, indicated by an increase in predicted SCr level, the clinician may consider intensified monitoring and perform dose adjustment if criteria are met.⁴¹ It is also worth noting that the decision for dose adjustment is drug-dependent. For example, in drugs with a narrow therapeutic index, such as vancomycin or aminoglycosides, a conservative approach may be required with more frequent monitoring. In contrast, for wide therapeutic index drugs, such as cephalosporins, more aggressive doses may be warranted for critically ill patients. Those scenarios and considerations must be considered when developing a clinical protocol to incorporate the model predictions into clinical decision-making or conducting prospective implementation studies.

The set of important predictors identified by our model aligns with previous research to predict AKI. Song et al conducted a systematic review of AKI prediction models and showed that creatinine-related variables were the most common significant predictors across machine learning models. The authors also showed that blood urea nitrogen and urine output are predictors of importance but to a lesser extent.⁵⁶ Similarly, in Huang et al work to predict next-day CrCl, variables such as “CrCl of the previous day,” and “mean CrCl of all past days during ICU stay” were ranked as highly important.⁴¹ These findings are expected, given the strong correlation between repeated and longitudinal creatinine-based measurements in the same patient. Nonetheless, this issue might limit the utility of these models in patients whose SCr measurements are infrequent or far apart. Gender ranked as a feature of high importance in our model, which could be attributed to the gender-dependent difference in muscle mass and creatinine generation.⁵⁷ Additionally, epidemiological studies continue to confirm a higher incidence of AKI in men compared to women.⁵⁸ The relationship between gender and AKI continues to be an interesting research topic that requires further elucidation. Our model identified other clinical predictors indicative of systemic infections and organ dysfunction, underscoring the complex relationship between illness severity and SCr levels.⁵⁹

We note limitations that are important to be acknowledged. First, it is well established that creatinine has several drawbacks as a biomarker to estimate kidney function in critically ill patients. Catabolic conditions may lead to an increase or decrease in the production of creatinine.⁶⁰ Fluid resuscitation will increase clearance and dilute SCr concentration, resulting in a decline in SCr concentration. Medications commonly prescribed in the ICU settings (eg, cefazolin, albumin, dopamine) can interfere with the creatinine assay, resulting in biased results. Due to the increase in tubular secretion associated with the decline in GFR, a change in SCr will not be observed until 50% of GFR has been lost.⁶⁰ Several biomarkers have been evaluated as potential alternatives or adjuncts to SCr, but their clinical adaptation has been limited.⁶¹ Perhaps the most notable example would be Cystatin C (CysC), a low molecular weight protein that is produced constantly by all nucleated cells and filtered at the glomerulus and not reabsorbed. Unlike SCr, CysC is less affected by sex, muscle mass, nutritional status, and frailty. In a study conducted at the Mayo Clinic hospital, CysC utilization in the ICU has been shown to increase from 4 tests/1000 patient-days in 2011 to 44 tests/1000 patient-days in 2018 and was assessed 6.4-fold more in ICU patients compared non-ICU patients.⁶² While the increase in CysC utilization is considerable, it is far from comparable to SCr utilization. Additionally, the Food and Drug Administration guidance on pharmacokinetic studies in kidney disease recommends using SCr and contemporary steady-state equations to estimate kidney function for drug labeling purposes. Drug manufacturers have not yet incorporated the use of CysC in their pharmacokinetic studies and drug labeling dose recommendations. Given the slow penetrance of routinely measuring CysC in clinical settings and lack of drug dosing guidance based on CysC, SCr will continue to serve as the standard biomarker to estimate kidney function and guide drug dosing, emphasizing the need for machine-learning models that can compensate for the shortcomings of SCr.

Second, although we included a prospective validation, our model has not been externally validated, limiting the generalizability of our findings and potentially the performance of our model if implemented in a different health system.⁶³^,⁶⁴ Additionally, further work is required to investigate the integration of Nephrocast’s predictions into clinical practice and develop a best practice advisory in the EHR system.⁶⁵^,⁶⁶ Third, the exclusion of patients of encounters with RRT on the predication day or the last 7 days, CKD stage 5, ESKD, and encounters after 14 days from the ICU admission date might have introduced a selection bias. While these decisions were made to ensure the reliability of our model, they further contribute to limiting the generalizability of this work. Fourth, potential predictors of AKI that were not available through the data pipeline were not included in the model. The inclusion of nephrotoxic drug count, concentration, and the drug’s risk of nephrotoxicity have been shown to be significant predictors in AKI models.^67–71 Similarly, undergoing major surgical procedures is known to increase the risk of AKI.⁷² The inclusion of these predictors could potentially improve the performance of Nephrocast. Lastly, Previous clinical trials involving EHR alerts did not demonstrate a statistically significant benefit on AKI-related outcomes despite showing that these alerts were associated with discontinuing nephrotoxic medications, increasing fluid resuscitation, optimizing hemodynamic parameters, and timely nephrologist consultations.⁶⁶^,⁷³ Subsequent research should prioritize identifying patient subgroups that would benefit from these clinical interventions.⁷⁴

Conclusions

By leveraging clinical data routinely collected in the ICU, we developed a deep learning model to predict next-day SCr in critically ill adult patients. Our model demonstrated superior performance compared to the reference SCr, especially in cases of unstable kidney function. This capability holds promise in assisting clinicians in identifying high-risk patients for AKI, predicting AKI trajectory, and informing the dosing of renally eliminated drugs. Further work is needed to externally validate the model’s performance, explore its clinical applications, and integrate SCr prediction into clinical workflows and decision-making.

Author contributions

Ghodsieh Ghanbari, Jonathan Y. Lam, Supreeth P. Shashikumar, Atul Malhotra, Shamim Nemati, and Zaid Yousif conceptualized and designed the study. Ghodsieh Ghanbari and Jonathan Y. Lam performed data extraction and analysis. Ghodsieh Ghanbari, Jonathan Y. Lam, Supreeth P. Shashikumar, Linda Awdishu, Karandeep Singh, Atul Malhotra, Shamim Nemati, and Zaid Yousif were involved in data interpretation and writing the manuscript. All authors read and approved the final manuscript.

Supplementary material

Supplementary material is available at JAMIA Open online.

Funding

G.G. and J.Y.L. acknowledge grant funding from the United States National Library of Medicine (award number T15LM011271). S.N. acknowledges grant funding from the United States National Library of Medicine (award number R01LM013998) and the United States National Institute of General Medical Sciences (award number R35GM143121).

Conflicts of interest

S.N., S.P.S., and A.M. are cofounders, advisors, and hold equity in Healcisio Inc, a predictive analytics startup. The terms of this arrangement have been reviewed and approved by the UC San Diego in accordance with its conflict-of-interest policies. A.M. is funded by the National Institutes of Health. A.M. reports income related to medical education from Livanova, Eli Lilly, Zoll, and Powell Mansfield. ResMed provided a philanthropic donation to UCSD. KS’s institution receives grant funding from the National Institute for Diabetes and Digestive and Kidney Disease; K.S.’s institution previously received grant funding from Teva Pharmaceuticals; K.S. previously consulted for Flatiron Health. L.A. serves as a clinical consultant for MediBeacon. Z.Y. serves as a clinical consultant for Takeda Pharmaceuticals.

Data availability

The code supporting this research is available https://github.com/NematiLab/Serum_Creatinine_Prediction. No public repository exists for the study data, as it contains protected health information. Further inquiries can be directed to the corresponding author.

References

Andonovic

Traynor

Shaw

Sim

MAB

Mark

Puxty

KA.

Short- and long-term outcomes of intensive care patients with acute kidney disease

eClinicalMedicine

2022

;

101291

Month:	Total Views:
September 2024	275
October 2024	428
November 2024	111
December 2024	95
January 2025	63
February 2025	108
March 2025	84
April 2025	59

Article Contents

Development and validation of a deep learning algorithm for the prediction of serum creatinine in critically ill patients

Abstract

Lay Summary

Background

Materials and methods

Study cohort

Outcome definition

Clinical features as predictors

Model development and validation

Evaluation of predicted SCr

AKI definition and staging

Descriptive analyses and software

Results

Study population

Model performance

Feature importance

Discussion

Conclusions

Author contributions

Supplementary material

Funding

Conflicts of interest

Data availability

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Most Read

Latest

This Feature Is Available To Subscribers Only