AI-guided virtual biopsy: Automated differentiation of cerebral gliomas from other benign and malignant MRI findings using deep learning

Distribution of Age and Gender in the Cohort Across Various MRI Findings

	Age (mean ± SD years)	Gender (male/female)	Percentage (male/female)
Metastatic lesions	59.28 ± 12.36	228/286	44.35/55.65
Inflammatory lesions	41.94 ± 14.57	142/224	42.26/57.74
Intracranial hemorrhages	62.68 ± 16.64	56/43	56.57/43.43
Meningioma	63.99 ± 13.31	25/58	30.12/69.88
Gliomas	54.76 ± 13.74	136/82	62.35/37.65

	Age (mean ± SD years)	Gender (male/female)	Percentage (male/female)
Metastatic lesions	59.28 ± 12.36	228/286	44.35/55.65
Inflammatory lesions	41.94 ± 14.57	142/224	42.26/57.74
Intracranial hemorrhages	62.68 ± 16.64	56/43	56.57/43.43
Meningioma	63.99 ± 13.31	25/58	30.12/69.88
Gliomas	54.76 ± 13.74	136/82	62.35/37.65

Table 1.

Open in new tab Download slide

Distribution of Age and Gender in the Cohort Across Various MRI Findings

	Age (mean ± SD years)	Gender (male/female)	Percentage (male/female)
Metastatic lesions	59.28 ± 12.36	228/286	44.35/55.65
Inflammatory lesions	41.94 ± 14.57	142/224	42.26/57.74
Intracranial hemorrhages	62.68 ± 16.64	56/43	56.57/43.43
Meningioma	63.99 ± 13.31	25/58	30.12/69.88
Gliomas	54.76 ± 13.74	136/82	62.35/37.65

	Age (mean ± SD years)	Gender (male/female)	Percentage (male/female)
Metastatic lesions	59.28 ± 12.36	228/286	44.35/55.65
Inflammatory lesions	41.94 ± 14.57	142/224	42.26/57.74
Intracranial hemorrhages	62.68 ± 16.64	56/43	56.57/43.43
Meningioma	63.99 ± 13.31	25/58	30.12/69.88
Gliomas	54.76 ± 13.74	136/82	62.35/37.65

Magnetic Resonance Imaging

The MRI examinations were conducted at a single center, utilizing various 1.5 T (MAGNETOM Aera, MAGNETOM Avanto, MAGNETOM Espree, MAGNETOM Sonata, MAGNETOM Symphony) and 3 T (Biograph mMR, MAGNETOM Skyra, MAGNETOM Vida) MR machines from a single vendor (Siemens Healthineers). The study period spans from March 2002 to May 2023. For the radiomics analysis, the MR sequences, FLAIR, noncontrast, and contrast-enhanced T1-weighted sequences, were selected.

Preprocessing

The initial step in the preprocessing phase involved resampling all 3 sequences, namely FLAIR, contrast-enhanced, and noncontrast-enhanced T1-weighted, to a uniform spatial resolution of (1., 1., 1.) mm³. This resampling procedure was executed using Advanced Normalization Tools (ANTs) in Python (ANTsPy),³² a Python package that encapsulates the functionalities of ANTs,³³ a C++ biomedical image processing library, and harnesses the statistical capabilities of ANTsR.³⁴ ANTsPy seamlessly integrates these tools with NumPy, scikit-learn, and the broader Python community.³²

To ensure data anonymization and the removal of extracranial structures, a skull stripping was conducted. HD-BET,³⁵ a publicly available algorithm renowned for its state-of-the-art performance, was employed for precise brain tissue extraction. Subsequently, to align all sequences within the same spatial orientation, coregistration was performed employing ANTsPy’s registration module.³⁶ This process involved the rigid transformation technique, specifically a translation, to co-register FLAIR and contrast-enhanced T1-weighted sequences with noncontrast-enhanced T1-weighted images.

The coregistered images were then utilized to generate automatic tumor segmentations (Figure 1) using HD-GLIO,^24,37 an open-source algorithm that employs a nnU-Net architecture.³⁸ HD-GLIO was trained using FLAIR, contrast-enhanced T1-weighted, noncontrast-enhanced T1-weighted, and T2-weighted sequences, although it’s noteworthy that the study cohort lacked T2-weighted sequences, necessitating the use of FLAIR as a surrogate for segmentation purposes which was validated with manual segmentations of cerebral gliomas by Haubold et al.³¹

Figure 1.

Examples of fully automated segmentations and their coregistration with the respective FLAIR, noncontrast T1-weighted sequence, and contrast-enhanced T1-weighted sequence.

Feature Extraction

Subsequent to the generation of segmentations, the PyRadiomics software^39,40 was employed to derive radiomic features from the segmented regions. The extracted feature set encompassed a comprehensive array of descriptors, including first-order statistical attributes, geometric features based on shape analysis, characteristics derived from Gray Level Co-Occurrence Matrix (GLCM), features based on Gray Level Run Length Matrix, attributes derived from Gray Level Size Zone Matrix, Neighboring Gray Tone Difference Matrix–related features, and features stemming from Gray Level Dependence Matrix analysis. Pertinent characteristics were derived from images subjected to diverse filter-based transformations, encompassing the Wavelet transformation, Laplacian of Gaussian (LoG) transformation, Local Binary Pattern 3D (LBP3D) transformation, and Gradient transformation.

Train Test Split

In accordance with their respective medical conditions, the MR examinations of the 1280 patients were distributed across various subgroups, as described above. Subsequently, discrete subcohorts were established for model training and evaluation. Notably, all subcohorts adhered to a consistent stratification scheme, employing an 80% train and 20% test-splitting approach. Furthermore, these train and test splits were stratified to ensure the maintenance of a balanced ratio between positive and negative cases within both subsets.

This study delineated 5 discrete subcohorts. The distribution of both positive and negative cases within each of these subcohorts is detailed in Table 2 for reference.

Table 2.

Distribution of Positive and Negative Cases in Different Subcohorts.

Cohort description	Train (positive/negative)	Test (positive/negative)
Gliomas vs. all other entities	174/849	44/213
Gliomas vs. metastatic lesions	174/411	44/103
Gliomas vs. meningioma	174/66	44/17
Gliomas vs. intracranial bleeding	174/79	44/20
Gliomas vs. inflammatory lesions	174/293	44/73

Cohort description	Train (positive/negative)	Test (positive/negative)
Gliomas vs. all other entities	174/849	44/213
Gliomas vs. metastatic lesions	174/411	44/103
Gliomas vs. meningioma	174/66	44/17
Gliomas vs. intracranial bleeding	174/79	44/20
Gliomas vs. inflammatory lesions	174/293	44/73

Table 2.

Distribution of Positive and Negative Cases in Different Subcohorts.

Cohort description	Train (positive/negative)	Test (positive/negative)
Gliomas vs. all other entities	174/849	44/213
Gliomas vs. metastatic lesions	174/411	44/103
Gliomas vs. meningioma	174/66	44/17
Gliomas vs. intracranial bleeding	174/79	44/20
Gliomas vs. inflammatory lesions	174/293	44/73

Cohort description	Train (positive/negative)	Test (positive/negative)
Gliomas vs. all other entities	174/849	44/213
Gliomas vs. metastatic lesions	174/411	44/103
Gliomas vs. meningioma	174/66	44/17
Gliomas vs. intracranial bleeding	174/79	44/20
Gliomas vs. inflammatory lesions	174/293	44/73

Feature Selection

To mitigate noise stemming from the presence of redundant or closely correlated features, the BorutaPy,⁴¹ an implementation of the Boruta algorithm⁴² in the Python programming language, was employed for feature selection. As a method for selecting all the important features, it aims to cover all the key details related to a specific outcome. It’s worth mentioning that methods using groups of decision trees, like Random Forest, Gradient Boosted Trees, and Extra Trees Classifiers, are good at figuring out complex, non-straightforward relationships between factors, especially when there are not many data points compared to the number of factors (a situation called “small n, significant p”).⁴¹ XGBoost algorithm⁴³ was specified as the estimator utilized within the BorutaPy framework to optimize the resultant feature set.

Parameter Optimization and Model Evaluation

The tuning of XGBoost parameters was executed through the utilization of the Tree-structured Parzen Estimator sampler, integrated within the Optuna framework.^44,45 Each optimization process encompassed a series of 100 iterations wherein parameters were stochastically sampled from a predefined parameter space. Within each iteration of the optimization procedure, a 10-fold cross-validation strategy was implemented, aiming to maximize the f1-score concerning the held-out fold from the cross-validation. Every subcohort underwent an identical optimization procedure.

During the training phase, we improved the models using the f1-score, which was chosen to balance precision and recall. The f1-score measures these two essential components of performance. When deciding on the final models for each classification assignment, we prioritized the area under the curve (AUC) of the receiver-operating characteristic (ROC) curve. AUC was chosen because it provides a more comprehensive perspective of the model’s discriminating power across all decision thresholds, making it a reliable evaluation indicator for ultimate model performance.

Table 3 presents the optimized parameters specific to each subcohort. To mitigate the risk of data leakage, hyperparameter tuning was exclusively conducted on the training dataset.

Table 3.

Optimal Hyperparameters for Each Subcohort Selected Through Optuna

Parameter	Glioma vs. all other Pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
booster	gbtree	gbtree	gbtree	dart	gbtree
grow_policy	depthwise	lossguide	depthwise	lossguide	depthwise
n_estimators	100	100	100	100	100
scale_pos_weight	4.885057	2.362069	1.683908	0.454023	0.37931
gamma	0	0.319064	0	0.001264	0
max_depth	6	3	6	5	6
lambda	1	0.043813	1	0.953963	1
alpha	0	0.398513	0	0.708249	0
eta	0.3	0.119043	0.3	0.691486	0.3
sample_type	uniform	uniform	uniform	uniform	uniform
normalize_type	tree	tree	tree	forest	tree
rate_drop	0	0	0	0.312374	0
skip_drop	0	0	0	0.988103	0

Parameter	Glioma vs. all other Pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
booster	gbtree	gbtree	gbtree	dart	gbtree
grow_policy	depthwise	lossguide	depthwise	lossguide	depthwise
n_estimators	100	100	100	100	100
scale_pos_weight	4.885057	2.362069	1.683908	0.454023	0.37931
gamma	0	0.319064	0	0.001264	0
max_depth	6	3	6	5	6
lambda	1	0.043813	1	0.953963	1
alpha	0	0.398513	0	0.708249	0
eta	0.3	0.119043	0.3	0.691486	0.3
sample_type	uniform	uniform	uniform	uniform	uniform
normalize_type	tree	tree	tree	forest	tree
rate_drop	0	0	0	0.312374	0
skip_drop	0	0	0	0.988103	0

Table 3.

Optimal Hyperparameters for Each Subcohort Selected Through Optuna

Parameter	Glioma vs. all other Pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
booster	gbtree	gbtree	gbtree	dart	gbtree
grow_policy	depthwise	lossguide	depthwise	lossguide	depthwise
n_estimators	100	100	100	100	100
scale_pos_weight	4.885057	2.362069	1.683908	0.454023	0.37931
gamma	0	0.319064	0	0.001264	0
max_depth	6	3	6	5	6
lambda	1	0.043813	1	0.953963	1
alpha	0	0.398513	0	0.708249	0
eta	0.3	0.119043	0.3	0.691486	0.3
sample_type	uniform	uniform	uniform	uniform	uniform
normalize_type	tree	tree	tree	forest	tree
rate_drop	0	0	0	0.312374	0
skip_drop	0	0	0	0.988103	0

Parameter	Glioma vs. all other Pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
booster	gbtree	gbtree	gbtree	dart	gbtree
grow_policy	depthwise	lossguide	depthwise	lossguide	depthwise
n_estimators	100	100	100	100	100
scale_pos_weight	4.885057	2.362069	1.683908	0.454023	0.37931
gamma	0	0.319064	0	0.001264	0
max_depth	6	3	6	5	6
lambda	1	0.043813	1	0.953963	1
alpha	0	0.398513	0	0.708249	0
eta	0.3	0.119043	0.3	0.691486	0.3
sample_type	uniform	uniform	uniform	uniform	uniform
normalize_type	tree	tree	tree	forest	tree
rate_drop	0	0	0	0.312374	0
skip_drop	0	0	0	0.988103	0

All models were trained with the objective function binary:logistic and a random_state of 42.

For feature selection, BorutaPy incorporates a parameter denoted as “perc,” which governs the number of features to be selected. Lower values of “perc” result in the inclusion of a greater number of false positives as relevant features, albeit at the expense of omitting some genuinely pertinent features. To identify the most advantageous feature subset, various values of this parameter were systematically tested. Following the feature selection phase, hyperparameters for the XGBoost algorithm underwent optimization for each distinct feature set. Subsequently, an individual XGBoost model was trained for each subcohort, utilizing the hyperparameters ascertained through the hyperparameter tuning process and the specific features selected. Among the ensemble of models, the one that attained the highest AUC on the test set was designated as the definitive model.

Baseline Evaluation With a Dummy Classifier

To benchmark the performance of our models, we employed a dummy classifier as a baseline. This classifier provides a reference point by generating predictions without utilizing any learned patterns from the data. Specifically, it operates using a “stratified” strategy, which accounts for class imbalance by generating predictions proportional to the class distribution within the dataset. This ensures that the predictions reflect the inherent imbalance in the data, rather than assuming uniform class probabilities.

The dummy classifier’s performance was evaluated using the same metrics as the primary models, including the AUC.

Human Reader Evaluation

To provide a benchmark for human performance, 2 experienced neuroradiologists, each with more than 10 years of professional experience, independently evaluated half of the test dataset. Their assessments were conducted under the same conditions as the neural network to ensure a fair comparison. They were provided with only the 3 MRI sequences—FLAIR, noncontrast-enhanced T1-weighted, and contrast-enhanced T1-weighted—without access to additional clinical information or other imaging sequences.

Preparation of the Manuscript

For linguistic assistance in composing the manuscript, ChatGPT (Version GPT-4.0), developed by OpenAI, was employed.

Results

In total, a good performance was achieved in discriminating between gliomas and various subgroups of other intracranial pathologies. In the construction of the models, various combinations of algorithms for feature selection and hyperparameter optimization were employed. The best models for each classification task were selected based on the hyperparameter performance using Optuna. These models were then evaluated on the hidden test set, which was exclusively used for the final assessment. Corresponding accuracy, sensitivity, specificity, AUC values, and precision are presented in Table 4.

Table 4.

Machine Learning Models, Number of Selected Features, and Performance Metrics (Area Under the Curve [AUC], Balanced Accuracy, F1 Score, Precision, Sensitivity, Specificity)

	Glioma vs. all other pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
Base	XGB	RF	XGB	RF	XGB
AUC	0.94	0.96	1.0	0.99	0.98
Balanced accuracy	0.82	0.83	0.97	0.98	0.88
F1 score	0.71	0.77	0.96	0.98	0.96
Precision	0.75	0.82	0.94	1.0	0.92
Recall/sensitivity	0.68	0.73	0.98	0.96	1.0
Specificity	0.95	0.93	0.96	1.0	0.77
No of features	47	33	48	22	56

	Glioma vs. all other pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
Base	XGB	RF	XGB	RF	XGB
AUC	0.94	0.96	1.0	0.99	0.98
Balanced accuracy	0.82	0.83	0.97	0.98	0.88
F1 score	0.71	0.77	0.96	0.98	0.96
Precision	0.75	0.82	0.94	1.0	0.92
Recall/sensitivity	0.68	0.73	0.98	0.96	1.0
Specificity	0.95	0.93	0.96	1.0	0.77
No of features	47	33	48	22	56

Table 4.

Open in new tab Download slide

Machine Learning Models, Number of Selected Features, and Performance Metrics (Area Under the Curve [AUC], Balanced Accuracy, F1 Score, Precision, Sensitivity, Specificity)

	Glioma vs. all other pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
Base	XGB	RF	XGB	RF	XGB
AUC	0.94	0.96	1.0	0.99	0.98
Balanced accuracy	0.82	0.83	0.97	0.98	0.88
F1 score	0.71	0.77	0.96	0.98	0.96
Precision	0.75	0.82	0.94	1.0	0.92
Recall/sensitivity	0.68	0.73	0.98	0.96	1.0
Specificity	0.95	0.93	0.96	1.0	0.77
No of features	47	33	48	22	56

	Glioma vs. all other pathologies	Glioma vs. metastasis	Glioma vs. inflammatory lesions	Glioma vs. intracerebral hemorrhage	Glioma vs. meningioma
Base	XGB	RF	XGB	RF	XGB
AUC	0.94	0.96	1.0	0.99	0.98
Balanced accuracy	0.82	0.83	0.97	0.98	0.88
F1 score	0.71	0.77	0.96	0.98	0.96
Precision	0.75	0.82	0.94	1.0	0.92
Recall/sensitivity	0.68	0.73	0.98	0.96	1.0
Specificity	0.95	0.93	0.96	1.0	0.77
No of features	47	33	48	22	56

To provide a benchmark for model performance, the dummy classifier was evaluated using the same dataset. The AUC values for the dummy classifier were consistently near random chance, with values such as 0.45, 0.50, 0.51, 0.55, and 0.62.

In the context of distinguishing between gliomas and the other selected intracranial pathologies, consistently good results were achieved with AUC values >0.9.

The network was particularly good at differentiating between gliomas and metastases and had a high AUC of 0.96 (sensitivity 0.73, specificity 0.93). This predictive model was constructed using 33 selected features through feature selection. The ROC curve for these predictions is illustrated in Figure 2A.

Figure 2.

Receiver-operating curves (ROC) curves for predictive models discriminating gliomas from metastases (A), inflammatory lesions (B), intracerebral hemorrhages (C), and meningiomas (D).

The network designed for the differentiation of gliomas from inflammatory lesions yielded excellent results with a very good AUC of 1.0 (sensitivity 0.98, specificity 0.96) and contains 48 different features in its predictive model. Figure 2B shows the ROC curve for this model.

Furthermore, the network distinguishing gliomas from intracerebral hemorrhages achieved a very good AUC of 0.99 (sensitivity 0.96, specificity 1.0). For this model, 22 features were selected. The ROC curve is shown in Figure 2C.

The network for differentiating between gliomas and meningiomas delivered strong results. The predictive model obtained an AUC of 0.98 (sensitivity 1.0, specificity 0.77), employing a total of 56 selected features. Figure 2D presents the ROC curve for this predictive model.

In the context of distinguishing gliomas from a combined group of intracranial pathologies, which include metastases, inflammatory lesions, intracerebral hemorrhages, and meningiomas, our neural network achieved good results. The model designed for this distinction achieved an AUC of 0.94 (sensitivity 0.68, specificity 0.95) utilizing a set of 30 selected features determined through a feature selection process. Figure 3 shows the ROC curve.

Figure 3.

Receiver-operating curves (ROC) curves for predictive models discriminating gliomas from all other pathologies (metastases, inflammatory lesions, intracerebral hemorrhages, and meningiomas).

Open in new tab Download slide

Additionally, 2 neuroradiologists with more than 10 years of professional experience achieved excellent results in distinguishing gliomas from the combined group of intracranial pathologies. Their performance included a sensitivity of 0.77 with a specificity of 0.99, and a sensitivity of 0.91 with a specificity of 0.97, respectively.

Discussion

Our study focuses on differentiating gliomas from other common intracranial pathologies by analyzing radiomic features from routine cranial MRI scans in a fully automated pipeline. It therefore introduces an important safety net for algorithms with a focus on virtual biopsy of cerebral gliomas.

The results of our models demonstrated excellent performance in distinguishing gliomas from other intracranial pathologies. All models achieved an AUC of at least 0.96 (gliomas vs. metastases [AUC 0.96], vs. inflammatory lesions [AUC 1.0], vs. intracerebral hemorrhages [AUC 0.99], vs. meningiomas [AUC 0.98]). These differentiation outcomes align with existing literature, where previous studies have consistently demonstrated the capacity to effectively differentiate between various entities using MRI.^46–51 For instance, Tsolaki et al. achieved high classification performance in the automatic differentiation of glioblastomas and metastases based on 3T MR spectroscopy and perfusion data.⁴⁸ As early as 2009, initial studies involving perfusion maps and manual region of interest measurements revealed the capability to differentiate various cerebral pathologies based on image features. This study by Zacharaki et al. demonstrated the feasibility of distinguishing between different types of intracranial tumors. They successfully differentiated between metastases, meningiomas, gliomas, and glioblastomas in a relatively small cohort of 102 patients, achieving a model sensitivity range of 85%–87%. Although our models exhibit higher sensitivity and are based on a more extensive dataset, this early work by Zacharaki et al. highlighted the potential of machine learning for such differentiation tasks. Despite this potential, the low accuracy, the need for manual segmentation, and the complexity of imaging protocols have, to this day, hindered the integration of these approaches into clinical practice.⁴⁶

In addition to comparing gliomas with individual intracranial pathologies, we also trained a prediction model to differentiate between gliomas and a combined group of intracranial pathologies, including metastases, inflammatory lesions, intracerebral hemorrhages, and meningiomas. While the individual models were highly accurate in distinguishing specific pathologies (eg, gliomas vs. metastases, gliomas vs. meningiomas), their use necessitates prior knowledge of the type of lesion being analyzed, which conflicts with the goal of a fully automated, biopsy-independent diagnostic process. In the scenario, to differentiate between gliomas and a combined group of intracranial pathologies, our neural network achieved noteworthy results with an AUC of 0.94. This performance demonstrates the combined model’s closer alignment with real-world clinical circumstances, where the underlying pathology is frequently unknown before further study, making it a more practical and usable approach for noninvasive diagnosis. However, it’s worth noting that the AUC in this context was slightly lower compared to the AUCs for distinguishing gliomas from individual intracranial pathologies. This could be due to the fact that the group with which gliomas are compared is very heterogeneous, which makes the differentiation more difficult. For an adequate safety net, however, it is crucial to differentiate gliomas from the most common pathologies and not a single pathology.

The results of our study, alongside prior research, highlight the performance of both humans and AI in this context. Rauschecker et al. (2020) reported that an AI system for MRI-based diagnosis achieved an accuracy of 91%, comparable to the 86% sensitivity of academic neuroradiologists, while significantly outperforming less specialized radiologists (radiology residents 56%, general radiologists 57%). In our study, the 2 experienced neuroradiologists demonstrated sensitivities of 77% and 91% in distinguishing gliomas from the combined group of other pathologies. By comparison, our algorithm achieved a sensitivity of 68% for the same task. However, specificity is of particular importance in our study, as the primary goal was to develop an algorithm capable of preventing non-glioma intracranial pathologies from being incorrectly routed for virtual biopsy evaluation. In this regard, our algorithm showed promising results, achieving a specificity of 95%, which was comparable to the neuroradiologists’ performances of 99% and 97%. These findings underscore the algorithm’s reliability in minimizing false positives and its potential as a safety control mechanism in clinical practice.

Studies on the differentiation of cerebral lesions often require complex MRI protocols, such as perfusion imaging.^46–48 Additionally, some studies are typically conducted on specific MRI machines or limited to particular field strengths,^46–48,51 which can introduce constraints that yield promising results but raise concerns about the generalizability of the approach. To circumvent these limitations, our objective was to adopt an approach with broad generalizability. For this purpose, we employed the prediction model by Haubold et al.,³¹ which makes predictions based on 3, in-brain imaging nearly universally applied MRI sequences: FLAIR, noncontrast-enhanced T1-weighted, and contrast-enhanced T1-weighted sequences. Furthermore, akin to Haubold et al.,³¹ we used a diverse set of MRI scanners operating at 1.5 and 3 Tesla to differentiate gliomas from other intracranial pathologies.

Another limitation in other studies aiming to distinguish intracranial pathologies is the manual or semi-automated segmentation method, which may introduce biases due to human influence and hinder clinical implementation due to the complexity of manual segmentations. To mitigate these potential biases, we employed an automated tumor segmentation using HD-GLIO.^13,14 HD-GLIO is an algorithm utilizing a nnU-Net architecture¹⁵ trained on FLAIR, contrast-enhanced T1-weighted, noncontrast-enhanced T1-weighted, and T2-weighted sequences. Notably, Haubold et al. have previously demonstrated that in the case of the segmentation of cerebral gliomas, the network achieves a high segmentation efficiency without a T2-weighted sequence (DICE score of 0.81 ± 0.13).³¹ In this context, however, we have not explicitly shown in the present study that other pathologies are well segmented by this network. This is because with this study we wanted to place a control functionality in front of the virtual biopsy of cerebral gliomas so that other pathologies are not incorrectly classified into genetic profiles of cerebral gliomas. If separate dedicated segmentation networks were used, a pooled comparison that most closely matches this functionality would not be adequately possible. The use of separate segmentation networks would also contradict the initial situation that the pathology is unknown.

Overall, our study successfully achieved its primary objective of developing and evaluating a noninvasive AI-based model for distinguishing gliomas from other prevalent intracranial pathologies. The consistently high AUC values attained by the models for differentiation between gliomas and other common intracranial pathologies underscore the fulfillment of our primary research goal. The inclusion of a diverse and extensive dataset ensures that our findings possess a high degree of generalizability, rendering them relevant for a broad clinical context. The utilization of universally applicable MRI sequences and the incorporation of automated tumor segmentation to mitigate human-induced biases collectively enhance the study’s contributions.

Nevertheless, despite the promising results, our study is not without limitations. First, this study employed a retrospective and single-center approach. Further validation of these findings should involve a prospective multicenter study. While several key differential diagnoses for gliomas were examined in this work, there exist other intracranial pathologies for which differentiation models should be developed in future research. However, it is important to note that the pathologies chosen for this study are among the most common, which means that the current non-inclusion of rarer pathologies, due to their lower incidence, also leads to a relatively small number of misdiagnoses.

Although MRIs from different 1.5 and 3 Tesla MRI scanners were included in our study for generalizability, it is important to point out that these scanners were all from a single manufacturer, which could potentially bias the results. However, it is important to emphasize that our study included a very large cohort of 1280 patients, which included a variety of MRI protocols. The size of the patient group and the variety of MRI techniques employed add to the robustness of our findings.

Conclusions

In summary, our study demonstrated a versatile solution for a noninvasive fully automated AI-based differentiation of cerebral gliomas from other intracranial pathologies. It shows a possible approach for the introduction of control functionalities in the analysis of the genetic profile of cerebral gliomas. The introduction of such control functionalities is an important next step before the clinical implementation of a virtual biopsy of cerebral gliomas.

Funding

M.H. received financial support from the Clinician Scientist Program of the University Medicine Essen Clinician Scientist Academy (UMEA), which is funded by the German Research Foundation (DFG) (FU 356/12-2). The DFG did not have any influence on the study design, data collection, data interpretation, data analysis, or report writing.

Acknowledgments

We acknowledge support by the Open Access Publication Fund of the University of Duisburg-Essen.

Conflict of interest statement

The authors declare no potential conflicts of interest.

Authorship statement

Designing the experiments: M.H., J.H. Implementation and writing the manuscript: M.H., V.P., R.H., J.H. Reviewing and correcting the manuscript: M.H., L.S., H.S., Y.L., M.O., C.D., M.F., F.N., L.U., J.H., V.P., R.H., M.G., N.G., K.W.

Data availability

The data supporting the findings of this study will be made available upon reasonable request.

References

Weller

Wick

Aldape

, et al.

Glioma

Nat Rev Dis Primers.

2015

;

15017

Louis

Perry

Reifenberger

, et al.

The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary

Acta Neuropathol.

2016

;

131

(

803

–

820

Louis

Perry

Wesseling

, et al.

The 2021 WHO Classification of Tumors of the Central Nervous System: a summary

Neuro Oncol

2021

;

(

1231

–

1251

Riche

Marijon

Amelot

, et al.

Severity, timeline, and management of complications after stereotactic brain biopsy

J Neurosurg.

2022

;

136

(

867

–

876

Qin

Huang

Dong

, et al.

Stereotactic biopsy for lesions in brainstem and deep brain: a single-center experience of 72 cases

Braz J Med Biol Res.

2021

;

(

e11335

Cheng

Zhao

, et al.

Complications of stereotactic biopsy of lesions in the sellar region, pineal gland, and brainstem: a retrospective, single-center study

Medicine (Baltim).

2020

;

(

e18572

Crossref

Chen

Hsu

Erich Wu

, et al.

Stereotactic brain biopsy: single center retrospective analysis of complications

Clin Neurol Neurosurg.

2009

;

111

(

835

–

839

Malone

Yang

Hershman

, et al.

Complications following stereotactic needle biopsy of intracranial tumors

World Neurosurg.

2015

;

(

1084

–

1089

Riche

Amelot

Peyre

, et al.

Complications after frame-based stereotactic brain biopsy: a systematic review

Neurosurg Rev.

2021

;

(

301

–

307

10.

Haubold

Demircioglu

Gratz

, et al.

Non-invasive tumor decoding and phenotyping of cerebral gliomas utilizing multiparametric 18F-FET PET-MRI and MR Fingerprinting

Eur J Nucl Med Mol Imaging.

2020

;

(

1435

–

1445

11.

Gutta

Acharya

Shiroishi

Hwang

Nayak

KS.

Improved glioma grading using deep convolutional neural networks

AJNR Am J Neuroradiol.

2021

;

(

233

–

239

12.

Xie

Chen

Fang

, et al.

Textural features of dynamic contrast-enhanced MRI derived model-free and model-based parameter maps in glioma grading

J Magn Reson Imaging.

2018

;

(

1099

–

1111

13.

Skogen

Schulz

Dormagen

, et al.

Diagnostic performance of texture analysis on MRI in grading cerebral gliomas

Eur J Radiol.

2016

;

(

824

–

829

14.

Tian

Yan

Zhang

, et al.

Radiomics strategy for glioma grading using texture features from multiparametric MRI

J Magn Reson Imaging.

2018

;

(

1518

–

1528

15.

Cluceru

Interian

Phillips

, et al.

Improving the noninvasive classification of glioma genetic subtype with deep learning and diffusion-weighted imaging

Neuro-Oncology.

2022

;

(

639

–

652

16.

Cho

Lee

Kim

Park

Classification of the glioma grading using radiomics analysis

PeerJ.

2018

;

e5982

17.

Lohmann

Galldiks

Kocher

, et al.

Radiomics in neuro-oncology: basics, workflow, and applications

Methods.

2021

;

188

112

–

121

18.

Kocher

Ruge

Galldiks

Lohmann

Applications of radiomics and machine learning for radiotherapy of malignant brain tumors

Strahlenther Onkol.

2020

;

196

(

856

–

867

19.

Bera

Braman

Gupta

Velcheti

Madabhushi

Predicting cancer outcomes with radiomics and artificial intelligence in radiology

Nat Rev Clin Oncol.

2022

;

(

132

–

146

20.

Kickingereder

Burth

Wick

, et al.

Radiomic profiling of glioblastoma: identifying an imaging predictor of patient survival with improved performance over established clinical and radiologic risk models

Radiology.

2016

;

280

(

880

–

889

21.

Parmar

Haubold

Salhöfer

, et al.

Fully automated MR-based virtual biopsy of primary CNS lymphomas

Neurooncol. Adv..

2024

;

(

vdae022

PubMed

OpenURL Placeholder Text

. https://danielhomola.com/feature%20selection/phd/borutapy-an-all-relevant-feature-selection-method/

22.

Beig

Bera

Prasanna

, et al.

Radiogenomic-based survival risk stratification of tumor habitat on Gd-T1w MRI is associated with biological processes in glioblastoma

Clin Cancer Res.

2020

;

(

1866

–

1876

23.

Lao

Chen

, et al.

A deep learning-based radiomics model for prediction of survival in glioblastoma multiforme

Sci Rep.

2017

;

(

10353

24.

Kickingereder

Isensee

Tursunova

, et al.

Automated quantitative tumour response assessment of MRI in neuro-oncology with artificial neural networks: a multicentre, retrospective study

Lancet Oncol.

2019

;

(

728

–

740

25.

Liu

Liang

, et al.

Radiomics can differentiate high-grade glioma from brain metastasis: a systematic review and meta-analysis

Eur Radiol.

2022

;

(

8039

–

8051

26.

Liu

Qian

, et al.

Genotype prediction of ATRX mutation in lower-grade gliomas using an MRI radiomics signature

Eur Radiol.

2018

;

(

2960

–

2968

27.

Regnard

Lanseur

Ventre

, et al.

Assessment of performances of a deep learning algorithm for the detection of limbs and pelvic fractures, dislocations, focal bone lesions, and elbow effusions on trauma X-rays

Eur J Radiol.

2022

;

154

110447

28.

Shofty

Artzi

Ben Bashat

, et al.

MRI radiomics analysis of molecular alterations in low-grade gliomas

Int J Comput Assist Radiol Surg.

2018

;

(

563

–

571

29.

Yogananda

CGB

Shah

Nalawade

, et al.

MRI-based deep-learning method for determining glioma MGMT promoter methylation status

AJNR Am J Neuroradiol.

2021

;

(

845

–

852

30.

Akbari

Bakas

Pisapia

, et al.

In vivo evaluation of EGFRvIII mutation in primary glioblastoma patients via complex multiparametric MRI signature

Neuro Oncol

2018

;

(

1068

–

1079

31.

Haubold

Hosch

Parmar

, et al.

Fully automated MR based virtual biopsy of cerebral gliomas

Cancers (Basel)

2021

;

(

6186

32.

Advanced Normalization Tools in Python [Internet].

Advanced Normalization Tools Ecosystem

;

2021

[cited 2021 Nov 16]. https://github.com/ANTsX/ANTsPy

33.

Advanced Normalization Tools [Internet].

Advanced Normalization Tools Ecosystem

;

2021

[cited 2021 Nov 16]. https://github.com/ANTsX/ANTs

34.

ANTsR [Internet].

Advanced Normalization Tools Ecosystem

;

2021

[cited 2021 Nov 16]. https://github.com/ANTsX/ANTsR

35.

Isensee

Schell

Pflueger

, et al.

Automated brain extraction of multisequence MRI using artificial neural networks

Hum Brain Mapp.

2019

;

(

4952

–

4964

36.

Registration—ANTsPy master documentation [Internet]. [cited 2021 Nov 16]. https://antspyx.readthedocs.io/en/latest/registration.html

37.

Isensee

Jäger

Kohl

SAA

Petersen

Maier-Hein

KH.

Automated design of deep learning methods for biomedical image segmentation

Nat Methods.

2021

;

(

203

–

211

38.

Isensee

Jaeger

Kohl

SAA

Petersen

Maier-Hein

KH.

nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation

Nat Methods.

2021

;

(

203

–

211

39.

van Griethuysen

JJM

Fedorov

Parmar

, et al.

Computational radiomics system to decode the radiographic phenotype

Cancer Res.

2017

;

(

e104

–

e107

40.

pyradiomics v3.1.0 [Internet].

Artificial Intelligence in Medicine (AIM) Program

;

2023

[cited 2023 Sep 21]. https://github.com/AIM-Harvard/pyradiomics

41.

Daniel Homola [Internet].

2015

[cited 2021 Oct 20].

BorutaPy

42.

Kursa

Rudnicki

WR.

Feature selection with the Boruta package

J Stat Soft.

2010

;

(

1-13

. http://www.jstatsoft.org/v36/i11/.

Crossref

43.

Chen

Guestrin

XGBoost: a scalable tree boosting system

Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

2016

Aug 13;

785

–

794

44.

Akiba

Sano

Yanase

Ohta

Koyama

Optuna: A Next-generation Hyperparameter Optimization Framework

arXiv

:2623 [cs, stat] [Internet].

2019

Jul 25 [cited 2021 Oct 20]; http://arxiv.org/abs/1907.10902

OpenURL Placeholder Text

45.

Optuna: A hyperparameter optimization framework [Internet].

optuna

;

2021

[cited 2021 Oct 20]. https://github.com/optuna/optuna

46.

Zacharaki

Wang

Chawla

, et al.

Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme

Magn Reson Med.

2009

;

(

1609

–

1618

47.

Zacharaki

Kanas

Davatzikos

Investigating machine learning techniques for MRI-based classification of brain neoplasms

Int J Comput Assist Radiol Surg.

2011

;

(

821

–

828

48.

Tsolaki

Svolos

Kousi

, et al.

Automated differentiation of glioblastomas from intracranial metastases using 3T MR spectroscopic and perfusion data

Int J Comput Assist Radiol Surg.

2013

;

(

751

–

761

49.

Ayadi

Elhamzi

Charfi

Atri

Deep CNN for brain tumor classification

Neural Process Lett.

2021

;

(

671

–

700

Crossref