State of the art: radiomics and radiomics-related artificial intelligence on the road to clinical translation

Abstract

Radiomics and artificial intelligence carry the promise of increased precision in oncologic imaging assessments due to the ability of harnessing thousands of occult digital imaging features embedded in conventional medical imaging data. While powerful, these technologies suffer from a number of sources of variability that currently impede clinical translation. In order to overcome this impediment, there is a need to control for these sources of variability through harmonization of imaging data acquisition across institutions, construction of standardized imaging protocols that maximize the acquisition of these features, harmonization of post-processing techniques, and big data resources to properly power studies for hypothesis testing. For this to be accomplished, it will be critical to have multidisciplinary and multi-institutional collaboration.

artificial intelligence, deep learning, machine learning, oncologic imaging, radiomics

Introduction to radiomics and artificial intelligence in radiology

Radiomics leverages conventional medical imaging to extract occult digital features embedded in the image that are reflective of tissue histology, biological activity, functional properties, and more. These features can be quantitatively extracted from image regions of interest (ROI) to yield radiomic feature patterns that reflect tissue architecture and are influenced by genetic expression, tissue microenvironment, and effects of therapeutic intervention. As a result, these radiomic patterns can serve as an “imaging phenotype” of malignancy that can be used as a biomarker providing clinically significant information in a noninvasive manner. The imaging phenotype can be integrated with conventional radiologic parameters such as lesion size, presence of contrast enhancement, Fluorodeoxyglucose (FDG) avidity on PET-CT,¹ and the presence of restricted diffusion on MRI² to generate a more complete imaging description of disease.

Artificial intelligence (AI) methods are used at various stages of a radiomics pipeline. AI is an umbrella term that refers to the use of computer algorithms and machines to automatically perform intelligent tasks.³ Machine learning (ML) employs statistical methods to “learn” and improve from “experience” (ie, data)⁴ and can be applied to the classification and prediction of the imaging phenotype.⁵ This can be achieved through human hand-crafted imaging features or autonomously through deep learning (DL)—an automated computer architecture that uses multi-layered neural networks to map input images into desirable outputs (such as tumour segmentation, prognostic predictions, etc.).⁵ Machine learning/DL methods are of paramount importance in radiomics since they can learn from large, multi-dimensional imaging datasets to build classification models and classify new data.

The pipeline for radiomics, like any other quantitative imaging analysis workflow, starts with the acquisition, selection, and curation of the medical images, including tissue segmentation and quality control. Then features are either manually extracted from the segmented volume of interest (VOI) for ML methods or the VOI is directly passed through a DL network which then models features into the different target variables of clinical importance. If manual extraction is performed, then the extracted features undergo further feature-engineering methods, such as feature selection, to avoid overfitting data by the modelling algorithm. Figure 1 delineates the workflow process of a typical radiomics analysis.

Figure 1.

Typical radiomics workflow. The workflow starts in the clinic by collecting clinical and imaging data. The imaging data are converted to files amenable to radiomics analysis through several steps described in “Image Acquisition and Conversion to Segmentable Data, Image Pre-Processing, and “Image Segmentation.” Then the imaging data along with the clinical data go through a machine learning and/or deep learning pipeline to model specific questions about the disease such as recurrence risk.

Open in new tab Download slide

Image acquisition and conversion to segmentable data

Radiomic analysis can be performed on any medical imaging data including CT, MRI, ultrasound, and PET. In order to ensure a high-quality analysis, standardized imaging acquisition protocols can reduce unwanted technical sources of variability including kernel reconstruction, timing and dose of intravenous contrast, time to repetition and time to echo on MRI, and radiotracer uptake time and dose on PET.⁶ The downloaded image data are taken from the Picture Archiving and Communication System and converted to a segmentable file format to be used with segmentation software such as ITK-SNAP.⁷

Image pre-processing

There is non-biological variability, termed “batch effects,”⁸^,⁹ present in medical imaging. These can be due to differences in scanner hardware and software, heterogeneity in imaging acquisition protocols, and technical image artefacts such as motion and can affect radiomics evaluation of the disease. While it is currently not possible to eliminate all technical variabilities, those introduced by a specific scanner at a specific site can be limited by comparing image variability on phantom studies.¹⁰ In contrast, variabilities introduced by differences in acquisition sites, scanners, and parameter studies can be addressed through harmonization on the image domain (image pre-processing) and the feature domain (“Feature Harmonization”). Computational methods of image pre-processing include:

Image resampling: Resampling to homogenize the image resolution¹¹ across multiple scans.
Normalization: Homogenize signal intensities which are arbitrary and filter outliers¹² across multiple scans.
Discretization: Group pixels into bins of similar intensity ranges.¹¹
Bias field correction: Homogenize spatial signal variation (for MRIs).¹¹^,¹³

Image segmentation

Medical images are segmented via manual, automatic, or semi-automatic segmentation methods to determine the 2D ROI or 3D VOI.¹⁴ Manual segmentation refers to human expert delineation of the tumour volume while automatic segmentation is performed by algorithms. In semi-automatic segmentation, users aid the software in determining segmentation parameters such as tissue margins and image window selection.¹⁵ Examples of semi-automatic segmentation methods include region-growing methods (ie, GrowCut on 3D Slicer [www.slicer.org]), feature space methods (ie, Multichannel Markov Random Field Framework)¹⁶ and annotation tools such as graph cut, level-set, active contours.¹⁷ Along with requiring user input, these methods can be limited by the nature of the imaging data. For example, GrowCut works best with homogenous and bright CT scans when segmenting lung tumour.¹⁸

In contrast, fully automated methods do not require user interactions and typically involve DL-based methods to train models on labelled medical images or “training dataset” and subsequently apply those models on an experimental dataset.¹⁹ Since, DL-based methods use convolutional neural networks (CNNs), the deterministic nature of their output can avoid intra- and inter-observer variabilities.²⁰

Hand-crafted feature curation

Feature extraction

The image ROIs/VOIs contain numerous features which can be extracted to construct an imaging phenotype. Currently, radiologists provide qualitative assessments and simple quantitative lesion measurements such as lesion size, attenuation (CT), or signal intensity (MRI). Through radiomic analysis these features can be quantitated and classified.¹⁷ The commonly used radiomic features include (see Figure 2):

Figure 2.

Different types of hand-crafted features. (From left to right) (i) intensity-based features are related to statistics of the grey-level intensity of each voxel; (ii) histogram-based features are related to the statistics of intensities after grouping them into bins; (iii) volumetric and morphologic features describe the volume and shape of the VOI (image generated using ITK-SNAP⁷); (iv) textural features relate the spatial relationship of the grey-level intensities; and (v) higher-order features are extracted after transforming the image through filters (image from Ref. 21).

Open in new tab Download slide

Intensity- and histogram-based features: first-order statistical features that model the voxel intensities. Intensity-based features describe the distribution of the grey levels in the ROI and histogram-based features are extracted after grouping intensities into bins (discretization). These features are not concerned with the spatial relationship of intensities.²²^,²³
Volumetric and morphologic features: associated with the VOI and the shape and geometry of the ROI.²²
Texture-based features: second-order statistics which quantify the spatial arrangement of the voxel intensities.²⁴ These include grey-level run length matrix (GLRLM), grey-level co-occurrence matrix (GLCM), grey-level size zone matrix, neighbouring grey tone difference matrix, local binary pattern, and use matrices which quantify the textural variations based on the spatial arrangements of the grey-level intensity.²⁵
Higher-order features: extracted after using filters such as Gaussian filter and Gabor filter and are used to capture complex patterns in the data²⁵ or extracted using artificial neural networks (“Deep” features).⁶

During feature extraction, factors other than image-processing parameters that influence the feature values for different feature families should be accounted for including grid distance (used in textural features), feature aggregation (aggregation of different values for one feature into a single value), and distance weighting (emphasizes local intensities; used in PyRadiomics).²³^,²⁶

Feature harmonization

Feature harmonization is often used to create a homogenous set of features when using images acquired from different scanners.⁸^,⁹ Harmonization can adjust for undesirable variations within an imaging dataset or “batch” generated during image acquisition.²⁷ The concept of batch-effect removal comes from genomics, where adjusting batch effects across multiple datasets for microarray gene expression is necessary.⁹^,²⁷ Some commonly used methods employed for radiomic analysis include²⁷ location-scale, matrix factorization, and discretization methods. For small datasets, ComBat, an empirical Bayes method,⁹ is particularly valuable for feature-level harmonization.²¹

ComBat is a location-scale method using Bayes estimations for mean and variance of features in each batch to transform them into a unified mean and variance.²⁷ This technique has been implemented in multiple languages (ie, R, Python, and MATLAB)²⁸ and has shown promising results in mitigating batch-effect-induced differences in radiomic images in lung cancer.²² Limitations of ComBat include an assumption that technical errors are normally distributed and the fact that only a single batch-effect can be corrected at a time.²⁹ Several methods are being tested to address these limitations including a Gaussian Mixture Model ComBat for improved batch-effect correction and a Nested ComBat method for multiple batch-effect correction.²⁹

Feature selection

Since not all of the numerous features extracted are useful, feature selection is the next important step for building a robust and generalizable radiomics model.¹¹ Having too many features, called “high dimensionality,” can lead to model overfitting which might not work for new data not previously evaluated by the model.³⁰ Through feature selection, the number of features is reduced to create a more robust and reproducible signature by considering two factors—feature stability and feature redundancy. Feature stability refers to the robustness of the imaging feature to training sample set variability. Feature redundancy refers to reducing the redundant features identified by being highly correlated to or similar in what they characterize¹⁷ since these are of little value.

There are multiple approaches to assess feature stability based on available imaging data for comparison. If test-retest dataset (same image from the same patient and scanner obtained a few minutes apart) is available, then the intra-class correlation coefficient (ICC)—an index from 0 to 1 that reflects test-retest agreement—can be calculated for each feature.^31–33 If multiple phantom images for the same and different scanners can be acquired,²⁰ then the concordance correlation coefficient (CCC) and dynamic range (DR) for each feature³⁴^,³⁵ can be calculated. Features with higher CCC, DR, and ICC (ie, CCC and DR > 0.9, ICC > 0.75) are considered to have higher test-retest/inter-observer agreement, biological range, and good reproducibility³⁵^,³⁶ and are considered stable with unstable features removed using the cutoff. Furthermore, if multiple image segmentations are available (same image, different radiologist or algorithm), the ICC for the features across the different annotations can be evaluated.¹¹^,²⁰

Once feature stability has been assessed, the next step is to eliminate redundancy in feature sets. This can be done through supervised (requires labels) or unsupervised (does not require labels)⁵ methods. As a first step in a supervised approach, a pairwise correlation test can be performed to remove the features with high correlation³⁷ and be followed by further supervised feature selection using filter, wrapper or ML-based methods. Correlation clusters and heatmaps are helpful for visualizing the feature set performance derived from different selection methods when training radiomic models.¹¹^,²⁰^,³⁷ Feature selection methods can also be incorporated during model building including embedded methods such as least absolute shrinkage and selection operator (LASSO). With LASSO, the model is regularized by “shrinking” the feature weights and setting weights of the non-contributing features to zero.³⁸ Finally, unsupervised feature selection methods can be used to reduce feature dimensionality including clustering, t-distributed stochastic neighbour embedding, and principal component analysis.²⁰

This summary of the process of feature selection highlights the variability that can be introduced by choice of the method in feature selection which has been reported in the literature³⁸ including when evaluating CT images of lung cancer patients³⁹ and when devising radiomic predictors of tissue histology.³⁷ It is challenging to standardize feature selection methods given its dependency on the available data, and the choice of method should be made based on the imaging data characteristics of a given dataset. An understanding of the choice of feature selection method as a source of variability is important and methods should be reported in detail in radiomic studies to ensure reproducibility.²⁰

Featureless DL methods

Deep learning methods are becoming increasingly popular in different parts of the radiomics pipeline (see “Image Segmentation” for DL-based image segmentation). Featureless DL methods, which do not require hand-crafted radiomic features, are particularly interesting since they avoid additional steps of image pre-processing and feature engineering, and the resulting sources of variability from these steps. In DL techniques, multiple layers of neural networks, with varying modules (convolution/pooling) and activation functions, represent the data non-linearly. For example, the first layer might represent edges, the second can identify motifs in the edges, and the third can distinguish objects from the motifs.

Supervised DL methods such as CNN, patch-/pixel-based ML,⁴⁰ and recurrent neural network⁴¹^,⁴² are commonly used when ample labelled data are available. CNNs are popular in image recognition⁴³ and are useful feature extractor layers of a DL architecture. In the convolutional layers of a CNN, the input image is divided into overlapping partial images through filters or kernels.⁴³ Hence, the layers down-sample the image and extract semantic information until finally all the information is converted into target variables in the final layer.¹⁹ Alternatively, unsupervised methods can be used including autoencoders (AE)⁴⁴ or recurrent Boltzmann machine methods.⁵ Semi-supervised approaches can combine supervised networks with unsupervised generative models for additional information about the image.⁴⁵

DL methods are generally more flexible than hand-crafted feature selection and can be modified for various tasks including segmentation, registration, and lesion detection. For example, CNN trained on images of skin lesions resulted in dermatologist-level classification of skin cancer⁴⁶ and a DL radiomics model using chest CT could predict distant metastasis.⁴⁷ However, DL methods involve a larger number of parameters and a larger amount of data than hand-crafted methods.⁴⁸ Data augmentation in DL can mitigate the issue of sparse data to some extent. Transfer learning can also be used to create a model trained on an unrelated dataset (eg, natural images) and use it on the target training dataset to fine-tune it to create a generalizable model on the target dataset.⁵^,⁴⁹

Model development

Data pre-processing

When using hand-crafted radiomic features, variables that impact model training must be addressed including feature scaling, missing values, and class imbalance.²⁰ Features scaling refers to normalization methods used to prevent features with greater ranges from dominating the model.⁵⁰ Missing value is a common problem in real-world datasets and can be addressed by using simple statistics or complex modelling to impute the missing value.²⁰ Class imbalance refers to the skewness of the dataset towards a certain classification label or predictive value (or range) and can be addressed by oversampling. Synthetic minority oversampling technique is used in oversampling to generate synthetic instances (feature sets) from real ones in the minority class.⁵¹ When using DL for modelling, data augmentation can be a useful approach to tackle class imbalance.

Feature classification

Once a set of non-redundant and robust features are curated, a model is built to answer the specific clinical question termed the “target variable.” Target variables can be discrete (ie, presence or absence of recurrence at 5 years), or continuous (ie, survival analysis). In the feature classification step of ML, the algorithm learns how to model the clinical question using examples of feature-target set (supervised ML) or by learning intrinsic patterns in the training feature set (unsupervised ML). The specific classification method is selected based on the medical question asked. Supervised classification algorithms, such as logistic regression,⁵² linear/non-linear support vector machine,⁵³ random forest,³⁹ Naïve Bayes,³⁷ can be tested for discrete variables. For continuous target variables, linear regression and regression trees are valuable. Features can also be clustered into intrinsic imaging phenotypes using unsupervised clustering methods using an agglomerative approach for hierarchical clustering²²^,⁵⁴ (Figure 3). For survival analysis, Kaplan-Meier curve,²²^,⁵⁵ Cox regression,⁵⁶ random survival forest,⁴⁹ and support vector survival⁴⁶ are typically used to explore variables or phenotypes that may impact survival time.

Hand-crafted radiomics feature modelling results from Singh et al.22 Top: Hierarchical clustering of radiomics features recognizes two statistically significant (P = .02) imaging phenotypes. Bottom: Survival analysis (using Kaplan-Meier curve). Progression-free survival probability for the two phenotypes using clinical covariates (PDL1 expression, Eastern Cooperative Oncology Group (ECOG), Body Mass Index (BMI), and smoking status) and radiomics phenotypes.

Figure 3.

Hand-crafted radiomics feature modelling results from Singh et al.²² Top: Hierarchical clustering of radiomics features recognizes two statistically significant (P = .02) imaging phenotypes. Bottom: Survival analysis (using Kaplan-Meier curve). Progression-free survival probability for the two phenotypes using clinical covariates (PDL1 expression, Eastern Cooperative Oncology Group (ECOG), Body Mass Index (BMI), and smoking status) and radiomics phenotypes.

Open in new tab Download slide

Model validation and evaluation

Once a radiomic model has been developed, it must then be validated. There are several different ways to validate radiomics models including quantification of predictive ability. Model validation is performed by separating the imaging data into training, validation, and testing sets. Usually, the training set is used during model building. Validation sets are used to fine-tune the ML setting (hyperparameters) to probe changes to the model’s performance. Sometimes, the training and validation are done on the same set and resampling techniques such as bootstrapping and cross-validation are used for validation.⁵^,²⁰ The test set is used to evaluate the performance of the now optimized model. To evaluate generalizability of the model, the test set can be curated from another scanner or site.

Several numerical metrics and graphical representations are used to evaluate performance. For binary classification, sensitivity (true positive rate) and specificity (true negative rate) are commonly used. The receiver operating characteristic (ROC) curve reveals the relationship between the true positive rate (y-axis) and the false-positive rate (x-axis). The area under the ROC represents the probability of the classifier randomly ranking a random positive instance higher than a random negative case. For survival analysis, c-statistics⁴⁷ is used to evaluate the discriminative ability of the model. The Kaplan-Meier curve (Figure 3) is helpful for estimating survival among different clusters of imaging phenotypes²² and can be evaluated using the log rank test.⁵⁷

Applications of radiomics and AI in oncologic imaging

Cancer screening and lesion detection

Multiple studies have shown a potential role for radiomics in the setting of optimizing lesion detection during cancer screening scans or routine clinical scans. For example, ML-based radiomic approaches have shown promise for distinguishing pancreatic ductal adenocarcinoma from normal pancreas in both diagnostic⁵⁸ and prediagnostic⁵⁹ CT scans and in detecting premalignant colorectal polyps using CT colonography.⁶⁰ Radiomics has also shown promises in categorizing highly suspicious prostate cancer using multiparametric-MRI (mpMRI)^61–63 and automated segmentation of tumour subregions.⁶⁴ Other tissue-detection studies have been done on organs-at-risk contouring for radiotherapy using DL methods.⁶⁵^,⁶⁶ By using these radiomic approaches, the sensitivity and specificity of lesion detection may be enhanced.

Tissue characterization

Radiomics has also shown promise for predicting tissue histology including differentiating between benign and malignant lesions. If successful, this approach could assist in characterization of lesions that are in a location difficult to biopsy or during longitudinal follow-up of cancer patients to reduce the need for repeat biopsies. For example, DL (multiparametric magnetic resonance transfer learning) has been demonstrated to learn discriminative imaging features and categorize prostate cancers.⁶⁷ Radiomics has been shown to successfully differentiate between benign and malignant lesions on mammography, ultrasound, and DCE-MRI images⁶⁸^,⁶⁹ and distinguish different types of breast cancers.⁵²^,⁵³ Radiomic features were also shown to predict the histology of lung tumours on CT.³⁷^,⁷⁰ With this technology comes the potential of a “virtual biopsy” performed on lesions while still in situ which alleviates sampling error arising from internal tissue heterogeneity of tumours.⁷¹

Prediction of tissue microenvironment and its effect on response

In addition to predicting tissue histology, radiomics-derived features have been demonstrated to predict tumour genomic expression characteristics and aspects of the tumour microenvironment. This is possible since the genomic expression and dynamics of the tumour microenvironment resulting in tissue structural changes that are detectable by radiomics. Aerts et al³¹ has developed prognostic radiomic signatures from lung and head-and-neck cancer (HNC) reflective of intratumoral heterogeneity and genomic expression. Radiomics signatures have been described that are associated with tumour features such as the degree of tumour mutational burden and presence of tumour-infiltrating lymphocytes both considered to be predictive of tumour response to anti-PDL1/PD1 therapy.⁷²

Peritumoral tissues features can also be of predictive value, for example, on PET imaging in non–small-cell lung cancer (NSCLC) and cervical cancer.⁷³ Since radiomics can detect these subtle tissue architectural features, radiomic signatures have been constructed that can predict oncologic therapy response. For example, radiomic can be leveraged to predict hypoxia in HNC patients, stratifying patients who would respond well to chemoradiotherapy based on tissue hypoxia.⁷⁴ Another study has shown that spatial heterogeneity in MRI of glibliobastoma tumours is associated with 12-month overall survival and altered gene expression patterns.⁷⁵

Characterization of local response to therapy and prediction of recurrence or metastasis

Radiomics can also be of value in therapy response analysis allowing for earlier determination of therapy response or failure, and discrimination between posttreatment changes and residual viable tumour. For example, delta radiomics features, or longitudinal study of features on serial imaging, was shown to be a good predictor of necrosis versus progression in brain metastasis after radiosurgery.⁷⁶ A phenotype of tumour heterogeneity extracted from preoperative DCE-MRI could predict 10-year recurrence rate⁷⁷ and radiomics models on CT images of NSCLC patients after radiotherapy⁷⁸ can predict the development of distant metastasis. Hand-crafted features from mpMRI trained on artificial neural network successfully predicted different response groups after preoperative chemoradiation therapy for locally advanced rectal cancer.⁷⁹ Another study used a combination of clinical information and CT scans to predict progression-free survival in patients with NSCLC undergoing first-line immunotherapy.²²

Barriers to clinical translation

Variabilities in image acquired

Radiomic approaches seek to leverage imaging to quantify biological variations in tissue. Radiomic models must be generalizable to enable accurate phenotyping on new “unseen” data (to the model) to be viable for clinical translation. A major confounder to this purpose are non-biological variations in medical images related to technical variations from scanner variability (manufacturer, model, hardware, sampling rate),¹² inconsistent adherence to imaging protocols,¹²^,⁸⁰ variability in image reconstruction algorithms,⁸¹^,⁸² hybrid protocols,⁶ and radiomics processing software.²³ This results in different features extracted even in the same patient when imaged at different sites. This makes the use of multi-institutional data for radiomics challenging⁶ which is a major caveat for the validation of radiomic models and eventual clinical translation.

Feature redundancy

The feature extraction process can produce a large number of features noting that the Image Biomarker Standardization Initiative has standardized 169 radiomic features.²³ Large number of features can lead to model overfitting while training and will weaken model performance on new patient data. Among extracted features, it is sometimes difficult to establish which are relevant, and heterogeneous feature selection methods (“Feature Selection”) can negatively impact the reproducibility of these features as biomarkers.

Feature and model interpretability

Feature interpretability becomes an issue especially with higher-order features such as texture features (ie, GLRLM, GLCM), and features after using filters. “Deep” features extracted also suffer from issue of low interpretability. Deep learning models are essentially “black-boxes,” and their layers cannot be interpreted leading to scepticism as to the value of the model.

Heterogeneity in segmentation methods

The use of different segmentation methods can produce different radiomics signatures from the same image. This issue most significantly impacts manual segmentation, particularly when there are multiple annotators involved.¹⁴^,⁸³ Semi-automatic segmentation, although reduces manual labour, can introduce variability through varying degrees of user input. In addition, differences in user decision-making of what to include in a VOI (ie, solid portions of tumour vs necrotic components) introduce variability. An example of this is whether or not to include the peritumoural regions when predicting disease outcomes,⁷³^,⁸⁴ a decision which has not been standardized by the field. While automatic segmentation techniques avoid the variability associated with user input, these methods work well on homogenous tumours but are challenged by irregular and heterogenous tumours.¹⁷

Impact of comorbidities and therapies on tumour architecture

The power of radiomic analysis lies in the ability to leverage the digital imprint of the tumour architecture and interactions with the microenvironment. However, this level of granularity in tissue characterization means that the impact of clinical variables, such as those imparted by existing comorbidities and therapies, becomes more important as variables impacting the radiomic signal. Even among oncologic therapies, there is an increasing number of drug classes each shutting down tumour growth through a different strategy, most recently immunotherapeutics which work by inciting inflammation within the tumour. Systemic therapy can be co-administered with other modalities such as radiation therapy which incite inflammation and DNA damage within the tissues in and around the site of disease.

Furthermore, each individual patient presents with individual alterations in specific organs that may contribute to even further heterogeneity in imaging appearance of lesions. For example, a lung cancer in a patient with advanced background emphysema versus in a nonsmoker. Similarly, hepatocellular carcinoma in patients with variable degrees of liver cirrhosis. These patient factors may limit the generalizability of a given prediction model and more research is needed to elucidate the effect of comorbidities and interventions on radiomic signatures.

Sample size and lack of big data

The impact of these sources of variability can be mitigated with large datasets that include clinical and imaging data, preferably from multi-institutional data sets. Obtaining these data sets is fraught with logistical and medico-legal issues that so far have limited most radiomic research to smaller data sets and retrospective analysis, significantly smaller than the datasets leveraged in past computer vision breakthroughs.⁸⁵ The need for “big data” is most pressing when developing radiomic models in populations that are inherently more heterogeneous (ie, multiple possible interventions, histologies, or patient populations).⁸⁶

Tools needed for clinical translation

Automated segmentation of lesions

Radiomic models perform best with homogeneous methods of segmentations whether automatic or semi-automatic.¹⁴ Automatic segmentation, an active area of research, can save time in radiomics analyses which require large datasets with homogenous pre-processing. Some automatic segmentation methods in development include Hierarchical CNNs for breast tumours in DCE-MRI,⁸⁷ GAN-based segmentation for liver and brain tumours⁸⁸ and U-net-based methods.⁸⁹^,⁹⁰ U-net are of particular interest since they do not require as many training samples⁹¹ as some of the other approaches, a limiting factor for DL in medical imaging.

Multicentre collaboration

Multicentre collaboration is of paramount importance in creating generalizable clinically relevant radiomic models. Moreover, it is important to validate these imaging biomarkers in multicentre prospective clinical trials through multi-institutional collaboration between imaging scientists, radiologists, and medical oncologists. Centralized learning (CL) initiatives where multiple institutions share patient data at a centralized location and develop clinical models are commonly used in radiomics. However, CL fails to work for a large number of institutions, especially on a global scale, due to concerns such as privacy, data ownership, and technical challenges.⁹² Conversely, federated learning, where the parameters from ML models trained on different centres are aggregated to form a consensus model, offers a decentralized approach to multicentre collaboration.⁹²^,⁹³

Standardization and harmonization of workflow

Adherence to standardized imaging acquisition protocols and proper reporting of the reconstruction algorithms will be important for creating high-quality multicentre aggregate data, both in real-world imaging acquisition and clinical trials. IBSI⁷⁴ recently recommended strict adherence to standardized radiomics workflows and proper reporting of the methods used in different steps of the pipeline to optimize outcome. Realistically however regional practice differences even within a single health system may limit optimal standardization. Merging multicentre studies for ML and statistical models can, therefore, lead to poor results. Hence, the development of harmonization for imaging acquisition differences is vital to maximize the efficiency of multicentre studies. This is being studied in both image and feature domains.²⁷ Image domain harmonization includes image-processing techniques (“Image Segmentation”). Feature-level harmonization includes batch-effect corrections using methods such as ComBat.²⁷^,²⁹ Further optimization of these methods would play a role in a faster integration of radiomics into clinical practice.

Conclusions and future directions of the field

This review aims to provide an overview of the workflow of radiomics, its potential clinical uses and remaining challenges currently limiting integration in routine clinical practice. We aimed to also integrate a brief discussion of the relevant AI tools used in the field of radiomics while being aware of the limitations of tackling these vast subjects in a short review.

A future goal of radiomics is to seamlessly integrate into the clinical workflow and augment radiological interpretation through quantitative measures; to assist radiologists in providing high-quality imaging interpretations, not replace them. Radiomics has shown great promise not only in areas of screening, diagnosis, and prognosis but also in areas such as therapy response prediction and assessment. Machine-learning-/DL-assisted radiomics can give treating clinicians a better insight of disease heterogeneity, progression, and therapy response on an individual level and help build targeted treatments—a step towards precision medicine. This is particularly promising in areas for heterogeneous disease such as cancer. Hence, it is crucial for researchers, developers, and clinicians to work to bring this technology to the clinical realm.

Funding

None declared.

Conflicts of interest

None declared.

References

Hofman

Hicks

RJ.

How we read oncologic FDG PET/CT

Cancer Imaging

2016

;

(

Month:	Total Views:
December 2023	25
January 2024	113
February 2024	225
March 2024	176
April 2024	210
May 2024	177
June 2024	144
July 2024	101
August 2024	98
September 2024	129
October 2024	152
November 2024	167
December 2024	181
January 2025	177
February 2025	168
March 2025	175
April 2025	133
May 2025	16

Article Contents

State of the art: radiomics and radiomics-related artificial intelligence on the road to clinical translation

Abstract

Introduction to radiomics and artificial intelligence in radiology

Image acquisition and conversion to segmentable data

Image pre-processing

Image segmentation

Hand-crafted feature curation

Feature extraction

Feature harmonization

Feature selection

Featureless DL methods

Model development

Data pre-processing

Feature classification

Model validation and evaluation

Applications of radiomics and AI in oncologic imaging

Cancer screening and lesion detection

Tissue characterization

Prediction of tissue microenvironment and its effect on response

Characterization of local response to therapy and prediction of recurrence or metastasis

Barriers to clinical translation

Variabilities in image acquired

Feature redundancy

Feature and model interpretability

Heterogeneity in segmentation methods

Impact of comorbidities and therapies on tumour architecture

Sample size and lack of big data

Tools needed for clinical translation

Automated segmentation of lesions

Multicentre collaboration

Standardization and harmonization of workflow

Conclusions and future directions of the field

Funding

Conflicts of interest

References

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only