Improving artificial intelligence pipeline for liver malignancy diagnosis using ultrasound images and video frames

Author Notes

Abstract

Recent developments of deep learning methods have demonstrated their feasibility in liver malignancy diagnosis using ultrasound (US) images. However, most of these methods require manual selection and annotation of US images by radiologists, which limit their practical application. On the other hand, US videos provide more comprehensive morphological information about liver masses and their relationships with surrounding structures than US images, potentially leading to a more accurate diagnosis. Here, we developed a fully automated artificial intelligence (AI) pipeline to imitate the workflow of radiologists for detecting liver masses and diagnosing liver malignancy. In this pipeline, we designed an automated mass-guided strategy that used segmentation information to direct diagnostic models to focus on liver masses, thus increasing diagnostic accuracy. The diagnostic models based on US videos utilized bi-directional convolutional long short-term memory modules with an attention-boosted module to learn and fuse spatiotemporal information from consecutive video frames. Using a large-scale dataset of 50 063 US images and video frames from 11 468 patients, we developed and tested the AI pipeline and investigated its applications. A dataset of annotated US images is available at https://doi.org/10.5281/zenodo.7272660.

ultrasound images, artificial intelligence, deep learning, liver malignancy diagnosis

Introduction

Liver malignancy is the fourth leading cause of cancer-related death worldwide and ranks sixth in terms of incident cases [1]. In China alone, liver malignancy accounts for more than half of all liver malignancy-related deaths [2]. Early detection and diagnosis of liver malignancy, as well as timely treatment, are crucial for improving patient survival. Ultrasound (US) is a flexible, safe, low cost and real-time examination tool that employs the pulse-echo principle to produce an anatomical tomogram and detect abnormalities such as liver masses [3]. It is commonly used as the first-line liver imaging method for monitoring, screening and diagnosis of liver malignancy [4]. Despite its extensive usage, the accuracy of US-based detection of liver malignancy varies widely [5, 6], owing to the fact that US is highly dependent on the expertise, experience and attention to detail of radiologists [2]. Therefore, it is vital to develop computer-aided diagnosis systems to help radiologists improve diagnostic accuracy.

Recent developments of deep learning models in artificial intelligence (AI) [7] have demonstrated their feasibility in medical imaging [8, 9, 10], including diagnosing liver malignancy using US images [11]. However, a fully automated AI pipeline that is robust to varying US image conditions and aligned with the standard of actual clinical application is lacking. Many previous studies used small datasets [12–14], lacked external testing to evaluate their models’ robustness and reliability, or required radiologists to manually select images from US videos and annotate mass regions [15–17], hence limiting their clinical applications. A distinct challenge is heterogeneity of US images. That is, liver images are known to have various levels of brightness and noises, diverse liver shapes and various sizes and locations of liver masses. This challenge makes it difficult to develop a robust AI model for detection and diagnosis of liver malignancy. Another challenge is the opacity of the decision-making process of deep learning models, which hinders the translation of AI systems to clinical applications.

Meanwhile, although studies have reported AI models with considerable accuracy for detecting liver malignancy, the majority of these models were developed on static US images that were manually selected by radiologists [14,15,18–20]. However, in US images, the overall morphology of liver masses and their relationships with surrounding structures, which are critical for radiologists in making diagnosis, would most likely be lost [21]. On the other hand, the US videos can provide more comprehensive information on the morphology and texture of the liver masses, which AI models can employ to develop a more accurate diagnosis model for liver malignancy [22]. However, not all video frames can be used for diagnostic analysis; radiologists always search for frames that clearly display liver masses. Therefore, the design of a comprehensive model for US videos should maximize their advantages while limiting their disadvantages.

In this study, we aim to develop a comprehensive AI pipeline that imitates the workflow of radiologists for liver malignancy diagnosis using US images and video frames. This pipeline is a hierarchical, fully automated system that integrates deep learning segmentation and classification methods to perform scan-liver segmentation, mass detection and segmentation and diagnostic analysis (Figure 1A). In this pipeline, we designed an automated mass-guided strategy to incorporate mass segmentation information into the diagnostic network so that the diagnostic models can focus on the mass regions in the US images, thus increasing prediction accuracy and making the results more interpretable. On top of this, we took advantage of the bi-directional convolutional long short-term memory (BConvLSTM) network [23], which is capable of extracting spatiotemporal information from US videos and developed Attention-Boosted BConvLSTM-based diagnostic models. Not only can the models learn morphological information about liver masses and their relationships with surrounding structures from consecutive frames, but they also pay particular attention to key frames that clearly show liver masses.

Figure 1

The overview of the AI pipeline. (A) The AI pipeline for liver malignancy diagnosis. First, a segmentation framework was applied to US images to hierarchically segment scan regions and livers. Then, a CNN detection model and a segmentation model were applied to the segmented liver images to detect and segment liver masses, respectively. Finally, the diagnostic models used a combination of US images and clinical factors for (i) liver malignancy diagnosis and subtype predictions, (ii) comparison with the performance of radiologists and (iii) development of new models for US videos. (B) A three-stage segmentation framework. This framework was trained on manually annotated images to segment scan regions from images, livers from other organs and masses from livers. The trained networks were integrated into the AI pipeline in (A). See also Supplementary Figure S2a for more detailed model description. (C) An Attention-Boosted BConvLSTM-based classification model for US videos. For every t consecutive US frames, the CNN models’ outputs from the AI pipeline were input into the BConvLSTM layers and then the Attention-Boosted module learned and fused spatiotemporal information for liver malignancy diagnosis. See also Supplementary Figure S2b for more detailed model description.

Open in new tab Download slide

The AI pipeline was trained, validated and internally tested on a large-scale cohort of 43 746 US images containing a variety of US equipment, examination settings and histological subtypes. It was externally tested on two datasets with a total of 6317 images, which demonstrated its robustness and efficiency across a variety of US imaging conditions and liver mass types. Our model outperformed junior radiologists and was comparable to mid-level radiologists in terms of accuracy on an independent cohort. To investigate the potential clinical applications of our AI pipeline, we simulated a scenario using consensus from both radiologists and the AI pipeline in decision-making. Moreover, experimental results showed that our video-based diagnostic models can provide more accurate prediction than image-based models for diagnosing liver malignancy with an increasing area under the receiver operating characteristic curve (AUC) from 0.967 to 0.983 (with clinical factors) and 0.943 to 0.966 (without clinical factors) at the patient level.

Materials and methods

Data collection

We constructed a large US dataset by combining data from three geographical regions in China: Guangzhou, a city in Guangdong province (the Guangzhou cohort); Foshan, a city also in Guangdong province (the Foshan cohort) and Yichang, a city in Hubei province (the Yichang cohort). A total of 50 063 US images from 11 468 patients in these three cohorts were obtained. We also collected serological examination results for patients with liver masses in both the Guangzhou and Foshan cohorts (Supplementary Tables S1 and S2). The US devices and definitions of liver masses are summarized in Supplementary Appendices 1.1 and 1.2, respectively. The Ethic committee approvals were obtained in all the institutions, and all the participants’ patients signed a consent form.

To collect the US images and video clips, the examining physicians performed two-dimensional US scans of the liver according to the routine procedure. Each video was clipped during mass appearance in the visual field. Meanwhile, the images with the major sections of the masses were kept if intrahepatic masses were found and clearly displayed. All US images and clinical factors were first de-identified to remove any patient-related information. A subset of 735 US images were annotated for segmentation model development, including 435 images of malignant masses, 200 images of benign masses and 100 images of normal livers. Two radiologists with >10 years of experience annotated and verified the data, respectively. In the event of a disagreement, they discussed the case and reached a consensus.

Implementation of the AI pipeline

As shown in Figure 1A, the AI pipeline consisted of four components: scan-liver segmentation, mass detection, mass segmentation and diagnostic analysis of liver masses. First, the scan-liver segmentation model received US images as input and produced normalized liver images. Second, the model for liver mass detection predicted whether or not the liver images contain masses. Then, the liver images containing masses were processed for mass segmentation. Finally, the mass-guided deep learning models integrated liver images with patients’ clinical factors to conduct diagnostic analysis, including malignant and benign liver mass classification, and histological subtype prediction. The design of the framework was able to reduce several kinds of noises in the original images, such as US background noises, human operation biases and device-dependent variations, while also providing greater generalization. The following sections detail the development of three-stage liver-mass segmentation models (Figure 1B and Supplementary Figure S2a) for the first and third components, as well as the implementation of classification models for the second and final component.

Development of three-stage segmentation framework

A raw US image contains a background area outside the US scan region and the echogenicity of other unrelated organs and tissues, which may interfere with the classification models for liver malignancy diagnosis. We propose an automated three-stage segmentation framework to solve this problem and ensure device compatibility and performance. As shown in Figure 1B and Supplementary Figure S2a, the framework includes three stages: segmenting US scan regions from images (Stage 1), livers from the scan regions (Stage 2) and masses from livers (Stage 3). Together, they decomposed a multi-class segmentation problem into a sequence of three binary segmentation problems according to sub-region hierarchy.

At the first stage, we down-sampled the raw US images to a low resolution of |$128\times 128$| and segmented the scan regions using the scan-region segmentation models. We then calculated the bounding boxes with the segmentation results and cropped them from the original images, thereby removing the areas outside the scan regions. At the second stage, we segmented the liver regions from the cropped images using the liver segmentation models, thereby removing other organs, background and noise. At this stage, the inputs (scan regions) were normalized to |$256\times 256$| to balance computational cost and accuracy. The segmented liver regions were then cropped from the original images and normalized to |$512\times 512$|⁠. At the third stage, the mass segmentation models took the normalized liver images to segment liver masses for diagnostic analysis.

We assessed three widely used deep learning semantic segmentation models as the backbone for the framework: Fully Convolutional Network (FCN) [24], U-Net [25] and DeepLabV3 [26]. FCN consists of multiple convolutional and max-pooling layers, followed by up-sampling layers to identify pixel-wise labels and predict segmentation masks. Compared with FCN, U-Net adds horizontal concatenation operations that combine high-resolution features in the contracting path with the up-sampled output. In this way, the successive convolution layers can learn to assemble more precise outputs and increase localization accuracy. DeepLabV3 utilizes the Atrous Spatial Pyramid Pooling module to probe convolutional features at multiple scales, thus boosting the model’s segmentation performance at multiple scales. To improve their generalizability, we pretrained the models on the Microsoft (MS) common object in context (COCO) dataset [27], which is a large-scale semantic segmentation dataset with 2.5 million labeled instances in 328 k images. Moreover, we used data augmentation that included rotation, brightness adjustment, horizontal/vertical flips and elastic deformations during the training stage to allow models to learn slight variations of images. The training settings are summarized in Supplementary Appendix 1.3.

Development of diagnostic models

To simulate the clinical pathways of US examination, we developed a hierarchical diagnostic system that integrated segmentation networks into classification networks. The classification networks included two sequential diagnostic procedures to address common clinical scenarios. First, a ‘mass detection’ network is called to differentiate between US images with masses (abnormal) and those without (normal). Second, a ‘diagnostic analysis’ network is called to classify liver masses as malignant or benign and perform histological subtype prediction.

For the ‘mass detection’ network, segmented liver images from Stage 2 of the segmentation framework were first resized to 512|$\times$|512 and then analyzed by a DenseNet121-based convolutional neural network (CNN) model for differentiation between normal and abnormal livers. In comparison, we also developed a CNN model using original US images without segmented livers as inputs. We trained, validated and internally tested the classification models on the US image data from the Guangzhou cohort, with a random patient-level split of 70%:10%:20%.

Various combinations of clinical information were used to develop the ‘diagnostic analysis’ models to classify liver masses as malignant or benign. Specially, using the segmented masses from Stage 3 of the segmentation framework, we proposed an automated mass-guided strategy that incorporated the mass segmentation information into the diagnostic network, which classified liver masses without (LM-Net) or with clinical factors (LMC-Net). In detail, we added the mass segmentation regions on liver images as an additional input channel to the diagnostic network, thereby enhancing the diagnostic network’s ability to detect significant mass features and utilize them for diagnosis. For LMC-Net, we incorporated clinical factors into this architecture with a fully connected layer. Features of clinical factors were then concatenated with output features of the liver and mass branches and analyzed by two fully connected layers. After the fully connected layers, a SoftMax computation layer was added to produce probabilities for classification tasks. In this study, we intent to evaluate the diagnostic benefit of using both the images and the clinical factors. For this purpose, we developed two additional models: a liver image-only diagnostic model based on the CNN model (L-Net) and a machine learning (ML) classifier (C-model) using Gradient Boost Decision Tree on clinical factors. An additional benefit is that we can utilize the clinical factor-only model (C-model) to make a diagnosis if a mass is not detected by the model.

Explanation of decision-making

We used Gradient-weighted Class Activation Mapping (Grad-CAM) to discover how much each liver region in the US images contributed the classification of malignant versus benign masses, as performed by deep learning models. We applied Grad-CAM to the final convolutional layer of CNN architectures to highlight the important regions for prediction.

We also adopted a Shapley additive explanation (SHAP) method to illustrate the effect of clinical features on the C-model. SHAP is an effective method that provides explainability of the model with an advantage of local and global interpretability [28,29].

Comparison with radiologists and performance enhancement by AI

Twelve radiologists with varying years of experience were chosen and separated into three groups based on their work experience: junior-level (fewer than 8 years), intermediate-level (8–12 years) and senior-level (> 12 years), with four radiologists in each group. They independently made diagnoses for patients in the Foshan cohort according to examined US images and clinical factors. In the same cohort, we employed the LMC-Net model to predict liver-malignancy probabilities.

We then investigated the potential clinical applications of LMC-Net in the diagnosis of liver malignancy. We simulated the scenario using consensus derived from both radiologists and our AI model, in which the AI model was deployed as a ‘second reader’ of the diagnostic decisions of radiologists [30]. We randomly divided the four senior radiologists and the four junior radiologists into four groups that each group consisted of one senior radiologist and one junior radiologist. For each group, we used a junior radiologist as the first reader. When the AI model agreed with a junior radiologist’s decision, the decision was considered final. In the event of a disagreement, a senior radiologist’s opinion was sought.

Histological subtype prediction

To assess the AI model’s performance in distinguishing more detailed histological subtypes, we selected two subsets from the Guangzhou cohort, one with malignant liver masses and the other with benign liver masses. The malignant subset includes 915 images from 412 patients with hepatocellular carcinoma (HCC), 78 images from 36 patients with intrahepatic cholangiocarcinoma (ICC) and 457 images from 188 patients with metastases. The benign subset includes 4123 images from 1527 patients with hemangioma, 5005 images from 1832 patients with liver cysts and 250 images from 113 patients with focal nodular hyperplasia (FNH). Malignant masses and FNHs were confirmed by biopsy or post-surgery pathology, whereas hemangiomas and cysts were confirmed by enhanced imaging. This study employed a 5-fold cross-validation strategy to train and validate the LMC-Net model, with a random patient-level split of 80%:20% for training and validation in each fold.

Attention-boosted BConvLSTM-based models for US videos

In practice, radiologists examine and analyze multiple consecutive US video frames for morphological and texture information of liver masses, as well as the other information such as the sizes to diagnose liver malignancy. To imitate radiologists’ decision-making process, we proposed an Attention-Boosted BConvLSTM-based diagnostic models for US videos (Figure 1C and Supplementary Figure S2b), where BConvLSTM network captures spatiotemporal information from US videos [31]. The BConvLSTM used two ConvLSTMs [32] to process the input video frames into two directions of forward and backward paths, which were recurrent layers designed for spatiotemporal data. Since frames in the BConvLSTM-based model should not be regarded equally important, we proposed an attention-boosted module to weight the mass-attention values such that critical frames in US videos receive more attention. Given a sequence of t frames |${x}_1,\dots, {x}_i,\dots, {x}_t$| from a video, we utilized the softmax computation of the sizes of mass regions within each frame to calculate the mass-attention values |${\alpha}_i$| as follows:

$$ {\alpha}_i=\frac{\mathit{\exp}\left({s}_i+\varepsilon \right)}{\sum_{k=1}^t\mathit{\exp}\left({s}_k+\varepsilon \right)}, $$

where |${s}_i$| represents the proportion of the segmented masse region in the whole liver image, and |$\varepsilon$| was set to 0.01 to alleviate the adverse impact of segmentation errors. The attention-boosted module was added to the BConvLSTM layers. Let |${H}_1,\dots, {H}_t$| denote the hidden state tensors of the BConvLSTM layers. The output of the attention-boosted module |${Y}_t$| was calculated as the weighted sum of the hidden states and mass-attention values as follows:

$$ {Y}_t=\sum_{k=1}^t{\alpha}_k{H}_k. $$

In this study, we developed two attention-boosted BConvLSTM-based diagnostic models for US videos, one using only US videos (LM-VNet) and the other using a combination of US videos and clinical factors (LMC-VNet). We constructed the BConvLSTM-based diagnostic network using the backbone of LM-Net followed by two BConvLSTM layers, each with a kernel size of |$3\times 3$|⁠. The LM-Net was pretrained on the US image dataset to address the problem of small-scale video data. Considering the strong similarity of adjacent frames, we sub-sampled one-third of the input video frames from the original 15 frames-per-second stream. For each frame, the LM-Net backbone extracted convolutional features and fed them into the BConvLSTM layers. For every 16 consecutive frames, the BConvLSTM layers extracted the spatiotemporal features as 16 corresponding hidden state tensors, and then the attention-boosted module and an average pooling layer compressed them into one output tensor. The output tensor was the input of the fully connected (FC) layer in the LM-VNet model, whereas in the LMC-VNet model, the output tensor concatenated with the features extracted from clinical factors was the input of the FC layer. Furthermore, an additional SoftMax activation function following the FC layer predicted the malignant probabilities. We calculated video-level malignant probabilities by taking weighted sum of predicted malignant probabilities from all frames using frame-level mass-attention values as weights. In comparison, we also evaluated the LM-Net and the LMC-Net models by treating US videos as individual images. We averaged the probabilities for each patient if there were more than one image/video for one patient. The training settings are summarized in Supplementary Appendix 1.4.

Evaluating the models

We used Intersection Over Union (IoU) as a metric to evaluate the model’s performance on segmentation (Supplementary Table S3). The IoU is determined by dividing the area of overlap between the prediction segmentation region and the ground truth by the area of union of the predicted segmentation region and the ground truth. We used a variety of metrics to assess the model’s performance on classification tasks, including sensitivity, specificity, precision, accuracy and AUC. Sensitivity, specificity, precision and accuracy were determined by Youden index. The confidence intervals for the difference of two values were calculated by the bootstrap method [33] with 1000 repeats. A two-sided permutation test with 10 000 trials was used to generate P-values for the difference [34], and P-value < 0.05 was considered statistically significant. We drew the smooth AUC-receiver operator characteristic (ROC) curves using the pROC package [35].

Results

Clinical characteristics

The Guangzhou cohort served as the training, validation and internal-testing datasets for our AI pipeline. It consisted of a US image dataset and a US video dataset, a total of 43 746 US images from 10 997 patients. The US image dataset consisted of 25 087 images from 10 831 patients, including images of normal livers, benign masses and malignant masses. The US video dataset consisted of 205 video clips containing 18 659 US frames captured from 166 patients during the appearance of a mass.

The Foshan and Yichang cohorts served as two external test datasets to assess the generalizability and applicability of our AI pipeline. The Foshan cohort also served as a prospective study with 673 US images from 370 patients. The Yichang cohort was a US video dataset, which included 5644 frames from 101 patients during the appearance of a mass. Meanwhile, for each video clip, we also collected the US image that displayed the clearest mass. In this way, the video datasets of Guangzhou cohort and Yichang cohort could be used to evaluate the performance of the AI pipeline on US videos as well as US images. Details are summarized in Table 1 and Supplementary Figure S1.

Table 1

Open in new tab

Patient demographic statistics in the developmental/internal-testing dataset (Guangzhou) and external test datasets (Foshan and Yichang)

Datasets	Image dataset (Guangzhou)	Video dataset (Guangzhou)	External test set1 (Foshan)	External test set2 (Yichang)
Number of US images	25 087	18 659 (205 clips)	673	5644 (101 clips)
Number of Patients	10 831	166	370	101
Age mean, years (std)	46.46 (14.39)	52.83 (13.97)	48.63 (14.78)	49.25 (16.22)
Male (%)	6408 (59.16%)	118 (71.08%)	199 (53.78%)	55 (54.46%)
Normal images	10 241	–	245	–
Benign images	9549	8475	218	1708
Malignant images	5297	10 184	164	3936
Normal patients (%)	4284 (39.55%)	–	73 (19.73%)	–
Benign patients (%)	4569 (42.18%)	103 (62.05%)	218 (58.92%)	45 (44.55%)
Malignant patients (%)	1978 (18.26%)	63 (37.95%)	79 (21.35%)	56 (55.45%)

Datasets	Image dataset (Guangzhou)	Video dataset (Guangzhou)	External test set1 (Foshan)	External test set2 (Yichang)
Number of US images	25 087	18 659 (205 clips)	673	5644 (101 clips)
Number of Patients	10 831	166	370	101
Age mean, years (std)	46.46 (14.39)	52.83 (13.97)	48.63 (14.78)	49.25 (16.22)
Male (%)	6408 (59.16%)	118 (71.08%)	199 (53.78%)	55 (54.46%)
Normal images	10 241	–	245	–
Benign images	9549	8475	218	1708
Malignant images	5297	10 184	164	3936
Normal patients (%)	4284 (39.55%)	–	73 (19.73%)	–
Benign patients (%)	4569 (42.18%)	103 (62.05%)	218 (58.92%)	45 (44.55%)
Malignant patients (%)	1978 (18.26%)	63 (37.95%)	79 (21.35%)	56 (55.45%)

Table 1

Open in new tab

Patient demographic statistics in the developmental/internal-testing dataset (Guangzhou) and external test datasets (Foshan and Yichang)

Datasets	Image dataset (Guangzhou)	Video dataset (Guangzhou)	External test set1 (Foshan)	External test set2 (Yichang)
Number of US images	25 087	18 659 (205 clips)	673	5644 (101 clips)
Number of Patients	10 831	166	370	101
Age mean, years (std)	46.46 (14.39)	52.83 (13.97)	48.63 (14.78)	49.25 (16.22)
Male (%)	6408 (59.16%)	118 (71.08%)	199 (53.78%)	55 (54.46%)
Normal images	10 241	–	245	–
Benign images	9549	8475	218	1708
Malignant images	5297	10 184	164	3936
Normal patients (%)	4284 (39.55%)	–	73 (19.73%)	–
Benign patients (%)	4569 (42.18%)	103 (62.05%)	218 (58.92%)	45 (44.55%)
Malignant patients (%)	1978 (18.26%)	63 (37.95%)	79 (21.35%)	56 (55.45%)

Datasets	Image dataset (Guangzhou)	Video dataset (Guangzhou)	External test set1 (Foshan)	External test set2 (Yichang)
Number of US images	25 087	18 659 (205 clips)	673	5644 (101 clips)
Number of Patients	10 831	166	370	101
Age mean, years (std)	46.46 (14.39)	52.83 (13.97)	48.63 (14.78)	49.25 (16.22)
Male (%)	6408 (59.16%)	118 (71.08%)	199 (53.78%)	55 (54.46%)
Normal images	10 241	–	245	–
Benign images	9549	8475	218	1708
Malignant images	5297	10 184	164	3936
Normal patients (%)	4284 (39.55%)	–	73 (19.73%)	–
Benign patients (%)	4569 (42.18%)	103 (62.05%)	218 (58.92%)	45 (44.55%)
Malignant patients (%)	1978 (18.26%)	63 (37.95%)	79 (21.35%)	56 (55.45%)

Performance of segmentation models

The results of the segmentation models for our three-stage segmentation framework are summarized in Supplementary Table S3. All models had at least 0.97 IoU for scan-region segmentation, 0.92 IoU for liver segmentation and 0.71 IoU for liver mass segmentation. DeepLabV3 performed the best with 0.988, 0.940 and 0.758 IoU in the US scan region, liver and mass segmentations, respectively. As a result of its superior overall performance, we chose DeepLabV3 as the backbone of our segmentation framework.

Figure 2 shows three examples of DeepLabV3 segmentation results for US scan regions, livers and liver masses, as well as comparisons to manual annotations. The AI framework’s ability to perform precise US image segmentations was clearly demonstrated in the nearly perfect agreements between human annotations and segmentations. As a result, the segmentation system alone might be used as a visualization tool for radiologists to highlight lesion areas.

Figure 2

Three examples illustrating the segmentation framework’s results. The first column displays the original US images. The second column displays the manually segmented US images. The third column displays the segmented US images generated by the AI segmentation framework. The fourth column and the fifth column display saliency maps (using Grad-CAM) of the diagnostic models with (LM-Net) and without (L-Net) mass-guided strategy, respectively. The colors indicate the regions that the AI models prioritize when performing malignancy diagnosis, with the red color indicating a greater contribution to the prediction results, the white color a moderate contribution and the blue color a minor contribution.

Open in new tab Download slide

Performance of diagnostic models for US images

As shown in Figure 3A and Supplementary Table S4, our approach was able to detect abnormal livers from normal livers using segmented liver images with an AUC of 0.990 [95% confidence interval (95% CI): 0.986–0.992], a sensitivity of 95.1%, a specificity of 96.1%, a precision of 97.4% and an accuracy of 95.5%. In comparison, the CNN model using original US images without segmented livers as inputs had worse performance, with an AUC of 0.963 (95% CI: 0.960–0.966), a sensitivity of 87.9%, a specificity of 94.2%, a precision of 96.2% and an accuracy of 91.9%.

Figure 3

The performance of the diagnostic system on the Guangzhou cohort. (A) A comparison of the ROC curves for CNN models that detect liver masses using original US images and liver images. (B) A comparison of the ROC curves for four AI models using different combinations of clinical factors and US images for classifying benign versus malignant masses. C-model: a ML model using only clinical factors; L-Net: a deep learning diagnostic model using liver images; LM-Net: a deep learning diagnostic model using a combination of liver images and mass segmentation information; LMC-Net: a deep learning diagnostic model using a combination of liver images, mass segmentation information and clinical factors. (C) AUC values of the CNN model trained with different numbers of liver images for liver mass detection and (D) AUC values of the LMC-Net model trained with different numbers of liver images for liver malignancy diagnosis.

Open in new tab Download slide

We evaluated the performance of the classification models for classifying liver masses as malignant or benign on the internal test dataset containing 20% of patients with liver masses from the Guangzhou Cohort, and the results are shown in Figure 3B and Supplementary Table S5. The model LM-Net without clinical factors achieved an accuracy of 89.3% and an AUC of 0.940 (95% CI: 0.927–0.954), and the model LMC-Net with clinical factors achieved an accuracy of 91.5% and an AUC of 0.968 (95% CI: 0.960–0.975). In comparison, the clinical factor only C-model had an accuracy of 81.0% and an AUC of 0.885 (95% CI: 0.880–0.889), whereas the image only L-Net had an accuracy of 85.2% and an AUC of 0.916 (95% CI: 0.907–0.924). These results demonstrated that the mass-guided strategy on liver images improved the diagnostic accuracy by 4.1%, and that adding clinical factors improved the image-based diagnostic accuracy by 2.2%.

To evaluate the performance gains of our proposed model from increasing data size, we randomly sampled 500, 1000, 2000, 4000, 8000 and 16 000 images from our dataset for training a liver mass detection model, and 500, 1000, 2000, 4000 and 8000 images for training a liver malignancy diagnosis model. After training, we evaluated the model performance on the internal-testing dataset. Each experiment was repeated five times, and the mean and standard deviation of AUC values were reported. As shown in Figures 3C and D, the performance improved as the number of training samples increased. Especially, the model performance achieved 0.978 AUC for liver mass detection, and 0.955 AUC for liver malignancy diagnosis, when the training data exceeded 4000.

Explanation of decision-making

To examine the basis of the decision-making process of the AI models, we first applied a visual explanation algorithm called Grad-CAM [36] in conjunction with the CNN architecture model to profile and compare the attention regions of the liver images with (LM-Net) and without (L-Net) the mass-guided strategy. As shown in Figure 2, the saliency maps of mass-guided model (LM-Net) enabled a greater focus on the liver masses, which were the critical regions for diagnostic analysis, and thus produced a better classification accuracy.

Using an explainer SHAP on the C-model, we examined the significance of the clinical factors and displayed the results in Figure 4. Figure 4A and B represents the instance-level interpretation for patients with liver malignancy and benign masses, respectively. Figure 4C and D illustrates the global feature attributions of the whole dataset. The alpha-fetoprotein (AFP) level was identified as the most significant factor in the liver malignancy diagnosis. The serum enzyme levels, including γ-glutamyl transpeptidase, aspartate aminotransferase, alkaline aminotransferase and alkaline phosphatase, also contribute substantially to the diagnosis. These findings are consistent with current knowledge, indicating that the model appropriately accounts for clinical factors.

Figure 4

The effects of clinical features on the ML model (C-model) as determined by SHAP. (A) Clinical feature contribution for a patient diagnosed with malignant liver mass. The horizontal axis represents the prediction probability. Features contributing to the increase of the probability were highlighted in red, whereas those contributing to the decrease of the probability were highlighted in blue. (B) Clinical feature contribution for a patient diagnosed with benign liver mass. (C) Distribution of the effects of each clinical feature on the global-level output. The colors represent the features’ values, with red as high and blue as low. The features to the left of the bar contribute negatively to the malignancy prediction, whereas the features to the right contribute positively to the malignancy prediction. (D) Average effect for each clinical feature. AFP: alpha-fetoprotein (ng/ml); ALB: albumin (g/L); GGT: gamma-glutamyl transferase (U/L); AST: aspartate aminotransferase (U/L); HBsAg: hepatitis B surface antigen; ALP: alkaline phosphatase (U/L); ALT: alkaline aminotransferase (U/L); CEA: carcinoembryonic antigen (ug/L); TBIL: total bilirubin (umol/L); DBIL: direct bilirubin (umol/L), Sex: 1(female), 2(male).

Open in new tab Download slide

Independent external testing of the AI models

We evaluated the AI models using two external datasets from geographically distinct regions (Foshan and Yichang).

Using the Foshan cohort, we first applied the CNN model to all 673 liver images from 370 patients to detect the existence of liver masses. As shown in Figure 5A, the model had a sensitivity of 86.7%, a specificity of 88.0% and an AUC of 0.945 (95% CI: 0.933–0.955). For 297 patients (382 images) with liver masses, we then applied the LMC-Net to classify these images as malignant or benign. As shown in Figure 5B, the model had a sensitivity of 82.7%, a specificity of 92.7% and an AUC of 0.928 (95% CI: 0.902–0.950). These results demonstrated robust performance of the AI models on external datasets.

Figure 5

The performance of the diagnostic system on the external test datasets and its comparison to radiologists. (A) The ROC curve for the CNN model using liver images to detect liver masses on the external test dataset (Foshan). (B) The ROC curve for the LMC-Net model for classifying benign versus malignant masses on the external test dataset (Foshan). The results include the mean diagnostic accuracies of junior, mid-level, senior radiologists and the consensus decision reached by radiologists and the AI model. (C) The ROC curve for the LM-Net for classifying benign versus malignant masses using the images from the Yichang cohort.

Open in new tab Download slide

Using the Yichang cohort, we applied the LM-Net to classify 101 US images from 101 patients as malignant or benign. As shown in Figure 5C, the LM-Net model had a sensitivity of 85.4%, a specificity of 77.8% and an AUC of 0.885 (95% CI: 0.828–0.922). These results validated the generalizability of the AI model.

Comparison with radiologists and performance enhancement by AI

Comparison of LMC-Net with the judgement of 12 US radiologists on liver malignancy diagnosis using the Foshan cohort is shown in Figure 5B and Supplementary Table S8. Specifically, the sensitivity of the deep learning model was comparable to that of the mid-level radiologists (82.7% versus 82.6%, P > 0.05) at a respectable specificity of 92.7%, but significantly higher than that of junior radiologists (82.7% versus 75.6%, P < 0.001).

In the simulation study, the combination of human and AI resulted in overall performance that was better than that of senior radiologists alone (Accuracy: 91.3% versus 89.5%), while saving 79.6% of senior radiologists’ labor (Figure 5B, Supplementary Table S9). These results demonstrated that the AI model could improve the performance of junior radiologists and reduce the workload of senior radiologists.

Histological subtype prediction

The LMC-Net model was able to differentiate HCC from the other subtypes with an AUC of 0.796 (95% CI: 0.763–0.828), ICC from the other subtypes with an AUC of 0.692 (95% CI: 0.609–0.775) and metastases from the other subtypes with an AUC of 0.779 (95% CI: 0.741–0.812) (Figure 6A). For the benign subset, the LMC-Net model was able to differentiate FNH from the other subtypes with an AUC of 0.881 (95% CI: 0.848–0.912), liver cyst from the other subtypes with an AUC of 0.930 (95% CI: 0.923–0.937) and hemangioma from the other subtypes with an AUC of 0.903 (95% CI: 0.895–0.911) (Figure 6B). These results demonstrated the utility of AI models in histological subtype prediction.

Figure 6

The performance of the LMC-Net models for malignant and benign subtype classifications. (A) ROC curves for the malignant subtype classification. (B) ROC curves for the benign subtype classification.

Open in new tab Download slide

AI performance for US videos

As shown in Figure 7A and Supplementary Table S6, the LM-VNet model without clinical factors had a sensitivity of 87.3%, a specificity of 91.0%, a precision of 87.9% and an AUC of 0.966 (95% CI: 0.955–0.977), whereas the LMC-VNet model with clinical factors had better performance with a sensitivity of 90.9%, a specificity of 93.5%, a precision of 92.5% and an AUC of 0.983 (95% CI: 0.972–0.991). In comparison, if we replaced the US videos with individual images and applied the LM-Net model and the LMC-Net model to these images, performance decreased. The LM-Net model had a sensitivity of 84.7%, a specificity of 88.8%, a precision of 87.1% and an AUC of 0.943 (95% CI: 0.924–0.958), whereas the LMC-Net model had a sensitivity of 88.1%, a specificity of 91.9%, a precision of 89.6% and an AUC of 0.967 (95% CI: 0.955–0.979). These results demonstrated the importance of spatiotemporal information between consecutive US frames, which should be incorporated into the AI models for more accurate diagnosis. We validated the LM-VNet on the video dataset of the Yichang cohort. As shown in Figure 7B, the LM-VNet had a sensitivity of 86.0%, a specificity of 84.3%, a precision of 85.6% and an AUC of 0.901 (95% CI: 0.873–0.921), which demonstrated the robustness of the video-based model.

Figure 7

The performance comparison of the diagnostic system using US image and video data. (A) The ROC curves for four diagnostic models in classifying benign versus malignant masses using the developmental dataset (Guangzhou). LM-VNet: a video model using a combination of liver images and mass segmentation information. LMC-VNet: a video model using a combination of liver images, mass segmentation information and clinical factors. (B) The ROC curves for the LM-VNet model for classifying benign versus malignant masses in the external test dataset (Yichang).

Open in new tab Download slide

We investigated the contributions of various modules in our model through the ablation study. For the clinical factors, the diagnostic models integrated with clinical factors (LMC-) outperformed the models without using the clinical factors (LM-) by roughly 2% in AUC, demonstrating that the clinical factors do contribute to the diagnosis. For the BConvLSTM module, the video diagnostic models based solely on the BConvLSTM module, without the Attention-Boosted module (w/o AB), outperformed the LM-Net and LMC-Net models that treated video frames as distinct images by roughly 1% in AUC. This result indicates that including spatiotemporal information into the diagnosis is advantageous. For the Attention-Boosted module, the LM-VNet and LMC-VNet models, both of which had the BConvLSTM and Attention-Boosted modules, outperformed the BConvLSTM (w/o AB) models by nearly 1% in AUC, indicating the importance of paying attention to the critical frames.

We compared our proposed methods with conventional ML solutions, TextureRF, TextureSVM and TextureANN [7], and other deep learning methods, Model^LBand Model^LBC [15]. As shown in Supplementary Table S6, all deep learning methods outperformed the conventional ML methods because of their superior feature extraction ability, and our proposed models outperformed the previous deep learning models.

Figure 8 shows a video case of a malignant mass. When the liver masses were clearly visible in the frames such as Frames 16 and 27, the segmentation model was able to provide precise mass contours, and the diagnostic models predicted high malignant probabilities. However, when the liver masses were not clearly visible such as Frame 1, the clinical factors could make more contributions to diagnostic analysis. Overall, the video-based diagnostic models outperformed the image-based diagnostic models. Additional cases are shown in Supplementary Figure S3.

Figure 8

An example of a video case with HCC. (A) Malignant probabilities on all frames by four models. The image-based models (LM-Net^* and LMC-Net^*) produced malignant probabilities for each frame. The video-based models (LM-VNet and LMC-VNet) produced the malignant probabilities based on information from both the current and previous frames. (B) Four samples of the video frames and their corresponding mass segmentation results by the mass segmentation model.

Open in new tab Download slide

Conclusion and discussion

In this study, we developed an AI pipeline for fully automated liver malignancy screening and diagnosis using large-scale US datasets. The pipeline followed clinical practice of US examination, including detecting and segmenting liver masses, classifying them as either malignant or benign and subsequently making histological subtype predictions. In this process, an automated mass-guided strategy was designed to incorporate segmentation information into diagnostic networks. Serological examinations and other clinical factors were integrated with US images to make a comprehensive diagnosis. Moreover, we proposed attention-boosted BConvLSTM-based diagnostic models to improve the diagnostic accuracy by imitating real-world radiologists’ examination on US videos. The pipeline was evaluated on multiple cohorts and demonstrated high accuracy for detecting liver masses and differentiating between benign and malignant masses.

Independently, we developed a three-stage framework that segments target regions in the order of US scan regions, livers and masses, from larger to smaller (Figure 1B and Supplementary Figure S2a). Similar challenges occur in other medical image analysis, including CT scans, for which a multi-stage segmentation framework was proposed to solve this problem and ensure device compatibility and performance [37]. This design addressed major challenges in US image analysis to build a robust and generalizable AI system. Specifically, the three original US images in Figure 2 displayed varying levels of brightness, noise outside scan regions, diverse liver shapes and various mass sizes. The proposed framework was able to reduce variations in scan regions and the interference of extraneous noise by segmenting US scan regions, to remove irrelevant parts (organs) in images by segmenting livers and to provide localized liver masses for down-streaming diagnosis.

According to the diagnostic models’ classification results and saliency maps shown in the last two columns of Figure 2, the mass-guided strategy can direct the diagnostic model to focus on liver masses, boundaries and adjacent areas to produce the most accurate diagnoses. This is similar to how radiologists use information such as mass sizes, mass features and boundary features, to make diagnoses, thereby elevating confidence in the AI model’s predictions.

In clinical practice, it is critical to integrate various clinical data to make correct diagnoses [15]. The serological examination is an important reference point for radiologists when determining whether a mass is benign or malignant [38]. For example, patients with chronic liver disease who have elevated AFP levels are suggested to have an increased risk of HCC [39, 40]. Inspired by this, we developed multiple AI diagnostic models based on various sources of clinical data, as shown in Figure 3B. The combined information significantly improved diagnostic accuracy, increasing an AUC of 0.940 without serological examinations (LM-Net) to an AUC 0.968 with serological examination (LMC-Net). This increase demonstrated that US images and serological examinations complemented one another in malignancy diagnosis. The external test of the Foshan cohort confirmed that the diagnostic accuracy (an AUC of 0.928 by LMC-Net in Figure 5B) was comparable to that of mid-level radiologists.

Radiologists identify liver masses by viewing US videos in a real-world situation, instead of a few static US images. US videos can provide more comprehensive morphological and texture information on liver masses and their relationships with surrounding structures [21]. In this study, we proposed Attention-Boosted BConvLSTM models based on US videos that offered a greater potential for integration into existing US diagnostic systems and may provide a better diagnostic accuracy in a real clinical setting. The attention modules assessed the qualities of frames and calculated their weights by using the mass segmentation results. As shown in Figure 8, the segmentation model provided precise mass contours for the frames showing clear masses such as Frames 16 and 27, but struggled to segment the mass region when the liver mass was not clearly displayed such as Frames 1 and 20. Our AI models could integrate weighted features from current and previous frames to reduce the adverse impact of low-quality frames and provide more accurate diagnosis.

In previous studies, Virmani et al. [41] adopted an SVM-based method on 56 US images from 56 patients to differentiate between HCC and normal cases, achieving 88.8% accuracy, with 90.0% sensitivity for detecting normal cases, and 86.6% sensitivity for HCC cases. Xi et al. [12] developed deep learning models to differentiate benign from malignant focal liver lesions using 911 images from 596 patients, achieving an accuracy of 84%. Brehar et al. [7] compared deep learning models and conventional ML models on 1331 annotated images from 268 patients. The deep learning models achieved 0.95 AUC and 91% accuracy, outperforming 0.72 AUC and 66% accuracy for the ML methods. Shen et al. [14] established a prediction model using a logistic regression algorithm on 266 patients to discriminate between malignant and benign liver lesions, achieving 0.942 AUC and 90.6% accuracy. In our study, we developed our models on a large-scale dataset of 25 087 images of US image dataset, and they outperformed these studies with 0.990 AUC and 95.5% accuracy for liver mass detection, and 0.968 AUC and 91.5% accuracy for liver malignancy diagnosis. Moreover, the robustness of our methods was validated on external datasets.

Several limitations of our study warrant additional investigation. First, owing to the limited number of patients with biopsy or post-surgery pathology results, we only investigated the models’ performance on three subtype classifications for the malignant and benign masses, respectively. More efforts should be made in the future to classify patients into more comprehensive subtypes. Second, of all the proposed AI models, LMC-VNet provided the most accurate diagnosis. However, due to the lack of serological examination results on the Yichang cohort, we did not test the LMC-VNet model on the external dataset. In the future, we hope to perform extensive tests on the model. Third, this study was conducted on US data in China, where hepatitis B-related liver malignancy accounted for most liver malignant patients. Our model may be biased to this type of malignancy. As more data are gathered from various geographical regions around the world [2], we anticipate that better models will be developed.

In summary, we developed an application of deep learning models to automate liver mass detection and classification. AI-assisted liver malignancy screening models have the potential to reduce medical costs, while improving screening efficiency and accuracy at all levels of health care, especially primary care. We have shown that our AI models can increase radiologists’ diagnostic accuracy, especially for less experienced radiologists, and may aid in the prognosis and treatment of patients with liver malignancy.

Data availability

The de-identified data are available at https://doi.org/10.5281/zenodo.7272660. The dataset despite being open to public access, is subject to copyright. Any use of data contained within this dataset must receive appropriate acknowledgement and credit.

Code availability

We provided the Python source code, which is available at https://github.com/AndlierXu/AI-liver-ultrasound/.

Key points

We proposed a fully automated AI pipeline that imitates the workflow of radiologists for detecting liver masses and diagnosing liver malignancy using a large-scale dataset of US images and videos.
We developed video-based diagnostic models that could naturally be integrated into existing US diagnostic systems. We demonstrated that they provided a higher diagnostic accuracy than image-based models in the clinical setting.
Our AI models can increase radiologists’ diagnostic accuracy, especially for less experienced radiologists, and may aid in the prognosis and treatment of patients with liver malignancy.

Acknowledgments

We would like to thank the anonymous reviewers for valuable suggestions.

Funding

National Key R&D Program of China (2021YFF1201303 and 2019YFB1404804), National Natural Science Foundation of China (grants 61872218 and 61906105), Guoqiang Institute of Tsinghua University, Tsinghua University Initiative Scientific Research Program, Beijing National Research Center for Information Science and Technology (BNRist) and Tsinghua-Qingdao Institute of Data Science.

Yiming Xu is a PhD candidate of the Department of Computer Science and Technology of Tsinghua University. His research interests include clinical/medical informatics.

Bowen Zheng is a doctor of The Third Affiliated Hospital of Sun Yat-Sen University. Her research interests include clinical/medical informatics.

Xiaohong Liu is a PhD of the Department of Computer Science and Technology of Tsinghua University. His research interests include clinical/medical informatics.

Tao Wu is a doctor of The Third Affiliated Hospital of Sun Yat-Sen University, whose research interests include clinical/medical informatics.

Jinxiu Ju is a PhD of The Third Affiliated Hospital of Sun Yat-Sen University, whose research interests include clinical/medical informatics.

Shijie Wang is a PhD of The Third Affiliated Hospital of Sun Yat-Sen University. His research interests include clinical/medical informatics.

Yufan Lian is a doctor of The Third Affiliated Hospital of Sun Yat-Sen University, whose research interests include clinical/medical informatics.

Hongjun Zhang is a doctor of The Third Affiliated Hospital of Sun Yat-Sen University, whose research interests include clinical/medical informatics.

Tong Liang is a doctor of The Foshan Traditional Chinese Medicine Hospital, whose research interests include clinical/medical informatics.

Ye Sang is a doctor of China Three Gorges University and Yichang Central People’s Hospital, whose research interests include clinical/medical informatics.

Rui Jiang is an Associate Professor of Department of Automation and BNRist of Tsinghua University. His research interests include clinical/medical informatics, bioinformatics.

Guangyu Wang is a Professor of School of Information and Communication Engineering of Beijing University of Posts and Telecommunications. Her research interests include clinical/medical informatics.

Jie Ren is a Professor of The Third Affiliated Hospital of Sun Yat-Sen University. Her research interests include clinical/medical informatics.

Ting Chen is a Professor of Department of Computer Science and Technology & Institute of Artificial Intelligence & BNRist of Tsinghua University. His research interests include clinical/medical informatics and bioinformatics.

References

Bray

Ferlay

Soerjomataram

, et al.

Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

CA Cancer J Clin

2018

;

(

394

–

424

Akinyemiju

Abera

Ahmed

, et al.

The burden of primary liver cancer and underlying etiologies from 1990 to 2015 at the global, regional, and national level: results from the global burden of disease study 2015

JAMA Oncol

2017

;

(

1683

–

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Wilkinson

Principles of real-time two-dimensional B-scan ultrasonic imaging

J Med Eng Technol

1981

;

(

–

Tchelepi

Ralls

Ultrasound of focal liver masses

Ultrasound Q

2004

;

(

155

–

Bolondi

Screening for hepatocellular carcinoma in cirrhosis

J Hepatol

2003

;

(

1076

–

Samoylova

Mehta

Roberts

, et al.

Predictors of ultrasound failure to detect hepatocellular carcinoma

Liver Transpl

2018

;

(

1171

–

Brehar

Mitrea

Vancea

, et al.

Comparison of deep-learning and conventional machine-learning methods for the automatic recognition of the hepatocellular carcinoma areas from ultrasound images

Sensors

2020

;

(

3085

Yasaka

Akai

Abe

, et al.

Deep learning with convolutional neural network for differentiation of liver masses at dynamic contrast-enhanced CT: a preliminary study

Radiology

2018

;

286

(

887

–

Wang

Chen

, et al.

Artificial intelligence assists identifying malignant versus benign liver lesions using contrast-enhanced ultrasound

J Gastroenterol Hepatol

2021

;

(

2875

–

10.

Marya

Powers

Fujii-Lau

, et al.

Application of artificial intelligence using a novel EUS-based convolutional neural network model to identify and distinguish benign and malignant hepatic masses

Gastrointest Endosc

2021

;

(

1121

–

1130.e1

11.

Nishida

Yamakawa

Shiina

, et al.

Current status and perspectives for computer-aided ultrasonic diagnosis of liver lesions using deep learning technology

Hepatol Int

2019

;

(

416

–

12.

Guan

, et al.

Deep learning for differentiation of benign and malignant solid liver lesions on ultrasonography

Abdominal Radiol

2021

;

(

534

–

Google Scholar

Crossref

WorldCat

13.

Hassan

Elmogy

Sallam

E-S

Diagnosis of focal liver diseases based on deep learning technique for ultrasound images

Arabian J Sci Eng

2017

;

(

3127

–

Google Scholar

Crossref

WorldCat

14.

Shen

Lin

, et al.

Development of an ultrasound prediction model to discriminate between malignant and benign liver lesions

Ultrasound Med Biol

2020

;

(

952

–

15.

Yang

Wei

Hao

, et al.

Improving B-mode ultrasound diagnostic performance for focal liver lesions using deep learning: a multicentre study

EBioMedicine

2020

;

102777

16.

SS-D

Chang

, et al.

Classification of hepatocellular carcinoma and liver abscess by applying neural network to ultrasound images

Sensors Mater

2020

;

(

2659

–

753

Google Scholar

Crossref

WorldCat

17.

Nishida

Yamakawa

Shiina

, et al.

Artificial intelligence (AI) models for the ultrasonographic diagnosis of liver tumors and comparison of diagnostic accuracies between AI and human experts

J Gastroenterol

2022

;

(

309

–

18.

Yamada

Deep learning promotes B-mode ultrasound screening for focal liver lesions

EBioMedicine

2020

;

:102814.

Google Scholar

OpenURL Placeholder Text

WorldCat

19.

Hwang

Lee

Kim

, et al.

Classification of focal liver lesions on ultrasound images by extracting hybrid textural features and using an artificial neural network

Biomed Mater Eng

2015

;

(

S1599

–

611

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

20.

Schmauch

Herent

Jehanno

, et al.

Diagnosis of focal liver lesions from ultrasound using deep learning

Diagn Interv Imaging

2019

;

100

(

227

–

21.

Chen

Wang

Niu

, et al.

Domain knowledge powered deep learning for breast cancer diagnosis based on contrast-enhanced ultrasound videos

IEEE Trans Med Imaging

2021

;

(

2439

–

22.

Tesanic

Merz

Artifacts in 3D prenatal sonography

Ultraschall in der Medizin–Eur J Ultrasound

2020

;

(

286

–

Google Scholar

OpenURL Placeholder Text

WorldCat

23.

Song

Wang

Zhao

, et al. Pyramid dilated deeper convlstm for video salient object detection. In:

Proceedings of the European Conference on Computer Vision (ECCV)

. Munich, Germany: Springer. Cham.

2018

;715–731.

24.

Long

Shelhamer

Darrell

. Fully convolutional networks for semantic segmentation. In:

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

, Boston, MA, USA: IEEE.

2015

; 3431–3440.

25.

Ronneberger

Fischer

Brox

. U-net: Convolutional networks for biomedical image segmentation. In:

International Conference on Medical Image Computing and Computer-Assisted Intervention

. Nassir Navab, Joachim Hornegger, William M. Wells, Alejandro Frangi.

Munich, Germany: Springer

. 2015: 234–241.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

26.

Chen

L-C

Papandreou

Schroff

, et al. Rethinking atrous convolution for semantic image segmentation.

arXiv preprint arXiv:1706.05587

2017

27.

Lin

T-Y

, et al. Microsoft coco: common objects in context. In:

European Conference on Computer Vision

. David Fleet, Tomas Pajdla, Bernt Schiele, Tinne Tuytelaars.

Zurich, Switzerland: Springer

. 2014: 740–755.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

28.

Lundberg

Erion

Chen

, et al.

From local explanations to global understanding with explainable AI for trees

Nat Mach Intell

2020

;

(

–

29.

Lundberg

Lee

S-I

. A unified approach to interpreting model predictions

arXiv preprint arXiv:1705.07874

. In:

2017

30.

McKinney

Sieniek

Godbole

, et al.

International evaluation of an AI system for breast cancer screening

Nature

2020

;

577

(

7788

–

31.

Xingjian

Chen

Wang

Yeung

. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In:

Advances in Neural Information Processing Systems

. Montreal, Quebec, Canada: Curran Associates, Inc. 2015;802–810.

32.

Rahman

Adjeroh

Deep learning using convolutional LSTM estimates biological age from physical activity

Sci Rep

2019

;

(

–

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

33.

Efron

Tibshirani

An Introduction to the Bootstrap

Boca Raton, Florida, USA: Chapman and Hall/CRC

1994

34.

Chihara

Hesterberg

Mathematical Statistics with Resampling and R

Hoboken, New Jersey, USA: John Wiley & Sons

2018

35.

Robin

Turck

Hainard

, et al.

pROC: an open-source package for R and S+ to analyze and compare ROC curves

BMC Bioinform

2011

;

(

–

Google Scholar

Crossref

WorldCat

36.

Selvaraju

Cogswell

Das

, et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In

Proceedings of the IEEE International Conference on Computer Vision

. Venice, Italy: IEEE. 2017;618–626.

37.

Zhang

Liu

Shen

, et al.

Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography

Cell

2020

;

181

(

1423

–

1433.e11

38.

Schwartz

, et al.

Clinical Features and Diagnosis of Hepatocellular Carcinoma

. Waltham: UptoDate;

2019

39.

Tsukuma

Hiyama

Tanaka

, et al.

Risk factors for hepatocellular carcinoma among patients with chronic liver disease

New Engl J Med

1993

;

328

(

1797

–

801

40.

Tzartzeva

Singal

Testing for AFP in combination with ultrasound improves early liver cancer detection

Expert Rev Gastroenterol Hepatol

2018

;

(

947

–

41.

Virmani

Kumar

Kalra

, et al.

SVM-based characterization of liver ultrasound images using wavelet packet texture descriptors

J Digit Imaging

2013

;

(

530

–

Author notes

Yiming Xu, Bowen Zheng, Xiaohong Liu contributed equally to this work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Download all slides

Month:	Total Views:
December 2022	134
January 2023	311
February 2023	232
March 2023	249
April 2023	297
May 2023	199
June 2023	162
July 2023	192
August 2023	186
September 2023	139
October 2023	179
November 2023	235
December 2023	219
January 2024	208
February 2024	129
March 2024	133
April 2024	141
May 2024	225
June 2024	134
July 2024	135
August 2024	117
September 2024	134
October 2024	140
November 2024	148
December 2024	79
January 2025	82
February 2025	138
March 2025	134
April 2025	111
May 2025	24

Article Contents

Improving artificial intelligence pipeline for liver malignancy diagnosis using ultrasound images and video frames

Abstract

Introduction

Materials and methods

Data collection

Implementation of the AI pipeline

Development of three-stage segmentation framework

Development of diagnostic models

Explanation of decision-making

Comparison with radiologists and performance enhancement by AI

Histological subtype prediction

Attention-boosted BConvLSTM-based models for US videos

Evaluating the models

Results

Clinical characteristics

Performance of segmentation models

Performance of diagnostic models for US images

Explanation of decision-making

Independent external testing of the AI models

Comparison with radiologists and performance enhancement by AI

Histological subtype prediction

AI performance for US videos

Conclusion and discussion

Data availability

Code availability

Acknowledgments

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Improving artificial intelligence pipeline for liver malignancy diagnosis using ultrasound images and video frames

Abstract

Introduction

Materials and methods

Data collection

Implementation of the AI pipeline

Development of three-stage segmentation framework

Development of diagnostic models

Explanation of decision-making

Comparison with radiologists and performance enhancement by AI

Histological subtype prediction

Attention-boosted BConvLSTM-based models for US videos

Evaluating the models

Results

Clinical characteristics

Performance of segmentation models

Performance of diagnostic models for US images

Explanation of decision-making

Independent external testing of the AI models

Comparison with radiologists and performance enhancement by AI

Histological subtype prediction

AI performance for US videos

Conclusion and discussion

Data availability

Code availability

Acknowledgments

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only