Abstract

Genome sequencing data have become increasingly important in the field of personalized medicine and diagnosis. However, accurately detecting genomic variations remains a challenging task. Traditional variation detection methods rely on manual inspection or predefined rules, which can be time-consuming and prone to errors. Consequently, deep learning–based approaches for variation detection have gained attention due to their ability to automatically learn genomic features that distinguish between variants. In our review, we discuss the recent advancements in deep learning–based algorithms for detecting small variations and structural variations in genomic data, as well as their advantages and limitations.

INTRODUCTION

Genetic variants can be classified into three main categories based on their size: single-nucleotide variants (SNVs) [1], short insertions and deletions (Indels ≤ 50 bp) and structural variants (SVs > 50 bp) [2], which can include deletion, insertion, duplication, inversion and translocation mutations [3]. Deletions and duplications of genomic fragments are referred to as copy number variations (CNVs) [4, 5]. Moreover, complex structural variants (CSVs) can occur through a combination of simple SV events [6, 7]. Variant calling refers to identifying nucleotide differences in an individual's genome relative to a reference sequence, which is critical for understanding human phenotypic diversity and human diseases such as cancer [8–14]. It plays a vital role in both research and clinical applications of human genome sequencing [15, 16]. Although there have been significant improvements in sequencing technology, accurately detecting genetic variation from billions of short and noisy sequence reads remains a challenging task [17–21].

When detecting SNVs and short Indels, the conventional approach is to identify non-reference bases in a collection of reads that cover each position. Probabilistic modeling plays a crucial role in inferring potential genotypes or estimating the probability of variation compared to artifacts.

When detecting SVs, there are two primary categories of calling methods. De novo assembly–based methods assemble original reads into a longer sequence and compare it with a reference genome [22–24]. These methods are theoretically capable of detecting all types of variations and are less influenced by the reference sequence. However, accurately assembling a genome sequence can be quite challenging, especially when dealing with heterogeneous sequences. On the other hand, read alignment–based methods detect variations by directly aligning short paired reads or long reads with a reference genome. Whole genome sequencing (WGS) data from short reads is typically characterized by the read depth, discordant read pair and split read [3, 25]. However, accurate variant calling in whole exome sequencing (WES) data has been largely limited by technical issues, such as a high error rate [26–29]. Long reads can effectively span areas of high repetition or low complexity, thereby improving alignment quality and enabling the detection of more SVs compared to short reads [25, 30–35]. The two main single-molecule sequencing data platforms, Pacific Biosciences (PacBio) [36] and Oxford Nanopore Technologies (ONT) [37], have significantly enhanced the performance of various genome applications, particularly genome assembly and variant calling [38–43].

Despite the advances in sequencing technology and the significant reduction in sequencing costs, enabling high coverage and a significant decrease in sequencing error rates, accurately detecting all variations in the human genome is still a challenge due to factors such as the complexity of the genome itself [44–47]. To address this, various methods are utilized in large-scale genome studies and integrated into a unified identification set [6, 48]. However, false-positive variant calling remains an issue [25], and heuristic filters and manual examination using software programs are commonly used [49, 50]. Nevertheless, these methods can be time-consuming and difficult to optimize for different sequencing datasets. An effective model-based variation detection method is urgently needed.

Machine learning techniques typically treat variant calling as a classification task, which involves calling and filtering genomic variations, and use supervised learning to develop models that can predict the presence or absence of variants. For instance, forests [51] and SV2 [52] utilize read comparisons to generate features and employ random forest models [53] and support vector machines [54], respectively, to detect SVs.

Deep learning, a machine learning technique that has gained popularity, is being used in diverse fields such as image recognition [55], language translation [56], gaming [57, 58] and life sciences [59–62]. In genomics, deep learning models have shown promise in accurately calling genetic variants, surpassing traditional methods. The introduction of DeepVariant [63], the first deep learning–based variant calling method, marked a shift toward deep learning approaches in contrast to traditional statistical methods. Deep learning methods, led by DeepVariant, have dominated short-read variant calling and have also made progress in long-read variant calling, overcoming challenges posed by high base error rates. Overall, advancements in sequencing technology have greatly enhanced the detection of genetic variations, opening up new possibilities in genomics research and clinical applications.

The present review aims to provide a comprehensive overview of deep learning–based approaches for variant calling, elucidating and comparing them in detail. Deep learning approaches are particularly noteworthy because they aim to reduce expert input and foster the increasing automation of processes. This review is divided into two primary sections, focusing firstly on small variations and secondly on structural variations. It covers key topics including tensor coding, training datasets and neural network (NN) architectures employed in variant calling. Each section of the review comprises specific research examples.

DEEP LEARNING OF GENOME VARIANT CALLING

In this paper, we provide a comprehensive review of variation detection techniques that are based on deep learning models. To better understand these methods, we present a general workflow for variation detection, as illustrated in Figure 1. Some tools are standalone, meaning that variant calling and deep learning are integrated, while other tools require combination with other precursor variant calling software to complete the data preprocessing part.

General workflow of variant calling methods based on deep learning. The upper part is the input & preprocess. The lower part is the deep learning methods. First, read is aligned with the reference genome for variant detection, obtaining candidate variants. Then, the candidate variants are encoded and input into a trained neural network to obtain high-confidence variant calls.
Figure 1

General workflow of variant calling methods based on deep learning. The upper part is the input & preprocess. The lower part is the deep learning methods. First, read is aligned with the reference genome for variant detection, obtaining candidate variants. Then, the candidate variants are encoded and input into a trained neural network to obtain high-confidence variant calls.

SNV and Indel calling

In this section, we introduce several classic deep learning methods for SNV and Indel detection. We summarize these methods, as shown in Table 1.

Table 1

Summary of SNV and Indel detection methods based on deep learning

ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepVariantSNV, INDELShort readWGS, WESPileup imagePublicCNNHigh accuracy, robust reliability;
considerable computational resources
https://github.com/google/deepvariant/2018
CNNScoreVariantsSNV, INDELShort readWGS, WESPileup imagePublicCNNImproved coding technique; enhanced predictive accuracy;
coding redundancy
https://github.com/broadinstitute/gatk2020
ClairvoyanteSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNFew parameters; outstanding performance in long-read technology;
cannot identify multi-allelic variants or Indels ≥4 bases; not considering the base quality
https://github.com/aquaskyline/Clairvoyante2019
ClairSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicBi-LSTMConsiderable improvements in precision, recall and speed;
Indel calling of Nanopore data needs to be improved
https://github.com/HKUBAL/Clair2020
NanoCallerSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNCombine long-range haplotype information; few parameters;
incorrect alignments in low-complexity regions; potential inaccuracy in detecting Indels in nucleotide repeats of Nanopore data
https://github.com/WGLab/NanoCaller2021
PEPPER-Margin-DeepVariantSNV, INDELLong read
(PacBio)
WGSFull-alignment imagePublicCNN+
RNN
Haplotype-aware; encode more features; advanced variant calling results on Nanopore data;
low Indel calling accuracy on Nanopore data
PEPPER: https://github.com/kishwarshafin/pepper
Margin: https://github.com/UCSC-nanopore-cgl/margin
DeepVariant: https://github.com/google/deepvariant
2021
Clair3SNV, INDELLong read
(ONT)
WGSFull-alignment image and pileup imagePublicCNN + Bi-LSTMCombine pileup-based and full alignment variant calling; fast runtime and excellent performancehttps://github.com/HKUBAL/Clair32022
NanoSNPSNVLong read
(ONT)
WGSPileup image and haplotype imagePublicBi-LSTMCombine long-range haplotype feature and short-range pileup feature; best performance on Nanopore data;
cannot identify SNPs with short reads
https://github.com/huangnengCSU/NanoSNP.git2023
ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepVariantSNV, INDELShort readWGS, WESPileup imagePublicCNNHigh accuracy, robust reliability;
considerable computational resources
https://github.com/google/deepvariant/2018
CNNScoreVariantsSNV, INDELShort readWGS, WESPileup imagePublicCNNImproved coding technique; enhanced predictive accuracy;
coding redundancy
https://github.com/broadinstitute/gatk2020
ClairvoyanteSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNFew parameters; outstanding performance in long-read technology;
cannot identify multi-allelic variants or Indels ≥4 bases; not considering the base quality
https://github.com/aquaskyline/Clairvoyante2019
ClairSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicBi-LSTMConsiderable improvements in precision, recall and speed;
Indel calling of Nanopore data needs to be improved
https://github.com/HKUBAL/Clair2020
NanoCallerSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNCombine long-range haplotype information; few parameters;
incorrect alignments in low-complexity regions; potential inaccuracy in detecting Indels in nucleotide repeats of Nanopore data
https://github.com/WGLab/NanoCaller2021
PEPPER-Margin-DeepVariantSNV, INDELLong read
(PacBio)
WGSFull-alignment imagePublicCNN+
RNN
Haplotype-aware; encode more features; advanced variant calling results on Nanopore data;
low Indel calling accuracy on Nanopore data
PEPPER: https://github.com/kishwarshafin/pepper
Margin: https://github.com/UCSC-nanopore-cgl/margin
DeepVariant: https://github.com/google/deepvariant
2021
Clair3SNV, INDELLong read
(ONT)
WGSFull-alignment image and pileup imagePublicCNN + Bi-LSTMCombine pileup-based and full alignment variant calling; fast runtime and excellent performancehttps://github.com/HKUBAL/Clair32022
NanoSNPSNVLong read
(ONT)
WGSPileup image and haplotype imagePublicBi-LSTMCombine long-range haplotype feature and short-range pileup feature; best performance on Nanopore data;
cannot identify SNPs with short reads
https://github.com/huangnengCSU/NanoSNP.git2023
Table 1

Summary of SNV and Indel detection methods based on deep learning

ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepVariantSNV, INDELShort readWGS, WESPileup imagePublicCNNHigh accuracy, robust reliability;
considerable computational resources
https://github.com/google/deepvariant/2018
CNNScoreVariantsSNV, INDELShort readWGS, WESPileup imagePublicCNNImproved coding technique; enhanced predictive accuracy;
coding redundancy
https://github.com/broadinstitute/gatk2020
ClairvoyanteSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNFew parameters; outstanding performance in long-read technology;
cannot identify multi-allelic variants or Indels ≥4 bases; not considering the base quality
https://github.com/aquaskyline/Clairvoyante2019
ClairSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicBi-LSTMConsiderable improvements in precision, recall and speed;
Indel calling of Nanopore data needs to be improved
https://github.com/HKUBAL/Clair2020
NanoCallerSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNCombine long-range haplotype information; few parameters;
incorrect alignments in low-complexity regions; potential inaccuracy in detecting Indels in nucleotide repeats of Nanopore data
https://github.com/WGLab/NanoCaller2021
PEPPER-Margin-DeepVariantSNV, INDELLong read
(PacBio)
WGSFull-alignment imagePublicCNN+
RNN
Haplotype-aware; encode more features; advanced variant calling results on Nanopore data;
low Indel calling accuracy on Nanopore data
PEPPER: https://github.com/kishwarshafin/pepper
Margin: https://github.com/UCSC-nanopore-cgl/margin
DeepVariant: https://github.com/google/deepvariant
2021
Clair3SNV, INDELLong read
(ONT)
WGSFull-alignment image and pileup imagePublicCNN + Bi-LSTMCombine pileup-based and full alignment variant calling; fast runtime and excellent performancehttps://github.com/HKUBAL/Clair32022
NanoSNPSNVLong read
(ONT)
WGSPileup image and haplotype imagePublicBi-LSTMCombine long-range haplotype feature and short-range pileup feature; best performance on Nanopore data;
cannot identify SNPs with short reads
https://github.com/huangnengCSU/NanoSNP.git2023
ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepVariantSNV, INDELShort readWGS, WESPileup imagePublicCNNHigh accuracy, robust reliability;
considerable computational resources
https://github.com/google/deepvariant/2018
CNNScoreVariantsSNV, INDELShort readWGS, WESPileup imagePublicCNNImproved coding technique; enhanced predictive accuracy;
coding redundancy
https://github.com/broadinstitute/gatk2020
ClairvoyanteSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNFew parameters; outstanding performance in long-read technology;
cannot identify multi-allelic variants or Indels ≥4 bases; not considering the base quality
https://github.com/aquaskyline/Clairvoyante2019
ClairSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicBi-LSTMConsiderable improvements in precision, recall and speed;
Indel calling of Nanopore data needs to be improved
https://github.com/HKUBAL/Clair2020
NanoCallerSNV, INDELLong read
(PacBio, ONT)
WGSPileup imagePublicCNNCombine long-range haplotype information; few parameters;
incorrect alignments in low-complexity regions; potential inaccuracy in detecting Indels in nucleotide repeats of Nanopore data
https://github.com/WGLab/NanoCaller2021
PEPPER-Margin-DeepVariantSNV, INDELLong read
(PacBio)
WGSFull-alignment imagePublicCNN+
RNN
Haplotype-aware; encode more features; advanced variant calling results on Nanopore data;
low Indel calling accuracy on Nanopore data
PEPPER: https://github.com/kishwarshafin/pepper
Margin: https://github.com/UCSC-nanopore-cgl/margin
DeepVariant: https://github.com/google/deepvariant
2021
Clair3SNV, INDELLong read
(ONT)
WGSFull-alignment image and pileup imagePublicCNN + Bi-LSTMCombine pileup-based and full alignment variant calling; fast runtime and excellent performancehttps://github.com/HKUBAL/Clair32022
NanoSNPSNVLong read
(ONT)
WGSPileup image and haplotype imagePublicBi-LSTMCombine long-range haplotype feature and short-range pileup feature; best performance on Nanopore data;
cannot identify SNPs with short reads
https://github.com/huangnengCSU/NanoSNP.git2023

Tensor encoding

Variant calling like DeepVariant [63] is treated as an image classification problem using tensor encoding. Sequencing data are turned into an image and analyzed to identify genetic variations. DeepVariant learns the relationship between read pileup image and true genotype calls for accurate variant identification. The process starts with identifying candidate SNPs and Indels, processing mapped reads with a local read assembly procedure based on De Bruijn graph [64] and selecting best haplotypes with a hidden Markov model (HMM) [65]. The Smith–Waterman-like algorithm [66] is used for read realignment, and only high-quality reads are considered for variant calling. Candidate gene reaching the threshold are encoded as 3-channel RGB pileup images, with the first row representing the reference sequence and the remaining rows representing reads, resulting in one image per candidate site.

CNNScoreVariants [67] retains both read and reference sequence information, adding mapping quality and read flags. Unlike DeepVariant, it encodes base qualities into the read tensor’s base channel, treating it as a confidence level for each base call. It uses one-hot and p-hot encoding for reference and read tensors, respectively. The reference tensor has a two-dimensional (2D) reference base centered at a variant, and the read tensor is a three-dimensional tensor spanning different genomic sites in width and different read pileups in height. The reference tensor’s channels are the four DNA bases, while the read tensor’s first four channels encode base quality, and the remaining five channels encode read flags representing the chain, pairing and mapping quality.

Clairvoyante [68], specifically designed for SNP and Indel calls in single-molecule sequencing data, uses a three-dimensional tensor to encode information about the read and reference sequence. The tensor’s dimensions represent the site, count of the four bases on the reads and four different counting methods. In the third dimension, four different counting methods are used to generate four tensors: (1) for the reference sequence and supporting reads, (2) for inserted sequences, (3) for deleted base pairs and (4) for alternative alleles.

Clair [69] is an improved version of Clairvoyante, which introduces four new tasks in deep learning to address the limitations of Clairvoyante, including multi-allelic variant calling and long Indel calling. Clair employs a similar encoding method as Clairvoyante, but with a second dimension in its three-dimensional tensor that is twice the size, representing four base positive/negative strand counts.

NanoCaller [70] is designed for long read sequencing data, using long-range haplotype information and phased reads to improve variant calling accuracy. Unlike other tools such as DeepVariant, Clairvoyante and Clair, it only considers heterozygous SNP sites far from the candidate site in the pileup images. The read and reference sequence information of SNP candidate sites is encoded into a three-dimensional tensor, including alternative alleles, candidate sites and reference sequence bases. The pileup image represents different bases at the candidate site using five channels, with the fifth channel representing the reference sequence. For Indel candidate sites, the pileup image is also encoded as a three-dimensional tensor, combining all reads, reads in one phase and reads in the other phase at the candidate site matrices. The three dimensions denote bases or deletion, pileup columns of realigned sequences and two matrices.

PEPPER-Margin-DeepVariant [71] and NanoCaller both use a haplotype-aware strategy for variant calling in long-read sequencing data. PEPPER-SNP, a submodel of PEPPER, calls SNPs using tensor encoding and represents each genomic site with 10 features. Different bases are color-coded, with each row and column representing a feature and reference genome site, respectively. Observations are coded as weights, shown as the alpha of each base. Another submodule, PEPPER-HP, considers SNVs and Indels as candidate variants and generates haplotype-specific likelihoods for each candidate variant. Similar to PEPPER-SNP, PEPPER-HP uses an encoding in which each column represents a reference position with two values, indicating the reference sequence site and the insertion alleles targeted at that site.

Clairvoyante, Clair and NanoCaller are based on pileup, which has an advantage in terms of time efficiency. PEPPER-Margin-DeepVariant is based on full-alignment variant calling, which provides the highest precision and recall. Clair3 [72], as the successor of Clair, combines the advantages of both methods by using full alignment for difficult variant candidates and pileup calling for the majority of candidates, resulting in fast runtime and excellent performance. Clair3’s input includes a 2D tensor for pileup and a three-dimensional tensor for full alignment, encoding genome site, features and various information related to the variant calling process.

NanoSNP [73] is a SNP calling method for low-coverage Nanopore sequencing reads, combining long-range haplotype and short-range pileup features for precise SNP identification. The process starts with a pileup model predicting SNP sites by extracting pileup features from aligned reads and using these SNPs for read phasing. The haplotype model then use both type information to extract relevant data for each haplotype, including distribution of nucleotide, base quality and mapping quality. This data forms the feature tensor, used to generate a pileup image for the candidate SNP site. Moreover, a group of high-quality heterozygous SNP sites adjacent to the candidate SNP site is selected to form the long-range haplotype image.

Training set

Deep learning classification heavily relies on the quality of the training labels. In order to achieve high-performance inference, having a gold-standard dataset is crucial. Due to the availability of multiple benchmark datasets, SNV and Indel variant detection methods based on deep learning often use publicly available datasets for training. However, it is important to carefully select the training datasets as it directly impacts the model’s performance and robustness. The Genome in a Bottle (GIAB) project [74–76] is a widely used public database that contains samples with high-quality genomic sequences and is often used as a source of training datasets for these methods [77], such as DeepVariant [63], Clairvoyante [68] and CNNScoreVariants [67].

Network architecture

Most deep learning frameworks for genome variant calling primarily rely on the convolutional neural network (CNN) [78, 79] model, while a few utilize the recurrent neural network (RNN) [80] model, and some methods incorporate both networks. The architecture of CNN models for SNV and Indel detection typically includes an input layer, convolutional layer, fully connected layer, softmax layer and output layer. RNN models, on the other hand, consist of an input layer, LSTM layer, fully connected layer, softmax layer and output layer. Some deep learning methods utilize two networks for variant calling. The network architecture is illustrated in Figure 2.

The hierarchical architecture of network layers in deep learning methods illustrated. Most deep learning frameworks for genome variant calling are based on the CNN model, while a few are based on the RNN model, and some methods incorporate both networks.
Figure 2

The hierarchical architecture of network layers in deep learning methods illustrated. Most deep learning frameworks for genome variant calling are based on the CNN model, while a few are based on the RNN model, and some methods incorporate both networks.

DeepVariant [63] is the first CNN-based approach for detecting genome variants. It adopts the Inception architecture [81–83]. Specifically, an image input layer is first created, which is appended to the ConvNetJuly2015v2 [81] CNN with nine partitions. The final output layer is a softmax layer with three categories, representing the probabilities of the three genotypes, fully connected to the previous layer.

CNNScoreVariants [67] is a CNN-based deep learning method that uses 1D CNNs on reference tensors and 2D CNNs on read tensors, alternating between the axes of genome site and pileup reads. Max pooling is applied to the pileup axis, not the sequence axis due to the discrete nature of DNA sequence data. After several convolution layers, the spatial dimensions of the tensor are flattened to a single 1D vector that is merged with batch-normalized variant annotations [82], directed into fully connected layers and forwarded to the final softmax layer. Performance is improved by including a skip connection, concatenating normalized annotations with the penultimate dense layer or all deeper layers.

Clairvoyante [68] is a multitask five-layer CNN, including three convolution layers with varying core counts and two fully connected layers. Pooling is performed after each convolutional layer. It makes four sets of predictions for each input, with one set calculated from the first fully connected layer and the remaining sets calculated from the second fully connected layer, which are mutually exclusive.

NanoCaller [70] uses two CNNs, one for SNP detection and another for Indel detection. Both models have three convolutional layers with varying kernel sizes, which are then joined a Flatten layer and fully connected layers. The two models differ in how they calculate probabilities: the SNP model has two independent pathways for calculating base probabilities and zygosity probabilities, while the Indel model uses two fully connected hidden layers to determine probabilities for four zygosities scenarios. The final output is obtained by combining SNP and Indel network calls.

Clair [69] consists of a five-layer RNN with four tasks, including two bi-directional long short-term memory (Bi-LSTM) [84] layers and three fully connected layers. Each Bi-LSTM layer houses 256 cells, and the first fully connected layer carries out transposition and splitting. The second Bi-LSTM layer, along with the second and third fully connected layers, have certain dropout rates for information. Clair generates data for these four tasks and possesses an independent penultimate layer (i.e. the third fully connected layer) preceding each task output. This design ensures the independence of each task's output.

Clair3 [72] employs two networks: a pileup network with two different-sized Bi-LSTM layers and a full-alignment network based on the residual network (ResNet) with three standard residual blocks. The pileup network omits the transpose-split layer for enhanced speed. The full-alignment network incorporates a convolutional layer in each block for channel dimensionality. The spatial pyramid pool (SPP) layer [85], acting as a pooling layer, creates different receptive fields with three pooling scales per channel, providing a fixed-length output for the subsequent layer.

NanoSNP [73] comprises two networks: the pileup model network and the haplotype model network. The pileup model has two Bi-LSTM layers and fully connected layers, using tanh activation to extract tensor sequence for SNP prediction. Probabilities are computed using fully connected layers with softmax activation. The haplotype model, with a pileup image processing module and a haplotype image processing module, shares the same structure. Both models capture feature correlations at the SNP site, combining outputs through a fully connected layer for SNP prediction.

Methods for detecting SNVs and Indels in short read sequencing, such as DeepVariant and CNNScoreVariants, are applicable to WGS and WES data. Long read sequencing has two major platforms: PacBio and ONT. Clairvoyante, Clair and NanoCaller are able to analyze data generated by these two platforms. Although Clairvoyante and Clair can also process Illumina data, they perform better with data from the PacBio and ONT. Due to the higher error rate associated with ONT sequencing compared to PacBio, PEPPER-Margin-DeepVariant is specifically designed for analyzing data from PacBio. To address the high error rate of ONT, Clair3 and NanoSNP is specially developed for processing data from the ONT sequencing platform.

SV calling

Compared to SNP and Indel calling, SV calling is a more complex task, due to the variety of types of SVs and their inherent complexity. The following section will introduce several classical deep learning methods in SV detection, which are summarized in Table 2.

Table 2

Summary of SV detection methods based on deep learning

ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepSVSV (deletion)Short readWGSImagePublicCNNCall long deletions; work with noisy training datahttps://github.com/CSuperlei/DeepSV2019
Samplot-MLSV (deletion)Short readWGSImagePublicCNNCall long deletionshttps://github.com/mchowdh200/samplot-ml2020
TensorSVSV (deletion, duplication, inversion)Short readWGSImagePublicCNNDetect deletions, duplications and inversions; effective genotyping; quick training and inference speedshttps://github.com/timothyjamesbecker/TensorSV2020
DeepCNVSV (CNV)Microarray data and short readWGSImageSyntheticCNN + DNNDetect CNVs; enhanced confidence; fewer false positives and failures in replicating associations;
rely on the CNV callers to generate raw CNV calls
https://github.com/CAG-CNV/DeepCNV2021
DeepSVFilterSVShort readWGSImagePretrainingCNNFilter SVs; employ transfer learning and data augmentation to deal with small datasets;
cannot work on WES data; relatively small number of high confidence SVs used to construct the training set
https://github.com/yongzhuang/DeepSVFilter2021
BreakNetSV (deletion)Long read
(PacBio)
WGSFeature matrixPublicCNN + Bi-LSTMDetect deletions; stable performance on low coverage datahttps://github.com/luojunwei/BreakNet2021
MAMnetSV (deletion, insertion)Long read
(PacBio, ONT)
WGSFeature matrixPublicCNN + Bi-LSTMCall insertions and deletions; improved performance on low coverage datahttps://github.com/micahvista/MAMnet2022
svBreakSVShort readWGSFeature matrixSyntheticCNNDetect 7 common SV breakpointshttps://github.com/BDanalysis/svBreak2022
DECoNTSV (CNV)Short readWESProblem formulationpublicBi-LSTMEnhanced call accuracy, reliable germline CNV detection on WES datasets;
reliance on existing variation callers
https://github.com/ciceklab/DECoNT2022
CNV-espressoSV (CNV)Short readWESImagePretrainingCNNValidation tool; detect rare CNV;
cannot detect general CNV; cannot improve sensitivity
https://github.com/ShenLab/CNV-Espresso2022
SVisionSV (CSV)Long read
(PacBio, ONT)
WGSImageSyntheticCNNDetect both simple and complex SVshttps://github.com/xjtu-omics/SVision2022
CueSV (CSV)Short read
(extensible)
WGSImageSyntheticStacked hourglass networkCall and genotype both simple and complex SVs; learn complex SV abstractions directly from data; can be extended to different sequencing platformshttps://github.com/PopicLab/cue2023
ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepSVSV (deletion)Short readWGSImagePublicCNNCall long deletions; work with noisy training datahttps://github.com/CSuperlei/DeepSV2019
Samplot-MLSV (deletion)Short readWGSImagePublicCNNCall long deletionshttps://github.com/mchowdh200/samplot-ml2020
TensorSVSV (deletion, duplication, inversion)Short readWGSImagePublicCNNDetect deletions, duplications and inversions; effective genotyping; quick training and inference speedshttps://github.com/timothyjamesbecker/TensorSV2020
DeepCNVSV (CNV)Microarray data and short readWGSImageSyntheticCNN + DNNDetect CNVs; enhanced confidence; fewer false positives and failures in replicating associations;
rely on the CNV callers to generate raw CNV calls
https://github.com/CAG-CNV/DeepCNV2021
DeepSVFilterSVShort readWGSImagePretrainingCNNFilter SVs; employ transfer learning and data augmentation to deal with small datasets;
cannot work on WES data; relatively small number of high confidence SVs used to construct the training set
https://github.com/yongzhuang/DeepSVFilter2021
BreakNetSV (deletion)Long read
(PacBio)
WGSFeature matrixPublicCNN + Bi-LSTMDetect deletions; stable performance on low coverage datahttps://github.com/luojunwei/BreakNet2021
MAMnetSV (deletion, insertion)Long read
(PacBio, ONT)
WGSFeature matrixPublicCNN + Bi-LSTMCall insertions and deletions; improved performance on low coverage datahttps://github.com/micahvista/MAMnet2022
svBreakSVShort readWGSFeature matrixSyntheticCNNDetect 7 common SV breakpointshttps://github.com/BDanalysis/svBreak2022
DECoNTSV (CNV)Short readWESProblem formulationpublicBi-LSTMEnhanced call accuracy, reliable germline CNV detection on WES datasets;
reliance on existing variation callers
https://github.com/ciceklab/DECoNT2022
CNV-espressoSV (CNV)Short readWESImagePretrainingCNNValidation tool; detect rare CNV;
cannot detect general CNV; cannot improve sensitivity
https://github.com/ShenLab/CNV-Espresso2022
SVisionSV (CSV)Long read
(PacBio, ONT)
WGSImageSyntheticCNNDetect both simple and complex SVshttps://github.com/xjtu-omics/SVision2022
CueSV (CSV)Short read
(extensible)
WGSImageSyntheticStacked hourglass networkCall and genotype both simple and complex SVs; learn complex SV abstractions directly from data; can be extended to different sequencing platformshttps://github.com/PopicLab/cue2023
Table 2

Summary of SV detection methods based on deep learning

ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepSVSV (deletion)Short readWGSImagePublicCNNCall long deletions; work with noisy training datahttps://github.com/CSuperlei/DeepSV2019
Samplot-MLSV (deletion)Short readWGSImagePublicCNNCall long deletionshttps://github.com/mchowdh200/samplot-ml2020
TensorSVSV (deletion, duplication, inversion)Short readWGSImagePublicCNNDetect deletions, duplications and inversions; effective genotyping; quick training and inference speedshttps://github.com/timothyjamesbecker/TensorSV2020
DeepCNVSV (CNV)Microarray data and short readWGSImageSyntheticCNN + DNNDetect CNVs; enhanced confidence; fewer false positives and failures in replicating associations;
rely on the CNV callers to generate raw CNV calls
https://github.com/CAG-CNV/DeepCNV2021
DeepSVFilterSVShort readWGSImagePretrainingCNNFilter SVs; employ transfer learning and data augmentation to deal with small datasets;
cannot work on WES data; relatively small number of high confidence SVs used to construct the training set
https://github.com/yongzhuang/DeepSVFilter2021
BreakNetSV (deletion)Long read
(PacBio)
WGSFeature matrixPublicCNN + Bi-LSTMDetect deletions; stable performance on low coverage datahttps://github.com/luojunwei/BreakNet2021
MAMnetSV (deletion, insertion)Long read
(PacBio, ONT)
WGSFeature matrixPublicCNN + Bi-LSTMCall insertions and deletions; improved performance on low coverage datahttps://github.com/micahvista/MAMnet2022
svBreakSVShort readWGSFeature matrixSyntheticCNNDetect 7 common SV breakpointshttps://github.com/BDanalysis/svBreak2022
DECoNTSV (CNV)Short readWESProblem formulationpublicBi-LSTMEnhanced call accuracy, reliable germline CNV detection on WES datasets;
reliance on existing variation callers
https://github.com/ciceklab/DECoNT2022
CNV-espressoSV (CNV)Short readWESImagePretrainingCNNValidation tool; detect rare CNV;
cannot detect general CNV; cannot improve sensitivity
https://github.com/ShenLab/CNV-Espresso2022
SVisionSV (CSV)Long read
(PacBio, ONT)
WGSImageSyntheticCNNDetect both simple and complex SVshttps://github.com/xjtu-omics/SVision2022
CueSV (CSV)Short read
(extensible)
WGSImageSyntheticStacked hourglass networkCall and genotype both simple and complex SVs; learn complex SV abstractions directly from data; can be extended to different sequencing platformshttps://github.com/PopicLab/cue2023
ModelVariant typeFragment sizeRegionTensor encodingTraining setNeural networkAdvantage & limitationSource codeYear
DeepSVSV (deletion)Short readWGSImagePublicCNNCall long deletions; work with noisy training datahttps://github.com/CSuperlei/DeepSV2019
Samplot-MLSV (deletion)Short readWGSImagePublicCNNCall long deletionshttps://github.com/mchowdh200/samplot-ml2020
TensorSVSV (deletion, duplication, inversion)Short readWGSImagePublicCNNDetect deletions, duplications and inversions; effective genotyping; quick training and inference speedshttps://github.com/timothyjamesbecker/TensorSV2020
DeepCNVSV (CNV)Microarray data and short readWGSImageSyntheticCNN + DNNDetect CNVs; enhanced confidence; fewer false positives and failures in replicating associations;
rely on the CNV callers to generate raw CNV calls
https://github.com/CAG-CNV/DeepCNV2021
DeepSVFilterSVShort readWGSImagePretrainingCNNFilter SVs; employ transfer learning and data augmentation to deal with small datasets;
cannot work on WES data; relatively small number of high confidence SVs used to construct the training set
https://github.com/yongzhuang/DeepSVFilter2021
BreakNetSV (deletion)Long read
(PacBio)
WGSFeature matrixPublicCNN + Bi-LSTMDetect deletions; stable performance on low coverage datahttps://github.com/luojunwei/BreakNet2021
MAMnetSV (deletion, insertion)Long read
(PacBio, ONT)
WGSFeature matrixPublicCNN + Bi-LSTMCall insertions and deletions; improved performance on low coverage datahttps://github.com/micahvista/MAMnet2022
svBreakSVShort readWGSFeature matrixSyntheticCNNDetect 7 common SV breakpointshttps://github.com/BDanalysis/svBreak2022
DECoNTSV (CNV)Short readWESProblem formulationpublicBi-LSTMEnhanced call accuracy, reliable germline CNV detection on WES datasets;
reliance on existing variation callers
https://github.com/ciceklab/DECoNT2022
CNV-espressoSV (CNV)Short readWESImagePretrainingCNNValidation tool; detect rare CNV;
cannot detect general CNV; cannot improve sensitivity
https://github.com/ShenLab/CNV-Espresso2022
SVisionSV (CSV)Long read
(PacBio, ONT)
WGSImageSyntheticCNNDetect both simple and complex SVshttps://github.com/xjtu-omics/SVision2022
CueSV (CSV)Short read
(extensible)
WGSImageSyntheticStacked hourglass networkCall and genotype both simple and complex SVs; learn complex SV abstractions directly from data; can be extended to different sequencing platformshttps://github.com/PopicLab/cue2023

Tensor encoding

DeepSV [86] effectively utilizes a variety of information sources to identify long deletions in sequence data. It employs (R, G, B) image coding sequences, assigning each nucleotide (A, T, C or G) a basic color slightly modified to incorporate deletion signatures. The visualization process combines key features of deletions, including read depth, split read and discordant pair. Read depth is depicted through pileup images; split read and inconsistent read pairs are integrated by adjusting the base color of the mapped base according to the signature. Moreover, the color coding takes into account paired, concordant/discordant, mapping quality and mapping type to enhance the visualization.

Samplot-ML [87] is also a method for identifying long deletions. It uses Samplot [88] to generate images of deletions. Similar to DeepSV, this visualization incorporates read depth, discordant pairs and split read signals.

DeepCNV [89] utilizes image data and metadata for CNV detection, generating image files automatically with PennCNV’s auxiliary visualization program [90]. Each CNV call includes an LRR and a BAF scatter plot image [91] showing the potential CNV segment and its adjacent regions. The LRR plot shows SNP genotypes in the region, and the BAF plot covers the same region. DeepCNV differentiates SNPs by color-coding them. Pixel values of these images are normalized to a scale between 0 and 1. Additionally, PennCNV generates a summary of 13 features for quality checking, which are also normalized. This method by DeepCNV significantly improves the reliability of CNV calling, reducing false positive and failures in replicating CNV associations.

CNV-espresso [92] is another method used for detecting CNV, but only for rare CNV. It encodes the read depth signal of each candidate CNV into an image, where the X-axis represents the CNV coordinates in the human genome, and the Y-axis represents the normalized read depth value.

DeepSVFilter [93] filters SVs in short-read WGS data by encoding SV signals as images, treating SV filtering as a binary classification issue. It represents each SV breakpoint as a three-dimensional tensor image comprising read depth, split read and discordant read pair channels, with pixels covered by a read encoded as ‘255’ and others as ‘0’. If an image covers two breakpoints of a single SV, separate images are generated and then vertically spliced for a complete SV image. Post-training, DeepSVFilter can filter any SV call sets from any detection methods.

BreakNet [94] is designed for identifying deletions within long reads. It divides the reference into various subregions and, based on alignment data, creates a feature matrix for each subregion. In this matrix, each row corresponds to an aligned long read, and each column signifies whether a deletion exists at that specific position. BreakNet organizes the rows in order of deletion count, choosing the top n rows to generate the matrix. In instances where the rows are fewer than n, the remaining elements default to 0. Despite delivering consistent performance on data with low coverage, BreakNet has a limitation. It can only detect deletions due to the constraints of the extracted features.

MAMnet [95] works to detect genome insertions and deletions by comparing long reads to the reference genome. Specifically, the reference genome is split into contiguous subregions, to which the reads are then aligned. The average read coverage is calculated for selected subregions, and the reads that overlap with each subregion are collected. MAMnet proceeds to compute nine features for each base position and averages them. A signature matrix is constructed for every subregion, followed by a logarithmic transformation to stabilize the deep neural network (DNN) training.

svBreak [96] is employed to identify prevalent types of SV breakpoints. It gleans 12 SV-related features for each genome site from the sequencing reads that are aligned to the reference genome. Each of these features is assigned a value of 1, 0 or −1, indicative of the positive, negative or uncertain status of each genome site. Following this, svBreak constructs a data matrix where each row corresponds to a site and each column signifies a feature. This approach enables the simultaneous calling and distinguishing of seven common SV breakpoints.

SVision [97] is a deep learning–based multi-object recognition framework that detects and identifies complex structural variants in sequencing data, which are often overlooked due to multiple breakpoints. The encoder takes variant feature sequences (VAR) and encodes dissimilarities and similarities between mutation-supporting read and reference genomes (REF) into images with three channels representing matched, duplicated and inverted segments. For each pair of VAR and REF, the encoder identifies matched and unmatched bases to create VAR-to-REF and REF-to-REF images. Variant features are highlighted by eliminating background noise through subtracting the REF-to-REF image from the corresponding VAR-to-REF image, resulting in a denoised image for each variant.

Cue [98] is capable of detecting and genotyping deletions, duplications, inversions, inverted duplications and inversions flanked by deletions that exceed 5 kb in length, with the latter two classified as complex SVs. It learns complex mutation patterns directly from the data, transforming sequences into images that encode SV information signals. In this process, Cue converts read sequences into images that capture multiple sequence signals between two genome intervals. Utilizing a pre-trained neural network, it generates Gaussian response confidence maps for each image, which encode the site, type and genotype of SVs in the image. Subsequently, it refines the high-confidence SV predictions and reassigns them to genome coordinates from the images.

Training set

Training data is vital for machine learning, with quality data leading to better performance. However, SV calling is challenging due to lack of training data and the absence of a universally accepted gold standard, necessitating the creation of SV training sets. This can be achieved through three main methods: using publicly available datasets, manually creating artificial data and using pretraining methodologies.

Publicly accessible datasets are commonly used for training sets, but their scale and heterogeneity may not cover all SV types and variations. Several algorithms like DeepSV [86], DECoNT [99], Samplot-ML [87], TensorSV [100], BreakNet [94] and MAMnet [95] utilize this method. DeepSV, DECoNT and Samplot-ML use the 1000 Genomes Project dataset [1]. Specifically, TensorSV uses three different datasets including the 1000 Genomes Project and the Human Genome SV project datasets [101], BreakNet uses multiple read alignment files from four extensively researched individuals and MAMnet uses six datasets from varying sequencing technologies.

Generating synthetic data is another method, but it may not fully represent real-world SV diversity. Algorithms like SVision [97], svBreak [96], DeepCNV [89] and Cue [98] use this method. SVision trains its CNN model on a mix of real and simulated SVs to ensure a balanced representation of SV types. VISOR [102] supplements training data for inversions, duplications and tandem duplications. svBreak uses simulated SV breakpoints for training. DeepCNV collects a dataset of SVs, including CNVs, from the WGS data generated by other CNV callers [6]. Raw CNV calling files are obtained from the 1000 Genomes FTP site, with false positives filtered out.

Pretraining methods involve initially training a model with a large dataset to learn general SV features, followed by fine-tuning it with a smaller specific dataset with limited SV annotations. This solves the lack of training data and improves accuracy. DeepSVFilter [93] is an example of this method, starting by identifying SVs from selected samples’ short WGS data using advanced SV calling methods [44–46] and then merging all the SVs into a unified set. CNV-espresso [92] is another example, building its training set from offspring–parents trio exome sequencing data.

In summary, building a comprehensive and representative training set for deep learning–based SV analysis requires careful evaluation of the available data and the diversity of SV types. Therefore, a combination of these methods may be necessary to achieve optimal performance in practical applications.

Network architecture

Neural networks for SV detection typically use CNN, RNN (Bi-LSTM) or a combination of both. The CNN effectively handles spatial structure information by extracting local features through convolutional and pooling layers and combining them via fully connected layers. It automatically captures local patterns and global characteristics, benefiting image classification tasks. Bi-LSTM captures temporal dependencies by considering both preceding and succeeding information, which is advantageous for modelling genomic sequence features. Combining the CNN and Bi-LSTM can extract spatial and temporal patterns comprehensively, improving classification performance. The same approach applies to encoding genomic sequence information as feature matrices.

svBreak [96] employs seven CNN models, each serving as a binary classification model, to detect various genomic variations. The network comprises five convolutional layers and three fully connected layers, with the CNN models extracting features and examining local data calculations. A weighted non-linear mapping is applied to the output matrix by the activation layer using the rectified linear unit (ReLU) function [103], with a 2 × 2 pooling layer size for better data processing. Lastly, fully connected layers are used in the output layer to minimize feature information loss, ensuring accurate detection of genomic variations.

DECoNT [99] is a comprehensive neural network that predicts precise CNVs and their categories, including deletion, duplication or no call, using a Bi-LSTM structure with 128 hidden neurons in each direction to analyze read depth signals. It includes a batch normalization layer and two fully connected layers, receiving inputs from the Bi-LSTM and previous CNV predictions, represented as one-hot-encoded vectors. The first fully connected layer contains 100 neurons, activated by the ReLU function, while the output layer consists of three neurons using softmax activation for event probability calculation. A weighted cross-entropy loss function optimizes the network for precise CNV prediction, mirroring the process used for categorized CNV prediction.

BreakNet [94] uses CNN and LSTM [104] networks to analyze genomic deletions. The CNN module downsamples the input matrix from the average pooling layer to lessen computational load and applies six convolution blocks, each with a conv2D layer, an SE optimization layer and a max pooling layer using ReLU activation function for non-linearity. The BRNN module has two Bi-LSTM layers with 64 LSTM units each, processing feature vectors from the CNN module and capture the information in both directions. Two fully connected layers categorize the vectors from the BRNN module, with a dropout applied after each layer to enhance generalization. The final outputs are computed using the sigmoid function.

MAMnet [95] combines CNN and LSTM models to detect genomic variations, processing the variation feature matrix as a single time step and transforming it into a vector using CNN. This reuses conv blocks, each comparing a convolution layer, a max pooling layer, a batch normalization layer and a squeeze-and-excitation (SE) optimization layer [105]. A Bi-LSTM network then captures temporal information across multiple steps in both directions. Ultimately, three fully connected layers integrate the information to produce the final prediction. The first two fully connected layers are followed by a dropout layer, while the last fully connected layer uses a sigmoid activation function.

DeepCNV [89] uses the CNN and DNN to process image data and metadata, respectively. The CNN part has a chain of convolutional layers with 3 × 3 receptive fields and fixed 1 pixel stride filters using LeakyReLU [106] activation function and 2 × 2 max pooling. The metadata are modeled through a four-layer DNN. The output from both two branches is combined and passed into a 50 neurons fully connected layer, then a terminal sigmoid activation node. This final neuron generates a score indicating false- or true-positive samples. The model is trained with the RMSprop optimizer.

DeepSVFilter [93] applies transfer learning using pre-trained CNN models on the ImageNet dataset for classification [107–112]. Its architecture involves a dropout layer after the pre-trained layers, a fully connected layer and a softmax layer. It employs the Adam optimizer for training SV images, limits epochs to prevent overfitting and uses cross-entropy loss function to compare the predicted probability with the true category label. The trained CNN processes candidate SV images and assigns a score ranging from 0 to 1 to each SV, indicating the likelihood of a true SV.

SVision [97] is based on the AlexNet architecture [55] for classifying sequence differences in similarity images, which includes five convolutional layers and three fully connected layers. The first layer processes images with special dimensions. It trains the CNN using transfer learning, initializes parameters with the best parameter set from ImageNet competition and fine-tunes on training data. It uses the cross-entropy loss function and the extracted features during training can be used for traditional machine learning classification.

Cue [98] builds on the fourth-order stacked hourglass [113, 114] CNN, originally used for human pose estimation, and aims to consolidate information at multiple scales. It starts with a convolutional backbone module, followed by four hourglass modules, each comprising a residual module and max-pooling layers for downscaling, as well as upsampling layers and skip connections to restore the output resolution. After each hourglass module, Cue performs intermediate supervision to generate intermediate confidence map predictions and calculate loss. This enables iterative re-evaluate of estimates and features at every scale.

The types of SVs that can be detected by different methods vary significantly, for instance, DeepSV is specifically designed for detecting long deletions, while svBreak can identify seven common SV breakpoints. Methods for detecting SVs in short read are generally applicable to WGS data, with only a few suitable for WES data, such as DECoNT and CNV-espresso. Additionally, some methods like Cue can independently perform variant calling and variant type classification. Other methods, like DeepSVFilter and CNV-espresso, require integration with precursor variant calling software, such as Manta [44], Delly [45] or Lumpy [46].

CONCLUSION

The field of genomics is experiencing massive data growth in biomedical and computational biology. Analytical methods have shifted from basic statistics to deep learning algorithms in machine learning, making the process more data-driven. This paper reviews 20 methods of genome variation detection based on deep learning. Specifically, we divide these methods into small variation and structural variation detection, focusing on tensor encoding techniques, training sets and NN models. Deep learning methods represent genetic variations as images, transforming variant calling problems into image classification problems. This approach potentially outperforms traditional methods and represents a promising direction for general SV discovery. These methods, while primarily focusing on genetic variation detection, could also be used for somatic variation detection with appropriate training sets.

It's worth noting that each method has advantages and limitations, and users can choose the appropriate method based on the type of variation to be detected and the size of the data fragment. Variation types are mainly divided into small variation and SV. Small variation includes SNV and Indel, with most methods able to detect both SNV and Indel, whereas NanoSNP is primarily used for identifying SNV. SV includes duplication, deletion, insertion, inversion and other complex structural variation, among which duplication and deletion are collectively called CNV. For instance, DeepSV focuses on detecting deletion, DeepCNV is used for detecting CNV and SVision is for detecting CSV. Data fragment size is categorized into short read and long read. Short read data should be further distinguished between WGS and WES; for example, DeepSVFilter is used for WGS data and DECoNT for WES data. Small variation detection does not require differentiation, such as DeepVariant and CNNScoreVariants. Long read data need to be distinguished between being produced by PacBio or ONT platforms; for instance, BreakNet is mainly used for PacBio data, Clair is primarily for ONT data and MAMnet is suitable for data from both platforms. When multiple methods are available for the same variant type and data type, users can choose according to the order in the table we provide. Typically, methods listed later in the table are improvements on earlier ones; for example, Clair is a successor of Clairvoyante, and NanoCaller is a further improvement of Clair. Additionally, some methods can be used independently, such as DeepVariant and Cue, while others need to be integrated with other tools, like DeepSVFilter, CNV-espresso and Samplot-ML. We recommend that users choose the method that suits the type of variation and data they need to detect, referring to our table for guidance. If users have sufficient data, they may consider training models with their own data; if data are insufficient, it is advisable to use well-trained models.

Future research may focus on creating scalable tools that can manage complex variants, adapt to new technologies and integrate various sequencing techniques. Current deep learning methods struggle with accurately detecting both short and long reads at the same time, so further work should aim to support multiple sequencing platforms. Additionally, more features in alignment information should be explored to improve variant calling accuracy. Accurately identifying multiple types of SV remains difficult as no single method can precisely identify all types of genomic variations and sequencing data. Therefore, research should aim to develop deep learning methods that can precisely detect different types of SVs, while taking into account the unique characteristics of each sequencing data type.

Key Points
  • This article reviews the latest progress of deep learning methods in genome variant calling, including SNV and INDEL calling and SV calling.

  • We explore tensor encoding methods, which mainly include image-based and multi-dimensional tensor-based methods.

  • We analyze the datasets used for variant calling, with small variation detection typically using gold-standard datasets, while structural variation detection datasets often require some form of processing, including synthetic methods and pre-training methods.

  • We study neural network models, which mainly include three types: CNN-based models, RNN-based models and CNN-RNN hybrid models.

FUNDING

This work was supported by the National Nature Science Foundation of China (Project 62072140) and Heilongjiang Provincial Science and Technology Department (No: 2022ZXJ03C01).

Author Biographies

Ren Junjun is a PhD student at the Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China.

Zhang Zhengqian is a PhD student at the Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China.

Wu Ying is a Master’s student at the Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China.

Wang Jialiang is a Master’s student at the Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China.

Liu Yongzhuang is an associate professor at the Harbin Institute of Technology, School of Computer Science and Technology, Harbin, China.

References

1.

Altshuler
DM
,
Durbin
RM
,
Abecasis
GR
, et al.  
A global reference for human genetic variation
.
Nature
 
2015
;
526
(
7571
):
68
74
.

2.

Kosugi
S
,
Momozawa
Y
,
Liu
XX
, et al.  
Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing
.
Genome Biol
 
2019
;
20
:
18
.

3.

Alkan
C
,
Coe
BP
,
Eichler
EE
.
Applications of next-generation sequencing genome structural variation discovery and genotyping
.
Nat Rev Genet
 
2011
;
12
(
5
):
363
76
.

4.

Conrad
DF
,
Pinto
D
,
Redon
R
, et al.  
Origins and functional impact of copy number variation in the human genome
.
Nature
 
2010
;
464
(
7289
):
704
12
.

5.

Mills
RE
,
Walter
K
,
Stewart
C
, et al.  
Mapping copy number variation by population-scale genome sequencing
.
Nature
 
2011
;
470
(
7332
):
59
65
.

6.

Sudmant
PH
,
Rausch
T
,
Gardner
EJ
, et al.  
An integrated map of structural variation in 2,504 human genomes
.
Nature
 
2015
;
526
(
7571
):
75
81
.

7.

Collins
RL
,
Brand
H
,
Redin
CE
, et al.  
Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome
.
Genome Biol
 
2017
;
18
:
21
.

8.

Weischenfeldt
J
,
Symmons
O
,
Spitz
F
,
Korbel
JO
.
Phenotypic impact of genomic structural variation: insights from and for human disease
.
Nat Rev Genet
 
2013
;
14
(
2
):
125
38
.

9.

Macintyre
G
,
Ylstra
B
,
Brenton
JD
.
Sequencing structural variants in cancer for precision therapeutics
.
Trends Genet
 
2016
;
32
(
9
):
530
42
.

10.

Stankiewicz
P
,
Lupski
JR
.
Structural variation in the human genome and its role in disease
.
Annu Rev Med
 
2010
;
61
:
437
55
.

11.

Collins
RL
,
Glessner
JT
,
Porcu
E
, et al.  
A cross-disorder dosage sensitivity map of the human genome
.
Cell
 
2022
;
185
:
3041
3055.e25
.

12.

Dinneen
TJ
,
Ghrálaigh
FN
,
Walsh
R
, et al.  
How does genetic variation modify ND-CNV phenotypes?
 
Trends Genet
 
2022
;
38
(
2
):
140
51
.

13.

Scott
AJ
,
Chiang
C
,
Hall
IM
.
Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes
.
Genome Res
 
2021
;
31
(
12
):
2249
57
.

14.

Shastry
BS
.
SNPs in disease gene mapping, medicinal drug development and evolution
.
J Hum Genet
 
2007
;
52
(
11
):
871
80
.

15.

Collins
RL
,
Brand
H
,
Karczewski
KJ
, et al.  
A structural variation reference for medical and population genetics
.
Nature
 
2020
;
581
:
444
51
.

16.

Redon
R
,
Ishikawa
S
,
Fitch
KR
, et al.  
Global variation in copy number in the human genome
.
Nature
 
2006
;
444
(
7118
):
444
54
.

17.

Goodwin
S
,
McPherson
JD
,
McCombie
WR
.
Coming of age: ten years of next-generation sequencing technologies
.
Nat Rev Genet
 
2016
;
17
(
6
):
333
51
.

18.

Nielsen
R
,
Paul
JS
,
Albrechtsen
A
,
Song
YS
.
Genotype and SNP calling from next-generation sequencing data
.
Nat Rev Genet
 
2011
;
12
(
6
):
443
51
.

19.

Li
H
.
Toward better understanding of artifacts in variant calling from high-coverage samples
.
Bioinformatics
 
2014
;
30
(
20
):
2843
51
.

20.

Goldfeder
RL
,
Priest
JR
,
Zook
JM
, et al.  
Medical implications of technical accuracy in genome sequencing
.
Genome Med
 
2016
;
8
:
12
.

21.

DePristo
MA
,
Banks
E
,
Poplin
R
, et al.  
A framework for variation discovery and genotyping using next-generation DNA sequencing data
.
Nat Genet
 
2011
;
43
:
491
8
.

22.

Li
H
.
FermiKit: assembly-based variant calling for Illumina resequencing data
.
Bioinformatics
 
2015
;
31
(
22
):
3694
6
.

23.

Chen
K
,
Chen
L
,
Fan
X
, et al.  
TIGRA: a targeted iterative graph routing assembler for breakpoint assembly
.
Genome Res
 
2014
;
24
(
2
):
310
7
.

24.

Mahmoud
M
,
Gobet
N
,
Cruz-Dávalos
DI
, et al.  
Structural variant calling: the long and the short of it
.
Genome Biol
 
2019
;
20
(
1
):
14
.

25.

Ho
SVS
,
Urban
AE
,
Mills
RE
.
Structural variation in the sequencing era
.
Nat Rev Genet
 
2020
;
21
(
3
):
171
89
.

26.

Jiang
Y
,
Turinsky
AL
,
Brudno
M
.
The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection
.
Nucleic Acids Res
 
2015
;
43
(
15
):
7217
28
.

27.

Krumm
N
,
Sudmant
PH
,
Ko
A
, et al.  
Copy number variation detection and genotyping from exome sequence data
.
Genome Res
 
2012
;
22
(
8
):
1525
32
.

28.

Ameur
A
,
Kloosterman
WP
,
Hestand
MS
.
Single-molecule sequencing: towards clinical applications
.
Trends Biotechnol
 
2019
;
37
(
1
):
72
85
.

29.

Van Hout
CV
,
Tachmazidou
I
,
Backman
JD
, et al.  
Exome sequencing and characterization of 49,960 individuals in the UK biobank
.
Nature
 
2020
;
586
(
7831
):
749
56
.

30.

Ebert
P
,
Audano
PA
,
Zhu
QH
, et al.  
Haplotype-resolved diverse human genomes and integrated analysis of structural variation
.
Science
 
2021
;
372
:eabf7117.

31.

Sedlazeck
FJ
,
Rescheneder
P
,
Smolka
M
, et al.  
Accurate detection of complex structural variations using single-molecule sequencing
.
Nat Methods
 
2018
;
15
:
461
8
.

32.

Jiang
T
,
Liu
YZ
,
Jiang
Y
, et al.  
Long-read-based human genomic structural variation detection with cuteSV
.
Genome Biol
 
2020
;
21
(
1
):
24
.

33.

Mantere
T
,
Kersten
S
,
Hoischen
A
.
Long-read sequencing emerging in medical genetics
.
Front Genet
 
2019
;
10
:
14
.

34.

Hastings
PJ
,
Lupski
JR
,
Rosenberg
SM
,
Ira
G
.
Mechanisms of change in gene copy number
.
Nat Rev Genet
 
2009
;
10
(
8
):
551
64
.

35.

Logsdon
GA
,
Vollger
MR
,
Eichler
EE
.
Long-read human genome sequencing and its applications
.
Nat Rev Genet
 
2020
;
21
(
10
):
597
614
.

36.

Eid
J
,
Fehr
A
,
Gray
J
, et al.  
Real-time DNA sequencing from single polymerase molecules
.
Science
 
2009
;
323
(
5910
):
133
8
.

37.

Branton
D
,
Deamer
DW
,
Marziali
A
, et al.  
The potential and challenges of nanopore sequencing
.
Nat Biotechnol
 
2008
;
26
(
10
):
1146
53
.

38.

Sedlazeck
FJ
,
Lee
H
,
Darby
CA
,
Schatz
MC
.
Piercing the dark matter: bioinformatics of long-range sequencing and mapping
.
Nat Rev Genet
 
2018
;
19
(
6
):
329
46
.

39.

Huddleston
J
,
Chaisson
MJP
,
Steinberg
KM
, et al.  
Discovery and genotyping of structural variation from long-read haploid genome sequence data
.
Genome Res
 
2017
;
27
(
5
):
677
85
.

40.

Jain
M
,
Koren
S
,
Miga
KH
, et al.  
Nanopore sequencing and assembly of a human genome with ultra-long reads
.
Nat Biotechnol
 
2018
;
36
:
338
45
.

41.

Chen
Y
,
Nie
F
,
Xie
SQ
, et al.  
Efficient assembly of nanopore reads via highly accurate and intact error correction
.
Nat Commun
 
2021
;
12
(
1
):
10
.

42.

Ni
P
,
Huang
N
,
Nie
F
, et al.  
Genome-wide detection of cytosine methylations in plant from Nanopore data using deep learning
.
Nat Commun
 
2021
;
12
(
1
):
11
.

43.

Wenger
AM
,
Peluso
P
,
Rowell
WJ
, et al.  
Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome
.
Nat Biotechnol
 
2019
;
37
:
1155
62
.

44.

Chen
XY
,
Schulz-Trieglaff
O
,
Shaw
R
, et al.  
Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications
.
Bioinformatics
 
2016
;
32
(
8
):
1220
2
.

45.

Rausch
T
,
Zichner
T
,
Schlattl
A
, et al.  
DELLY: structural variant discovery by integrated paired-end and split-read analysis
.
Bioinformatics
 
2012
;
28
(
18
):
I333
9
.

46.

Layer
RM
,
Chiang
C
,
Quinlan
AR
,
Hall
IM
.
LUMPY: a probabilistic framework for structural variant discovery
.
Genome Biol
 
2014
;
15
(
6
):
R84
.

47.

Wala
JA
,
Bandopadhayay
P
,
Greenwald
NF
, et al.  
SvABA: genome-wide detection of structural variants and indels by local assembly
.
Genome Res
 
2018
;
28
(
4
):
581
91
.

48.

Nagasaki
M
,
Yasuda
J
,
Katsuoka
F
, et al.  
Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals
.
Nat Commun
 
2015
;
6
:
13
.

49.

Robinson
JT
,
Thorvaldsdóttir
H
,
Wenger
AM
, et al.  
Variant review with the integrative genomics viewer
.
Cancer Res
 
2017
;
77
(
21
):
E31
4
.

50.

Quinlan
AR
,
Hall
IM
.
Characterizing complex structural variation in germline and somatic genomes
.
Trends Genet
 
2012
;
28
(
1
):
43
53
.

51.

Michaelson
JJ
,
Sebat
J
.
forestSV: structural variant discovery through statistical learning
.
Nat Methods
 
2012
;
9
:
819
21
.

52.

Antaki
D
,
Brandler
WM
,
Sebat
J
.
SV2: accurate structural variation genotyping andde novomutation detection from whole genomes
.
Bioinformatics
 
2018
;
34
(
10
):
1774
7
.

53.

Breiman
L
.
Random forests
.
Machine Learning
 
2001
;
45
(
1
):
5
32
.

54.

Cortes
C
,
Vapnik
V
.
Support-vector networks
.
Machine Learning
 
1995
;
20
(
3
):
273
97
.

55.

Krizhevsky
A
,
Sutskever
I
,
Hinton
GE
.
ImageNet classification with deep convolutional neural networks
.
Communications of the Acm
 
2017
;
60
(
6
):
84
90
.

56.

Wu
Y
,
Schuster
M
,
Chen
Z
, et al.  
Google's neural machine translation system: bridging the gap between human and machine translation
. arXiv:1609.08144, 2016.

57.

Silver
D
,
Huang
A
,
Maddison
CJ
, et al.  
Mastering the game of go with deep neural networks and tree search
.
Nature
 
2016
;
529
:
484
9
.

58.

Mnih
V
,
Kavukcuoglu
K
,
Silver
D
, et al.  
Human-level control through deep reinforcement learning
.
Nature
 
2015
;
518
(
7540
):
529
33
.

59.

Min
S
,
Lee
B
,
Yoon
S
.
Deep learning in bioinformatics
.
Brief Bioinform
 
2017
;
18
(
5
):
851
69
.

60.

Alipanahi
B
,
Delong
A
,
Weirauch
MT
,
Frey
BJ
.
Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
.
Nat Biotechnol
 
2015
;
33
:
831
8
.

61.

Zhou
J
,
Troyanskaya
OG
.
Predicting effects of noncoding variants with deep learning-based sequence model
.
Nat Methods
 
2015
;
12
(
10
):
931
4
.

62.

Xiong
HY
,
Alipanahi
B
,
Lee
LJ
, et al.  
The human splicing code reveals new insights into the genetic determinants of disease
.
Science
 
2015
;
347
(
6218
):
9
.

63.

Poplin
R
,
Chang
PC
,
Alexander
D
, et al.  
A universal SNP and small-indel variant caller using deep neural networks
.
Nat Biotechnol
 
2018
;
36
:
983
7
.

64.

Zerbino
DR
,
Birney
E
.
Velvet: algorithms for de novo short read assembly using de Bruijn graphs
.
Genome Res
 
2008
;
18
(
5
):
821
9
.

65.

Tang
M
,
Hasan
MS
,
Zhu
HX
, et al.  
Vi-HMM: a novel HMM-based method for sequence variant identification in short-read data
.
Hum Genomics
 
2019
;
13
:
12
.

66.

Smith
TF
,
Waterman
MS
.
Identification of common molecular subsequences
.
J Mol Biol
 
1981
;
147
(
1
):
195
7
.

67.

Friedman
S
,
Gauthier
L
,
Farjoun
Y
,
Banks
E
.
Lean and deep models for more accurate filtering of SNP and INDEL variant calls
.
Bioinformatics
 
2020
;
36
(
7
):
2060
7
.

68.

Luo
RB
,
Sedlazeck
FJ
,
Lam
TW
, et al.  
A multi-task convolutional deep neural network for variant calling in single molecule sequencing
.
Nat Commun
 
2019
;
10
:
11
.

69.

Luo
RB
,
Wong
CL
,
Wong
YS
, et al.  
Exploring the limit of using a deep neural network on pileup data for germline variant calling
.
Nature Machine Intelligence
 
2020
;
2
(
4
):
220
7
.

70.

Ahsan
MU
,
Liu
Q
,
Fang
L
,
Wang
K
.
NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks
.
Genome Biol
 
2021
;
22
(
1
):
33
.

71.

Shafin
K
,
Pesout
T
,
Chang
PC
, et al.  
Haplotype-aware variant calling with PEPPER-margin-DeepVariant enables high accuracy in nanopore long-reads
.
Nat Methods
 
2021
;
18
:
1322
32
.

72.

Zheng
ZX
,
Li
SM
,
Su
JH
, et al.  
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
.
Nature Computational Science
 
2022
;
2
:
797
803
.

73.

Huang
N
,
Xu
MH
,
Nie
F
, et al.  
NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
.
Bioinformatics
 
2023
;
39
(
1
):
9
.

74.

Wagner
J
,
Olson
ND
,
Harris
L
, et al.  
Benchmarking challenging small variants with linked and long reads
.
Cell genomics
 
2022
;
2
(
5
):
100128
.

75.

Olson
ND
,
Wagner
J
,
McDaniel
J
, et al.  
PrecisionFDA truth challenge V2: calling variants from short and long reads in difficult-to-map regions
.
Cell genomics
 
2022
;
2
(
5
):
100129
.

76.

Zook
JM
,
Salit
M
.
Genomes in a bottle: creating standard reference materials for genomic variation—why, what and how?
 
Genome Biol
 
2011
;
12
:
18
8
.

77.

Zook
JM
,
Chapman
B
,
Wang
J
, et al.  
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls
.
Nat Biotechnol
 
2014
;
32
(
3
):
246
51
.

78.

LeCun
Y
,
Bengio
Y
,
Hinton
G
.
Deep learning
.
Nature
 
2015
;
521
(
7553
):
436
44
.

79.

Zhang
QR
,
Zhang
M
,
Chen
TH
, et al.  
Recent advances in convolutional neural network acceleration
.
Neurocomputing
 
2019
;
323
:
37
51
.

80.

Alom
MZ
,
Taha
TM
,
Yakopcic
C
, et al.  
A state-of-the-art survey on deep learning theory and architectures
.
Electronics
 
2019
;
8
(
3
):
66
.

81.

Szegedy
C
,
Vanhoucke
V
,
Ioffe
S
, et al.  Rethinking the inception architecture for computer vision. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
Seattle, WA
:
IEEE
,
2016
,
2818
26
.

82.

Ioffe
S
,
Szegedy
C
. Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach F, Blei D (eds).
32nd International Conference on Machine Learning
.
Lille, France
:
Jmlr-Journal Machine Learning Research
,
2015
,
448
56
.

83.

Szegedy
C
,
Liu
W
,
Jia
YQ
, et al.  Going deeper with convolutions. In:
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
.
Boston, MA
:
IEEE
,
2015
,
1
9
.

84.

Hochreiter
S
,
Schmidhuber
J
.
Long short-term memory
.
Neural Comput
 
1997
;
9
(
8
):
1735
80
.

85.

He
KM
,
Zhang
XY
,
Ren
SQ
,
Sun
J
.
Spatial pyramid pooling in deep convolutional networks for visual recognition
.
IEEE Trans Pattern Anal Mach Intell
 
2015
;
37
(
9
):
1904
16
.

86.

Cai
L
,
Wu
YF
,
Gao
JY
.
DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network
.
BMC Bioinformatics
 
2019
;
20
(
1
):
17
.

87.

Chowdhury
M
,
Layer
RM
.
Learning what a good structural variant looks like
. bioRxiv 2020.

88.

Belyeu
JR
,
Chowdhury
M
,
Brown
J
, et al.  
Samplot: a platform for structural variant visual validation and automated filtering
.
Genome Biol
 
2021
;
22
(
1
):
13
.

89.

Glessner
JT
,
Hou
XR
,
Zhong
C
, et al.  
DeepCNV: a deep learning approach for authenticating copy number variations
.
Brief Bioinform
 
2021
;
22
(
5
):
10
.

90.

Wang
K
,
Li
MY
,
Hadley
D
, et al.  
PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data
.
Genome Res
 
2007
;
17
(
11
):
1665
74
.

91.

Lima
LD
,
Wang
K
.
PennCNV in whole-genome sequencing data
.
BMC Bioinformatics
 
2017
;
18
:
8
.

92.

Tan
RJ
,
Shen
YF
.
Accurate in silico confirmation of rare copy number variant calls from exome sequencing data using transfer learning
.
Nucleic Acids Res
 
2022
;
50
(
21
):
8
.

93.

Liu
YZ
,
Huang
YL
,
Wang
GH
,
Wang
Y
.
A deep learning approach for filtering structural variants in short read sequencing data
.
Brief Bioinform
 
2021
;
22
(
4
):
9
.

94.

Luo
JW
,
Ding
HY
,
Shen
JQ
, et al.  
BreakNet: detecting deletions using long reads and a deep learning approach
.
BMC Bioinformatics
 
2021
;
22
(
1
):
13
.

95.

Ding
HY
,
Luo
JW
.
MAMnet: detecting and genotyping deletions and insertions based on long reads and a deep learning approach
.
Brief Bioinform
 
2022
;
23
(
5
):
10
.

96.

Wang
SQ
,
Li
J
,
Haque
AKA
, et al.  
svBreak: a new approach for the detection of structural variant breakpoints based on convolutional neural network
.
Biomed Res Int
 
2022
;
2022
:
1
8
.

97.

Lin
JD
,
Wang
SB
,
Audano
PA
, et al.  
SVision: a deep learning approach to resolve complex structural variants
.
Nat Methods
 
2022
;
19
:
1230
3
.

98.

Popic
V
,
Rohlicek
C
,
Cunial
F
, et al.  
Cue: a deep-learning framework for structural variant discovery and genotyping
.
Nat Methods
2023;
20
:559–568.

99.

Özden
F
,
Alkan
C
,
Çiçek
AE
.
Polishing copy number variant calls on exome sequencing data via deep learning
.
Genome Res
 
2022
;
32
(
6
):
1170
82
.

100.

Becker
TJ
,
Shin
DG
. TensorSV: structural variation inference using tensors and variable topology neural networks. In: Park T, Cho YR, Hu X, et al. (eds).
IEEE International Conference on Bioinformatics and Biomedicine (IEEE BIBM). Electr Network
. Publisher:
IEEE Computer Soc
,
2020
, pp.
1356
60
.

101.

Chaisson
MJP
,
Sanders
AD
,
Zhao
XF
, et al.  
Multi-platform discovery of haplotype-resolved structural variation in human genomes
.
Nat Commun
 
2019
;
10
:
16
.

102.

Bolognini
D
,
Sanders
A
,
Korbel
JO
, et al.  
VISOR: a versatile haplotype-aware structural variant simulator for short- and long-read sequencing
.
Bioinformatics
 
2020
;
36
(
4
):
1267
9
.

103.

Yarotsky
D
.
Error bounds for approximations with deep ReLU networks
.
Neural Netw
 
2017
;
94
:
103
14
.

104.

Sherstinsky
A
.
Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network
.
Physica D
 
2020
;
404
:
132306
.

105.

Hu
J
,
Shen
L
,
Albanie
S
, et al.  
Squeeze-and-excitation networks
.
IEEE Trans Pattern Anal Mach Intell
 
2020
;
42
(
8
):
2011
23
.

106.

Anthimopoulos
M
,
Christodoulidis
S
,
Ebner
L
, et al.  
Lung pattern classification for interstitial lung diseases using a deep convolutional neural network
.
IEEE Trans Med Imaging
 
2016
;
35
(
5
):
1207
16
.

107.

Deng
J
,
Dong
W
,
Socher
R
, et al.  ImageNet: a large-scale hierarchical image database. In:
IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops
.
Miami Beach, FL
:
IEEE
,
2009
,
248
55
.

108.

Szegedy
C
,
Ioffe
S
,
Vanhoucke
V
,
Alemi
A
. Inception-v4, Inception-ResNet and the impact of residual connections on learning. In:
31st AAAI Conference on Artificial Intelligence
. Vol.
31
,
San Francisco, CA
:
Assoc Advancement Artificial Intelligence
,
2017
, pp.
4278
84
.

109.

Sandler
M
,
Howard
A
,
Zhu
ML
, et al.  MobileNetV2: inverted residuals and linear bottlenecks. In:
31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
.
Salt Lake City, UT
:
IEEE
,
2018
, pp.
4510
20
.

110.

Zoph
B
,
Vasudevan
V
,
Shlens
J
, et al.  Learning transferable architectures for scalable image recognition. In:
31st IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
.
Salt Lake City, UT
:
IEEE
,
2018
, pp.
8697
710
.

111.

Liu
CX
,
Zoph
B
,
Neumann
M
, et al.  Progressive neural architecture search. In: Ferrari V, Hebert M, Sminchisescu C, et al. (eds).
15th European Conference on Computer Vision (ECCV)
.  
Munich, Germany
:
Springer International Publishing Ag
,
2018
, pp.
19
35
.

112.

Andrew
G
,
Howard
MZ
,
Chen
B
, et al.  
MobileNets: efficient convolutional neural networks for mobile vision applications
. arXiv:1704.04861, 2017.

113.

Newell
A
,
Yang
KU
,
Deng
J
. Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, et al. (eds).
14th European Conference on Computer Vision (ECCV)
.
Amsterdam
:
Springer International Publishing Ag
,
2016
, pp.
483
99
.

114.

Newell
A
,
Huang
Z
,
Deng
J
. Associative embedding: end-to-end learning for joint detection and grouping. In: Guyon I, Luxburg UV, Bengio S, et al. (eds).
31st Annual Conference on Neural Information Processing Systems (NIPS)
.
Long Beach, CA
:
Neural Information Processing Systems (Nips)
,
2017
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)