-
PDF
- Split View
-
Views
-
Cite
Cite
Yasuto Ishii, Atsushi Toyoda, Alec Lewis, Angus Davison, Osamu Miura, Kazuki Kimura, Satoshi Chiba, Chromosome-Level Genome Assembly of the Asian Tramp Snail Bradybaena similaris (Stylommatophora: Camaenidae), Genome Biology and Evolution, Volume 17, Issue 5, May 2025, evaf070, https://doi.org/10.1093/gbe/evaf070
- Share Icon Share
Abstract
While terrestrial land snails have long been used for evolutionary research, a lack of high-quality genomic resources has impeded recent progress. Bradybaena snails in particular have numerous intriguing traits that make them a good model for studying evolution, including shell pattern polymorphism and convergent evolution. They are also introduced and invasive across the world. In this study, we present a chromosome-level genome assembly of the Asian tramp snail Bradybaena similaris, utilizing 88-fold Illumina short-read sequences, 125-fold Nanopore long-read sequences, 63-fold PacBio HiFi sequences, and 47-fold Hi-C sequences. The assembled genome of 2.18 Gb is anchored to 28 chromosomes and exhibits high completeness (single copy, 91.7%; duplicates, 7.1%) and contiguity (N50 of 75.6 Mb). Additionally, we also obtained a high-quality transcriptome for annotation. This resource represents the first chromosome-level assembly for snails in the superfamily Helicoidea, which includes more than 5,000 species of terrestrial snails, and will facilitate genomic study in Bradybaena and, more broadly, in the superfamily Helicoidea.
While the Helicoidea is the largest land snail superfamily, consisting of more than 5,000 species, many of interest to evolutionary studies, no chromosome-level assembly is available for any species. Previously, the genus Bradybaena in the Helicoidea has been studied in the context of speciation, adaptation, and invasive biology and thereby has high potential for further research. In this study, we present a chromosome-level assembly and transcriptome of the Asian tramp snail Bradybaena similaris. These high-quality genomic resources will facilitate research on related species and eventually enhance our understanding of many areas of evolutionary biology.
Introduction
Terrestrial snails are relevant to many scientific fields. In the context of invasion biology, they are recognized as both agricultural and public health pests, leading to extensive research on their control (Hirano et al. 2020; Yonow et al. 2023). Unfortunately, in conservation biology, terrestrial snails are recognized as holding a position of significant risk (Chiba and Cowie 2016), representing the animal group with the most extinctions since the 16th century, with many species now requiring conservation efforts (Lydeard et al. 2004). Terrestrial snails also often serve as indicators of environmental pollution and so have been used in the study of anthropogenic impacts on the environment (Baroudi et al. 2020).
Terrestrial snails are a key group also in evolutionary biology (Chiba and Cowie 2016), especially species in the Helicoidea, the most speciose land snail superfamily, including more than 5,000 species (MolluscaBase 2024, accessed on 4th September 2024). For example, the genera Satsuma and Euhadra have been studied with the potential to understand speciation and also the origins of variation in left-right asymmetry (Ueshima and Asami 2003; Davison et al. 2005; Hoso et al. 2010; Richards et al. 2017). Also, research on Cepaea, Theba, and Euhadra continues to provide insight in understanding the evolution and genetics of shell color (Knigge et al. 2017; Ito et al. 2023; Johansen et al. 2023; Chowdhury et al. 2024). For another instance, in studies of behavioral evolution, Cornu and Euhadra have been major subjects of research on sexual conflict and mating behavior (Koene 2006; Kimura et al. 2013).
Despite these advantages as excellent model systems, genomic research has been scarce in terrestrial snails, partly due to a lack of high-quality genome assemblies (Davison and Neiman 2021; Linscott et al. 2022). As of now, genome assemblies are available for only eight land snail and slug species (supplementary table S1, Supplementary Material online). Among these, only the four genera, including Achatina, Arion, Megaustenia, and Meghimatium, have chromosome-level assemblies (Guo et al. 2019; Liu et al. 2021; Chen et al. 2022; Chetruengchai et al. 2024; Sun et al. 2024). Only two genome assemblies have been formally published for Helicoidea (Candidula: Chueca et al. 2021; Cepaea: Saenko et al. 2021), and no chromosome-level assembly has been obtained.
Bradybaena snails in the Helicoidea are particularly intriguing in the context of speciation and adaptation, where similar shell morphologies have evolved independently (Hirano et al. 2014, 2019). As shell morphology is a key trait for assortative mating (Kimura et al. 2015), then this may lead to parallel speciation. In the context of adaptation, the shell color and band pattern of Bradybaena snails are also important. Bradybaena snails have polymorphism in shell color and band pattern, with the variation likely maintained through balancing selection (Komai and Emura 1955). Each phenotype is determined by a single locus in the form of a supergene (Komai and Emura 1955), a concept that has recently received renewed attention (Berdan et al. 2022). Also, Bradybaena snails are also highly invasive around the world. Bradybaena similaris, native to East and Southeast Asia, has been introduced to every continent except Antarctica (Serniotti et al. 2019). This species damages not only crop but also hosts parasites, some of which can cause diseases in humans and livestock (Serniotti et al. 2019). Genomic resources could support further understanding of evolutionary innovations that make these invasive species and by facilitating the development of control strategies (North et al. 2021).
Here, we represent a chromosome-level genome assembly of B. similaris. Using high-coverage Nanopore long reads as well as PacBio HiFi reads, we successfully assembled a high-quality genome. Additionally, we obtained a high-quality transcriptome using newly sequenced RNA-seq data for annotation. The value in these genomic resources is their potential to facilitate research on Bradybaena snails, and more generally helicoidean snails, that can greatly benefit our understanding of multiple topics within evolutionary biology.
Results and Discussion
Sequencing yielded 192 Gb Illumina paired short-read sequences, 272 Gb Nanopore long-read sequences, 136 Gb PacBio HiFi sequences, and 101 Gb Hi-C library sequences (supplementary table S2, Supplementary Material online). Genome size and heterozygosity estimated by a k-mer-based method were 1,872,818,176 bp and 1.38%, respectively (Fig. 1b; Ranallo-Benavidez et al. 2020). An initial assembly generated by hifiasm (Cheng et al. 2022) had a size of 2,423,902,034 bp and consisted of 554 contigs (supplementary table S3, Supplementary Material online). After removing haplotigs, the assembly contained 279 contigs with the size of 2,289,349,004 bp and N50 of 17,829,327 bp (supplementary table S3, Supplementary Material online). Haplotig purging reduced the duplicated BUSCOs by about half (supplementary table S3, Supplementary Material online). This assembly was then scaffolded using Hi-C reads, producing a final assembly of 2,289,373,504 bp, comprising 82 scaffolds (Table 1). Some mis-joined chromosomal scaffolds were observed (supplementary fig. S1, Supplementary Material online), and they were split manually. Of these, 54 scaffolds (containing 270 contigs; 2,271,669,131 bp) were anchored onto 28 megascaffolds (Fig. 1c; supplementary table S4, Supplementary Material online), which corresponds to the haploid chromosome number of the species (Inaba 1959). The unanchored scaffolds are size of 17,704,373 bp and consisted of 57 contigs. Telomeric sequences were found in all but one of the chromosomal scaffolds; 15 and 12 chromosomal scaffolds had telomeric sequences on both sides and on one side, respectively (supplementary table S4, Supplementary Material online). The assembled genome size was larger than the genome size estimated using the k-mer method. This discrepancy has been observed in other large-genome organisms and is likely due to a high proportion of repeats (Pfenninger et al. 2022; Miura et al. 2023). The assembled genome has a higher contiguity and completeness than that of the other terrestrial snails (supplementary table S1, Supplementary Material online). The relatively high rate of BUSCO duplication (Table 1) should be a characteristic of land snails. The similar trend (duplication rate > 5%) is observed in land snail genomes, such as Candidula, 7.1%; Cepaea, 15.1%; Achatina, 6.9%; Arion, 6%; and Oreohelix, 9.1% (Guo et al. 2019; Chueca et al. 2021; Saenko et al. 2021; Chen et al. 2022; Linscott et al. 2022). The present study used metazoa_odb10 as a reference which is based on OrthoDB v10 (Kriventseva et al. 2019). OrthoDB v10 includes no stylommatophoran land snail genome (https://v10-1.orthodb.org/, accessed on 2025 March 29). Besides, stylommatophoran land snails experienced a whole-genome duplication event (Liu et al. 2021; Chen et al. 2022). Hence, the high rate of duplication could be explained by the taxon-specific duplication event. Future research should explore what core genes are duplicated in land snail genomes.

Overview of the results. a) Photograph of B. similaris. b) GenomeScope k-mer profile plot for B. similaris, displaying observed k-mer frequency (bar plot), full model fitted by GenomeScope (black line). len, estimated genome size; uniq, the proportion of non-repetitive sequences; aa, homozygosity rate; ab, heterozygosity rate; kcov, mean k-mer coverage for heterozygous bases; err, error rates of the reads; dup, average rate of read duplications; k, k-mer size; p, ploidy. c) Hi-C contact map for B. similaris with chromosomal scaffolds (blue boxes). Darker red indicates higher contact density. d) Comparison of OMArk statistics for Panpulmonata snails. All data were sourced from OMArk web server. The shown species and accession numbers (in parentheses) are the following: Biomphalaria glabrata (BglaB1, UP001165740, and GCF_947242115.1), Biomphalaria pfeifferi (GCA_030265305.1), B. similaris (this study), Bulinus truncatus (GCA_021962125.1), C. unifasciata (UP000678393 and GCA_905116865.2), Elysia chlorotica (UP000271974 and GCA_003991915.1), Elysia crispate (GCA_033675545.1), Elysia marginata (GCA_019649035.1), Lymnaea stagnalis (GCA_964033795.1), Physella acuta (GCF_028476545.1), and Plakobranchus ocellatus (GCA_019648995.1).
The statistics for the genome assembly and annotation for B. similaris, after removing contaminants
Analysis . | Statistics . | Value . |
---|---|---|
Assembly | Assembled genome size | 2,289,373,504 |
Number of contigs | 327 | |
Number of scaffolds | 82 | |
Contig N50 (Mb) | 17.8 | |
Contig N90 (Mb) | 5.0 | |
Scaffold N50 (Mb) | 75.6 | |
Scaffold N90 (Mb) | 56.1 | |
BUSCO completeness (%) | 98.8 | |
Single copy (%) | 91.7 | |
Duplicated (%) | 7.1 | |
Fragmented (%) | 0.3 | |
Missing (%) | 0.9 | |
Annotation | # of genes | 29,226 |
# of annotated genes | 25,062 | |
Annotated using: | ||
SwissProt | 16,107 | |
TrEMBL | 24,313 | |
EggNOG | 22,863 | |
RefSeq invertebrate | 23,452 | |
BUSCO completeness (%) | 98.5 | |
Single copy (%) | 91.8 | |
Duplicated (%) | 6.7 | |
Fragmented (%) | 0.4 | |
Missing (%) | 1.1 |
Analysis . | Statistics . | Value . |
---|---|---|
Assembly | Assembled genome size | 2,289,373,504 |
Number of contigs | 327 | |
Number of scaffolds | 82 | |
Contig N50 (Mb) | 17.8 | |
Contig N90 (Mb) | 5.0 | |
Scaffold N50 (Mb) | 75.6 | |
Scaffold N90 (Mb) | 56.1 | |
BUSCO completeness (%) | 98.8 | |
Single copy (%) | 91.7 | |
Duplicated (%) | 7.1 | |
Fragmented (%) | 0.3 | |
Missing (%) | 0.9 | |
Annotation | # of genes | 29,226 |
# of annotated genes | 25,062 | |
Annotated using: | ||
SwissProt | 16,107 | |
TrEMBL | 24,313 | |
EggNOG | 22,863 | |
RefSeq invertebrate | 23,452 | |
BUSCO completeness (%) | 98.5 | |
Single copy (%) | 91.8 | |
Duplicated (%) | 6.7 | |
Fragmented (%) | 0.4 | |
Missing (%) | 1.1 |
The statistics for the genome assembly and annotation for B. similaris, after removing contaminants
Analysis . | Statistics . | Value . |
---|---|---|
Assembly | Assembled genome size | 2,289,373,504 |
Number of contigs | 327 | |
Number of scaffolds | 82 | |
Contig N50 (Mb) | 17.8 | |
Contig N90 (Mb) | 5.0 | |
Scaffold N50 (Mb) | 75.6 | |
Scaffold N90 (Mb) | 56.1 | |
BUSCO completeness (%) | 98.8 | |
Single copy (%) | 91.7 | |
Duplicated (%) | 7.1 | |
Fragmented (%) | 0.3 | |
Missing (%) | 0.9 | |
Annotation | # of genes | 29,226 |
# of annotated genes | 25,062 | |
Annotated using: | ||
SwissProt | 16,107 | |
TrEMBL | 24,313 | |
EggNOG | 22,863 | |
RefSeq invertebrate | 23,452 | |
BUSCO completeness (%) | 98.5 | |
Single copy (%) | 91.8 | |
Duplicated (%) | 6.7 | |
Fragmented (%) | 0.4 | |
Missing (%) | 1.1 |
Analysis . | Statistics . | Value . |
---|---|---|
Assembly | Assembled genome size | 2,289,373,504 |
Number of contigs | 327 | |
Number of scaffolds | 82 | |
Contig N50 (Mb) | 17.8 | |
Contig N90 (Mb) | 5.0 | |
Scaffold N50 (Mb) | 75.6 | |
Scaffold N90 (Mb) | 56.1 | |
BUSCO completeness (%) | 98.8 | |
Single copy (%) | 91.7 | |
Duplicated (%) | 7.1 | |
Fragmented (%) | 0.3 | |
Missing (%) | 0.9 | |
Annotation | # of genes | 29,226 |
# of annotated genes | 25,062 | |
Annotated using: | ||
SwissProt | 16,107 | |
TrEMBL | 24,313 | |
EggNOG | 22,863 | |
RefSeq invertebrate | 23,452 | |
BUSCO completeness (%) | 98.5 | |
Single copy (%) | 91.8 | |
Duplicated (%) | 6.7 | |
Fragmented (%) | 0.4 | |
Missing (%) | 1.1 |
The proportion of repeat content was estimated to be 61.28% (1.40 Gb), with long interspersed nuclear elements (LINEs) constituting a significant portion of the repeat content (19.51%; details are shown in supplementary table S5, Supplementary Material online). The proportion of LINEs is smaller than those of some terrestrial snails and slugs: Candidula unifasciata, 25.0% (Chueca et al. 2021); Cepaea nemoralis, 30.5% (Saenko et al. 2021); and Meghimatium bilineatum, 34.1% (Sun et al. 2024).
The genome was annotated using BRAKER3 (Gabriel et al. 2024), with 87.91 Gb RNA-seq data and OrthoDB 11 (Kuznetsov et al. 2023). Supplementary table S6, Supplementary Material online, shows the mapping rates for ten RNA-seq data used in this analysis. The number of predicted genes was 29,227. Functional annotation and contaminant detection were performed for the predicted genes using EnTAP (Hart et al. 2020). Any gene derived from bacteria was regarded as a putative contaminant. Using four protein sequence databases as a reference (SwissProt, TrEMBL, EggNOG, RefSeq invertebrate), a total of 25,062 genes were functionally annotated (Table 1). Among the annotated genes, 16,631 genes had Gene Ontology terms contained in EggNOG (Table 1). Only one gene was identified as a putative contaminant. Homology search was performed for this putative contaminant gene. The most homologous sequence was sourced from Candidula land snail (e-value = 3.16e-40), the second one was Bulinus freshwater snail (e-value = 1.97e-27), and the third one was Physella freshwater snail (e-value = 4.77e-26). Given this gene is found also in other molluscan genomes, we regarded that this gene is not a contaminant. The number of genes in the transcriptome is comparable to those of other land snails (supplementary table S5, Supplementary Material online).
Almost all metazoan core genes were found in the assembly and the transcriptome (98.8% and 98.5%, respectively; Table 1). The quality of the transcriptome was also assessed with OMArk (Nevers et al. 2024), which quantifies not only the completeness as BUSCO does (i.e. measuring how many expected genes are present in the genome) but also the consistency (i.e. the proportion of protein sequences correctly assigned to known gene families within the same lineage: Nevers et al. 2024). According to OMArk, 56.39% of the transcriptome was classified as “consistent,” 11.24% as “inconsistent,” and 32.37% as “unknown.” No gene was classified as “contaminant.” Although the relatively high proportion of inconsistent genes might suggest that the genome annotation can be improved, the quality was higher than that of most other Panpulmonata proteomes (i.e. higher “consistent” rate and lower “inconsistent” or “unknown” rate; Fig. 1d). The lack of model species in related taxa (Davison and Neiman 2021) may lead the relatively high proportion of inconsistent and unknown genes, as similar trends have been observed in other Panpulmonata proteomes (Fig. 1d).
Conclusions
We presented a chromosome-level genome assembly for B. similaris utilizing Illumina paired-read sequences, Nanopore long-read sequences, HiFi long-read sequences, and Hi-C sequences. The assembled genome was 2.18 Gb in size, with N50 of 75.6 Mb and 98.8% metazoan BUSCO completeness (single copy, 91.7%; duplicates, 7.1%). This quality is much higher than that of the other published terrestrial snail genomes. The estimated transcriptome was also high-quality, containing 98.5% metazoan BUSCO (single copy, 91.8%; duplicates, 6.7%). These high-quality genome resources will accelerate the genomic study of terrestrial snails, particularly Bradybaena snails.
Materials and Methods
Sampling, DNA Extraction, and Sequencing
Three adult B. similaris individuals were collected for genome sequencing in Sendai, Miyagi, Japan (38.259661N 140.861679E) in September 2023 (Fig. 1a; supplementary table S2, Supplementary Material online). No permission is required to collect this species. All snail shells were pale-colored and lacked bands. These three individuals were separately used for short-read sequencing, long-read sequencing, and Hi-C sequencing (supplementary table S2, Supplementary Material online). DNA was extracted from a piece of foot using Nanobind Tissue Big DNA Kit (PacBio, US) for short-read sequencing and QIAGEN Genomic-tip 100/G (QIAGEN, Germany) for long-read sequencing.
An Illumina short-read library was prepared with a TruSeq DNA PCR-Free Library Prep Kit (Illumina, US). Two runs of 150 bp paired-end sequencing were then performed on a NovaSeq 6000 (Illumina) using the NovaSeq 6000 SP Reagent Kit v1.5 (Illumina). The Nanopore reads were obtained using a PromethION, a Ligation Sequencing Kit V14, and a R10.4.1 flowcell (Nanopore, UK). The PacBio library was prepared using a SMRTbell prep kit 3.0 and sequenced on both Sequel II and Sequel IIe platforms, using a Sequel II SMRT Cell 8 M along with a Sequel II Binding Kit 3.2 and Sequel II Sequencing Kit 2.0 (PacBio). These library preparations and sequencings were performed at the National Institute of Genetics Japan (NIG, Japan). An Hi-C library was prepared using Proximo Hi-C kit (Animal) following the manufacture's instruction. Sequencing was conducted on a DNBseq-g400 (MGI, China) at BGI.
RNA-seq was performed for annotation. An additional individual was collected at Sendai, Miyagi, Japan (38.259550N 140.850820E; sample ID: MNKS 6468). Tissue of the dart sac was preserved in RNAlater (QIAGEN) until extraction. Total RNA was extracted using ISOGEN (NIPPON GENE, Japan). Library preparation and sequencing were performed at BGI. DNBseq-g400 was utilized for 150 bp paired-end sequencing.
De Novo Genome Assembly
De novo assembly was initially performed with Hifiasm 0.19.5-r587 (Cheng et al. 2022) using Nanopore as well as HiFi reads. Haplotigs were purged by purge_dups 1.2.5 (Guan et al. 2020), applying the “-e” option in the get_seqs module. The Hifiasm assembly was then used as input for Hi-C scaffolding. Hi-C reads were mapped onto the assembly using bwa 0.7.17-r1188 (Li, 2013) and samtools 1.17 (Danecek et al 2021). Then, the mapped reads were used for Hi-C scaffolding using YaHS 1.2 (Zhou et al. 2023). The contact map was visualized, and four pairs of scaffolds were clearly mis-joined into four chromosomes each, so these scaffolds were manually split. The final assembly was generated using Juicer 1.2 (Durand et al. 2016), Juicer tools 1.9.9, and JuiceBox 2.15.0.0 (Robinson et al. 2018).
Genome features, including genome size, heterozygosity, and repetitiveness, were estimated using GenomeScope 2.0 (Ranallo-Benavidez et al. 2020). The canonical k-mer distribution was produced from the short-read data with KMC 3.1.1 (Kokot et al. 2017) with k-mer length and maximum coverage set as 31 and 10,000, respectively.
Genome Annotation
Prior to genome annotation, the repeat content was masked. RepeatModeler 2.0.4 (Flynn et al. 2020) was used to build a species-specific repeat content library. Using this library, the repeat content was masked using RepeatMasker 4.1.5 (http://www.repeatmasker.org). Both software tools were implemented in TETools 1.8 (https://github.com/Dfam-consortium/TETools).
Gene prediction was performed for the soft-masked sequence using BRAKER 3.0.8 (Gabriel et al. 2024). RNA-seq data (available at NCBI under accession numbers SRR8040510–SRR8040517, SRR6981555, and newly obtained data) and protein data (molluscan proteomes in OrthoDB 11; Kuznetsov et al. 2023) were used as reference data. The tissue for RNA-seq was sourced from whole body, embryo, digestive gland, and dart sac, among which dart sac data are newly obtained in this study. Functional annotation and contaminant detection for the predicted genes were carried out with EnTAP 1.3.0 (Hart et al. 2020). Four databases (SwissProt, TrEMBL, EggNOG, and RefSeq invertebrate) were referenced (O’Leary et al. 2016; Huerta-Cepas et al. 2019; UniProt Consortium 2023). Genes derived from bacteria were regarded as putative contaminants. Gene Ontology terms were assigned through EggNOG database. Homology search was performed for the putative contaminant transcriptome. blastp 2.14.0+ (Camacho et al. 2009) was adopted for this homology search against the NCBI nucleotide database.
Telomeric sequences on the final assembly were identified using the TeloExplorer module implemented in quarTeT 1.2.5 (Lin et al. 2023). An option, “-c animal,” was applied in this analysis. The gene content completeness of the assemblies and the transcriptome was assessed using BUSCO 5.7.1 (Manni et al. 2021), by searching against the metazoan core genes (metazoa_odb10). OMArk 0.3.0 (Nevers et al. 2024) was also used to quantify the quality of the annotation. To remove splicing variants, a transcript was extracted for each gene.
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Acknowledgments
We express our sincere gratitude to Mason Linscott, Yuki Kimura, Masanori Tatani, Christine Parent, Margrethe Johansen, Gemma Collins, and Takahiro Hirano for the technical support and advice. We also thank Sachiko Horie and Masayuki Maki for the experimental assistance. ChatGPT was utilized for some English editing. This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI [grant numbers 21H02556 and 22H04925 (PAGS), 22K06383 and 24K02094], JSPS Summer Programme Fellowship, Daiwa Foundation Small Grant (grant number 6495/15181), Biotechnology and Biological Sciences Research Council grant award (grant number BB/Z514500/1), and Environment Research and Technology Development Fund (grant number JPMEERF20244001) of the Environment Restoration and Conservation Agency (ERCA) provided by the Ministry of the Environment of Japan.
Data Availability
All raw data and assembly were deposited to DDBJ (BioProject accession number: PRJDB16720). All data are available from NCBI under the corresponding BioProject (verified on 2025 March 28). The accession numbers for each sequence are DRR528210 for Illumina short-read sequences, DRR623704–623707 for PacBio HiFi sequences, DRR527377 for Nanopore long-read sequences, DRR624568 for Hi-C sequences, and DRR633686 for RNA-seq sequences.