Abstract

Semisulcospira habei is a freshwater snail species endemic to the Lake Biwa drainage and belongs to a species group radiated within the lake system. We report the chromosome-scale genome assembly of S. habei, including eight megascaffolds larger than 150 Mb. The genome assembly size is about 2.0 Gb with an N50 of 237 Mb. There are 41,547 protein-coding genes modeled by ab initio gene prediction based on the transcriptome data set, and the BUSCO completeness of the annotated genes was 92.2%. The repeat elements comprise approximately 76% of the genome assembly. The Hi-C contact map showed seven well-resolved scaffolds that correspond to the basic haploid chromosome number of S. habei inferred from the preceding karyotypic study, while it also exhibited one scaffold with a complicated mosaic pattern that is likely to represent the complex of multiple supernumerary chromosomes. The genome assembly reported here represents a high-quality genome resource in disentangling the genomic background of the adaptive radiation of Semisulcospira and also facilitates evolutionary studies in the superfamily Cerithioidea.

Significance

There are fascinating examples of adaptive radiation in the ancient lakes. The Japanese ancient lake, Lake Biwa, harbors many endemic species, including a species flock of the gastropod genus Semisulcospira. The Semisulcospira snails in Lake Biwa can be a useful model system to study the mechanisms of species diversification, yet no high-quality genomic reference was available in this group. We report a chromosome-scale genome assembly of Semisulcospira habei, which will serve as an essential genomic resource for future research on the adaptive radiation of Semisulcospira and, more broadly, for evolutionary studies of the superfamily Cerithioidea.

Introduction

Lake Biwa is an ancient lake in central Japan with approximately 4 million years of history and harbors more than 60 endemic species and subspecies (Nishino 2012). The freshwater snails in the genus Semisulcospira in the Lake Biwa drainage system involve 18 species (Sawada and Fuke 2022, 2023), exhibiting the highest within-genus diversity among all Lake Biwa endemic taxa. Two genetically and morphologically distinct Semisulcospira groups exist in Lake Biwa: the Semisulcospira niponica group and the Semisulcospira nakasekoae group (Sawada and Fuke 2022). Each group involves nine species. Recent genome DNA studies based on the reduced representation sequencing technique demonstrated that these endemic Semisulcospira groups concurrently diversified in response to the enlargement of the lake about 0.4 million years ago (Miura et al. 2019). Their rapid species diversification within the limited geographical scale provides an ideal study system to evaluate the genetic background of species diversification, as shown in cichlid fishes in ancient African lakes (Malinsky et al. 2018; Nakamura et al. 2021) and finches in the Galapagos Islands (Lamichhaney et al. 2015, 2018).

Semisulcospira habei (fig. 1A) belongs to the S. niponica group and is the only living species that also appears in the fossil records from the Paleo-Lake Biwa deposits (Matsuoka 1987; Nishino and Watanabe 2000). Other living Semisulcospira species in Lake Biwa were absent from the fossil records, perhaps because of their recent diversification (Miura et al. 2019). This fossil evidence suggests that S. habei is a candidate stem species that had seeded the radiation of the S. niponica group. The high-quality draft genome assembly of S. habei should be an essential resource to disentangle ecological and evolutionary mechanisms of species radiation in the Semisulcospira snails, including the identification of genes that facilitate the rapid species diversification of the Semisulcospira snails in Lake Biwa.

(A) The picture of S. habei at Uji River. (B) Hi-C contact map showing eight megascaffolds. (C) The Circos plot exhibits the distribution of coding genes, SINEs, LINEs, DNA transposons, and simple repeats in eight megascaffolds. The numbers of genes and repeat elements were counted by a sliding window of 1 Mb moving a step of 0.5 Mb. The scale of the maximum value of the y axis was set to 60 for coding genes, while it was set to 650 for the other repeat elements.
Fig. 1.

(A) The picture of S. habei at Uji River. (B) Hi-C contact map showing eight megascaffolds. (C) The Circos plot exhibits the distribution of coding genes, SINEs, LINEs, DNA transposons, and simple repeats in eight megascaffolds. The numbers of genes and repeat elements were counted by a sliding window of 1 Mb moving a step of 0.5 Mb. The scale of the maximum value of the y axis was set to 60 for coding genes, while it was set to 650 for the other repeat elements.

Results and Discussion

The superfamily Cerithioidea includes more than 200 genera and approximately 1,100 species distributed worldwide across tropical, subtropical, and warm temperate regions and inhabits a variety of marine, brackish water, and freshwater habitats (Strong et al. 2011). Despite their ubiquitousness in aquatic habitats, only three genome assemblies were available in the INSD database, and two of them are of low genome coverage with less than 25% of BUSCO completeness (supplementary table S1, Supplementary Material online). Our genome assembly is the first chromosome-scale assembly in this large molluskan superfamily.

We used about 250 Gb PacBio long-read sequences, 240 Gb Illumina paired short-read sequences, and 160 Gb Hi-C library sequences to assemble the genome of S. habei (supplementary table S2, Supplementary Material online). We estimated the genome size of S. habei as 1.95 Gb based on the k-mer distribution and 1.89 Gb based on the back-mapping technique. The genome size of the final assembly was about 1.98 Gb, slightly larger than the estimates based on the short-read sequences. Repeated elements often affect short-read sequence-based estimations, resulting in smaller genome sizes (Heckenhauer et al. 2022; Pfenninger et al. 2022). This perhaps explains the discrepancy between the assembly size and genome size estimations.

The final assembly is composed of 7,743 contigs and 578 scaffolds (table 1), and approximately 98.9% of sequences were contained in the eight megascaffolds. The mapping rate of the Illumina short reads against the final assembly was 99.7%, and 99.5% of the reads were paired. We estimated the total repeat content of the S. habei genome was about 76% (supplementary table S3, Supplementary Material online), which is similar to other freshwater and terrestrial snails with large genome sizes (∼3 Gb genome size, repeat content: 71–77%; Schell et al. 2017; Guo et al. 2019; Saenko et al. 2021) but is much larger compared with the genome sequences of other freshwater snails (genome size <1 Gb, repeat content: 11–48%; Gomes-dos-Santos et al. 2019). Long interspersed nuclear elements (LINEs) dominated about 20% of the genome (supplementary table S3, Supplementary Material online). Long terminal repeat (LTR) retrotransposon was the second largest group, occupying about 10% of the genome. Nearly 30% of the repeat elements were not classified into known categories.

Table 1

The Statistics for the Genome Assembly and Annotation for S. habei

ItemCategoryValue
Assembly statisticsAssembled genome size1,984,187,800
Number of scaffolds578
Number of contigs7,743
N50236,511,961
N90178,179,616
GC content (%)45.0
BUSCO completeness (%)94.4
 Single copy (%)93.1
 Duplicated (%)1.3
 Fragmented (%)3.1
 Missing (%)2.5
Protein-coding genesNumber of genes41,547
Number of transcripts45,269
Number of annotated transcripts32,409
 Uniprot_Swiss-prot14,367
 Uniprot_TrEMBL28,904
 RefSeq (invertebrates)28,789
 InterPro30,114
 EggNOG29,214
  With GO terms24,676
  With KEGG pathways8,983
BUSCO completeness (%)92.2
 Single copy (%)89.0
 Duplicated (%)3.2
 Fragmented (%)4.0
 Missing (%)3.8
ItemCategoryValue
Assembly statisticsAssembled genome size1,984,187,800
Number of scaffolds578
Number of contigs7,743
N50236,511,961
N90178,179,616
GC content (%)45.0
BUSCO completeness (%)94.4
 Single copy (%)93.1
 Duplicated (%)1.3
 Fragmented (%)3.1
 Missing (%)2.5
Protein-coding genesNumber of genes41,547
Number of transcripts45,269
Number of annotated transcripts32,409
 Uniprot_Swiss-prot14,367
 Uniprot_TrEMBL28,904
 RefSeq (invertebrates)28,789
 InterPro30,114
 EggNOG29,214
  With GO terms24,676
  With KEGG pathways8,983
BUSCO completeness (%)92.2
 Single copy (%)89.0
 Duplicated (%)3.2
 Fragmented (%)4.0
 Missing (%)3.8

The BUSCO completeness was estimated using the metazoan core gene database.

Table 1

The Statistics for the Genome Assembly and Annotation for S. habei

ItemCategoryValue
Assembly statisticsAssembled genome size1,984,187,800
Number of scaffolds578
Number of contigs7,743
N50236,511,961
N90178,179,616
GC content (%)45.0
BUSCO completeness (%)94.4
 Single copy (%)93.1
 Duplicated (%)1.3
 Fragmented (%)3.1
 Missing (%)2.5
Protein-coding genesNumber of genes41,547
Number of transcripts45,269
Number of annotated transcripts32,409
 Uniprot_Swiss-prot14,367
 Uniprot_TrEMBL28,904
 RefSeq (invertebrates)28,789
 InterPro30,114
 EggNOG29,214
  With GO terms24,676
  With KEGG pathways8,983
BUSCO completeness (%)92.2
 Single copy (%)89.0
 Duplicated (%)3.2
 Fragmented (%)4.0
 Missing (%)3.8
ItemCategoryValue
Assembly statisticsAssembled genome size1,984,187,800
Number of scaffolds578
Number of contigs7,743
N50236,511,961
N90178,179,616
GC content (%)45.0
BUSCO completeness (%)94.4
 Single copy (%)93.1
 Duplicated (%)1.3
 Fragmented (%)3.1
 Missing (%)2.5
Protein-coding genesNumber of genes41,547
Number of transcripts45,269
Number of annotated transcripts32,409
 Uniprot_Swiss-prot14,367
 Uniprot_TrEMBL28,904
 RefSeq (invertebrates)28,789
 InterPro30,114
 EggNOG29,214
  With GO terms24,676
  With KEGG pathways8,983
BUSCO completeness (%)92.2
 Single copy (%)89.0
 Duplicated (%)3.2
 Fragmented (%)4.0
 Missing (%)3.8

The BUSCO completeness was estimated using the metazoan core gene database.

We annotated the genome by ab initio gene prediction based on 11 Gb of S. habei transcriptome sequences (supplementary table S2, Supplementary Material online). The final assembly contains 41,547 protein-coding genes (45,269 transcripts; table 1). The number of genes modeled in our assembly is larger than that of other freshwater snail genomes (14,000–24,000 genes), while this is comparable to that of some marine and terrestrial snails (Masonbrink et al. 2019; Saenko et al. 2021; Patra et al. 2023). The mean coding sequence length is 1,400 bp, and about 82% of genes have multiple exons (6.9 exons on average). Of the protein-coding genes modeled in our assembly, 32,409 genes were successfully annotated in one or more of the following databases: SwissProt, TrEMBL, EggNOG, InterPro, and RefSeq Invertebrate databases (table 1), and 24,676 genes possess gene ontology (GO) terms in the EggNOG database. The BUSCO analyses detected 94.4% of the metazoan core genes in the genome assembly and 92.2% in the annotated genes (table 1), suggesting the assembly covers the majority of genes present in the S. habei genome sequence. The result of the BlobTools analysis demonstrated the contamination from bacteria was negligible (supplementary fig. S1, Supplementary Material online).

Burch (1968) inferred that the basic haploid chromosome number of S. habei is seven (2N = 14) based on his karyotypic observations. He further reported three to six supernumerary chromosomes (B chromosomes) with variable chromosome lengths in S. habei. The sizes of these supernumerary chromosomes were comparable to those of the other chromosomes in S. habei. The Hi-C contact map demonstrates that there are seven well-resolved scaffolds and one scaffold with a complicated mosaic pattern (chr4 in fig. 1B). We consider that the seven well-resolved scaffolds correspond to the seven basic chromosomes. Because the supernumerary chromosomes often experience large structural rearrangements (Endo et al. 2008), we postulate that one scaffold with a mosaic pattern represents the complex of multiple supernumerary chromosomes. We ensured that the observed mosaicism was not the artifact of the long-read–based scaffolding since we got a similar mosaic pattern without the long-read–based scaffolding process. The supernumerary chromosomes are often not functional and contain no essential genes (Camacho et al. 2000). However, in our assembly, the amount and distribution pattern of coding genes and repeat elements are comparable to the other chromosomes (fig. 1C), suggesting chr4 is functional and perhaps essential for the survival of S. habei. Future genomic studies in combination with cytological techniques are required to evaluate the supernumerary status and the detailed genomic function of chr4.

Despite the large and repetitive nature of the S. habei genome, the assembly reported here is of high quality and will be an essential genomic resource for the evolutionary studies on the cerithioidean snails. In particular, it will substantially contribute to understanding the genomic background of the adaptive radiation of Semisulcospira in the ancient Lake Biwa.

Materials and Methods

Study Sample, DNA Extraction, and Quality Evaluation

We collected a male S. habei from Uji River, the main outlet of Lake Biwa. Semisulcospira habei is characterized by grid-like nodes on the shell surface and the elongated conical shell outlines with fewer number of basal cords on the body whorl (Davis 1969). High-molecular-weight genomic DNA (HMW-gDNA) was extracted from fresh snail tissue using cetyltrimethylammonium bromide (CTAB) and NucleoBond HMW DNA (Macherey-Nagel, Germany). In brief, about 110 mg of foot tissue was cut into small pieces (<0.5 mm2) and digested in 3 ml of 2× CTAB solution with 200 µl of Proteinase K solution at 50 °C for about 2 h. HMW-gDNA was washed once with 3 ml of chloroform, and the water phase containing HMW-gDNA was transferred to the NucleoBond HWM DNA column and cleaned up following manufacturer's protocol. In the final process of DNA extraction, we added 3.5 ml of isopropanol into 5 ml of Buffer H5 to precipitate DNA. The HMW-gDNA was briefly washed with 70% ethanol, air-dried for approximately 10 min, and dissolved in Tris-HCl buffer. The purity and concentration of the extracted HMW-gDNA were evaluated using BioSpec-nano spectrometer (Shimadzu, Japan) and Qubit fluorometer (Thermo Fisher Scientific, United States). The approximate length of the extracted DNA fragments was evaluated using a pulse-field agarose gel electrophoresis.

Whole Genome Sequencing

We sequenced the genome of S. habei using two runs of the PacBio Sequel II sequencer. The PacBio library preparation for the continuous long reads (CLRs) and sequencing were performed with the SMRTbell Express Template Prep Kit 2.0 and the Sequel II Binding Kit 2.0/Sequencing Kit 2.0 at the National Institute of Genetics Japan (NIG). We also obtained the short-read sequences using the Illumina NovaSeq 6000 sequencer. The 150 bp paired-end sequencing was performed using the TruSeq DNA PCR-Free Library Prep Kit and the NovaSeq 6000 SP Reagent Kit v1.5 at NIG. We conducted the quality filtering of the short-read sequences using fastp (Chen et al. 2018) by eliminating the low-quality bases with a quality score less than Q30.

Estimation of Genome Size

We used two short-read sequence-based approaches to estimate the genome size of S. habei. First, we estimated the genome size using the k-mer distribution. We used KMC (Kokot et al. 2017) to count canonical 21-mers from the short-read sequences and produced k-mer count histogram with a max coverage threshold of 1 million reads. The obtained histogram was analyzed by GenomeScope (Vurture et al. 2017) to estimate the genome size and average heterozygosity. Second, we used the back-mapping method, which estimates the genome size by dividing the total sum of the short-read sequences by the peak coverage from mapping back the final assembly (Schell et al. 2017). We used a Perl wrapper script (backmap.pl) of ModEst (Pfenninger et al. 2022) to execute the back-mapping process.

De Novo Genome Assembly

The genome assembly of S. habei was constructed based on the PacBio long-read data set using Flye v. 2.9 (Kolmogorov et al. 2019) with genome size set to 2 Gb and assembly coverage of 100. Duplicated contigs were removed from the assembly using the long-read sequences by Purge_haplotigs v. 1.1.2 (Roach et al. 2018) and Redundans v. 0.14a (Pryszcz and Gabaldón 2016). The resultant assembly was used as input for further scaffolding using the long-read sequences by Longstitch v. 1.0.4 (Coombe et al. 2021). The assembly was then polished using the short-read sequences by Pilon v. 1.23. Finally, the paired-end RNA-seq data set was used for scaffolding the assembly using P_RNA_scaffolder (Zhu et al. 2018).

Hi-C Scaffolding

We obtained the additional sequences using the Hi-C technology to achieve the chromosomal-scale assembly. To construct the Hi-C library, about 20 mg of columellar muscle tissue from the same individual was used for the genome sequencing. The Hi-C library was made using an Arima-HiC+ kit (Arima Genomics, United States) and sequenced using NovaSeq 6000. The library construction and sequencing were performed at NIG. The Hi-C data set was processed using Juicer (Durand et al. 2016b) followed by 3D-DNA v. 180114 (Dudchenko et al. 2017) to scaffold the genome assembly at the chromosomal scale. Juicebox v. 1.11.08 (Durand et al. 2016a) was used to manually review the assembly errors. We eliminated the scaffolds with more than 50% of ambiguous bases (or assembly gaps) and without any protein-coding genes, and the resulting 578 scaffolds were defined as the final assembly.

Genome Annotation

We used RepeatModeler v. 2.0.3 (Flynn et al. 2020) to construct a species-specific library of transposable elements and repeats for S. habei. The obtained species-specific model was combined with the known repeat library of Mollusca from the RepeatMasker database. This combined library was used to detect and soft masked the repeat elements using RepeatMasker v. 4.1.4 (http://www.repeatmasker.org).

We performed the transcriptome sequencing using the tissues from the same individual to make the transcriptome database for S. habei (table 1). The head and middle part, including foot and mantle tissues, and the other part, including gonadal and hepatopancreas tissues, were fixed separately using RNAlater (Qiagen, United States). These two parts were used for the transcriptome library constructions and sequencing. The transcriptome sequencing was performed at Macrogen Japan. The RNA-seq data set was aligned to the final assembly using Hisat2 v. 2.2.1 (Kim et al. 2019). We used Braker2 v. 2.1.6 (Brůna et al. 2021) to achieve ab initio gene prediction with the aid of Augustus (Stanke et al. 2008) and GeneMark (Brůna et al. 2020). Untranslated regions were also detected using Gushr (Hoff and Stanke 2019). Protein sequences of the predicted genes were constructed using Gffread v. 0.12.7 (Pertea and Pertea 2020). We then performed functional annotations of the predicted genes using the EnTAP v. 0.10.8 pipeline (Hart et al. 2020) based on SwissProt, TrEMBL, EggNOG, InterPro, and RefSeq Invertebrate databases. We evaluated gene content completeness of the final assembly and annotation using BUSCO v.5 by counting the presence of the metazoan core genes (Manni et al. 2021). Finally, BlobTools v.1.1.1 (Laetsch and Blaxter 2017) was used to assess potential bacterial contaminations based on the NCBI nonredundant nucleotide database and the UniProt reference proteome database.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).

Acknowledgments

We thank Prof. Y. Sakakibara and V. Jayakumar for their valuable advice in constructing the genome assembly. We also thank T. Saito and K. Morita for their assistance in the sampling of S. habei. This work was supported by JSPS KAKENHI Grant Numbers 16H06279 (PAGS), 20K06788, and 23H02540.

Data Availability

The whole genome sequences of S. habei were deposited to DDBJ BioProject accession number PRJDB16287. The accession number of the PacBio long-read sequences is DRR494364, and that of the Illumina short-read sequences is DRR494361. The sequences of the Hi-C library were deposited at DDBJ under accession number DRR494363. The transcriptome short-read sequences were deposited under accession number DRR494362. The final chromosome-scale assembly was deposited at DDBJ under accession number BTPG01000001-BTPG01000578. The genome and genome annotation results have also been deposited in Figshare under the DOI: 10.6084/m9.figshare.24180666.

Literature Cited

Brůna
T
,
Hoff
KJ
,
Lomsadze
A
,
Stanke
M
,
Borodovsky
M
.
2021
.
BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database
.
NAR Genom Bioinform
.
3
:
lqaa108
.

Brůna
T
,
Lomsadze
A
,
Borodovsky
M
.
2020
.
GeneMark-EP+: eukaryotic gene prediction with self-training in the space of genes and proteins
.
NAR Genom Bioinform
.
2
:
lqaa026
.

Burch
J
.
1968
.
Cytotaxonomy of some Japanese Semisulcospira (Streptoneura: Pleuroceridae)
.
J Conchyliol
.
107
:
3
51
.

Camacho
JPM
,
Sharbel
TF
,
Beukeboom
LW
.
2000
.
B-chromosome evolution
.
Philos Trans R Soc Lond Ser B Biol Sci
.
355
:
163
178
.

Chen
S
,
Zhou
Y
,
Chen
Y
,
Gu
J
.
2018
.
Fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
34
:
i884
i890
.

Coombe
L
, et al.
2021
.
Longstitch: high-quality genome assembly correction and scaffolding using long reads
.
BMC Bioinformatics
22
:
1
13
.

Davis
G
.
1969
.
A taxonomic study of some species of Semisulcospira in Japan (Mesogastropoda: Pleuroceridae)
.
Malacologia
7
:
211
294
.

Dudchenko
O
, et al.
2017
.
De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds
.
Science
356
:
92
95
.

Durand
NC
, et al.
2016a
.
Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom
.
Cell Syst
.
3
:
99
101
.

Durand
NC
, et al.
2016b
.
Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments
.
Cell Syst
.
3
:
95
98
.

Endo
TR
, et al.
2008
.
Dissection of rye B chromosomes, and nondisjunction properties of the dissected segments in a common wheat background
.
Genes Genet Syst
.
83
:
23
30
.

Flynn
JM
, et al.
2020
.
Repeatmodeler2 for automated genomic discovery of transposable element families
.
Proc Natl Acad Sci USA
.
117
:
9451
9457
.

Gomes-dos-Santos
A
,
Lopes-Lima
M
,
Castro
LFC
,
Froufe
E
.
2019
.
Molluscan genomics: the road so far and the way forward
.
Hydrobiologia
847
:
1705
1726
.

Guo
Y
, et al.
2019
.
A chromosomal-level genome assembly for the giant African snail Achatina fulica
.
Gigascience
8
:
giz124
.

Hart
AJ
, et al.
2020
.
EnTAP: bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes
.
Mol Ecol Resour
.
20
:
591
604
.

Heckenhauer
J
, et al.
2022
.
Genome size evolution in the diverse insect order Trichoptera
.
GigaScience
11
:
giac011
.

Hoff
KJ
,
Stanke
M
.
2019
.
Predicting genes in single genomes with AUGUSTUS
.
Curr Protoc Bioinform
.
65
:
e57
.

Kim
D
,
Paggi
JM
,
Park
C
,
Bennett
C
,
Salzberg
SL
.
2019
.
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
.
Nat Biotechnol
.
37
:
907
915
.

Kokot
M
,
Długosz
M
,
Deorowicz
S
.
2017
.
KMC 3: counting and manipulating k-mer statistics
.
Bioinformatics
33
:
2759
2761
.

Kolmogorov
M
,
Yuan
J
,
Lin
Y
,
Pevzner
PA
.
2019
.
Assembly of long, error-prone reads using repeat graphs
.
Nat Biotechnol
.
37
:
540
546
.

Laetsch
DR
,
Blaxter
ML
.
2017
.
Blobtools: interrogation of genome assemblies
.
F1000Res
.
6
:
1287
.

Lamichhaney
S
, et al.
2015
.
Evolution of Darwin's finches and their beaks revealed by genome sequencing
.
Nature
518
:
371
375
.

Lamichhaney
S
, et al.
2018
.
Rapid hybrid speciation in Darwin’s finches
.
Science
359
:
224
228
.

Malinsky
M
, et al.
2018
.
Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow
.
Nat Ecol Evol
.
2
:
1940
1955
.

Manni
M
,
Berkeley
MR
,
Seppey
M
,
Simão
FA
,
Zdobnov
EM
.
2021
.
BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
.
Mol Biol Evol
.
38
:
4647
4654
.

Masonbrink
RE
, et al.
2019
.
An annotated genome for Haliotis rufescens (red abalone) and resequenced green, pink, pinto, black, and white abalone species
.
Genome Biol Evol
.
11
:
431
438
.

Matsuoka
K
.
1987
.
Malacofaunal succession in Pliocene to Pleistocene non-marine sediments in the Omi and Ueno basins, central Japan
.
J Earth Sci Nagoya Univ
.
35
:
23
115
.

Miura
O
,
Urabe
M
,
Nishimura
T
,
Nakai
K
,
Chiba
S
.
2019
.
Recent lake expansion triggered the adaptive radiation of freshwater snails in the ancient Lake Biwa
.
Evol Lett
.
3
:
43
54
.

Nakamura
H
, et al.
2021
.
Genomic signatures for species-specific adaptation in Lake Victoria cichlids derived from large-scale standing genetic variation
.
Mol Biol Evol
.
38
:
3111
3125
.

Nishino
M
.
2012
. Biodiversity of Lake Biwa. In:
Kawanabe
H
Nishino
M
and
Maehata
M
, editors.
Lake Biwa: interactions between nature and people
.
Dordrecht, Heidelberg, New York and London
:
Springer
. p.
31
35
.

Nishino
M
,
Watanabe
N
.
2000
.
Evolution and endemism in Lake Biwa, with special reference to its gastropod mollusc fauna
.
Adv Ecol Res
.
31
:
151
180
.

Patra
AK
, et al.
2023
.
Genome assembly of the Korean intertidal mud-creeper Batillaria attramentaria
.
Sci Data
.
10
:
498
.

Pertea
G
,
Pertea
M
.
2020
.
GFF utilities: GffRead and GffCompare
.
F1000Res
.
9
:
304
.

Pfenninger
M
,
Schönnenbeck
P
,
Schell
T
.
2022
.
Modest: accurate estimation of genome size from next generation sequencing data
.
Mol Ecol Resour
.
22
:
1454
1464
.

Pryszcz
LP
,
Gabaldón
T
.
2016
.
Redundans: an assembly pipeline for highly heterozygous genomes
.
Nucleic Acids Res
.
44
:
e113
.

Roach
MJ
,
Schmidt
SA
,
Borneman
AR
.
2018
.
Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies
.
BMC Bioinformatics
19
:
1
10
.

Saenko
SV
,
Groenenberg
DS
,
Davison
A
,
Schilthuizen
M
.
2021
.
The draft genome sequence of the grove snail Cepaea nemoralis
.
G3 (Bethesda)
11
:
jkaa071
.

Sawada
N
,
Fuke
Y
.
2022
.
Systematic revision of the Japanese freshwater snail Semisulcospira decipiens (Mollusca: Semisulcospiridae): implications for diversification in the ancient Lake Biwa
.
Invertebr Syst
.
36
:
1139
1177
.

Sawada
N
,
Fuke
Y
.
2023
.
Diversification in ancient Lake Biwa: integrative taxonomy reveals overlooked species diversity of the Japanese freshwater snail genus Semisulcospira (Mollusca: Semisulcospiridae)
.
Contrib Zool
.
92
:
1
37
.

Schell
T
, et al.
2017
.
An annotated draft genome for Radix auricularia (Gastropoda, Mollusca)
.
Genome Biol Evol
.
9
:
585
592
.

Stanke
M
,
Diekhans
M
,
Baertsch
R
,
Haussler
D
.
2008
.
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
.
Bioinformatics
24
:
637
644
.

Strong
EE
, et al.
2011
.
Phylogeny of the gastropod superfamily Cerithioidea using morphology and molecules
.
Zool J Linn Soc
.
162
:
43
89
.

Vurture
GW
, et al.
2017
.
Genomescope: fast reference-free genome profiling from short reads
.
Bioinformatics
33
:
2202
2204
.

Zhu
B-H
, et al.
2018
.
P_RNA_scaffolder: a fast and accurate genome scaffolder using paired-end RNA-sequencing reads
.
BMC Genomics
19
:
1
13
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Associate Editor: John Wang
John Wang
Associate Editor
Search for other works by this author on:

Supplementary data