A high-quality de novo genome assembly for clapper rail ( Rallus crepitans ) Open Access

Contiguity statistics for Rallus crepitans assemblies comparing the Dovetail HiC Assembly and the bRalCre1.1 assembly.

	Dovetail HiC Assembly	bRalCre1.1
Scaffolds	13,226	12,159
Total length (Mb)	994.8	1,107.5
N50 (Mb)	82.7	82.9
N90 (Mb)	10.8	12.2
L50	4	4
L90	18	20
Longest scaffold (Mb)	204.0	204.6
# Ns	4,085,069	3,899,784
# Gaps	42,269	41,488

	Dovetail HiC Assembly	bRalCre1.1
Scaffolds	13,226	12,159
Total length (Mb)	994.8	1,107.5
N50 (Mb)	82.7	82.9
N90 (Mb)	10.8	12.2
L50	4	4
L90	18	20
Longest scaffold (Mb)	204.0	204.6
# Ns	4,085,069	3,899,784
# Gaps	42,269	41,488

Table 1.

Contiguity statistics for Rallus crepitans assemblies comparing the Dovetail HiC Assembly and the bRalCre1.1 assembly.

	Dovetail HiC Assembly	bRalCre1.1
Scaffolds	13,226	12,159
Total length (Mb)	994.8	1,107.5
N50 (Mb)	82.7	82.9
N90 (Mb)	10.8	12.2
L50	4	4
L90	18	20
Longest scaffold (Mb)	204.0	204.6
# Ns	4,085,069	3,899,784
# Gaps	42,269	41,488

	Dovetail HiC Assembly	bRalCre1.1
Scaffolds	13,226	12,159
Total length (Mb)	994.8	1,107.5
N50 (Mb)	82.7	82.9
N90 (Mb)	10.8	12.2
L50	4	4
L90	18	20
Longest scaffold (Mb)	204.0	204.6
# Ns	4,085,069	3,899,784
# Gaps	42,269	41,488

Table 2.

Estimates of assembly completeness using the BUSCO aves_odb10 database (n = 8338 BUSCOs) showing the improvements in completeness between the Dovetail HiC Assembly and the bRalCre1.1 assembly, which includes the Z chromosome.

	Dovetail HiC Assembly		bRalCre1.1
	Count	Percentage	Count	Percentage
Complete BUSCOs	7130	85.6	7671	92.0
Complete and single-copy BUSCOs	7117	85.4	7616	91.3
Complete and duplicated BUSCOs	13	0.2	55	0.7
Fragmented BUSCOs	314	3.8	216	2.6
Missing BUSCOs	894	10.6	451	5.4

	Dovetail HiC Assembly		bRalCre1.1
	Count	Percentage	Count	Percentage
Complete BUSCOs	7130	85.6	7671	92.0
Complete and single-copy BUSCOs	7117	85.4	7616	91.3
Complete and duplicated BUSCOs	13	0.2	55	0.7
Fragmented BUSCOs	314	3.8	216	2.6
Missing BUSCOs	894	10.6	451	5.4

Table 2.

10.1186/s13059-019-1829-6

	Dovetail HiC Assembly		bRalCre1.1
	Count	Percentage	Count	Percentage
Complete BUSCOs	7130	85.6	7671	92.0
Complete and single-copy BUSCOs	7117	85.4	7616	91.3
Complete and duplicated BUSCOs	13	0.2	55	0.7
Fragmented BUSCOs	314	3.8	216	2.6
Missing BUSCOs	894	10.6	451	5.4

	Dovetail HiC Assembly		bRalCre1.1
	Count	Percentage	Count	Percentage
Complete BUSCOs	7130	85.6	7671	92.0
Complete and single-copy BUSCOs	7117	85.4	7616	91.3
Complete and duplicated BUSCOs	13	0.2	55	0.7
Fragmented BUSCOs	314	3.8	216	2.6
Missing BUSCOs	894	10.6	451	5.4

Contig re-assembly using spades output 55,026 contigs having a total length of 1.1 Gb, a N50 of 58.0 kb (L50 = 4,904), a N90 of 9.5 kb (L90 = 22,795), and a maximum contig length of 907 kb. We identified 24,773 contigs that did not align to macrochromosomes in the Dovetail HiC assembly and we submitted these to Dovetail for re-scaffolding, which output a set of 12,193 scaffolds having an N50 of 15.3 Mb (L50 = 5) and a N90 of 8 Kb (L90 = 673). The longest scaffold in the re-assembly was 76.1 Mb in length and primarily aligned to the chicken Z chromosome. After merging the macrochromosomes from the Dovetail HiC Assembly with these scaffolds representing the microchromosomes and unplaced contigs and polishing the assembly, we removed 4 contigs identified by the NCBI FCS tools as alphaproteobacteria or eukaryotic viruses, masked 44 bases that corresponded to known adapter sequences, and removed 5 contigs identified as mitochondrial contamination. The contact map illustrated that HiRise performed well when scaffolding large (>100 kb) macro- and micro-chromosomes (Supplementary Fig. 1), although we could not discern a shift in the distribution of scaffold lengths that potentially differentiated microchromosomes from unplaced scaffolds (Supplemental Data). MitoFinder assembled a contig representing the mitochondrial genome that was similar in length (17.1 kb) to other rail species.

The final version of the assembly, bRalCre1.1, included 12,159 scaffolds/contigs having a total length of 1.1 Gb, a N50 of 82.9 Mb (L50 = 4), a N90 of 12.2 Mb (L90 = 20), and a maximum scaffold length of 204.6 Mb. BUSCO completeness estimates for bRalCre1.1 improved on the results from the Dovetail HiC Assembly (Table 2), although several BUSCOs remained fragmented (n = 216; 2.6%) or were not detected (n = 451; 5.4%). Merqury results suggested that bRalCre1.1 was relatively complete (kmer completeness = 91.4%) and accurate (consensus quality = 55.2 or > 99.999% accuracy). Repetitive elements comprised ∼9% of the assembly (Supplementary Table 2), and a majority of these repeats were retroelements.

The bRalCre1.1 assembly we produced is the second for a species in the genus Rallus and one of six assemblies representing taxa within the Rallidae. Our assembly is among the most contiguous for the taxonomic family (Supplementary Table 1), and the availability of a genome assembly representing this genus will facilitate investigations of salinity tolerance, interspecific hybridization, and mechanisms of speciation in clapper and king rails.

Data availability

All short-insert, Chicago, and HiC sequencing data are available as part of NCBI BioProject PRJNA926626. The Whole Genome Shotgun project for bRalCre1.1 has been deposited at DDBJ/ENA/GenBank under the accession JAQOTC000000000. The version described in this paper is version JAQOTC010000000. Supplementary Table 1, Supplementary Fig. 1, a list of steps used to assemble the genome that includes the Python code used, Genomescope results, the PretextMap, Merqury results, RepeatMasker annotations, and results from BUSCO analyses of other rallid genomes are available from FigShare (https://doi.org/10.6084/m9.figshare.21983261).

Acknowledgments

We thank the Dovetail Genomics staff members for facilitating this work. We thank Andre Moncrieff for assistance with specimen collection and voucher specimen preparation and Donna Dittmann for assistance with the tissue loan. We thank LSUMZ staff for curation of the voucher specimen and associated samples.

Funding

Funding for this project was provided by the University of Delaware College of Agriculture and Natural Resources Seed Grant. E.C.E. was supported by the University of Delaware. B.C.F. and R.T.B. were partially supported by DEB-1655624. Portions of this research were conducted with high performance computing resources provided by Louisiana State University (http://www.hpc.lsu.edu) and the Louisiana Optical Network Infrastructure (https://loni.org).

Literature Cited

Allio

Schomaker-Bastos

Romiguier

Prosdocimi

Nabholz

Delsuc

Mitofinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics

Mol Ecol Resour

2020

;

(

892

–

905

. doi:

10.1111/1755-0998.13160

Alonge

Soyk

Ramakrishnan

Wang

Goodwin

Sedlazeck

Lippman

Schatz

RaGOO: fast and accurate reference-guided scaffolding of draft genomes

Genome Biol

2019

;

(

224

. doi:

Andrey

Dmitry

Alla

Anton

Using SPAdes de novo assembler

Curr Protoc Bioinformatics

2020

;

(

e102

. doi:

Bankevich

Nurk

Antipov

Gurevich

Dvorkin

Kulikov

Lesin

Nikolenko

Pham

Prjibelski

, et al.

SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing

J Comput Biol

2012

;

(

455

–

477

. doi:

10.1089/cmb.2012.0021

Bolger

Lohse

Usadel

Trimmomatic: a flexible trimmer for Illumina sequence data

Bioinformatics

2014

;

(

2114

–

2120

. doi:

10.1093/bioinformatics/btu170

Buckner

Sanders

Faircloth

Chakrabarty

The critical importance of vouchers in genomics

Elife

2021

;

e68264

. doi:

10.7554/eLife.68264

. https://sourceforge.net/projects/bbmap/.

Bushnell

. 2014.

BBMap: a fast, accurate, splice-aware aligner. Conference: 9th Annual Genomics of Energy & Environment Meeting, Walnut Creek, CA, March 17-20

Chapman

Sunkara

Luo

Schroth

Rokhsar

Meraculous: De Novo genome assembly with short paired-end reads

PLOS ONE

2011

;

(

e23501

. doi:

10.1371/journal.pone.0023501

10.1016/0300-9629(88)90946-2

Conway

Hucmcst

Moldenhaltr

Extra-renal salt excretion in Clapper and King rails

Comp Biochem Physiol.

1988

;

(

671

–

674

. doi:

. https://doi.org/10.1111/evo.14377.

Del-Rio

Rego

Whitney

Schunck

Silveira

Faircloth

Brumfield

Displaced clines in an avian hybrid zone (Thamnophilidae: Rhegmatorhina) within an amazonian interfluve

Evolution

2021

;

(

455

–

475

Garcia–R

Gonzalez-Orozco

Trewick

Contrasting patterns of diversification in a bird family (aves: Gruiformes: Rallidae) are revealed by analysis of geospatial distribution of species and phylogenetic diversity

Ecography

2019

;

(

500

–

510

. doi:

Greenberg

Maldonado

Droege

McDonald

Tidal marshes: a global perspective on the evolution and conservation of their terrestrial vertebrates

BioScience

2006

;

(

675

–

685

. doi:

10.1641/0006-3568(2006)56[675:TMAGPO]2.0.CO;2

Greenlaw

Elphick

Post

Rising

. Saltmarsh sparrow. In:

Rodewald

, editor.

Birds of the world

. Ithaca, NY, USA

: Cornell Lab of Ornithology

, 2018

Google Preview

. https://doi.org/10.48550/arXiv.1303.3997,

Kent

Sugnet

Furey

Roskin

Pringle

Zahler

Haussler

The human genome browser at UCSC

Genome Res

2002

;

(

996

–

1006

. doi:

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997

26 May 2013, preprint: not peer reviewed

Minimap2: pairwise alignment for nucleotide sequences

Bioinformatics

2018

;

(

3094

–

3100

. doi:

10.1093/bioinformatics/bty191

Handsaker

Wysoker

Fennell

Ruan

Homer

Marth

Abecasis

Durbin

;

1000 Genome Project Data Processing Subgroup

The sequence alignment/map format and SAMtools

Bioinformatics

2009

;

(

2078

–

2079

. doi:

10.1093/bioinformatics/btp352

Luo

Liu

Leung

Ting

Sadakane

Yamashita

Lam

MEGAHIT V1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices

Methods

2016

;

102

–

. doi:

10.1016/j.ymeth.2016.02.020

Lieberman-Aiden

van Berkum

Williams

Imakaev

Ragoczy

Telling

Amit

Lajoie

Sabo

Dorschner

, et al.

Comprehensive mapping of long-range interactions reveals folding principles of the human genome

Science

2009

;

326

(

5950

289

–

293

. doi:

10.1126/science.1181369

Liu

Schröder

Schmidt

Musket: a multistage k-mer spectrum-based error corrector for Illumina sequence data

Bioinformatics

2013

;

(

308

–

315

. doi:

10.1093/bioinformatics/bts690

Maley

Ecological speciation of King rails (Rallus elegans) and Clapper rails (Rallus longirostris) [Ph.D. dissertation]: Louisiana State University, Baton Rouge Louisiana

;

2012

. http://digitalcommons.lsu.edu/gradschool_dissertations/1773.

Maley

Brumfield

Mitochondrial and next-generation sequence data used to infer phylogenetic relationships and species limits in the clapper/king rail complex—datos mitocondriales y de la próxima generación usados para inferir relaciones filogenéticas y límites de especi

Condor

2013

;

115

(

316

–

329

. doi:

10.1525/cond.2013.110138

Manni

Berkeley

Seepey

Simão

Zdobnov

BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes

Mol Biol Evol

2021

;

(

4647

–

4654

. doi:

10.1093/molbev/msab199

Olson

. Toward a less imperfect understanding of the systematics and biogeography or the clapper and king rail Complex. In:

Dickerman

, editors.

The Era of Allan R. Phillips

Albuquerque (NM)

Horizon Communications

;

1997

. p.

–

111

Google Preview

10.1093/bioinformatics/btq033

Putnam

O’Connell

Stites

Rice

Blanchette

Calef

Troll

Fields

Hartley

Sugnet

, et al.

Chromosome-scale shotgun assembly using an in vitro method for long-range linkage

Genome Res

2016

;

(

342

–

350

. doi:

10.1101/gr.193474.115

Quinlan

Hall

BEDTools: a flexible suite of utilities for comparing genomic features

Bioinformatics

2010

;

(

841

–

842

. doi:

Recuerda

Vizueta

Cuevas-Caballé

Blanco

Rozas

Milá

Chromosome-level genome assembly of the common chaffinch (aves: Fringilla coelebs): a valuable resource for evolutionary biology

Genome Biol Evol

2021

;

(

evab034

. doi:

10.1093/gbe/evab034

10.1186/s13059-020-02134-9

Rhie

Walenz

Koren

Phillipy

Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

Genome Biol

2020

;

(

245

. doi:

Rush

Gaines

Eddleman

Conway

. Clapper rail (Rallus crepitans), version 1.0. In birds of the world. In:

Rodewald

, editors.

Birds of the World

Ithaca (NY)

Cornell Lab of Ornithology

;

2020

Google Preview

. http://www.repeatmasker.org

Salter

Johnson

Stafford

III

Herrin

Jr,

Schilling

Cedotal

Brumfield

Faircloth

A highly contiguous reference genome for Northern Bobwhite (Colinus virginianus)

G3 (Bethesda)

2019

;

(

3929

–

3932

. doi:

10.1534/g3.119.400609

Shakya

Haryoko

Irham

, Suparno,

Prawiradilaga

Sheldon

Genomic investigation of colour polymorphism and phylogeographic variation among populations of black-headed bulbul (Brachypodius atriceps) in insular Southeast Asia

Mol Ecol

2021

;

(

4757

–

4770

. doi:

Shriver

Gibbs

Vickery

Gibbs

Hodgman

Jones

Jacques

Concordance between morphological and molecular markers in assessing hybridization between sharp-tailed sparrows in new England

e Auk

2005

;

122

(

–

107

. doi:

Smith

Hubley

RepeatModeler Open-1.0

;

2008

Smith

Hubley

Green

RepeatMasker Open-4.0

;

2013

. http://www.repeatmasker.org

Sullivan

Wood

Illif

Bonney

Fink

Kelling

Ebird: a citizen-based bird observation network in the biological sciences

Biol Conserv

2009

;

(

142

2282

–

2292

. doi:

10.1016/j.biocon.2009.05.006

10.1093/bioinformatics/btx153

Vurture

Sedlazeck

Nattestad

Underwood

Fang

Gurtowski

Schatz

Genomescope: fast reference-free genome profiling from short reads

Bioinformatics

2017

;

(

2202

–

2222

. doi:

Walker

Abeel

Shea

Priest

Abouelliel

Sakthikumar

Cuomo

Zeng

Wortman

Young

, et al.

Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement

PLoS One

2014

;

(

e112963

. doi:

10.1371/journal.pone.0112963