-
PDF
- Split View
-
Views
-
Cite
Cite
Lindsay S Miles, Hannah Waterman, Nadia A Ayoub, Jessica E Garb, Robert A Haney, Michael S Rosenberg, Trevor J Krabbenhoft, Brian C Verrelli, Insight into the adaptive role of arachnid genome-wide duplication through chromosome-level genome assembly of the Western black widow spider, Journal of Heredity, Volume 115, Issue 3, May 2024, Pages 241–252, https://doi.org/10.1093/jhered/esae018
- Share Icon Share
Abstract
Although spiders are one of the most diverse groups of arthropods, the genetic architecture of their evolutionary adaptations is largely unknown. Specifically, ancient genome-wide duplication occurring during arachnid evolution ~450 mya resulted in a vast assembly of gene families, yet the extent to which selection has shaped this variation is understudied. To aid in comparative genome sequence analyses, we provide a chromosome-level genome of the Western black widow spider (Latrodectus hesperus)—a focus due to its silk properties, venom applications, and as a model for urban adaptation. We used long-read and Hi-C sequencing data, combined with transcriptomes, to assemble 14 chromosomes in a 1.46 Gb genome, with 38,393 genes annotated, and a BUSCO score of 95.3%. Our analyses identified high repetitive gene content and heterozygosity, consistent with other spider genomes, which has led to challenges in genome characterization. Our comparative evolutionary analyses of eight genomes available for species within the Araneoidea group (orb weavers and their descendants) identified 1,827 single-copy orthologs. Of these, 155 exhibit significant positive selection primarily associated with developmental genes, and with traits linked to sensory perception. These results support the hypothesis that several traits unique to spiders emerged from the adaptive evolution of ohnologs—or retained ancestrally duplicated genes—from ancient genome-wide duplication. These comparative spider genome analyses can serve as a model to understand how positive selection continually shapes ancestral duplications in generating novel traits today within and between diverse taxonomic groups.
Introduction
Spiders, members of the class Arachnida, have remarkably unique characteristics for which a genomic understanding could reveal applications for fields ranging from medicine and materials science to ecology and evolutionary biology (Hesselberg and Gálvez 2023). Spiders are renowned for their venomous capabilities related to capturing prey, with applications in drug development (Netirojjanakul and Miranda 2017), as well as for the properties of their silk, in particular strength and lightweight nature, while providing insights into biomaterials engineering, medical, and defense technology (Gatesy et al. 2001; Ayoub et al. 2007). From an ecological and evolutionary perspective, they have complex social behaviors, predatory strategies, web-building and courtship traits that highlight novelty in communication, and adaptation to diverse habitats including human-built ecosystems (Avilés 1997; Lowe et al. 2014; Tong et al. 2022). Finally, spiders have a deep phylogenetic history with origins in the Paleozoic—and are the second most diverse arthropod group behind beetles at >51,000 species (World Spider Catalog 2023)—with global impacts on insect pests, agriculture, and ecosystem sustainability (Michalko et al. 2019; Roberts-McEwen et al. 2022; Stojanowska et al. 2023).
Despite these biological and practical applications, the genomic understanding of spiders has been hindered by challenges posed by their complex genomes, such as their relatively large size—with a significant portion being repetitive DNA—and demonstrated high heterozygosity within species (Sanggaard et al. 2014). An added potential difficulty is historical genome-wide duplication that took place prior to the divergence of scorpions and spiders ~450 mya (Schwager et al. 2017). This event has had profound effects on genetic architecture and gene family evolution, including venom and silk (Sanggaard et al. 2014; Clarke et al. 2015; Gendreau et al. 2017; Sanchez-Herrero 2019) and developmental regulation (Schwager et al. 2007, 2017; Di et al. 2015; Leite et al. 2018; Aase-Remedios et al. 2023), yet the role that positive selection played in shaping variation following this ancient genome-wide duplication is just beginning to be understood (Cerca et al. 2021). One example of where this genome complexity and duplication has created uncertainty is with respect to sex chromosome evolution. Sex chromosome systems are diverse in spiders (Kořínková and Král 2012; Sember et al. 2020), yet with only a few comparative genome studies, we find significant variation in both the number and size of these chromosomes (Sheffer et al. 2022; Zhu et al. 2022). While these challenges have been obstacles, they also present opportunities to understand how gene duplication has shaped the evolution of unique spider phenotypes. In recent years, the field of genomics, especially for non-model organisms, has been revolutionized by advancements in sequencing technologies that have the potential to unravel insights into spider biology, evolution, and ecological interactions.
The Western black widow spider, Latrodectus hesperus, has become a model species for eco-evolutionary research related to its biomedical and materials applications. They are cob-web building spiders in the Theridiidae family whose silk has been studied extensively due to its extreme tensile strength, among the strongest on record (Gosline et al. 1999; Swanson et al. 2006; Zhao et al. 2010; Lane et al. 2013). Additionally, their venom has been well studied as it is composed of multiple toxins, including latrotoxins which are a family of neurotoxic proteins (Ushkaryov et al. 2004; He et al. 2013). One of these latrotoxins (alpha-latrotoxin) is known to be vertebrate-specific and has been shown to trigger neurotransmitter exocytosis, resulting in extreme pain (Blair 1934; Ushkaryov et al. 1992, 2004; Orlova et al. 2000; Ashton et al. 2001). The diversity of venom, silk, and other key proteins have been revealed through a variety of studies, including transcriptomic and proteomic data (Clarke et al. 2014; Haney et al. 2014; Miles et al. 2020). However, as has been the case for spiders in general, the complex history of gene duplication and genomic architecture has complicated our understanding of key traits and their evolution, both within populations and across species, and argues for chromosomal assembly approaches to overcome these obstacles.
A key characteristic of L. hesperus is its ability to thrive in human-built environments, specifically with behavioral, ecological, and evolutionary differences between urban and non-urban populations (Johnson et al. 2012, 2014; Trubl et al. 2012; Miles et al. 2018a, 2018b). In urban areas, L. hesperus forms dense aggregations in close proximity to humans (Trubl et al. 2012), creating a health concern given their highly toxic, vertebrate-specific venom (Vetter and Ibister 2008). Population genetic analyses indicate that urban areas facilitate L. hesperus gene flow, with higher genetic diversity within and lower genetic differentiation between populations compared to non-urban areas (Miles et al. 2018a, 2018b), a pattern contrary to the conventional wisdom of how cities fragment populations, but one which is more common than originally believed (Miles et al. 2019). Identifying the genetic basis of local adaptation, particularly to the novel urban environment, is a major focus for urban evolutionary biology (Schell 2018; Rivkin et al. 2019; Perrier et al. 2020; Verrelli et al. 2022; Winchell et al. 2023b). However, these studies are rare owing to the fact that non-model organisms that typically inhabit these areas have not been prioritized for genomic research until recently (Johnson and Munshi-South 2017; Szulkin et al. 2020; Santangelo et al. 2022; Winchell et al. 2022; Winchell et al. 2023a).
Here, we sequenced, assembled, and annotated the genome of L. hesperus to compare with several other available spider genomes within the Araneoidea (Fig. 1)—the well-studied group that includes orb weavers and represents ~25% of all described spider species (Wheeler et al. 2017). Several chromosome-level spider genomes have recently become available (Escuer et al. 2022; Sheffer et al. 2022; Wang et al. 2022), which allow for comparative evolutionary genomic structure analyses. Specifically, with chromosome-level genomes, we can better identify orthologous genes and test hypotheses of the role of gene duplication in spider genome evolution, a task that has been difficult as a result of the complexity of these genomes. Our study contributes to the understanding not only of spider evolutionary genomics in general but also to aid in further investigations of the population genetics and transcriptomics of the Western black widow spider as a model of urban adaptation.

Phylogenetic tree of araneomorph spiders. Rooted species tree produced in Orthofinder analysis using proteomes from high-quality annotated spider genomes analyzed in the current study (highlighted in boxes). Dates at nodes are in millions of years and obtained from Garb et al. (2004), Garrison et al. (2016), and Liu et al. (2016). Images of species obtained from Creative Commons as follows: Dysdera crocata (by Tone Killick is licensed under CC BY-NC-SA 2.0.); Stegodyphus dumicola (by Bernard Dupont CC BY-NC 2.0); Parasteatoda tepidariorum (by Judy Gallagher is licensed under CC BY 2.0.); Latrodectus hesperus (by Ferrous Femur is licensed under CC BY-NC-SA 2.0); Latrodectus elegans (Kananbala et al. 2012); Tetragnatha sp. (by sankax is licensed under CC BY-NC 2.0.); Oedothorax gibbosus (by Rhithrogena22 is licensed under CC BY 2.0.); Hylyphantes graminicola (by dhobern is licensed under CC BY 2.0.); Argiope bruennichi (by Rinaldo R is licensed under CC BY-NC-SA 2.0.); Trichonephila antipodiana (by tombenson76 is licensed under CC BY-NC-ND 2.0).
Materials and methods
Sample collection and sequencing
Adult female L. hesperus spiders were collected from Phoenix, AZ, United States of America, in June 2018, and then stored at −80 °C prior to DNA extraction. The whole body of one wild adult female spider was mechanically homogenized in liquid nitrogen and high molecular weight DNA was extracted using Qiagen Genomic-tip 500/G kit following the manufacturer’s protocol. DNA from the wild-caught individual was sequenced using eight flowcells on an Oxford Nanopore GridION (6 rapid SQK-RAD004 on version R9.4 and two ligation kits SQK-LSK109 using long fragment buffer on version R10 flowcells). Hi-C libraries were prepared from an adult female spider by Phase Genomics and sequenced on an Illumina NovaSeq 6000 to scaffold contigs into chromosomes.
RNA was collected from a single L. hesperus female within 24 h of collection from the wild for four tissues: silk glands (SRR17250295), venom glands (SRR17250293), ovary (SRR17250292), and cephalothorax (SRR17250291). Total RNA was isolated from the tissue samples in TRIzol (Invitrogen), purified using the RNeasy kit (Qiagen), and contaminating DNA was removed with Turbo DNase (Ambion). The cDNA libraries for each individual tissue sample were generated with the TruSeq RNA Sample Preparation Kit (Illumina). RNA from these four tissues was then sequenced using Illumina paired-end, 150 bp sequencing in single lanes of HiSeq 4000 through Novogene. Raw sequence data were run through TrimGalore (Martin 2011) with default settings to detect and remove adaptors.
Genome assembly
Following nanopore DNA sequencing of L. hesperus, base calling was conducted with Guppy v 3.0.6 (Oxford Nanopore Technologies—ONT, Oxford, United Kingdom). The raw fastq files were assembled using Flye v 2.8.3 (Kolmogorov et al. 2019) with a minimum of 10,000 overlaps between reads (-m) and two iterations of Illumina polishing (-i). The output from Flye was also polished with Pilon v 1.23 (Walker et al. 2014) using raw Illumina reads from a separate adult female L. hesperus that was previously sequenced for the i5k project (NCBI PRJNA168123). To assess genome completeness, BUSCO v 3.0.2 (Simão et al. 2015) was run against the arthropod.odb9 database. Pilon was run over multiple iterations, with BUSCO run after each iteration, until BUSCO scores showed diminishing or plateauing returns. The Pilon-polished assembly was then run through Purge Haplotigs (Roach et al. 2018) to eliminate redundant contigs. Microbial and other contaminant sequences were removed from the assembly through Kraken v 2.0.8 (Wood and Salzberg 2014) with the database comprised of archaea, bacteria, plasmid, viral, human, fungi, plant, protozoa, nr, nt, UniVec, and UniVec_Core. The contig-level assembly was scaffolded using Hi-C data in Juicer v 1.6 (Durand et al. 2016) and then piped into 3D-DNA v180922 (Dudchenko et al. 2017) under the parameters of input_size 200,000, mapq 60, and editor-coarse-resolution 50,000, followed by manual edits in Juicebox Tools.
Heterozygosity and sex chromosome analysis
In L. hesperus, sex is determined through the X1X2 system, where females have two X chromosomes and males have only one X chromosome (Král et al. 2006; Zhao et al. 2010). Therefore, while females will have heterozygous sites for their sex chromosome data, males are expected to be invariant (e.g. other than sequencing error) due to having only one sex chromosome. To identify the sex chromosomes through sequence analysis using this expectation of heterozygosity as rationale, the RNA-seq reads of one female that was created during annotation and from two males from NCBI (Accession: GSE95367, Clarke et al. 2017) were mapped to the reference genome assembled here using BWA v 0.7.17 (Li and Durbin 2010). Variants within repeats and indels were removed using mpileup in bcftools v 1.7 (Li 2011). An estimate of the per chromosome heterozygosity was calculated (number of variable sites per nucleotide across each chromosome) for each of the female and two male libraries.
Characterization of repeat content
Repetitive sequences were identified and classified using a combination of de novo approaches at the DNA level. Repetitive sequences were identified using RepeatModeler v 2.0.1 (Flynn et al. 2020) with default parameters. The RepeatModeler generated library and the transposable element (TE) library included in RepeatMasker (Smit et al. 2013) were input to MAKER v 3.01.03 (Holt and Yandell 2011; Campbell et al. 2014) for additional repeat masking. Certain families of genes in L. hesperus such as those related to venom and silk function are known to be highly repetitive (Hayashi et al. 1999; Vasanthavada et al. 2007; Ayoub et al. 2013; Clarke et al. 2014; Haney et al. 2014); thus, complex repeats were hard-masked while simple repeats were softmasked (softmask = 1).
Gene prediction and annotation
The MAKER pipeline v 3.01.03 was used for gene annotation with three rounds. The first round of MAKER uses evidence from ESTs and proteins to produce gene predictions and annotations. Proteomes from the three most closely related available annotated genomes within Araneoidea—Parasteatoda tepidariorum (RefSeq GCF_000365465.3), Oedothorax gibbosus (RefSeq GCA_019343175.1), and Argiope bruennichi (RefSeq GCA_015342795.1)—were input as evidence to align to the repeat masked reference genome. Transcriptome evidence was created from the RNA-seq data that was pooled across the four tissues and assembled de novo using the Trinity pipeline v 2.6.6 (Grabherr et al. 2011) with the –trimmomatic flag. The first round of MAKER with evidence alignment was used to train the gene prediction software SNAP v 2013-02-16 (Korf 2004) and Augustus v 19.12.2006 (Stanke et al. 2008). To avoid poor training performance, only gene models with a minimum length of 50 amino acids and a minimum annotation edit distance (AED) of 0.25 were included to train SNAP. Augustus training was performed using BUSCO v 3.1.0 (Simão et al. 2015), specifying P. tepidariorum as the initial gene model and using the BUSCO arthropoda_odb10 database.
A model gff was created by GeMoMa (Keilwagen et al. 2019) to be used in ab initio annotation with MAKER. This model gff was created from a sample of available annotated reference genomes across a diverse group of arthropods including Drosophila albomicans (RefSeq GCF_009650485.1), Coccinella septempunctata (RefSeq GCF_907165205.1), A. bruennichi (RefSeq GCA_015342795.1), and O. gibbosus (RefSeq GCA_019343175.1). This ab initio round of MAKER was performed using the GeMoMa alignment and output from the SNAP and Augustus gene models. Next, the output from this round of MAKER was then input to SNAP and AUGUSTUS to retrain gene model predictions, then these gene models were used in a final round of ab initio annotation in MAKER. Lastly, in a final curation of the list, we manually added genes known to be highly repetitive (e.g. silk and venom genes), and thus likely unintentionally masked in our first round of annotations (see Supplementary Material).
Genome synteny analysis
To visualize the concordant genome architecture compared to L. hesperus across evolutionary time, we sampled from the few available Araneoidea chromosomal-level genome assemblies that are also annotated. With this strategy, we used the most closely related available species, Latrodectus elegans, and a second more distantly related species, A. bruennichi, for a genome synteny analysis using CoGe (Comparative Genomics, genomevolution.org; Lyons et al. 2008). CoGe’s SynMap2 was used to identify syntenic gene pairs with LASTZ searches and the DAGChainer algorithm (Haug-Baltzell et al. 2017), requiring a maximum distance between two matches (-D) = 20 and a minimum of five aligned pairs (-A). Circos (Krzywinski et al. 2009) was used to generate plots based on synteny using the DAGchainer align coords file produced by CoGe SynMap2 (Haug-Baltzell et al. 2017). DAGChainer results were filtered to retain only one LASTz hit per query with the highest percent sequence identity.
Positive selection analysis
Orthologous gene clusters were identified using Orthofinder v2.5.4 (Emms and Kelly 2019) across eight assembled and annotated araneoid genomes (Table 1, Fig. 1): P. tepidariorum (NCBI PRJNA316108, PRJNA167405), L. hesperus (this study), L. elegans (NCBI PRJNA745004), Tetragnatha kauaiensis (https://doi.org/10.5061/dryad.b2rbnzsgr), O. gibbosus (NCBI PRJNA681589), Hylyphantes graminicola (https://doi.org/10.11922/sciencedb.01162), A. bruennichi (http://gigadb.org/dataset/100837), and Trichonephila antipodiana (http://gigadb.org/dataset/100868). The protein sequences were downloaded from their corresponding databases and edited to extract only the longest transcript variant per gene. Additionally, this list was curated to include only single-copy orthologs.
Scientific name . | Common name . | Family . | Genome size (Gbp) . | Scaffold/chromosome . | N50 . | BUSCO (%) . |
---|---|---|---|---|---|---|
Latrodectus hesperus | Western black widow spider | Theridiidae | 1.45 | 14 chromosomes | 120 Mb | 95.3 |
Latrodectus elegansa | Asian black widow spider | Theridiidae | 1.74 | 14 chromosomes | 114.3 Mb | 98.4 |
Parasteatoda tepidariorumb | House spider | Theridiidae | 1.20 | 59853 scaffolds | 465.5 kb | 98 |
Tetragnatha kauaiensisc | Long jawed orb weaver | Tetragnathidae | 1.08 | 3925 scaffolds | 2 Mb | 89.7 |
Oedothorax gibbosusd | Dwarf spider | Linyphiidae | 0.80 | 13 chromosomes | 979 kb | 94.8 |
Hylyphantes graminicolae | Sheet-web spider | Linyphiidae | 0.90 | 103 scaffolds | 77 Mb | 95.4 |
Argiope bruennichif | European wasp spider | Araneidae | 1.67 | 13 chromosomes | 124 Mb | 91.1 |
Trichonephila antipodianag | Batik golden web spider | Araneidae | 2.29 | 377 scaffolds | 172 Mb | 94.8 |
Scientific name . | Common name . | Family . | Genome size (Gbp) . | Scaffold/chromosome . | N50 . | BUSCO (%) . |
---|---|---|---|---|---|---|
Latrodectus hesperus | Western black widow spider | Theridiidae | 1.45 | 14 chromosomes | 120 Mb | 95.3 |
Latrodectus elegansa | Asian black widow spider | Theridiidae | 1.74 | 14 chromosomes | 114.3 Mb | 98.4 |
Parasteatoda tepidariorumb | House spider | Theridiidae | 1.20 | 59853 scaffolds | 465.5 kb | 98 |
Tetragnatha kauaiensisc | Long jawed orb weaver | Tetragnathidae | 1.08 | 3925 scaffolds | 2 Mb | 89.7 |
Oedothorax gibbosusd | Dwarf spider | Linyphiidae | 0.80 | 13 chromosomes | 979 kb | 94.8 |
Hylyphantes graminicolae | Sheet-web spider | Linyphiidae | 0.90 | 103 scaffolds | 77 Mb | 95.4 |
Argiope bruennichif | European wasp spider | Araneidae | 1.67 | 13 chromosomes | 124 Mb | 91.1 |
Trichonephila antipodianag | Batik golden web spider | Araneidae | 2.29 | 377 scaffolds | 172 Mb | 94.8 |
Genome data are from aWang et al. (2022), bSchwager et al. (2017), cCerca et al. (2021), dHendrickx et al. (2021), eZhu et al. (2022), fSheffer et al. (2022), and gFan et al. (2021).
Scientific name . | Common name . | Family . | Genome size (Gbp) . | Scaffold/chromosome . | N50 . | BUSCO (%) . |
---|---|---|---|---|---|---|
Latrodectus hesperus | Western black widow spider | Theridiidae | 1.45 | 14 chromosomes | 120 Mb | 95.3 |
Latrodectus elegansa | Asian black widow spider | Theridiidae | 1.74 | 14 chromosomes | 114.3 Mb | 98.4 |
Parasteatoda tepidariorumb | House spider | Theridiidae | 1.20 | 59853 scaffolds | 465.5 kb | 98 |
Tetragnatha kauaiensisc | Long jawed orb weaver | Tetragnathidae | 1.08 | 3925 scaffolds | 2 Mb | 89.7 |
Oedothorax gibbosusd | Dwarf spider | Linyphiidae | 0.80 | 13 chromosomes | 979 kb | 94.8 |
Hylyphantes graminicolae | Sheet-web spider | Linyphiidae | 0.90 | 103 scaffolds | 77 Mb | 95.4 |
Argiope bruennichif | European wasp spider | Araneidae | 1.67 | 13 chromosomes | 124 Mb | 91.1 |
Trichonephila antipodianag | Batik golden web spider | Araneidae | 2.29 | 377 scaffolds | 172 Mb | 94.8 |
Scientific name . | Common name . | Family . | Genome size (Gbp) . | Scaffold/chromosome . | N50 . | BUSCO (%) . |
---|---|---|---|---|---|---|
Latrodectus hesperus | Western black widow spider | Theridiidae | 1.45 | 14 chromosomes | 120 Mb | 95.3 |
Latrodectus elegansa | Asian black widow spider | Theridiidae | 1.74 | 14 chromosomes | 114.3 Mb | 98.4 |
Parasteatoda tepidariorumb | House spider | Theridiidae | 1.20 | 59853 scaffolds | 465.5 kb | 98 |
Tetragnatha kauaiensisc | Long jawed orb weaver | Tetragnathidae | 1.08 | 3925 scaffolds | 2 Mb | 89.7 |
Oedothorax gibbosusd | Dwarf spider | Linyphiidae | 0.80 | 13 chromosomes | 979 kb | 94.8 |
Hylyphantes graminicolae | Sheet-web spider | Linyphiidae | 0.90 | 103 scaffolds | 77 Mb | 95.4 |
Argiope bruennichif | European wasp spider | Araneidae | 1.67 | 13 chromosomes | 124 Mb | 91.1 |
Trichonephila antipodianag | Batik golden web spider | Araneidae | 2.29 | 377 scaffolds | 172 Mb | 94.8 |
Genome data are from aWang et al. (2022), bSchwager et al. (2017), cCerca et al. (2021), dHendrickx et al. (2021), eZhu et al. (2022), fSheffer et al. (2022), and gFan et al. (2021).
Positive selection analysis of our orthologous gene output from Orthofinder was conducted using PAML (Álvarez-Carretero et al. 2023). Although the ortholog analysis used alignments of amino acid sequences, PAML uses analyses of divergence at synonymous (dS) and nonsynonymous (dN) sites, requiring aligned nucleotide sequences of codons. Thus, the matching nucleotide sequence for each species’ amino acid sequence within an ortholog group was identified from each of their respective genome and transcriptome project datasets, with their codons substituted for the aligned amino acid sequences. In cases where a single nucleotide sequence could not be matched to each of the amino acid sequences for each of the eight species, or if ambiguity arose as a result of uncertainty in paralogy vs. orthology (e.g. nucleotide sequence codes for a slightly different amino acid sequence as a result of gene duplication), we omitted that entire orthogroup so that each resulting selection analysis was performed on all eight species. While this strategy may remove genes, downstream statistical and functional relevance interpretation (i.e. gene ontology [GO] analyses, see below) will be more informative as the same list of genes is compared for all eight species. In addition, this strategy facilitates the computational and statistical effort as we used a new PAML batch feature in CODEML that enables multiple genes to be tested for the same set of taxa for which genome/transcriptome data are available (Álvarez-Carretero et al. 2023). This filtering ultimately resulted in a list of 1,827 orthologs (see Supplementary Table 1).
The nucleotide sequence alignment for each of the 1,827 ortholog groups of eight species was fit to the M7 and M8 site models in a PAML batch run (Álvarez-Carretero et al. 2023). M7 is the null model that constrains dN/dS = 1 across sites across the tree, whereas M8 allows dN/dS to vary across sites and be greater than 1. If the M8 model has a significantly higher likelihood (or fit to the data), than the M7 model, then this is considered evidence for positive selection. For each of the 1,827 PAML ortholog analyses, we generated the LRT statistic, which is twice the log-likelihood difference between the M7 and M8 models (2Δℓ). This statistic was used in a chi-square test with two degrees of freedom (in this case, the number of free parameters between these two models). To correct for the many tests in evaluating this list of 1,827 genes, we controlled the FDR (false discovery rate) by conservatively adjusting P values to determine statistical significance (Benjamini and Hochberg 1995; McDonald 2014).
We conducted analyses to determine if our list of putative positive selection genes are enriched for certain biological processes and pathways. This list of genes was BLASTed to the Drosophila melanogaster genome (i.e. in the absence of a fully annotated appropriate arachnid reference genome available for GO reference analyses, this is the most comprehensive annotated database for arthropods), the top hit was retained, and the alias gene symbol was identified using Flybase (Gramates et al. 2022). We used the online g:Profiler software (Kolberg et al. 2023) with D. melanogaster as the reference database to conduct a GO category and enrichment analysis for the positively selected genes, following recommendations provided by Reimand et al. (2019).
Results and discussion
Genome assembly and annotation
Our L. hesperus chromosome-level assembly provided resolution into the genome size and architecture for comparative analyses. An initial estimate of L. hesperus genome size was 1.3 Gb, with 12 autosomes and two sex chromosomes based on karyotyping (Zhao et al. 2010). Following this estimate, the i5k initiative (Thomas et al. 2020) generated a draft genome from multiple runs of Illumina short reads with an assembly size of 1.2 Gb across 161,595 contigs with an N50 of 39.5kb. Here, we initially generated 10,957,884 Oxford Nanopore reads, containing 1.23 Gb of sequence data with approximately 37× genome coverage. Hi-C Scaffolding produced 14 scaffolds ranging from 50 to 130 Mb in length with scaffold N50 = 120 Mb, with our final assembly of 1.46 Gb in length (Supplementary Table 2 and Supplementary Figure 1) corresponding to the expected size and number of chromosomes. Although previous flow cytometry studies have found significant variation in spider genome sizes, from 700 Mb to 7 Gb (Gregory and Shorthouse 2003; Král et al. 2006; Sanggaard et al. 2014), assembled Araneoidea genomes range from 0.8 Gb to 2.5 Gb (Table 1). However, spider genomes predicted to be much larger (e.g. 7 Gb; Gregory and Shorthouse 2003) have not been sequenced yet, likely due to their size. Ultimately, this variance in genome size estimates could have implications for genotype-to-phenotype relationships. That is, the variance could reflect non-coding regions (i.e. genome size differences do not mean more genes), which may or may not have gene regulatory consequences, or it may reflect the history of gene duplications in size and number.
When compared to other araneoid spiders for which genome assemblies are available (Table 1, Fig. 1, Supplementary Table 3), our final assembly had an overall BUSCO complete score of 95.3%—consistent with others—after several rounds of filtering and assembly (see Supplementary Material). Specifically, BUSCO scores are consistently >90% for these other spider assemblies and N50 is >100 Mb for chromosome-level and near chromosome-level assemblies. Thus, regardless of assembly type (chromosome or scaffold), the BUSCO scores reflect conserved arthropod genes and do not appear to reflect spider-specific genes. In the end, while our assembly is of high quality, we expect that more spider genomes using advanced long-read technology (i.e. not only increased sequencing depth) will further resolve the complex nature of this and other spider genome assemblies.
Our gene annotation analysis using MAKER3 predicted a total of 34,986 genes. Thomas et al. (2020) estimated gene counts for arthropods in general to range from 10,000 to 40,000 using the same sequencing platforms and annotation tools. Given the diversity of arthropods sequenced and their specialized structures and secretions, it is not surprising that there would be considerable differences in the total number of genes for individual arthropod species. For spiders, the total number of genes has been estimated at ~30,000, regardless of the sequencing platform or annotation tool used (e.g. Sanggaard et al. 2014; Thomas et al. 2020). The consistency of these estimates suggests that the total count of genes reflects biological and not methodological explanations. Typically, spiders have substantial gene duplications compared to other arthropods (Schwager et al. 2017), which could explain the consistent estimates of 30,000 genes in spiders compared to the estimates of 15,000 genes in many other arthropods. This observation has two potential implications. First, it points to the potential role of genome-wide duplication in arachnid evolution in general, as well as the potential role of gene duplication in the evolution of Latrodectus-specific traits. Thus, further comparative spider chromosomal assemblies produced from longer reads, which help distinguish between single and multiple-copy genes, could provide more insight into accurate gene numbers and architecture.
In addition to the MAKER pipeline, we further annotated an additional 3,407 genes (see Supplementary Material). The MAKER pipeline relies on repetitive elements being masked prior to gene prediction and it relies on previous evidence of genes (i.e. from previously annotated genomes) to annotate genomes (Campbell et al. 2014). Thus, gene families that are the result of recent duplications could be masked by the annotation pipeline and inadvertently omitted from our annotation. In fact, our previous work has identified that Latrodectus-specific venom and silk genes are the result of recent gene duplications and have highly repetitive elements (Clarke et al. 2014; Haney et al. 2014), and our manual annotation here identified 120 venom and silk genes that were not captured in the initial annotation. This simple example shows that, as is the case with many under-studied groups, standard annotation pipelines capture well-documented and evolutionarily conserved genes but will miss the large majority of those genes that are taxon-specific and that define pathways and traits potentially unique to these species.
Genome content and rearrangement
To understand the evolutionary history of genome structure and gene organization within L. hesperus, we first investigated repeat content among araneoid spiders. In our L. hesperus genome, a total of 5.1% of complete BUSCO genes were duplicated and the repeat content across the genome was 31.26% of the total assembly, including 7.34% DNA transposons (Supplementary Table 4). While estimates of repeat content from other spiders in this study are comparable, such as 34.64% in A. bruennichi (Sheffer et al. 2022), some are considerably higher with 54% in Dysdera silvatica (Sánchez-Herrero et al. 2019), 60% in Acanthoscurria geniculata and 54% in Stegodyphus mimosarum (Sanggaard et al. 2014). This repeat content includes not only elements such as LINEs and SINEs but also genes and other coding material. These estimates have great implications for the role of repeat DNA influencing trait biology across spiders; however, with only a handful of genomes, clearly more studies are needed to determine how these current observations truly reflect spider repeat content evolution in general.
Genomic changes within arthropods can be both lineage- and species-specific (Thomas et al. 2020); therefore, we conducted a synteny analysis to identify potential concordant genome architecture. In looking at the few available araneoid chromosomal-level genome assemblies that are also annotated, we conducted an analysis between L. hesperus and the most closely related available species, L. elegans, and a second more distantly related species, A. bruennichi. First, we identified 106 off-diagonal syntenic blocks (containing a minimum of five genes), across 558 genes within the L. hesperus self-to-self syntenic map that were randomly distributed across the genome (Supplementary Fig. 1). Many of these self-to-self syntenic blocks likely represent remnants of ancient arachnid gene duplication. Next, we identified chromosomal synteny between L. hesperus and L. elegans with 14 chromosomes each and 313 syntenic blocks across 8,292 genes (Fig. 2a). This high level of colinearity suggests that despite a history of gene duplication, very few rearrangements have occurred recently within the Latrodectus clade.

Chromosomal synteny of the Western black widow spider (Latrodectus hesperus) genome. a) The synteny relationships between the more closely-related L. elegans (left, Wang et al. 2022) and L. hesperus (right) reveal conservation among the 14 predicted Latrodectus chromosomes, while b) the synteny between L. hesperus (left) and A. bruennichi (right, Sheffer et al. 2022) reveal genomic rearrangements among more distantly-related araneoids (see Fig. 1 for phylogenetic relationships) in the predicted 14 and 13 chromosomes, respectively. Chromosomes 6 and 9 in L. hesperus are the two putative sex chromosomes (see Fig. 3). Images of species obtained from Creative Commons as follows: Latrodectus hesperus (by Ferrous Femur is licensed under CC BY-NC-SA 2.0); Latrodectus elegans (Kananbala et al. 2012); Argiope bruennichi (by Rinaldo R is licensed under CC BY-NC-SA 2.0).
The L. hesperus to A. bruennichi syntenic map displayed 656 syntenic blocks across 5,370 genes between homologous chromosomes, 14 to 13, respectively, and has more chromosomal rearrangement than between the two Latrodectus species, as expected due to the longer time since divergence (~120 mya, Figs. 1 and 2b). Previous karyotypes of L. hesperus identified 12 autosomes and two sex chromosomes (Zhao et al. 2010), and A. bruennichi karyotyping identified 11 autosomes and two sex chromosomes (Sheffer et al. 2022); thus, the total number of chromosomes identified through genome sequencing match the expectation from the karyotypic analysis. The current genome synteny analysis allows for the identification of where chromosomal rearrangements have occurred as opposed to detecting only the change in the number of chromosomes. There are several regions of chromosomal rearrangements that may contribute to these differences in chromosome numbers among these species where it appears as though a few chromosomes from the ancestral genome have broken apart and then rearranged. For example, the A. bruennichi chromosome 6 shows syntenic blocks with chromosomes 10 and 5 in L. hesperus, and the A. bruennichi chromosome 12 shows syntenic blocks with chromosomes 1 and 5 in L. hesperus. Additional annotated chromosomal assemblies in Araneoidea, such as that of P. tepidariorum (which is more closely related to L. hesperus than A. bruennichi), might provide further resolution on the genomic rearrangements identified here. However, our results showing high collinearity of the chromosomes in comparisons with both closely related (L. elegans) and distantly-related (A. bruennichi) spiders analyzed here suggests that chromosome synteny has been generally preserved and that inter-chromosomal rearrangements were historically rare (or consistently removed by selection) in araneoid spiders.
Genome estimates of heterozygosity
Estimates of heterozygosity from our current assembly, 0.2% to 0.4% per chromosome, may have a variety of explanations, ranging from social behavior to population structure. For example, although some genome estimates from other spiders like that from the velvet spider (0.02%) are much lower, our estimate for L. hesperus is closer to the higher range in spiders and that of the tarantula (0.34%), a pattern that is consistent with certain types of social behavior that result in higher and lower inbreeding, respectively (Sanggaard et al. 2014). Our estimates of heterozygosity here from a sampled individual from the city of Phoenix are consistent with our previous population-level genome-wide estimates, where urban spiders have higher heterozygosity as a result of higher connectivity among urban areas compared to non-urban areas (Miles et al. 2018a, 2018b). Thus, this current result may suggest that non-urban samples could lead to more efficient genome assembly, at least for this species. Related, this population genetic structure could also influence our interpretation of genome-wide estimates of heterozygosity. For example, we might assume that if we sampled an individual for this current study from a non-urban area it would result in much lower estimates of genome diversity. To this point, because life history characteristics like social behavior can result in reduced heterozygosity, population structure that reduces genetic diversity in some areas, as is the case with L. hesperus in non-urban areas, could mimic the expected relationship between social behavior, inbreeding, and lower genome-wide diversity. Thus, while single genome sequences can be valuable in estimates of species genetic diversity in association with phenotypic traits like behavior (Tong et al. 2022), it is also necessary to recognize the need for complementary population genetic diversity estimates for these association studies.
Sex chromosome evolution
Our chromosomal assembly of L. hesperus can also be valuable in understanding sex chromosome evolution in spiders. Many different sex chromosome systems are found within spiders (e.g. X1X20, X1X2X30, X1X2X3X40, X1X2X3Y, Kořínková and Král 2012); however, the X1X20 sex-determining system, with X1X2 males and X1X1X2X2 females, is thought to be the ancestral system (Sember et al. 2020), especially in theridiids (Král et al. 2006). Indeed, our analysis using sequence data from the single female sequenced in this study as well as RNA sequencing data from each of two male L. hesperus spiders collected from a previous study (Clarke et al. 2017) identified two pairs of X chromosomes consistent with previous karyotyping of L. hesperus (e.g. X1X1X2X2 female and X1X2 male; Zhao et al. 2010). We found that for males, the average heterozygosity across all chromosomes was 0.00257 ± 0.00102 and for the female, it was 0.00277 ± 0.00049. Both males and the female showed decreased heterozygosity on chromosomes 6 (males: 0.00030, 0.00059; female: 0.00193) and 9 (male: 0.00027, 0.00068; female: 0.00199) compared to the other chromosomes (Fig. 3). Previous karyotype analyses also support this evidence of sex chromosomes of different sizes (Kořínková and Král 2012); however, consensus has been difficult from previous genome studies. For example, another araneoid chromosomal-level assembly with H. graminicola suggested that the two sex chromosomes were the two smallest chromosomes (Zhu et al. 2022); whereas, male-to-female genomic comparisons of known sex-linked markers in A. bruennichi found that chromosomes 9 and 10, which are not the smallest and map to our L. hesperus chromosomes 9 and 6, respectively (Fig. 2b), appear to be the sex chromosomes (Sheffer et al. 2022). Thus, our genome assembly lends additional support to previous evidence suggesting that sex chromosomes in araneoids are non-homologous and of different sizes. As more spider genomic and transcriptomic data are collected, it will be interesting to see how variation in sex chromosome size and number evolves among closely- and distantly-related taxa.

Ratio of heterozygosity to homozygosity for each of the Western black widow spider (Latrodectus hesperus) chromosomes between sexes. Each chromosome shows the ratio estimates for the datasets of one female and two male spiders. Ratios of heterozygosity to homozygosity on chromosomes 6 and 9 are reduced compared to other chromosomes for the male datasets compared to the female dataset, reflecting the two putative sex chromosomes. Heterozygosity ratios were averaged across a sliding window of 200,000 bp.
Genes and pathways under positive selection
The PAML site-model analysis of 1,827 orthologs from Orthofinder2 among eight araneoid species (Table 1, Fig. 1), identified 190 genes that met the criteria for statistical significance for positive selection (see Supplementary Table 1). Of these genes, 155 could be assigned to enrichment analyses as they were functionally identified from previous annotations in this group (e.g. house spider genome project) and could be identified as an orthologous gene from a BLAST to D. melanogaster. Unlike other clustering methods, the GO enrichment analyses from g:Profiler provide “highlight driver terms” from a two-step process that groups significant terms into related sub-ontologies, after which it identifies leading gene sets (Kolberg et al. 2023). Significant major driver terms (Supplementary Table 5) identified multicellular organismal development (GO:0007275), including motor neuron axon guidance (GO:0008045), regionalization (GO:0003002), locomotion (GO:0040011), netrin receptor activity (GO:0005042), and anchoring junction (GO:0070161).
Although these analyses point to signatures within genes broadly linked to body development, it may be the case that we have also identified signatures of positive selection at genes representing the development of sensory detection. For example, while the genes Ush and Ham are expressed in heart development, hematopoiesis, and dorsal closure, and are associated with neural stem fate in the house spider, P. tepidariorum, Ham also shows significant expression in the precheliceral domain, where structures such as the brain and eyes develop (Leite et al. 2022). The netrin pathway is linked ventrally with conserved roles in axon guidance in other arthropods, hemichordates, and chordates (Hiramoto et al. 2000; Lowe et al. 2006); however, it also controls axon guidance of Drosophila visual projection neurons (Zhang et al. 2023). The genes Lim1a, Pax6.1, and Pax6.2 are also essential in head and neural development across arthropods and animals in general (Shawlot and Behringer 1995; Lilly et al. 1999; Zhu et al. 2017), but are found in developing eyes in Drosophila and jellyfish, with phylogenetic studies pointing to the entire Pax gene repertoire having evolved by duplication from a single gene that still has homologs in poriferans (Hill et al. 2010). Pax2 is specifically localized to the development of sensory hairs and compound eyes in Drosophila; in the house spider, Pax2 has paralogs expressed in the peripheral nervous system, including eye development, and is similarly associated with sensory hair development (Janeschik et al. 2022). Related, we also find several genes high on our PAML significance list (Supplementary Table 1), in Cnga and Bsg, that also support cone/photoreceptor development in arthropods. Thus, these patterns may reflect a genomic signature of selection for sensory perception and behavioral development across diverse spiders. That is, while several genes are linked directly to eye and visual development, many other genes are linked to the nervous system and axon guidance, and limb and sensory hair development. This latter pattern could point to the architecture behind the deeper evolution of sensory organ development across spiders given the important role that spider appendages play in detecting vibrations and web-building for courting mates, communication, and capturing prey—an area that has been understudied in spiders from a genetics perspective (Leite et al. 2022).
These results provide strong support for the hypothesis that “ohnologs” (or retained ancestrally duplicated genes) from ancient genome-wide duplications that preceded the divergence of scorpions and spiders ~450 mya (Schwager et al. 2017) have played a large role in the diversification and subsequent rapid positive selection in early spider development. Specifically, several studies have shown that spiders and some other arachnids have retained significantly more Hox duplicated copies compared to other arthropods, with duplication and expression repertoires being highly conserved even across orders (Di et al. 2015; Leite et al. 2018, 2022; Harper et al. 2021). It might be expected that developmental ohnologs, such as the Hox clusters, would be retained for conserved body plan patterning; however, what is surprising is that among all orthologs analyzed here, Hox genes are enriched for positive selection—and not purifying selection—across hundreds of millions of years of spider evolution. A similar pattern has also been seen in the most conserved protein family, the collagens, across ~450 My of vertebrate evolution (Stover and Verrelli 2011; Stover et al. 2022), where bone and tissue-related variation can be highly deleterious, but duplications also provide new variation for adaptive evolution. Paradoxically, while developmental genes are some of the most evolutionarily conserved, ohnologs can provide new potential functional variation and may reflect fertile ground for subtle adaptive phenotypic changes seen across spider evolution in organ and limb development and sensory perception through both sub- and neo-functionalization.
Conclusions and future directions
A limitation here, and with de novo genome sequence assembly using non-model organisms in general, is a shortage of spider taxon-specific annotated genomes. The paucity is even more startling given the estimated divergence among spiders is over 300 Mya (Wheeler et al. 2017; Fernández et al. 2018), yet we have dozens of annotated genomes available within groups such as primates and Drosophila, even though they have divergence times on the order of a magnitude lower than spiders. This contrast in older divergence times and fewer sampled genomes for spiders means that we are likely to identify orthologous groups that are much older, and thus, likely reflect a conserved set of genes long under strong purifying selection (or we would not be able to align and identify them). While previous studies have identified these gene families having unusually high gene copies following ancient genome-wide duplication (Leite et al. 2018; Aase-Remedios et al. 2023), our study is the first to independently identify associated signatures of positive selection (i.e. these genes were not identified specifically a priori to test for selection). These results highlight the important role that structural genes involved in evolutionarily conserved phenotypes play in the adaptive evolution of novel traits such as body segmentation and sensory perception in spiders.
That comparative analyses reveal adaptation in only a few available genome sequences signals the need for further investigations into these pathways within spiders to shed light on how genome-wide duplication has provided evolutionary fodder within this under-studied group. For example, it is unclear whether the pattern identified here for positive selection on ohnologs for development and sensory perception genes predates araneoids. We emphasize the need for more spider genomes to understand not only these more ancient evolutionary events but to fill in gaps associated with species-specific changes not found in other groups—such as their unique venom and silk production—many of which we are not able to yet fully compare given the distant relationship of the taxa. More genome analyses of sister species within the Latrodectus genus would also reveal more about black widow spider biology and the divergent phenotypes that have evolved as these species have adapted to the increasing urban footprint.
Acknowledgments
This study benefited from collaborations with colleagues in the National Science Foundation Research Coordination Network (RCN): Eco-Evolutionary Dynamics in an Urban Planet: Underlying Mechanisms and Ecosystem Feedbacks. We thank Associate Editor, Dr. Alexander Suh, and anonymous reviewers for their comments leading to a revised manuscript.
Supplementary Material
Supplementary material is available at Journal of Heredity online.
Funding
This work was supported by institutional funds from Virginia Commonwealth University to B.C.V. and from University at Buffalo to T.J.K, with an Arthur A. Schomburg Fellowship to H.W.
Author contributions
Lindsay Miles (Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing), Hannah Waterman (Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing), Nadia Ayoub (Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing), Jessica Garb (Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing), Robert Haney (Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing), Michael Rosenberg (Data curation, Formal analysis, Investigation, Methodology, Writing – review & editing), Trevor Krabbenhoft (Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Visualization, Writing – original draft, Writing – review & editing), and Brian Verrelli (Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Visualization, Writing – original draft, Writing – review & editing)
Data availability
Data collected in this study are available from NCBI (Project PRJNA785544; Submission #SUB10731191, #SUB10795822).
References
Author notes
Lindsay S Miles and Hannah Waterman Equally contributing lead authors.