Abstract

Caused by both eukaryotic dinoflagellates and prokaryotic cyanobacteria, harmful algal blooms are events of severe ecological, economic, and public health consequence, and their incidence has become more common of late. Despite coordinated research efforts to identify and characterize the genomes of harmful algal bloom-causing organisms, the genomic basis and evolutionary origins of paralytic shellfish toxins produced by harmful algal blooms remain at best incomplete. The paralytic shellfish toxin saxitoxin has an especially complex genomic architecture and enigmatic phylogenetic distribution, spanning dinoflagellates and multiple cyanobacterial genera. Using filtration and extraction techniques to target the desired cyanobacteria from nonaxenic culture, coupled with a combination of short- and long-read sequencing, we generated a reference-quality hybrid genome assembly for Heteroscytonema crispum UTEX LB 1556, a freshwater, paralytic shellfish toxin-producing cyanobacterium thought to have the largest known genome in its phylum. We report a complete, novel biosynthetic gene cluster for the paralytic shellfish toxin saxitoxin. Leveraging this biosynthetic gene cluster, we find support for the hypothesis that paralytic shellfish toxin production has appeared in divergent Cyanobacteria lineages through widespread and repeated horizontal gene transfer. This work demonstrates the utility of long-read sequencing and metagenomic assembly toward advancing our understanding of paralytic shellfish toxin biosynthetic gene cluster diversity and suggests a mechanism for the origin of paralytic shellfish toxin biosynthetic genes.

Significance

Saxitoxin, a leading agent of harmful algal blooms, is produced by several highly divergent cyanobacteria lineages and is encoded by a large biosynthetic gene cluster. These observations have led to competing hypotheses regarding how the gene cluster has appeared in such divergent strains, namely, whether the cluster is exceptionally ancient or has been subject to repeated horizontal gene transfer. Here, we explore these hypotheses using a newly assembled genome of the harmful algal bloom-causing Heteroscytonema crispum and comparative genomic approaches, including phylogenetic analysis of hundreds of homologous gene clusters. We provide evidence that saxitoxin biosynthesis genes have been subject to widespread horizontal gene transfer and that transfers of multiple genes can explain the toxin's enigmatic phylogenetic distribution. Our findings resolve this longstanding question, pointing to horizontal gene transfer as a primary mechanism for spreading the ability to produce harmful toxins and providing a framework for understanding toxin gene cluster evolution.

Introduction

Harmful algal blooms (HABs) have been increasing in number, frequency, and intensity in both coastal regions and lakes worldwide (Anderson 2014). The most common HAB-related illnesses include paralytic shellfish poisoning, ciguatera poisoning, neurotoxic shellfish poisoning, amnesic shellfish poisoning, and diarrhetic shellfish poisoning (Grattan et al. 2016). Paralytic shellfish poisoning is caused both by eukaryotic dinoflagellates and prokaryotic cyanobacteria, with a variety of paralytic shellfish toxins (PSTs) produced. The most abundant and widely distributed PST, saxitoxin, can have widespread negative effects on public and ecosystem health (Christensen and Khan 2020), in addition to potential pharmaceutical applications (Abed et al. 2009). Despite the environmental, economic, and public health impacts, our understanding of the genomic basis of saxitoxin production remains modest.

The evolutionary genomic origins of saxitoxin production and the underlying sxt gene clusters remain an open question that is key to the identification of saxitoxin-producing bacterial strains and our understanding of saxitoxin biosynthetic pathways (Dittmann et al. 2013). In addition to being produced in two separate domains of life, saxitoxin production has a broad and patchy distribution across the cyanobacterial phylogeny. The divergence between saxitoxin-producing Oscillatoriales (e.g. Microseira wollei) and Nostocales (e.g. Cylindrospermopsis raciborskii T3) is estimated to be over 2 billion years (Murray et al. 2011). This sparse phylogenetic distribution, along with molecular phylogenetic analysis, has led to the hypothesis that the sxt gene cluster has arisen in multiple divergent strains through an extensive history of horizontal gene transfer (HGT), potentially with Proteobacteria or Actinobacteria (Kellmann et al. 2008; Moustafa et al. 2009). An alternative hypothesis is that the sxt gene cluster is exceptionally ancient, and a history of gene loss, rearrangement, and natural selection has led to its current phylogenetic distribution (Murray et al. 2011). However, this question has not been addressed using recent cyanobacterial genome assemblies and modern statistical methods that can compare these hypotheses directly.

Investigation into the genomic basis of PST biosynthetic pathways has been hindered by the difficulty of creating axenic cultures due to cyanobacteria living in close relationship with other bacteria (Rippka et al. 1979; Cornet et al. 2018). Recently developed metagenomic binning techniques offer a promising solution to this challenge (Marter et al. 2021) by constructing individual genomes (metagenome assembled genomes) from a collection of contigs based on genetic signatures and abundance. Given sufficient contig lengths, such techniques can facilitate the identification and characterization of biosynthetic gene clusters even in mixed cultures. Commonly used short-read sequencing technologies are often unable to produce data sufficient to generate fully contiguous genome assemblies, which has hampered resolution of long, biosynthetic gene clusters. Long-read sequencing platforms offer an opportunity to close the gaps in these fragmented assemblies (Koren and Phillippy 2015). The combination of metagenomic binning and long-read sequencing shows promise in generating highly contiguous genomes from nonaxenic cyanobacterial cultures.

The cyanobacterial species Heteroscytonema crispum includes multiple strains that are known to produce saxitoxin (Sendall and McGregor 2018). Interestingly, different strains of H. crispum have differently structured sxt biosynthetic gene clusters, resulting in different toxin profiles (Cullen et al. 2018), suggesting a dynamic genomic architecture of saxitoxin production. In addition, H. crispum has no saxitoxin-producing close relatives (Sendall and McGregor 2018) and is in a key phylogenetic position for elucidating the evolutionary dynamics of the sxt gene cluster. Therefore, the genome sequence of H. crispum is poised to offer crucial insight into the genomic architecture and evolutionary origins of saxitoxin production. However, there is no available genome sequence for the species. The strain H. crispum UTEX LB 1556, the subject of this study, is known to produce a suite of PSTs, including saxitoxin (Sendall and McGregor 2018). Isolating cyanobacteria such as H. crispum UTEX LB 1556 is very challenging and time-consuming, necessitating the use of metagenomic binning to identify the cyanobacterial portion of metagenomic assemblies.

Here, we present the first genome assembly for H. crispum using a combination of long reads and high coverage Illumina short-read sequencing (i.e. a hybrid approach). With the benefit of this new assembly, we detect and characterize a novel biosynthetic gene cluster for saxitoxin with 53 protein-coding genes. We then identify homologous gene regions across hundreds of publicly available genomes and use phylogeny-based methods to detect signatures of HGT and natural selection. We uncover evidence for extensive horizontal transfer of sxt genes across Bacteria, providing novel support that HGT can explain the sporadic distribution of saxitoxin production.

Results

Genome Assembly and Annotation

Sequencing and assembly efforts produced a high-quality draft genome of H. crispum UTEX LB1556 totaling ∼8.11 Mb in size (Table 1; supplementary table S1, Supplementary Material online) displaying evidence of high completeness (>99%) and low contamination (<1%). Calculated values for genome size and GC content were in line with published work for close relatives (Chen et al. 2021). Binning with MetaWRAP (Uritskiy et al. 2018) was able to accurately group cyanobacterial contigs (n = 61) into a clearly defined genomic bin (Fig. 1).

Identification of the cyanobacterial contigs in the H. crispum UTEX LB1556 assembly. Two blob plots that display the GC content of contigs (x axis) against their abundance (y axis) are shown. Left panel: the associated phylum-level annotations are labeled, indicated by the colored dots. Cyanobacterial contigs (lower middle of the plot, yellow) clearly resolve into a separate cluster. Right panel: the GC content of the same contigs is plotted, this time with the corresponding bin assignments labeled in color. Bin 11 (lower middle of the plot, brown) is observed in the same location as the cyanobacterial cluster from the left panel.
Fig. 1.

Identification of the cyanobacterial contigs in the H. crispum UTEX LB1556 assembly. Two blob plots that display the GC content of contigs (x axis) against their abundance (y axis) are shown. Left panel: the associated phylum-level annotations are labeled, indicated by the colored dots. Cyanobacterial contigs (lower middle of the plot, yellow) clearly resolve into a separate cluster. Right panel: the GC content of the same contigs is plotted, this time with the corresponding bin assignments labeled in color. Bin 11 (lower middle of the plot, brown) is observed in the same location as the cyanobacterial cluster from the left panel.

Table 1

Summary of the completed genome annotations for H. crispum UTEX LB 1556

AttributeValue
Genome size (bp)8,102,112
DNA coding (bp)4,529,466
GC content (%)43
Protein-coding genes9,305
Pseudogenes578
Genes with predicted functions1,884
Genes assigned to COGs727
Genes with signal peptides501
Biosynthetic gene clusters10
tRNA52
rRNA2
sRNA15
tmRNA1
ncRNA16
Antisense RNA4
Cis-regulatory elements7
Ribozymes1
CRISPRs74
Rho-dependent terminators29,763
Rho-independent terminators6,431
Hairpins153,422
AttributeValue
Genome size (bp)8,102,112
DNA coding (bp)4,529,466
GC content (%)43
Protein-coding genes9,305
Pseudogenes578
Genes with predicted functions1,884
Genes assigned to COGs727
Genes with signal peptides501
Biosynthetic gene clusters10
tRNA52
rRNA2
sRNA15
tmRNA1
ncRNA16
Antisense RNA4
Cis-regulatory elements7
Ribozymes1
CRISPRs74
Rho-dependent terminators29,763
Rho-independent terminators6,431
Hairpins153,422
Table 1

Summary of the completed genome annotations for H. crispum UTEX LB 1556

AttributeValue
Genome size (bp)8,102,112
DNA coding (bp)4,529,466
GC content (%)43
Protein-coding genes9,305
Pseudogenes578
Genes with predicted functions1,884
Genes assigned to COGs727
Genes with signal peptides501
Biosynthetic gene clusters10
tRNA52
rRNA2
sRNA15
tmRNA1
ncRNA16
Antisense RNA4
Cis-regulatory elements7
Ribozymes1
CRISPRs74
Rho-dependent terminators29,763
Rho-independent terminators6,431
Hairpins153,422
AttributeValue
Genome size (bp)8,102,112
DNA coding (bp)4,529,466
GC content (%)43
Protein-coding genes9,305
Pseudogenes578
Genes with predicted functions1,884
Genes assigned to COGs727
Genes with signal peptides501
Biosynthetic gene clusters10
tRNA52
rRNA2
sRNA15
tmRNA1
ncRNA16
Antisense RNA4
Cis-regulatory elements7
Ribozymes1
CRISPRs74
Rho-dependent terminators29,763
Rho-independent terminators6,431
Hairpins153,422

A large complement of protein-coding genes (7,281) in H. crispum UTEX LB1556 were annotated, 2,144 of which had predicted functions. Taken together, the annotation generated here is comprehensive, identifying a broad cross-section of functional components of the genome. antiSMASH (Medema et al. 2011), which annotates biosynthetic gene clusters, failed to detect any saxitoxin biosynthetic gene clusters in the assembly, possibly because its database lacks a full sxt cluster. This provided the impetus to pursue homology search-based approaches, described in the following section.

Identification of the Saxitoxin Biosynthetic Gene Cluster

Given that the automated annotation effort did not yield any obvious toxin biosynthetic gene clusters, an alternative approach utilizing direct homology searches of reference sxt clusters was performed. This approach revealed strong hits to the H. crispum UCFS10 saxitoxin reference cluster (Cullen et al. 2018) in H. crispum UTEX LB1556, mainly on contig seq12. Additional assembly and manual annotation efforts (see Materials and Methods) revealed a 63,916-bp contig containing the complete saxitoxin cluster. This contig was used to join the two contigs produced by MetaWRAP containing the sxt cluster. Overall, the cluster is composed of 53 annotated genes, with 27 genes clearly orthologous to the H. crispum UCFS10 cluster and 26 additional genes predicted (hypothetical proteins, transposases, or genes with other functions; see Fig. 2 and supplementary table S2, Supplementary Material online).

An illustration of the H. crispum saxitoxin biosynthetic gene cluster identified in this work. Coding genes are represented by arrows, and the direction indicates the gene orientation (e.g. a gene on the forward strand will point from left to right). Gene IDs are written directly on the gene where space permits; otherwise, the Gene ID is indicated by the color in the legend. The precise coordinates of the genes with respect to the H. crispum assembly are provided in supplementary table S2, Supplementary Material online.
Fig. 2.

An illustration of the H. crispum saxitoxin biosynthetic gene cluster identified in this work. Coding genes are represented by arrows, and the direction indicates the gene orientation (e.g. a gene on the forward strand will point from left to right). Gene IDs are written directly on the gene where space permits; otherwise, the Gene ID is indicated by the color in the legend. The precise coordinates of the genes with respect to the H. crispum assembly are provided in supplementary table S2, Supplementary Material online.

Comparative Genomic Analysis of the sxt Gene Cluster

Searches for homologous sxt gene clusters using cblaster (Gilchrist and Chooi 2021) uncovered 384 genomes with significant similarity to at least three clustered genes (mean = 4.78, SD = 4.75, max = 56) in the newly annotated sxt gene cluster in H. crispum UTEX LB 1556. Interestingly, there were no significant hits to Dinoflagellate genomes, possibly due to the absence of genomes in NCBI belonging to Dinoflagellate species known to produce saxitoxin (e.g. Alexandrium, Gymnodinium, and Pyrodinium). While a large proportion (42%) of the homologous gene clusters belonged to the order Nostocales, a considerable number were found in other cyanobacterial orders, e.g. Oscillatoriales, Desertifilales, Chroococcales, Pseudanabaenales, and Synechococcales. Homologous clusters were also found in genomes of non-cyanobacterial phyla including Actinomycetota, Campylobacterota, Firmicutes, Myxococcota, Planctomycetota, Pseudomonadota, and Verrucomicrobiota. Only 20 of the homologous clusters contained what appeared to be a complete biosynthetic gene cluster (>20 genes; Fig. 3a), including clusters from C. raciborskii, Raphidiopsis brookii, Aphanizomenon gracile, Dolichospermum circinale, and M. wollei. Apart from Symplocastrum sp. BBK-W-15 of the order Oscillatoriales, all the strains with putatively complete sxt gene clusters are known to produce saxitoxin. Along with M. wollei, Symplocastrum is the most evolutionarily distant strain from H. crispum UTEX LB 1556 that contains an apparently complete sxt cluster. Outside of the strains known to produce saxitoxin (Fig. 3a), smaller numbers of sxt homologs were found in other strains known to produce different toxins including Microcystis, Sphaerospermopsis, and Planktothrix.

The structure of the sxt gene cluster and patterns of HGT among cyanobacterial strains. a) Gene clusters with at least 20 unique BLAST hits to the H. crispum sxt genes are displayed. The phylogeny on the left displays evolutionary relationships of genomes based on the 16S rRNA gene, estimated with IQ-TREE. Vertical arrows between branches represent the top five most common transfer events between branches and are colored according to the number of inferred gene transfers. sxt genes are represented by horizontal arrows on the right and are colored according to their IDs as determined by protein similarity. Lines between gene clusters in closely related strains connect genes with at least 80% amino-acid identity. Only one representative for each of A. gracile and C. raciborskii is displayed. b) The estimated rate of HGT across the sxt cluster (bold vertical line) is significantly higher (P < 0.01) than the mean rate across 100 randomly sampled gene sets (distribution shown to the left of the sxt cluster line) of the same size as the sxt cluster.
Fig. 3.

The structure of the sxt gene cluster and patterns of HGT among cyanobacterial strains. a) Gene clusters with at least 20 unique BLAST hits to the H. crispum sxt genes are displayed. The phylogeny on the left displays evolutionary relationships of genomes based on the 16S rRNA gene, estimated with IQ-TREE. Vertical arrows between branches represent the top five most common transfer events between branches and are colored according to the number of inferred gene transfers. sxt genes are represented by horizontal arrows on the right and are colored according to their IDs as determined by protein similarity. Lines between gene clusters in closely related strains connect genes with at least 80% amino-acid identity. Only one representative for each of A. gracile and C. raciborskii is displayed. b) The estimated rate of HGT across the sxt cluster (bold vertical line) is significantly higher (P < 0.01) than the mean rate across 100 randomly sampled gene sets (distribution shown to the left of the sxt cluster line) of the same size as the sxt cluster.

The genomic structure of the sxt cluster is quite variable across Cyanobacteria and even shows structural variation among H. crispum strains (Fig. 3a). In fact, the sxt cluster in H. crispum UTEX LB1556 contains two nontransposase-associated genes with no clear homology in other gene clusters (pscORF13 and pscORF19; supplementary table S2, Supplementary Material online) and two genes found only in the other H. crispum strains (sxtM2 and pscORF12; supplementary table S2, Supplementary Material online). Among H. crispum strains UTEX LB 1556, UCFS10, and UCFS15, the genes sxtH, sxtM1, sxtM3, sxtDIOX, and sxtT have undergone rearrangements relative to the other genes in the cluster (Fig. 3a). As has been found in previous studies, a core set of sxt genes are found in saxitoxin-producing strains, including sxtA-sxtI, sxtP-sxtS, and sxtU (Murray et al. 2011).

Evidence for sxt Gene Cluster Evolution Driven by HGT

We found that sxt genes have undergone widespread and repeated HGT in their history and that H. crispum has been a key donor and recipient of sxt genes with other saxitoxin-producing lineages. All but one sxt gene family in our analysis showed significant statistical support (Δ Akaike information criterion [AIC] > 5) for HGT in its history for at least one nonrecombinant segment (supplementary table S2, Supplementary Material online). In fact, GeneRax (Morel et al. 2020) estimated an average of 12.79 (max = 41) transfer events per gene family and an average rate of 0.189 transfers per branch. Interestingly, the rate of gene transfers was approximately equal to the rate of losses (0.180 events/branch) and much higher than the rate of gene duplications (0.0548 events/branch). As had been hypothesized in previous studies, n = 16 genes showed evidence for HGT between Cyanobacteria and other phyla, including sxtO, sxtU, sxtW, sxtJ, sxtK, sxtI, and sxtH (supplementary table S3, Supplementary Material online). Notably, the average rate of HGT across the sxt cluster (0.189 events/branch) was significantly higher than the background rate across the genome, estimated from 100 sets of randomly sampled gene families (Fig. 3b; mean transfer rate = 0.155 events/branch, SD = 0.00878, P < 0.01). This result suggests that while HGT is common in Cyanobacteria, sxt genes have been transferred across strains at an exceptionally high rate.

While sxt gene homologs were present in 384 genomes in our analysis, we found evidence that transfers of sets of genes between a small number of lineages have largely driven the origins and distribution of complete sxt gene clusters (Fig. 3a). By counting the frequencies of inferred transfer events (supplementary table S3, Supplementary Material online), we found that 85% of all sxt genes in the cluster had signatures of HGT between H. crispum, M. wollei, Symplocastrum, and C. raciborskii/R. brookii (Fig. 3a). H. crispum and Symplocastrum sp. BBK-W-15 shared the most HGT events, by far, with 24 genes transferred. Outside of the top four pairs of branches, in terms of number of inferred HGT events (as shown in Fig. 3a), no other branch shared more than four transferred genes (supplementary table S3, Supplementary Material online). This result suggests that while individual sxt genes may have been frequently transferred between many lineages throughout their history, complete sxt clusters have originated as the result of HGT of many genes between saxitoxin-producing lineages.

We found only limited evidence for differential rates of molecular evolution in sxt genes among lineages. Only 3 out of 53 genes exhibited significantly different dN/dS rates in saxitoxin-producing lineages compared with non toxin-producers (supplementary table S2, Supplementary Material online). Together, these results suggest that sxt gene sequences have remained relatively conserved across time, primarily persisting in saxitoxin producers, and that the evolution of the sxt cluster has been shaped by large-scale evolutionary genomic events (i.e. rearrangements, horizontal transfers, and losses).

Discussion

A greater understanding of the genomic basis of HABs can help enable effective mitigation and prevention of their negative effects on ecological, public, and economic health. In this study, we leveraged new genomic technologies and bioinformatic approaches to assemble and annotate a high-quality genome of the understudied H. crispum UTEX LB 1556, a producer of the PST saxitoxin. The genome assembly showed evidence of high levels of completeness and contiguity, as well as low levels of contamination (Table 1). A comprehensive metagenomic binning workflow allowed us to assemble this high-quality genome despite the presence of numerous other taxa in the culture (Fig. 1). Moreover, the addition of PacBio reads to the H. crispum assembly (Table 1) more than tripled the observed N50 value from the initial genome assembly generated for this project (data not shown). Our multipronged annotation effort identified many different types of genomic features that will fuel future work in cyanobacterial genomic research (Table 1).

We report a complete saxitoxin biosynthetic gene cluster for the UTEX LB 1556 strain of H. crispum (Figs. 2 and 3; supplementary table S2, Supplementary Material online), composed of 53 genes spanning roughly 47 kb. The finding of a saxitoxin gene cluster is consistent with what is known about H. crispum and this strain specifically: members of the genus Heteroscytonema are known to produce multiple secondary metabolites (Sendall and McGregor 2018), at least two H. crispum strains have previously been shown to have saxitoxin gene clusters (Cullen et al. 2018), and saxitoxin production in the UTEX LB 1556 strain was detected directly in this study, ranging from 17 to 79 ng/mL saxitoxin (data not shown). Comparing the membership of the cluster reported here with that of the H. crispum strains from Cullen et al. (2018) reveals a high degree of similarity. Indeed, all 17 sxt genes reported in the H. crispum UCFS10 and UCFS15 were observed in the H. crispum UTEX LB 1556 cluster identified in this study. While the majority of sxt genes in UTEX LB 1556 were in synteny with UCFS10 and UCFS15, sxtH, sxtM1, sxtM3, sxtDIOX, and sxtT have undergone rearrangements. The only gene that was notably absent in the UTEX LB 1556 cluster is the gene referred to only as ABC transporter by Cullen et al. (2018).

The patchy phylogenetic distribution of saxitoxin production among Cyanobacteria has been the subject of extensive study and debate (Dittmann et al. 2013). If the distribution of the sxt gene cluster could be explained by an exceptionally ancient origin and persistence across many divergent cyanobacterial lineages, as suggested by Murray et al. (2011), one would expect differential rates of molecular evolution between toxin producers and non toxin-producers, as the sxt genes would be subjected to differential selection pressures depending on gene and cluster function. Contrary to the findings of Murray et al., we found only limited evidence for differential rates of molecular evolution in sxt genes among lineages. Instead, using dense sampling of genomes and new statistical approaches, we found that while gene loss and duplication have played a role, HGT is the primary driver of saxitoxin evolution and can explain its phylogenetic distribution (supplementary table S2, Supplementary Material online; Fig. 3). Previous molecular phylogenetic analyses have shown that several sxt genes have phylogenies that do not match the presumptive species history, supporting the inference of HGT (Kellmann et al. 2008; Moustafa et al. 2009; Cullen et al. 2018). Our study is the first to find strong statistical support for this hypothesis, using an approach that explicitly models HGT, duplication, and loss in the context of a species tree. In fact, we found that every sxt gene in the H. crispum UTEX LB 1556 genome has experienced HGT in its history and that the transfer rate of sxt genes was significantly higher than the genomic background rate (Fig. 3b). This result suggests that the transfer of sxt genes could confer a selective advantage for strains that are able to acquire these genes.

We observed only a small number of divergent cyanobacterial strains with complete sxt clusters, despite high rates of sxt gene transfer across Cyanobacteria. Interestingly, the lineages with complete sxt clusters were found to have exchanged large sets of sxt genes with each other, ranging from 6 to 24 genes (Fig. 3a). These observations suggest that (i) rare combinations of HGT events are necessary to assemble the full sxt cluster and (ii) transfers of large genomic regions with multiple sxt genes may have driven the origins of saxitoxin production in the small number of divergent strains. Further studies into the geographic distributions and interactions between saxitoxin-producing Cyanobacteria will be key to monitoring the potential to spread this ability. Overall, these results point to the importance of HGT not only in the general evolution of prokaryotic genomes but also in the evolution of key traits with widespread effects, such as HABs.

In summary, this work presents a new, high-quality genome assembly for H. crispum along with a comprehensive genome annotation. This work also identified and cataloged a novel saxitoxin biosynthesis gene cluster in H. crispum UTEX LB 1556. It is anticipated that the products of this study will support ongoing investigations into the molecular, biochemical, and ecological basis for HABs (Abed et al. 2009).

Materials and Methods

Culturing of H. crispum UTEX LB1556 and Sampling

Batch cultures of the nonaxenic strain H. crispum UTEX LB 1556 were grown in MB3N liquid media in 50- and 250-mL vented, sterile tissue culture flasks. These were grown in environmental chambers at 20 °C between 25 and 100 µmol m−2 s−1. The in-house cultures were subcultured at various intervals ranging from 2 weeks to 2 months and sampled at each inoculation time point. Triplicate whole culture cyanobacterial samples collected at these points were preserved with saxitoxin diluent, provided by the Eurofins Technologies Enzyme-Linked Immunosorbent Assay (ELISA) Saxitoxins (PSP) (EC 2002/225 Compliant) kit, PN 52255B. The vials were stored frozen at −20 °C until analysis, then thawed at room temperature, and lysed before screening.

Concentrated samples of biomass, referred to here as pellets, were collected concurrently with ELISA vial samples for later genomic processing. To remove as many nonfilamentous, smaller bacteria as possible, some samples were filtered through an 11-µm filter and biomass collected on the filter was resuspended in sterile media prior to pelleting. The biomass was concentrated in an ultracentrifuge at low speeds (<5,000 × g). The supernatant was discarded, and the concentrated biomass was resuspended in nuclease free water. Pellets were washed and pelleted, and the supernatant was removed for a total of three rinse cycles. The resulting wet pellets were stored at −20 °C until ready to extract genomic DNA. Excessive pipetting and vortexing were avoided during these steps to maintain the integrity of high molecular weight DNA. The wet weight of each pellet was calculated prior to freezing.

Toxin Screening

Prior to each ELISA, samples were partially or fully lysed using two freeze–thaw cycles followed by a chilled sonication bath. A vial containing the saxitoxin ELISA kit diluent (Eurofins Technologies, 52255B) was processed using the same lysis method and included in the plate as a laboratory reagent blank (LRB). The absorbance of the ELISA plates was read at room temperature at 450 nm. Cyanotoxin levels were calculated by interpolating absorbance values in a standard curve generated for each unique plate using either a four-parameter nonlinear regression or a logarithmic curve. Samples, standards, LRBs, and controls were tested in at least duplicate analytical well replicates and averaged for analysis. Samples lower than Standard 1 (0.02 ppb) were reported as below detection.

Extraction of Genomic DNA

Genomic DNA was extracted from each cyanobacterial culture using a multistep procedure. Culture pellets were flash frozen in liquid nitrogen and ground with a sterile micropestle prior to digestion with lysozyme and extraction with the MasterPure Gram Positive DNA Purification Kit (Lucigen). DNA extracts were subsequently cleaned and concentrated with the Genomic DNA Clean & Concentrator-25 Kit (Zymo). Each column was eluted three times in 35 to 50 uL of Zymo Elution Buffer. The elution buffer was warmed to 37 °C before dissolving DNA pellets in the initial extraction, and the Zymo elution buffer was warmed to 65 °C before each elution from the columns for the cleaning and concentration procedure. The yield and purity of the cyanobacterial DNA were assessed using a NanoDrop One spectrophotometer (NanoDrop One Microvolume UV-Vis Spectrophotometer; Thermo Fisher, Waltham, MA, USA). Purity was assessed using both A260/A230 and A260/A280 values; samples with a A260/A280 value between 1.8 and 2.0 and a A260/A230 value close to or above 2.0 were considered relatively pure genomic DNA samples suitable for sequencing.

Illumina Libraries

DNA extractions were quantitated fluorometrically with Quant-iT PicoGreen kits (Thermo Fisher), and DNA molecular weight ranges were estimated using 1.2% agarose FlashGel gels (Lonza) or Bioanalyzer High Sensitivity DNA kits (Agilent) prior to DNA Prep library (Illumina) preparation. Library sizes and quantities were assayed using the Bioanalyzer High Sensitivity DNA Kit and KAPA Library Quantification Kit (Roche), respectively, followed by sequencing on the MiSeq platform (Illumina).

Pacific Biosciences Libraries

DNA samples were quantitated fluorometrically with Quant-iT PicoGreen kits (Thermo Fisher), and DNA molecular weight ranges were estimated using Femto Pulse 55-kb BAC Analysis kits (Agilent). Given the limited quantity and quality of DNA available, low input libraries were prepared from DNA samples containing the highest average molecular weight, resulting in libraries with average insert sizes of ∼1, 5.3, and 6.1. The 1-kb insert library was prepared from an early DNA sample and omitted the size selection step to remove library fragments smaller than 3 kb.

Raw Read Processing, Short-Read Assembly, and Metagenomic Binning

Illumina MiSeq reads were processed using fastp (Chen et al. 2018) (v.0.20.1), which removed adapters and low-quality sequences. PacBio circular consensus sequencing (CCS) reads (n = 775,182) with predicted accuracy ≥ 99% (Q20) were collected, and along with the Illumina MiSeq reads (n = 38,488,136; 19,244,068 read pairs), were used as input to Unicycler (Wick et al. 2017), which generated a hybrid assembly by first assembling the short-read data and then using long reads to bridge gaps. Unicycler was run in the default “normal” bridging mode, which provides a highly contiguous assembly with a low risk of misassembly. The assembly was filtered for length and the contigs were binned using the MetaWRAP pipeline (Uritskiy et al. 2018) (v.1.2), which used CONCOCT (Alneberg et al. 2014), MaxBin2 (Wu et al. 2015), and metaBat2 (Kang et al. 2019) as the initial binning tools. This was followed by a bin refinement step using the BIN_REFINEMENT module, which combines the outputs from the initial binning tools to remove contamination without sacrificing completeness as assessed using CheckM (Parks et al. 2015). Completeness was defined as the number of expected single-copy genes present in at least one copy divided by the total number of expected single-copy genes and expressed in percentage terms. Contamination was calculated as the number of expected single-copy genes present in multiple copies divided by the total number of expected single-copy genes, also expressed in percentage format. Finally, the MetaWRAP REASSEMBLE_BINS module was used to assign reads to bins, reassemble the bin-specific reads, and assess completeness and contamination with CheckM to determine the “best” version of each bin, yielding the final set of bins determined to be at least 70% complete and at most 10% contaminated. Mash (Ondov et al. 2016) and Kraken2 (Wood et al. 2019) were run on the final selected contigs to confirm the identity of the sample.

Genome Annotation and Identification of the sxt Gene Cluster

To identify putative toxin-producing biosynthetic gene clusters, the genome was annotated using a custom bioinformatics pipeline. This pipeline relies on a suite of gene callers to annotate genes, including BBTools (Bushnell 2017), Glimmer3 (Delcher et al. 2007), MetaGeneAnnotator (Nogushi et al. 2008), PGAP (Tatusova et al. 2016), and Prodigal (Hyatt et al. 2010), and includes a module to make consensus gene calls from these predictions as well as reference protein alignments (computed with NCBI BLAST+ [Camacho et al. 2009]). We also annotated noncoding RNAs (Infernal [Nawrocki et al. 2013], ARAGORN [Laslett and Canback 2004], and Prokka [Seemann 2014]), signal peptides (SignalP [Petersen et al. 2011]), CRISPRs (CRISPRCasFinder [Couvin et al. 2018]), transcription terminators (RhoTermPredict [Di Salvo et al. 2019] and TransTermHP [Kingsford et al. 2007]), and biosynthetic gene clusters (antiSMASH [Medema et al. 2011]).

To determine if there was evidence supporting the presence of an sxt biosynthetic gene cluster, Prodigal was used to identify translated coding regions within the H. crispum genome assembly. These coding regions were used to query selected reference clusters using BLASTP (Camacho et al. 2009). Matches were required to be at least 100 aa long with 70% identity. The homology search yielded the highest number of positive results when querying the H. crispum UCFS10 strain reference gene clusters. The results of this homology search were used to build the H. crispum UTEX LB1556 sxt cluster annotation. Gene names were assigned according to a simple procedure: if the gene had a positive hit to a known sxt gene from the H. crispum UCFS10 reference, then the reference gene name was used. If the gene was a supporting member of the cluster, then the gene was labeled “pscORF” (putative saxitoxin cluster Open Reading Frame) followed by a uniquely identifying integer (“pscORF1,” “pscORF2,” and so forth).

As the saxitoxin gene cluster was found on the ends of two scaffolds (see Results), PacBio reads (n = 560,296) were used to confirm that the entire cluster was syntenic. Reads were mapped against the H. crispum UCFS10 cluster (Cullen et al. 2018) using minimap2 with default parameters (Li 2018). Reads that mapped against the reference (n = 1,969) were assembled using miniasm (Li 2016) with default parameters and the -s 200 flag, producing four contigs. These four contigs were utilized in a second round of minimap2 to recruit more PacBio reads, yielding 4,674 mapped reads in total. These reads were provided as input to a second round of miniasm, resulting in ten contigs, the largest of which (63,916 bp) contained the sxt gene cluster. These procedures confirmed that the two scaffolds assembled with MetWRAP containing the sxt cluster could be joined using the miniasm assembly.

Comparative Analysis of the sxt Gene Cluster

To explore the evolutionary origins of the sxt biosynthetic gene cluster in the newly assembled H. crispum UTEX LB 1556 genome, homologous gene clusters were first identified in publicly available bacterial and protist genomes. cblaster v.3.16 (Gilchrist et al. 2021) was used to identify clusters of genes homologous to our identified sxt genes in database of all NCBI GenBank cyanobacteria genomes (n = 4,275), NCBI RefSeq Representative bacteria genomes (n = 14,429), NCBI GenBank Protist genomes (n = 1,055), and sxt gene cluster references selected from NCBI GenBank (n = 10; supplementary table S4, Supplementary Material online). The sets of reference genomes were selected to maximize sensitivity and taxonomic breadth while maintaining computational feasibility for downstream analyses. Gene cluster hits from cblaster were retained if gene hits had a minimum sequence identity of 30%, minimum coverage of 0.5, a minimum of three gene hits, and a maximum of 20 kb between genes in a cluster. Gene order and orientation of homologous clusters with at least 20 genes were visualized using clinker v.0.0.24 (Gilchrist and Chooi 2021).

A phylogenetic tree of strains with hits to the H. crispum UTEX LB1556 sxt cluster was constructed using 16S rRNA sequences pulled from the genome annotations and from SILVA for the sxt reference clusters (supplementary table S4, Supplementary Material online). 16S sequences were aligned using MAFFT v.7.508 (Katoh and Standley 2013), and spurious sequences > 30% divergent from the consensus sequence were removed using CIAlign (Tumescheit et al. 2022). A maximum-likelihood phylogeny was estimated with IQ-TREE v.2.2.03 (Minh et al. 2020) using the best-fit model of molecular evolution selected with ModelFinder (Kalyaanamoorthy et al. 2017).

GeneRax v2.0.4 (Morel et al. 2020) was used to test for signatures of horizontal transfer of genes between genomes. GeneRax reconciles gene-family trees with a species tree using a statistical model of gene duplication, loss, and horizontal transfer. First, OrthoFinder v.2.5.4 (Emms and Kelly 2019) was run with default settings to cluster sxt genes into orthologous gene families (i.e. “phylogenetic hierarchical orthogroups”). Nucleotide sequences for each gene family were aligned using TranslatorX (Abascal et al. 2010), which translates nucleotide sequences to amino acid, aligns with MAFFT, and then back-translates to nucleotides. To remove spurious homologs and potentially erroneously annotated genes, sequences with >50% divergence from the consensus sequence were removed with CIAlign (Tumescheit et al. 2022). As recombination can mislead phylogeny-based tests for natural selection and HGT, nonrecombinant segments of each alignment were identified using GARD (Kosakovsky Pond et al. 2006). Nonrecombinant alignment partitions were retained if they were at least 100 bp long, contained at least four sequences, and contained a sequence from H. crispum UTEX LB 1556. Parameters were estimated for the undated DTL (duplication, transfer, loss) and DL (duplication, loss) GeneRax models for nonrecombinant segments of each gene-family alignment using the 16S phylogeny as the species tree and allowing different evolutionary rates for each gene family. To estimate gene-wise statistical support for HGT, log-likelihoods for the DTL (three parameters, including HGT) and DL (two parameters, excluding HGT) models were compared using the AIC (Akaike 1973). An AIC value > 5 was considered significant support for HGT.

As cyanobacteria genes across the genome are thought to have extensive signatures of HGT (Gogarten et al. 2002; Zhaxybayeva et al. 2006; Morel et al. 2020), we sought to test whether rates of HGT in the sxt cluster were higher than expected compared with the genomic background. To that end, we estimated rates of HGT for 100 sets of randomly sampled gene families equal in size to the sxt cluster. The distribution of transfer rates across this set was compared with the average rate for the sxt cluster to calculate an empirical P-value.

To test the hypothesis that toxin-producing lineages have distinct rates of sxt gene molecular evolution compared with nontoxin-producing lineages, sxt gene-family alignments were screened for signatures of natural selection using the branch-site unrestricted statistical test for episodic diversification (BUSTED [Murrell et al. 2015]), a dN/dS-based method in the HyPhy v.2.5 package (Kosakovsky Pond 2020). For each gene tree, we tested whether saxitoxin-producing strains had different dN/dS rates relative to nontoxin producers, signifying differential selection pressures. P-values were obtained using likelihood ratio tests and corrected for multiple testing using the Benjamini–Hochberg procedure (Benjamini and Hochberg 1995).

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Acknowledgments

The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of DHS or the US Government. DHS does not endorse any products or commercial services mentioned in this presentation. In no event shall DHS, BNBI, or NBACC have any responsibility or liability for any use, misuse, inability to use, or reliance upon the information contained herein. In addition, no warranty of fitness for a particular purpose, merchantability, accuracy, or adequacy is provided regarding the contents of this document. The US Government (USG) retains and the publisher, by accepting the article for publication, acknowledges that the USG retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for USG purposes.

Funding

This work was funded under Agreement No. HSHQDC-15-C-00064 awarded to Battelle National Biodefense Institute (BNBI) by the Department of Homeland Security (DHS) Science and Technology Directorate (S&T) for the management and operation of the National Biodefense Analysis and Countermeasures Center (NBACC), a Federally Funded Research and Development Center.

Data Availability

Raw Illumina and PacBio data have been deposited to NCBI Sequence Read Archive under BioProject PRJNA1174761. The H. crispum UTEX LB 1556 genome assembly has been deposited to NCBI GenBank, accession number JBIPDF000000000. A phylogenetic tree in Newick format of genomes analyzed in this study based on the 16S rRNA sequence alignment has been deposited to FigShare (https://doi.org/10.6084/m9.figshare.26426965.v1). The reconciled gene trees in Newick format and gene family alignments in fasta format for each gene family present in supplementary table S2, Supplementary Material online have been deposited to FigShare (https://doi.org/10.6084/m9.figshare.26426989.v1).

Literature Cited

Abascal
 
F
,
Zardoya
 
R
,
Telford
 
MJ
.
Translatorx: multiple alignment of nucleotide sequences guided by amino acid translations
.
Nucleic Acids Res
.
2010
:
38
(
suppl_2
):
W7
W13
. .

Abed
 
RMM
,
Dobretsov
 
S
,
Sudesh
 
K
.
Applications of cyanobacteria in biotechnology
.
J Appl Microbiol
.
2009
:
106
(
1
):
1
12
. .

Akaike
 
H
. Information theory and an extension of the maximum likelihood principle. In:
Petrov
 
B
,
Csaki
 
F
, editors.
International symposium on information theory
.
Budapest
:
Akademiai Kiado
;
1973
. p.
267
281
.

Alneberg
 
J
,
Bjarnason
 
BS
,
de Bruijn
 
I
,
Schirmer
 
M
,
Quick
 
J
,
Ijaz
 
UZ
,
Lahti
 
L
,
Loman
 
NJ
,
Andersson
 
AF
,
Quince
 
C
.
Binning metagenomic contigs by coverage and composition
.
Nat Methods
.
2014
:
11
(
11
):
1144
1146
. .

Anderson
 
D
. HABs in a changing world: a perspective on harmful algal blooms, their impacts, and research and management in a dynamic era of climactic and environmental change. In:
Kim
 
HG
,
Reguera
 
B
,
Hallegraeff
 
G
,
Lee
 
CK
,
Han
 
MS
,
Choi
 
JK
, editors.
Harmful Algae 2012, Proceedings of the 15th International Conference on Harmful Algae
.
International Society for the Study of Harmful Algae
;
2014
. p.
3
25
.

Benjamini
 
Y
,
Hochberg
 
Y
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J R Stat Soc Ser B Stat Methodol
.
1995
:
57
(
1
):
289
300
. .

Bushnell
 
B
.
2017
. BBtools. [Online] [accessed 2022 January 21]. http://sourceforge.net/projects/bbmap/.

Camacho
 
C
,
Coulouris
 
G
,
Avagyan
 
V
,
Ma
 
N
,
Papadopoulos
 
J
,
Bealer
 
K
,
Madden
 
TL
.
BLAST+: architecture and applications
.
BMC Bioinformatics
.
2009
:
10
(
1
):
421
. .

Chen
 
M-Y
,
Teng
 
W-K
,
Zhao
 
L
,
Hu
 
C-X
,
Zhou
 
Y-K
,
Han
 
B-P
,
Song
 
L-R
,
Shu
 
W-S
.
Comparative genomics reveals insights into cyanobacterial evolution and habitat adaptation
.
ISME J
.
2021
:
15
(
1
):
211
227
. .

Chen
 
S
,
Zhou
 
Y
,
Chen
 
Y
,
Gu
 
J
.
Fastp: an ultra-fast all-in-one FASTQ preprocessor
.
Bioinformatics
.
2018
:
34
(
17
):
i884
i890
. .

Christensen
 
VG
,
Khan
 
E
.
Freshwater neurotoxins and concerns for human, animal, and ecosystem health: a review of anatoxin-a and saxitoxin
.
Sci Total Environ
.
2020
:
736
:
139515
. .

Cornet
 
L
,
Bertrand
 
AR
,
Hanikenne
 
M
,
Javaux
 
EJ
,
Wilmotte
 
A
,
Baurain
 
D
.
Metagenomic assembly of new (sub)polar Cyanobacteria and their associated microbiome from non-axenic cultures
.
Microb Genom
.
2018
:
4
(
9
):
e000212
. .

Couvin
 
D
,
Bernheim
 
A
,
Toffano-Nioche
 
C
,
Touchon
 
M
,
Michalik
 
J
,
Néron
 
B
,
Rocha
 
EPC
,
Vergnaud
 
G
,
Gautheret
 
D
,
Pourcel
 
C
.
CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins
.
Nucleic Acids Res
.
2018
:
46
(
W1
):
W246
W251
. .

Cullen
 
A
,
D’Agostino
 
PM
,
Mazmouz
 
R
,
Pickford
 
R
,
Wood
 
S
,
Neilan
 
BA
.
Insertions within the saxitoxin biosynthetic cluster results in differential toxin profiles
.
ACS Chem Biol
.
2018
:
13
(
11
):
3107
3114
. .

Delcher
 
AL
,
Bratke
 
KA
,
Powers
 
EC
,
Salzberg
 
SL
.
Identifying bacterial genes and endosymbiont
.
Bioinformatics
.
2007
:
23
(
6
):
673
679
. .

Di Salvo
 
M
,
Puccio
 
S
,
Peano
 
C
,
Lacour
 
S
,
Alifano
 
P
.
RhoTermPredict: an algorithm for predicting Rho-dependent transcription terminators based on Escherichia coli, Bacillus subtilis and Salmonella enterica databases
.
BMC Bioinformatics
.
2019
:
20
(
1
):
117
. .

Dittmann
 
E
,
Fewer
 
DP
,
Neilan
 
BA
.
Cyanobacterial toxins: biosynthetic routes and evolutionary roots
.
FEMS Microbiol Rev
.
2013
:
37
(
1
):
23
43
. .

Emms
 
DM
,
Kelly
 
S
.
OrthoFinder: phylogenetic orthology inference for comparative genomics
.
Genome Biol
.
2019
:
20
(
1
):
238
. .

Gilchrist
 
CL
,
Booth
 
TJ
,
van Wersch
 
B
,
van Grieken
 
L
,
Medema
 
MH
,
Chooi
 
Y-H
.
Cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters
.
Bioinform Adv
.
2021
:
1
(
1
):
vbab016
. .

Gilchrist
 
CL
,
Chooi
 
YH
.
Clinker & clustermap.js: automatic generation of gene cluster comparison figures
.
Bioinformatics
.
2021
:
37
(
16
):
2473
2475
. .

Gogarten
 
JP
,
Doolittle
 
WF
,
Lawrence
 
JG
.
Prokaryotic evolution in light of gene transfer
.
Mol Biol Evol
.
2002
:
19
(
12
):
2226
2238
. .

Grattan
 
LM
,
Holobaugh
 
S
,
Morris
 
JG
Jr
.
Harmful algal blooms and public health
.
Harmful Algae
.
2016
:
57
(
Part B
):
2
8
. .

Hyatt
 
D
,
Chen
 
G-L
,
Locascio
 
PF
,
Land
 
ML
,
Larimer
 
FW
,
Hauser
 
LJ
.
Prodigal: prokaryotic gene recognition and translation initiation site identification
.
BMC Bioinformatics
.
2010
:
8
(
1
):
119
. .

Kalyaanamoorthy
 
S
,
Minh
 
BQ
,
Wong
 
TKF
,
von Haeseler
 
A
,
Jermiin
 
LS
.
ModelFinder: fast model selection for accurate phylogenetic estimates
.
Nat Methods
.
2017
:
14
(
6
):
587
589
. .

Kang
 
DD
,
Li
 
F
,
Kirton
 
E
,
Thomas
 
A
,
Egan
 
R
,
An
 
H
,
Wang
 
Z
.
MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
.
PeerJ
.
2019
:
7
:
e7359
. .

Katoh
 
K
,
Standley
 
DM
.
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
.
Mol Biol Evol
.
2013
:
30
(
4
):
772
780
. .

Kellmann
 
R
,
Michali
 
TK
,
Neilan
 
BA
.
Identification of a saxitoxin biosynthesis gene with a history of frequent horizontal gene transfers
.
J Mol Evol
.
2008
:
67
(
5
):
526
538
. .

Kingsford
 
CL
,
Ayanbule
 
K
,
Salzberg
 
SL
.
Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake
.
Genome Biol
.
2007
:
8
(
2
):
R22
. .

Koren
 
S
,
Phillippy
 
AM
.
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly
.
Curr Opin Microbiol
.
2015
:
23
:
110
120
. .

Kosakovsky Pond
 
SL
,
Poon
 
AFY
,
Velazquez
 
R
,
Weaver
 
S
,
Hepler
 
NL
,
Murrell
 
B
,
Shank
 
SD
,
Magalis
 
BR
,
Bouvier
 
D
,
Nekrutenko
 
A
, et al.  
Hyphy 2.5—a customizable platform for evolutionary hypothesis testing using phylogenies
.
Mol Biol Evol
.
2020
:
37
(
1
):
295
299
. .

Kosakovsky Pond
 
SL
,
Posada
 
D
,
Gravenor
 
MB
,
Woelk
 
CH
,
Frost
 
SDW
.
GARD: a genetic algorithm for recombination detection
.
Bioinformatics
.
2006
:
22
(
24
):
3096
3098
. .

Laslett
 
D
,
Canback
 
B
.
ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences
.
Nucleic Acids Res
.
2004
:
32
(
1
):
11
16
. .

Li
 
H
.
Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
.
Bioinformatics
.
2016
:
32
(
14
):
2103
2110
. .

Li
 
H
.
Minimap2: pairwise alignment for nucleotide sequences
.
Bioinformatics
.
2018
:
34
(
18
):
3094
3100
. .

Marter
 
P
,
Huang
 
S
,
Brinkmann
 
H
,
Pradella
 
S
,
Jarek
 
M
,
Rohde
 
M
,
Bunk
 
B
,
Petersen
 
J
.
Filling the gaps in the cyanobacterial tree of life-metagenome analysis of Stigonema ocellatum DSM 106950, Chlorogloea purpurea SAG 13.99 and Gomphosphaeria aponina DSM 107014
.
Genes (Basel)
.
2021
:
12
(
3
):
389
. .

Medema
 
MH
,
Blin
 
K
,
Cimermancic
 
P
,
de Jager
 
V
,
Zakrzewski
 
P
,
Fischbach
 
MA
,
Weber
 
T
,
Takano
 
E
,
Breitling
 
R
.
antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences
.
Nucleic Acids Res
.
2011
:
39
(
suppl_2
):
W339
W346
. .

Minh
 
BQ
,
Schmidt
 
HA
,
Chernomor
 
O
,
Schrempf
 
D
,
Woodhams
 
MD
,
von Haeseler
 
A
,
Lanfear
 
R
.
IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era
.
Mol Biol Evol
.
2020
:
37
(
5
):
1530
1534
. .

Morel
 
B
,
Kozlov
 
AM
,
Stamatakis
 
A
,
Szöllősi
 
GJ
.
GeneRax: a tool for species-tree-aware maximum likelihood-based gene family tree inference under gene duplication, transfer, and loss
.
Mol Biol Evol
.
2020
:
37
(
9
):
2763
2774
. .

Moustafa
 
A
,
Loram
 
JE
,
Hackett
 
JD
,
Anderson
 
DM
,
Plumley
 
FG
,
Bhattacharya
 
D
.
Origin of saxitoxin biosynthetic genes in cyanobacteria
.
PLoS One
.
2009
:
4
(
6
):
e5758
. .

Murray
 
SA
,
Mihali
 
TK
,
Neilan
 
BA
.
Extraordinary conservation, gene loss, and positive selection in the evolution of an ancient neurotoxin
.
Mol Biol Evol
.
2011
:
28
(
3
):
1173
1182
. .

Murrell
 
B
,
Weaver
 
S
,
Smith
 
MD
,
Wertheim
 
JO
,
Murrell
 
S
,
Aylward
 
A
,
Eren
 
K
,
Pollner
 
T
,
Martin
 
DP
,
Smith
 
DM
, et al.  
Gene-wide identification of episodic selection
.
Mol Biol Evol
.
2015
:
32
(
5
):
1365
1371
. .

Nawrocki
 
EP
,
Kolbe
 
DL
,
Eddy
 
SR
.
Infernal 1.1: 100-fold faster RNA homology searches
.
Bioinformatics
.
2013
:
29
(
22
):
2933
2935
. .

Nogushi
 
H
,
Taniguchi
 
T
,
Itoh
 
T
.
MetaGeneAnnotator: detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes
.
DNA Res
.
2008
:
15
(
6
):
387
396
. .

Ondov
 
BD
,
Treangen
 
TJ
,
Melsted
 
P
,
Mallonee
 
AB
,
Bergman
 
NH
,
Koren
 
S
,
Phillippy
 
AM
.
Mash: fast genome and metagenome distance estimation using MinHash
.
Genome Biol
.
2016
:
17
(
1
):
132
. .

Parks
 
DH
,
Imelfort
 
M
,
Skennerton
 
CT
,
Hugenholtz
 
P
,
Tyson
 
GW
.
Checkm: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes
.
Genome Res
.
2015
:
25
(
7
):
1043
1055
. .

Petersen
 
TN
,
Brunak
 
S
,
von Heijne
 
G
,
Nielsen
 
H
.
Signalp 4.0: discriminating signal peptides from transmembrane regions
.
Nat Methods
.
2011
:
8
(
10
):
785
786
. .

Rippka
 
R
,
Deruelles
 
J
,
Rippka
 
R
,
Herdman
 
M
,
Waterbury
 
JB
.
Generic assignments, strain histories and properties of pure cultures of Cyanobacteria
.
J Gen Microbiol
.
1979
:
111
:
1
61
. .

Seemann
 
T
.
Prokka: rapid prokaryotic genome annotation
.
Bioinformatics
.
2014
:
30
(
14
):
2068
2069
. .

Sendall
 
BC
,
McGregor
 
GB
.
Cryptic diversity within the Scytonema complex: characterization of the paralytic shellfish toxin producer Heterosyctonema crispum, and the establishment of the family Heteroscytonemataceae (Cyanobacteria/Nostocales)
.
Harmful Algae
.
2018
:
80
:
158
170
. .

Tatusova
 
T
,
DiCuccio
 
M
,
Badretdin
 
A
,
Chetvernin
 
V
,
Nawrocki
 
EP
,
Zaslavsky
 
L
,
Lomsadze
 
A
,
Pruitt
 
KD
,
Borodovsky
 
M
,
Ostell
 
J
.
NCBI prokaryotic genome annotation pipeline
.
Nucleic Acids Res
.
2016
:
44
(
14
):
6614
6624
. .

Tumescheit
 
C
,
Firth
 
AE
,
Brown
 
K
.
CIAlign: a highly customisable command line tool to clean, interpret and visualise multiple sequence alignments
.
PeerJ
.
2022
:
10
:
e12983
. .

Uritskiy
 
GV
,
DiRuggiero
 
J
,
Taylor
 
J
.
MetaWRAP-a flexible pipeline for genome-resolved metagenomic data analysis
.
Microbiome
.
2018
:
6
(
1
):
158
. .

Wick
 
RR
,
Judd
 
LM
,
Gorrie
 
CL
,
Holt
 
KE
.
Unicycler: resolving bacterial genome assemblies from short and long sequencing reads
.
PLoS Comput Biol
.
2017
:
13
(
6
):
e1005595
. .

Wood
 
DE
,
Lu
 
J
,
Langmead
 
B
.
Improved metagenomic analysis with Kraken 2
.
Genome Biol
.
2019
:
20
(
1
):
257
. .

Wu
 
Y-W
,
Simmons
 
BA
,
Winger
 
SW
.
MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets
.
Bioinformatics
.
2015
:
32
(
4
):
605
607
. .

Zhaxybayeva
 
O
,
Gogarten
 
JP
,
Charlebois
 
RL
,
Doolittle
 
WF
,
Papke
 
RT
.
Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events
.
Genome Res
.
2006
:
16
(
9
):
1099
1108
. .

Author notes

David B. Stern and R. Taylor Raborn contributed equally.

This work is written by (a) US Government employee(s) and is in the public domain in the US.

Supplementary data