-
PDF
- Split View
-
Views
-
Cite
Cite
Wei Du, Gaoyang Li, Nicholas Ho, Landon Jenkins, Drew Hockaday, Jiankang Tan, Huansheng Cao, CyanoPATH: a knowledgebase of genome-scale functional repertoire for toxic cyanobacterial blooms, Briefings in Bioinformatics, Volume 22, Issue 4, July 2021, bbaa375, https://doi.org/10.1093/bib/bbaa375
- Share Icon Share
Abstract
CyanoPATH is a database that curates and analyzes the common genomic functional repertoire for cyanobacteria harmful algal blooms (CyanoHABs) in eutrophic waters. Based on the literature of empirical studies and genome/protein databases, it summarizes four types of information: common biological functions (pathways) driving CyanoHABs, customized pathway maps, classification of blooming type based on databases and the genomes of cyanobacteria. A total of 19 pathways are reconstructed, which are involved in the utilization of macronutrients (e.g. carbon, nitrogen, phosphorus and sulfur), micronutrients (e.g. zinc, magnesium, iron, etc.) and other resources (e.g. light and vitamins) and in stress resistance (e.g. lead and copper). These pathways, comprised of both transport and biochemical reactions, are reconstructed with proteins from NCBI and reactions from KEGG and visualized with self-created transport/reaction maps. The pathways are hierarchical and consist of subpathways, protein/enzyme complexes and constituent proteins. New cyanobacterial genomes can be annotated and visualized for these pathways and compared with existing species. This set of genomic functional repertoire is useful in analyzing aquatic metagenomes and metatranscriptomes in CyanoHAB research. Most importantly, it establishes a link between genome and ecology. All these reference proteins, pathways and maps and genomes are free to download at http://www.csbg-jlu.info/CyanoPATH.
Introduction
CyanoHABs (cyanobacterial harmful algal blooms) are widespread toxic aquatic hazards since 1870s [1–3]. This century-long dominance of dozens of species of cyanobacteria has now expanded to a global scale owing to increasing water eutrophication [4, 5]. CyanoHABS have proven to deteriorate water qualities and cause toxic effects on human health, leading to substantial economic loss [6–12]. To make it worse, CyanoHABs continue to intensify under current climate warming irrespective of mitigation efforts [13–16], as cyanobacteria grow better at high temperatures [17–19].

Data collection and curation, pathway reconstruction and visualization and construction of CyanoPATH and its two applications.
The substantial global socioeconomic costs and environmental impacts caused by CyanoHABs call for effective mitigation, which in turn entails a complete understanding of the cause of CyanoHABs as a research priority [20]. CyanoHABs, as any other ecological dominance, is generally viewed as a consequence of the synergistic interactions between elevated nutrients/resources and a superior ecophysiology of blooming cyanobacteria. The driving roles of nutrients have been well established with a scientific consensus: CyanoHABs are primarily driven by elevated nutrient loading from human activities [8, 21], occurring within a short period of time, particularly in waters of long residence time [22, 23]. Additionally, one underappreciated capability of CyanoHAB species is the utilization of small-molecule organics, such as amino acid and simple sugars [24–27].
In contrast to the well-established roles of elevated nutrients, the ‘superior’ biological functions that assimilate and convert the elevated nutrients to biomass for cell growth and in turn bloom formation have been little identified. There are at least four obstacles to it: (1) the existing individual pathways are empirically inferred in different species and thus may be different among blooming species; (2) it is notoriously difficult and time-consuming to create mutants in cyanobacteria to genetically verify the ecologically inferred pathways [28]; (3) there are no control species (i.e. non-blooming cyanobacteria) studied against blooming cyanobacteria to confirm the uniqueness of the inferred functions; and (4) the fact that there are many species of blooming cyanobacteria with different morphology, genomic features and behavior makes the common ecophysiology even more complex [29].
Our goal is, by tackling some of the aforementioned difficulties, to systematically curate all the ecologically inferred genomic functions into one comprehensive set of ecophysiology. This set of biological functions consists of two types of metabolic pathways: nutrient-utilization and stress resistance. Nutrients include macro-nutrients (e.g. carbon, nitrogen, phosphorus and sulfur) [30–37] and trace elements [21, 38] based on the amounts needed for cyanobacterial growth. Stress resistance pathways include those against elevated stressors to cyanobacteria in eutrophic waters, such as heavy metals and antibiotics [39–43], predation through production of toxins [8] and ultra-violet light [44, 45]. Furthermore, we classified the cyanobacteria as blooming or nonblooming based on literature [8, 21, 46] and by searching databases [47]. Finally, we stored and shared this functional genomic repertoire in a database, CyanoPATH. CyanoPATH can be used in four ways: (1) it provides a whole set, instead of individual pathways, of biological functions for researchers to refer to in CyanoHAB research; (2) it can serve as a tool for genome annotation to assess a new species potential in forming blooms and also for analyzing metagenomes and metatranscriptomes; (3) it allows users to compare the different capabilities among blooming species and between blooming and nonblooming species; (4) it is the first database showcasing direct links between genome and ecology, and thus provides a novel perspective for molecular ecology research on a genome scale.
Database implementation
Overview of CyanoPATH
The entire database has been built using the steps outlined in Figure 1. Generally, protein sequences and reactions are collected and curated after they are identified through search in ecological and molecular biology literature. Pathways are reconstructed and visualized in custom pathways maps. A pathway prediction for new cyanobacterial species is provided as a third module. These steps are detailed below.
The classification of central and accessory pathways and their protein reference sources
Pathway . | Subpathway . | Role in CyanoHABs . | Source‡ . | Literature . |
---|---|---|---|---|
Photosynthesis | Harvest light and fix carbon for growth | |||
Phycobilisome | Light harvesting complex | SEED | [1] | |
CO2 concentrating mechanism | CO2 concentrating for carbon fixation | SEED KEGG | [1, 2] | |
Photosystem II | Capturing light, splitting water and producing oxygen | SEED KEGG | [1, 2] | |
Photosystem I | Cross-membrane electron transfer for energy production | SEED KEGG | [1, 2] | |
Nutrient assimilation | Nitrogen | Assimilate nitrogen for growth | ||
Ammonia | Assimilable nitrogen | CB | [3] | |
Nitrate/Nitrite | Nitrogen source for ammonia | CB | [4, 5] | |
N2 fixation | Producing ammonia | CB | [6] | |
Urea | Producing ammonia | CB | [7] | |
Phosphorus | Assimilate phosphorus for growth | |||
Phosphate | Inorganic phosphorus | HB | [8] | |
Glycerol-3-phosphate | Organic phosphorus | HB | [9] | |
Glycerophosphoryl diesters | Organic phosphorus | HB | [9] | |
Phosphonate | Organic phosphorus | HB | [9] | |
Sulfur | Assimilate sulfur for growth | |||
Sulfate/thiosulfate | Inorganic sulfur | HB | [10–12] | |
Sulfonate | Organic sulfur | HB | [10, 13] | |
Organics | Assimilate small-molecule organic carbon or nitrogen for growth | |||
Simple sugars | Organic carbon | HB | [14, 15] | |
N-acetyl glucosamine | Organic carbon and nitrogen | HB | [16, 17] | |
Amino acids | Organic carbon and nitrogen | HB | [18–22] | |
Trace metal | Assimilate cofactors of enzymes and metalloproteins for growth | |||
MnO42- | Cofactors of enzymes and metalloproteins | HB | [23, 24] | |
Cu2+ | Cofactors of enzymes and metalloproteins | HB | [25] | |
Mn2+ | Cofactors of enzymes and metalloproteins | HB | [23] | |
Mg2+ | Cofactors of enzymes and metalloproteins | HB | [26] | |
Fe3+ | Cofactors of enzymes and metalloproteins | HB/CB[27] | [28, 29] | |
Ni2+ | Cofactors of enzymes and metalloproteins | HB | [30] | |
Vitamin | Coenzymes | |||
Thiamine | Coenzymes | HB | [31] | |
Riboflavin | Coenzymes | HB | [32–34] | |
Niacin | Coenzymes | HB | [35] | |
Biotin | Coenzymes | HB | [35] | |
Folate | Coenzymes | HB | [36, 37] | |
Cobalamin | Coenzymes | HB | [38] | |
Stress resistance | Heavy metal | PhotosynthesisInhibit growth | ||
Cu+ | Inducing oxygenic species | HB | [39–41] | |
CrO42+ | Mutagenic effects | HB | [39–41] | |
Ni2+ | Inhibiting enzymes and causing oxidative stress | HB | [39–41] | |
Zn2+ | Interfering metal uptake | HB | [39–41] | |
Zn2+/ Co2+/Cd2+ | Interfering with metal storage | HB | [39–41] | |
Multidrug | Inhibit growth | HB | [42–44] | |
Antibiotics/Inhibitors | Interfering with key biological processes | HB | [42–44] | |
UV radiation | DNA damage, photosynthesis damage | CB | [45] | |
Toxin prossssduction | Against predation | CB | [46–51] | |
Low temperature | Resist low temperatures | |||
(Poly-) unsaturated fatty acids | Increasing membrane fluidity | HB/CB | [52–54] | |
Buoyancy regulation | Position cells in water column for better resource for growth | CB | [55] | |
Osmoprotectant | Maintain osmotic homeostasis | HB/CB | [1, 2] | |
Redox balance | Resist oxidative stress | HB/CB | [1, 2] |
Pathway . | Subpathway . | Role in CyanoHABs . | Source‡ . | Literature . |
---|---|---|---|---|
Photosynthesis | Harvest light and fix carbon for growth | |||
Phycobilisome | Light harvesting complex | SEED | [1] | |
CO2 concentrating mechanism | CO2 concentrating for carbon fixation | SEED KEGG | [1, 2] | |
Photosystem II | Capturing light, splitting water and producing oxygen | SEED KEGG | [1, 2] | |
Photosystem I | Cross-membrane electron transfer for energy production | SEED KEGG | [1, 2] | |
Nutrient assimilation | Nitrogen | Assimilate nitrogen for growth | ||
Ammonia | Assimilable nitrogen | CB | [3] | |
Nitrate/Nitrite | Nitrogen source for ammonia | CB | [4, 5] | |
N2 fixation | Producing ammonia | CB | [6] | |
Urea | Producing ammonia | CB | [7] | |
Phosphorus | Assimilate phosphorus for growth | |||
Phosphate | Inorganic phosphorus | HB | [8] | |
Glycerol-3-phosphate | Organic phosphorus | HB | [9] | |
Glycerophosphoryl diesters | Organic phosphorus | HB | [9] | |
Phosphonate | Organic phosphorus | HB | [9] | |
Sulfur | Assimilate sulfur for growth | |||
Sulfate/thiosulfate | Inorganic sulfur | HB | [10–12] | |
Sulfonate | Organic sulfur | HB | [10, 13] | |
Organics | Assimilate small-molecule organic carbon or nitrogen for growth | |||
Simple sugars | Organic carbon | HB | [14, 15] | |
N-acetyl glucosamine | Organic carbon and nitrogen | HB | [16, 17] | |
Amino acids | Organic carbon and nitrogen | HB | [18–22] | |
Trace metal | Assimilate cofactors of enzymes and metalloproteins for growth | |||
MnO42- | Cofactors of enzymes and metalloproteins | HB | [23, 24] | |
Cu2+ | Cofactors of enzymes and metalloproteins | HB | [25] | |
Mn2+ | Cofactors of enzymes and metalloproteins | HB | [23] | |
Mg2+ | Cofactors of enzymes and metalloproteins | HB | [26] | |
Fe3+ | Cofactors of enzymes and metalloproteins | HB/CB[27] | [28, 29] | |
Ni2+ | Cofactors of enzymes and metalloproteins | HB | [30] | |
Vitamin | Coenzymes | |||
Thiamine | Coenzymes | HB | [31] | |
Riboflavin | Coenzymes | HB | [32–34] | |
Niacin | Coenzymes | HB | [35] | |
Biotin | Coenzymes | HB | [35] | |
Folate | Coenzymes | HB | [36, 37] | |
Cobalamin | Coenzymes | HB | [38] | |
Stress resistance | Heavy metal | PhotosynthesisInhibit growth | ||
Cu+ | Inducing oxygenic species | HB | [39–41] | |
CrO42+ | Mutagenic effects | HB | [39–41] | |
Ni2+ | Inhibiting enzymes and causing oxidative stress | HB | [39–41] | |
Zn2+ | Interfering metal uptake | HB | [39–41] | |
Zn2+/ Co2+/Cd2+ | Interfering with metal storage | HB | [39–41] | |
Multidrug | Inhibit growth | HB | [42–44] | |
Antibiotics/Inhibitors | Interfering with key biological processes | HB | [42–44] | |
UV radiation | DNA damage, photosynthesis damage | CB | [45] | |
Toxin prossssduction | Against predation | CB | [46–51] | |
Low temperature | Resist low temperatures | |||
(Poly-) unsaturated fatty acids | Increasing membrane fluidity | HB/CB | [52–54] | |
Buoyancy regulation | Position cells in water column for better resource for growth | CB | [55] | |
Osmoprotectant | Maintain osmotic homeostasis | HB/CB | [1, 2] | |
Redox balance | Resist oxidative stress | HB/CB | [1, 2] |
‡CB: cyanobacteria; HB: heterotrophic bacteria; KEGG: Kyoto Encyclopedia of Genes and Genomes; SEED: the SEED genome annotation database; TCDB: Transporter Classification Database.
The classification of central and accessory pathways and their protein reference sources
Pathway . | Subpathway . | Role in CyanoHABs . | Source‡ . | Literature . |
---|---|---|---|---|
Photosynthesis | Harvest light and fix carbon for growth | |||
Phycobilisome | Light harvesting complex | SEED | [1] | |
CO2 concentrating mechanism | CO2 concentrating for carbon fixation | SEED KEGG | [1, 2] | |
Photosystem II | Capturing light, splitting water and producing oxygen | SEED KEGG | [1, 2] | |
Photosystem I | Cross-membrane electron transfer for energy production | SEED KEGG | [1, 2] | |
Nutrient assimilation | Nitrogen | Assimilate nitrogen for growth | ||
Ammonia | Assimilable nitrogen | CB | [3] | |
Nitrate/Nitrite | Nitrogen source for ammonia | CB | [4, 5] | |
N2 fixation | Producing ammonia | CB | [6] | |
Urea | Producing ammonia | CB | [7] | |
Phosphorus | Assimilate phosphorus for growth | |||
Phosphate | Inorganic phosphorus | HB | [8] | |
Glycerol-3-phosphate | Organic phosphorus | HB | [9] | |
Glycerophosphoryl diesters | Organic phosphorus | HB | [9] | |
Phosphonate | Organic phosphorus | HB | [9] | |
Sulfur | Assimilate sulfur for growth | |||
Sulfate/thiosulfate | Inorganic sulfur | HB | [10–12] | |
Sulfonate | Organic sulfur | HB | [10, 13] | |
Organics | Assimilate small-molecule organic carbon or nitrogen for growth | |||
Simple sugars | Organic carbon | HB | [14, 15] | |
N-acetyl glucosamine | Organic carbon and nitrogen | HB | [16, 17] | |
Amino acids | Organic carbon and nitrogen | HB | [18–22] | |
Trace metal | Assimilate cofactors of enzymes and metalloproteins for growth | |||
MnO42- | Cofactors of enzymes and metalloproteins | HB | [23, 24] | |
Cu2+ | Cofactors of enzymes and metalloproteins | HB | [25] | |
Mn2+ | Cofactors of enzymes and metalloproteins | HB | [23] | |
Mg2+ | Cofactors of enzymes and metalloproteins | HB | [26] | |
Fe3+ | Cofactors of enzymes and metalloproteins | HB/CB[27] | [28, 29] | |
Ni2+ | Cofactors of enzymes and metalloproteins | HB | [30] | |
Vitamin | Coenzymes | |||
Thiamine | Coenzymes | HB | [31] | |
Riboflavin | Coenzymes | HB | [32–34] | |
Niacin | Coenzymes | HB | [35] | |
Biotin | Coenzymes | HB | [35] | |
Folate | Coenzymes | HB | [36, 37] | |
Cobalamin | Coenzymes | HB | [38] | |
Stress resistance | Heavy metal | PhotosynthesisInhibit growth | ||
Cu+ | Inducing oxygenic species | HB | [39–41] | |
CrO42+ | Mutagenic effects | HB | [39–41] | |
Ni2+ | Inhibiting enzymes and causing oxidative stress | HB | [39–41] | |
Zn2+ | Interfering metal uptake | HB | [39–41] | |
Zn2+/ Co2+/Cd2+ | Interfering with metal storage | HB | [39–41] | |
Multidrug | Inhibit growth | HB | [42–44] | |
Antibiotics/Inhibitors | Interfering with key biological processes | HB | [42–44] | |
UV radiation | DNA damage, photosynthesis damage | CB | [45] | |
Toxin prossssduction | Against predation | CB | [46–51] | |
Low temperature | Resist low temperatures | |||
(Poly-) unsaturated fatty acids | Increasing membrane fluidity | HB/CB | [52–54] | |
Buoyancy regulation | Position cells in water column for better resource for growth | CB | [55] | |
Osmoprotectant | Maintain osmotic homeostasis | HB/CB | [1, 2] | |
Redox balance | Resist oxidative stress | HB/CB | [1, 2] |
Pathway . | Subpathway . | Role in CyanoHABs . | Source‡ . | Literature . |
---|---|---|---|---|
Photosynthesis | Harvest light and fix carbon for growth | |||
Phycobilisome | Light harvesting complex | SEED | [1] | |
CO2 concentrating mechanism | CO2 concentrating for carbon fixation | SEED KEGG | [1, 2] | |
Photosystem II | Capturing light, splitting water and producing oxygen | SEED KEGG | [1, 2] | |
Photosystem I | Cross-membrane electron transfer for energy production | SEED KEGG | [1, 2] | |
Nutrient assimilation | Nitrogen | Assimilate nitrogen for growth | ||
Ammonia | Assimilable nitrogen | CB | [3] | |
Nitrate/Nitrite | Nitrogen source for ammonia | CB | [4, 5] | |
N2 fixation | Producing ammonia | CB | [6] | |
Urea | Producing ammonia | CB | [7] | |
Phosphorus | Assimilate phosphorus for growth | |||
Phosphate | Inorganic phosphorus | HB | [8] | |
Glycerol-3-phosphate | Organic phosphorus | HB | [9] | |
Glycerophosphoryl diesters | Organic phosphorus | HB | [9] | |
Phosphonate | Organic phosphorus | HB | [9] | |
Sulfur | Assimilate sulfur for growth | |||
Sulfate/thiosulfate | Inorganic sulfur | HB | [10–12] | |
Sulfonate | Organic sulfur | HB | [10, 13] | |
Organics | Assimilate small-molecule organic carbon or nitrogen for growth | |||
Simple sugars | Organic carbon | HB | [14, 15] | |
N-acetyl glucosamine | Organic carbon and nitrogen | HB | [16, 17] | |
Amino acids | Organic carbon and nitrogen | HB | [18–22] | |
Trace metal | Assimilate cofactors of enzymes and metalloproteins for growth | |||
MnO42- | Cofactors of enzymes and metalloproteins | HB | [23, 24] | |
Cu2+ | Cofactors of enzymes and metalloproteins | HB | [25] | |
Mn2+ | Cofactors of enzymes and metalloproteins | HB | [23] | |
Mg2+ | Cofactors of enzymes and metalloproteins | HB | [26] | |
Fe3+ | Cofactors of enzymes and metalloproteins | HB/CB[27] | [28, 29] | |
Ni2+ | Cofactors of enzymes and metalloproteins | HB | [30] | |
Vitamin | Coenzymes | |||
Thiamine | Coenzymes | HB | [31] | |
Riboflavin | Coenzymes | HB | [32–34] | |
Niacin | Coenzymes | HB | [35] | |
Biotin | Coenzymes | HB | [35] | |
Folate | Coenzymes | HB | [36, 37] | |
Cobalamin | Coenzymes | HB | [38] | |
Stress resistance | Heavy metal | PhotosynthesisInhibit growth | ||
Cu+ | Inducing oxygenic species | HB | [39–41] | |
CrO42+ | Mutagenic effects | HB | [39–41] | |
Ni2+ | Inhibiting enzymes and causing oxidative stress | HB | [39–41] | |
Zn2+ | Interfering metal uptake | HB | [39–41] | |
Zn2+/ Co2+/Cd2+ | Interfering with metal storage | HB | [39–41] | |
Multidrug | Inhibit growth | HB | [42–44] | |
Antibiotics/Inhibitors | Interfering with key biological processes | HB | [42–44] | |
UV radiation | DNA damage, photosynthesis damage | CB | [45] | |
Toxin prossssduction | Against predation | CB | [46–51] | |
Low temperature | Resist low temperatures | |||
(Poly-) unsaturated fatty acids | Increasing membrane fluidity | HB/CB | [52–54] | |
Buoyancy regulation | Position cells in water column for better resource for growth | CB | [55] | |
Osmoprotectant | Maintain osmotic homeostasis | HB/CB | [1, 2] | |
Redox balance | Resist oxidative stress | HB/CB | [1, 2] |
‡CB: cyanobacteria; HB: heterotrophic bacteria; KEGG: Kyoto Encyclopedia of Genes and Genomes; SEED: the SEED genome annotation database; TCDB: Transporter Classification Database.
Data collection, processing and annotation
This database comprises five types of biological data: genome sequences of 178 cyanobacteria, reference sequences of the proteins in each pathway, reactions catalyzed by enzymes in the pathways, custom scalable vector graphics (SVG) maps of the pathways and the classification of blooming types of the cyanobacteria with whole genome sequences. The proteins/enzymes and enzymes were recovered from literature, which is provided in Table 1 and the sequences were retrieved from NCBI GenBank and the reactions from the SEED genome annotation database (SEED) [48] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [49]. The genome sequences were downloaded from the NCBI Genome Database in May 2019. We classified the species as blooming or non-blooming based on a curated classification [8, 21, 46, 47]. The species not listed in the original classification were assigned by querying their species/genus names in the Web of Science. If publications reporting blooms formation were found, they were classified as blooming and otherwise as non-blooming. More specifically, we used as query either the strain denomination (first), species epithet (if the strain denomination yielded insufficient hits), or the genus name (in a few cases). The hits were then further separated into genomic/physiological or environmental studies using appropriate key words. Lastly, the records were grouped as bloom-forming or nonbloom-forming [47].

An overview of the pathways associated with CyanoHABs supported by literature, which consists of two types: nutrient utilization (red labels in purple boxes) and stress resistance (red labels in purple boxes). (This picture is reproduced under the Creative Commons CC BY license for open access, which requires no permission).

The main pages of CyanoPATH. A: the Genome page that stores 178 genomes with associated metrics; B: the Pathway summary page for each genome; C: the Nitrogen utilization pathway in Anabaena sp. 90; D: the major transporters in nitrate and nitrite utilization; E: the protein constituents of the nitrate ABC transporter; F: user uploading sequences for prediction; G: the predicted pathways.
Reconstruction of metabolic pathways
We first included four photosynthetic modules: photosystems I (PS-I) and II (PS-II), phycobilisome and CO2-concentrating mechanism. The reference proteins for these four modules were extracted from the SEED subsystems [48] and the KEGG database [49]. The rest of the pathways, which are involved in resource assimilation and stress resistance among bloomers, were manually curated from the literature and databases SEED subsystems and KEGG, and the transporter database Transporter Classification Database (TCDB) [102] (Table 1).
For the proteins/enzymes, we retrieved their associated reactions (enzymatic or transport reactions) from Biocyc [103] and KEGG and then drew reaction charts into pathways. For the pathways, which have not been delineated or the functions of the proteins not characterized, the proteins/enzymes were organized together without reactions or substrates and products. All the pathways were drawn into maps in the SVG format.
Homolog search in the cyanobacterial genomes
All protein sequences of each genome searched against the curated reference proteins of each pathways using Blastp [104]. For that, the cyanobacteria-derived protein sequences were used first, if available; if not, heterotrophic bacteria-derived proteins were used. Significant hits from BLASTP output were filtered with E-value < 10−7 and length coverage > 65% [29].
Database construction
CyanoPATH was constructed on a CentOS Linux server (version 7.6). The web services were built using Apache (version 2.4.18). The representation and the logic layers were implemented using the Web 2.0 technology (HTML5, CSS3 and Javascript language along with jQuery library) and PHP server-side scripting language. All data were stored in an optimized MySQL relational database. The keyword-based search engine was implemented based on the Sphinx Open Source Search Server (http://sphinxsearch.com) and integrated into CyanoPATH using the iframe (inline frame) HTML tag.
Web interface and functions
All the pathways-associated bloom formation are summarized in Figure 2; they are involved in either nutrient utilization or stress resistance. In CyanoPATH, these pathways, the genomes and prediction function are organized in an easy-to-use interface (Figure 3). The 178 cyanobacterial genomes are placed under the ‘Genome’ button, each having genome metrics such as genome size, GC content and the number of proteins, along with their taxonomy ID, NCBI genome assembly number links and blooming classification.
The genomes are connected the pathways under the ‘Pathway’ button. Each pathway has its own customized map. The pathway components can be colored red when a genome is selected from the dropdown menu (Figure 3C). Furthermore, the pathways here are further divided into subpathways, protein complexes (e.g. ABC transporters NrtABCD for nitrate) and single protein constituents (PstA, PstB, NrtC or NrtD), which are connected hierarchically from pathway to single proteins (Figure 3C–E).
CyanoPATH also has a pathway prediction tool, under the ‘Prediction’ button. Its role is to search the proteins in a genome and annotate them into the pathways. For this function, the input should be protein sequences in FASTA format, pasted into the text box or upload as a file (Figure 3F). When the run is completed, the results are displayed as a flat table and the system also sends a notification email provided by the users. Each prediction job will be assigned a job ID and users can use the Job ID to download results. The predicted pathways include the name of input proteins, pathways, subpathways, components and the e-value of Blastp (Figure 3G).
Finally, instructions on how to use this database are provided under the ‘Help’ button. To enable users to perform pathway analyses locally, we make available the reference sequences and the pathway maps in SVG and xml formats as downloadable files, which can be found on the ‘Download’ page without login or registration. All the data for CyanoPATH are publicly available at http://www.csbg-jlu.info/CyanoPAT.
Comparision between blomming and nonblooming species
As the genomes in CyanoPATH are classified as either blooming or nonblooming, one can compare the pathways between any two genomes, with results obtained from CyanoPATH. Here we show one example of such comparisons. Using a standardized index calculated based on the number of the proteins in each pathway, each of the pathways and control (core metabolism) pathways are compared between blooming than nonblooming species. Results show that these pathways are more enriched in blooming than nonblooming species, while the core pathways show no differences between the two groups (Figure 4).

Comparison of the CyanoHAB-associated pathways and core metabolic pathways in terms of the number of proteins in each pathway between blooming and non-blooming strains. B: blooming; nB: non-blooming. P values of Wilcoxon tests are displayed on top of the panels.
Generality of pathway-set-based functional genomics
CyanoPATH showcases that the link between genome and complex phenome is not one-on-one, like one gene one phenotype, but rather complex. Supported by extensive empirical studies, our study provides a good case. However, its generality is not limited here. Cancer biology (data generated by the TCGA Research Network: https://www.cancer.gov/tcga) and pathogen research [105] has also accumulated extensive data that can be explored for causal relationship between genome and phenome through a set of pathways.
Conclusions
The database CyanoPATH, the long-awaited and much-needed curation of ecologically inferred biological function (i.e. common ecophysiology) of CyanoHABs on a genome scale, represents a major step forward in understanding and controlling CyanoHABs. It is a systematic curation of genomic functional repertoire (genomic ecophysiology) from individual empirical studies and from different species. It also provides a classification of the blooming types of all existing cyanobacteria with whole genome sequences and means to obtain such classification through literature search. Furthermore, CyanoPATH provides bloom ecologists with a common set of biological functions to refer to and further assess. Lastly, CyanoPATH in a new angle establishes a link between genomics and ecology in general.
Author contributions
HC conceived the research and designed the project with WD and GL. All authors carried out the research, with WD, GL and NH on the technical side, and LJ, DH, JT and HC on data and materials preparation side. All authors wrote and reviewed the manuscript.
CyanoPATH is a knowledgebase of empirically inferred metabolic pathways driving cyanobacterial blooms.
CyanoPATH can annotate whole genomes of cyanobacteria for the presence of bloom-driving pathways.
CyanoPATH provides a case of the link between genome and phenome, i.e. cyanobacterial blooms in natural eutrophic waters.
Data availability
CyanoPATH and associated data are publicly available at http://www.csbg-jlu.info/CyanoPATH.
Conflict of interest
The authors declare no conflict of interest.
Funding
This work was supported by the National Natural Science Foundation of China (61872418), Natural Science Foundation of Jilin Province (20180101050JC).
Wei Du is an associate professor in computer science and studies bioinformatics in biomedical science and genomics at Jilin University, China.
Gaoyang Li is a postdoctorate researcher at Tongji University and his research lies in biomedical science and systems biology.
Nicholas Ho is an undergraduate research intern who major in computer science and mathematics and does research at the Biodesign Center for Fundamental and Applied Microbiomics at Arizona State University.
Landon Jenkins is an undergraduate research intern who major in genetics and does research at the Biodesign Center for Fundamental and Applied Microbiomics at Arizona State University.
Drew Hockaday is an undergraduate research intern who major in microbiology and does research at the Biodesign Center for Fundamental and Applied Microbiomics at Arizona State University.
Jiankang Tan is principal investigator in environmental health at Lishui Ecology and Environment Bureau.
Huansheng Cao is an assistant professor at Duke Kunshan University, who studies environmental science, bioinformatics and systems biology.
References
Author notes
Wei Du, Gaoyang Li and Nicholas Ho contributed equally to this study.