Abstract

Plants are important sources of food and plant products are essential for modern human life. Plants are increasingly gaining importance as drug and fuel resources, bioremediation tools and as tools for recombinant technology. Considering these applications, database infrastructure for plant model systems deserves much more attention. Study of plant biological pathways, the interconnection between these pathways and plant systems biology on the whole has in general lagged behind human systems biology. In this article we review plant pathway databases and the resources that are currently available. We lay out trends and challenges in the ongoing efforts to integrate plant pathway databases and the applications of database integration. We also discuss how progress in non-plant communities can serve as an example for the improvement of the plant pathway database landscape and thereby allow quantitative modeling of plant biosystems. We propose Good Database Practice as a possible model for collaboration and to ease future integration efforts.

INTRODUCTION

A biological pathway is a programmed sequence of molecular events in a cell. This chain of events executes a particular cellular function or brings about a specific biological effect. Knowledge of an organism’s pathways is essential to understand a biological system at different levels, from simple metabolism to complex regulatory reactions. Many pathways are complex and hierarchical and are themselves interconnected to form, to participate in, or to regulate a network of events. Over the last couple of decades, there has been an exponential increase in the information on these pathways, their components and their functions [1]. This stems from the biotechnological advancements in genomics and proteomics and high throughput technologies like microarray and two-hybrid screens. For numerous species, this has increased our knowledge about normal pathways as well as rogue/aberrant pathways that lead to a variety of diseases. Examples include pathways that lead to cancer [2] or pathways that lead to aberrant leaf development in plants [3]. Production of large amounts of data necessitates the creation of pathway databases and repositories, where information about the pathways along with their molecular components and reactions is stored. These data sets often become data-sources in their own right, and are shared with the public, explaining in part the large number of databases that exist today [1].

Simultaneously, technological advancements that allow access to and discovery of novel pathway information have resulted in the creation of many more pathway databases [1] that target different organisms, processes and mechanisms. Availability of such vast amounts of information in an ordered format has led us to ask new questions. Ideker and colleagues [4] have raised questions pertinent to evolutionary and comparative biology, e.g. ‘considering that the protein sequences and structures are conserved, could the protein-interaction networks be conserved as well? Is there a minimal set of pathways that is required by all living organisms? Can the evolutionary distance be measured at the network connectivity level rather than at the DNA or protein level?’ Answers to these and other questions will lead to an increased understanding of living systems, which in turn may result in more questions, at other levels, that are currently unimaginable. Information aggregated from different pathway databases is often more useful than information from individual databases. Integration of information from various pathway databases can be used to reveal novel information about a system.

Information from pathway databases has been used for different purposes. Information analysis and data mining holds the potential for discovery of orthologous/analogous pathways and pathway components in other related organisms [5]. For example, organisms which are difficult to cultivate in vitro and therefore are less amenable to laboratory studies could be examined in silico through a study of orthologs. Iterative expansion of pathway data can be utilized to build models of biological mechanisms based on the hypotheses derived from these initial data; see Bumgarner and Yeung [6] for a recent review. Models can (and should) in turn generate experimentally verifiable predictions.

Pathway database analysis can be used to find patterns in the pathways that are related to a disease [7] and aid in the identification of new drug targets [8]. Another idea is targeted drug discovery by screening the complete pathway as compared to a single pathway component [9]. Pathway analysis can also be used to identify molecular switches that lead to disease and to efficiently turn them off to silence them without affecting the rest of the system. A recent study on riboswitches illustrates how one can reengineer components of a pathway to control expression of multiple genes [10].

Compared to the exponential increase in human/animal pathway databases, development of plant pathway databases has been modest and a smaller number of applications have resulted. Plant pathway databases have remained relatively under-utilized. This apparent lacuna is all the more concerning considering that plants are important as food crop, fiber and plant-based fuel source. Examples from non-plant resources and their applications can serve as inspiration for plant scientists who wish to control pathways, for instance, to produce crops with longer shelf life or enhance immunity to plant pathogens.

In this review, we provide an overview of existing plant pathway databases, look at current progress and how the information contained in the databases has been used in the past and can be used in the future. We use examples from the existing plant pathway databases to showcase the potential of database integration. Non-plant integration applications are discussed to suggest future potential. Finally, we discuss how already existing information can be further enriched, organized and utilized for practical applications. We also highlight the acute need of robust, long-term, and user-friendly interactive databases.

The pathway database landscape

Pathguide [1], an online pathway resource meta-database, provides an overview of more than 300 biological pathway resources that have been developed to date. These include pathway databases, tools for data analysis, visualization and data extrapolation and other (peripheral) databases that can be linked with pathway databases to provide additional information. Some databases are specific to a particular organism, e.g. AraCyc [11] deals with the metabolic pathways of Arabidopsis thaliana. Some pathway databases are specific to a certain disorder or disease, e.g. the Human Cancer Protein Interaction Network (HCPIN)[12]; other contain information about a certain system in an organism, e.g. InnateDB [13], a repository for pathways involved in the innate immune system of humans and mice.

Plant pathway databases, when compared to human pathway databases, are fewer in number (Figure 1) and much less diverse. There is an increasing awareness about the importance of plants as food crops, but it appears that only limited resources have been devoted to uncovering and understanding plant pathways. A comparison of the number of genomes sequenced to date for mammals and higher plants (Figure 2) shows that plants receive less attention from the sequencing community when compared to other organisms. The absolute numbers differ between the databases (some sites are kept more current than others), but the trend remains the same. There are many biologically, medically and economically important plants that differ in their physiology. In addition, secondary metabolism is important from a pharmacological point of view. Therefore, there is a need for many more genomes to be sequenced, proteomes to be studied and pathways to be uncovered for the optimal utilization of plants. While lower numbers of genome sequencing data do not completely explain the lack of pathway databases, they certainly contribute to it.

Pathway resources with plants and humans annotated as major organisms from a total of the 328 resources available in Pathguide. Inclusive—databases containing several other major organisms apart from plant or human; dedicated—databases dedicated to plants or humans; other—databases for other organisms, or databases for numerous organisms which may also include human and plant information, and pathway tools. Numbers indicate the actual number of resources available for each category in Pathguide.
Figure 1:

Pathway resources with plants and humans annotated as major organisms from a total of the 328 resources available in Pathguide. Inclusive—databases containing several other major organisms apart from plant or human; dedicated—databases dedicated to plants or humans; other—databases for other organisms, or databases for numerous organisms which may also include human and plant information, and pathway tools. Numbers indicate the actual number of resources available for each category in Pathguide.

A comparison of genomes sequenced for mammals and higher plants. Data from NCBI Genome Database (http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html), Genome Pages at EBI (http://www.ebi.ac.uk/genomes/eukaryota.html) and GOLD database (http://www.genomesonline.org/cgi-bin/GOLD/bin/gold.cgi?page_requested=Complete+Published) are compared. Numbers in the bars indicate the number of genomes sequenced.
Figure 2:

A comparison of genomes sequenced for mammals and higher plants. Data from NCBI Genome Database (http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html), Genome Pages at EBI (http://www.ebi.ac.uk/genomes/eukaryota.html) and GOLD database (http://www.genomesonline.org/cgi-bin/GOLD/bin/gold.cgi?page_requested=Complete+Published) are compared. Numbers in the bars indicate the number of genomes sequenced.

Most plant pathway databases contain information on the networks in their own right, e.g. metabolic or regulatory networks in A. thaliana or soybean. However, there are no specialized databases yet that deal with pathways for plant immunity, plant growth or for controlling the size of plant organs.

For the purpose of this review, pathway databases are broadly classified into four types: metabolic pathways, gene regulatory networks, protein–protein interaction networks, and signaling pathways.

‘Metabolic pathways’ are the earliest discovered and best studied pathways. Metabolic pathways are represented by a series of enzymatic reactions that take place at the level of small molecules. These have been elaborated and characterized for many organisms. Table 1 presents an overview of available metabolic pathway databases dedicated to different plant species and the sites that host them. Metabolic pathway databases like MetaCyc [14] contain experimentally verified metabolic pathways and enzyme information for more than 2000 organisms and can be used to predict orthologous pathways in another organism for which the genome has been sequenced and annotated. A dedicated portal for plant metabolic pathway databases is SolCyc (available at http://solcyc.solgenomics.net/). SolCyc is a Pathway Tools-based (and thus MetaCyc inferred) pathway genome database (PGDB) currently containing small molecule metabolism data for five plants belonging to family solanacea—tomato, potato, tobacco, pepper and petunia.

Table 1:

Overview of plant species specific metabolic pathway databases

OrganismDatabaseVersionLocation
Arabidopsis thalianaAraCyc7.0.0.0http://www.arabidopsis.org/biocyc/index.jsp
AraCyc6.0.0.0http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonicaRiceCyc3.0.0.0http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolorSorghumCyc1.0.0.0http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatulaMedicCyc1.0.1.1http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicumLycoCyc2.0.1.1http://pathway.gramene.org/LYCO/class-tree?object=Pathways
LycoCyc2.0.0.0http://solcyc.solgenomics.net/LYCO/server.html
CapsicumCapCyc1.0.1.1http://pathway.gramene.org/CAP/class-tree?object=Pathways
CapCyc2.1.0.0http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosumPotatoCyc1.0.1.1http://pathway.gramene.org/POTATO/class-tree?object=Pathways
PotatoCyc1.1.0.0http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephoraCoffeaCyc1.1.1.0http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
CoffeaCyc1.1.0.0http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis viniferaVitisNethttp://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpaPoplarCyc2.0.0.0http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybridaPetuniaCyc2.1.1.0http://solcyc.solgenomics.net/PET/server.html?
Solanum melongenaSolaCyc1.2.0.0http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacumNicotianaCyc1.1.0.0http://solcyc.solgenomics.net/TOBACCO/server.html?
OrganismDatabaseVersionLocation
Arabidopsis thalianaAraCyc7.0.0.0http://www.arabidopsis.org/biocyc/index.jsp
AraCyc6.0.0.0http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonicaRiceCyc3.0.0.0http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolorSorghumCyc1.0.0.0http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatulaMedicCyc1.0.1.1http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicumLycoCyc2.0.1.1http://pathway.gramene.org/LYCO/class-tree?object=Pathways
LycoCyc2.0.0.0http://solcyc.solgenomics.net/LYCO/server.html
CapsicumCapCyc1.0.1.1http://pathway.gramene.org/CAP/class-tree?object=Pathways
CapCyc2.1.0.0http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosumPotatoCyc1.0.1.1http://pathway.gramene.org/POTATO/class-tree?object=Pathways
PotatoCyc1.1.0.0http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephoraCoffeaCyc1.1.1.0http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
CoffeaCyc1.1.0.0http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis viniferaVitisNethttp://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpaPoplarCyc2.0.0.0http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybridaPetuniaCyc2.1.1.0http://solcyc.solgenomics.net/PET/server.html?
Solanum melongenaSolaCyc1.2.0.0http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacumNicotianaCyc1.1.0.0http://solcyc.solgenomics.net/TOBACCO/server.html?
Table 1:

Overview of plant species specific metabolic pathway databases

OrganismDatabaseVersionLocation
Arabidopsis thalianaAraCyc7.0.0.0http://www.arabidopsis.org/biocyc/index.jsp
AraCyc6.0.0.0http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonicaRiceCyc3.0.0.0http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolorSorghumCyc1.0.0.0http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatulaMedicCyc1.0.1.1http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicumLycoCyc2.0.1.1http://pathway.gramene.org/LYCO/class-tree?object=Pathways
LycoCyc2.0.0.0http://solcyc.solgenomics.net/LYCO/server.html
CapsicumCapCyc1.0.1.1http://pathway.gramene.org/CAP/class-tree?object=Pathways
CapCyc2.1.0.0http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosumPotatoCyc1.0.1.1http://pathway.gramene.org/POTATO/class-tree?object=Pathways
PotatoCyc1.1.0.0http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephoraCoffeaCyc1.1.1.0http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
CoffeaCyc1.1.0.0http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis viniferaVitisNethttp://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpaPoplarCyc2.0.0.0http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybridaPetuniaCyc2.1.1.0http://solcyc.solgenomics.net/PET/server.html?
Solanum melongenaSolaCyc1.2.0.0http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacumNicotianaCyc1.1.0.0http://solcyc.solgenomics.net/TOBACCO/server.html?
OrganismDatabaseVersionLocation
Arabidopsis thalianaAraCyc7.0.0.0http://www.arabidopsis.org/biocyc/index.jsp
AraCyc6.0.0.0http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonicaRiceCyc3.0.0.0http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolorSorghumCyc1.0.0.0http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatulaMedicCyc1.0.1.1http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicumLycoCyc2.0.1.1http://pathway.gramene.org/LYCO/class-tree?object=Pathways
LycoCyc2.0.0.0http://solcyc.solgenomics.net/LYCO/server.html
CapsicumCapCyc1.0.1.1http://pathway.gramene.org/CAP/class-tree?object=Pathways
CapCyc2.1.0.0http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosumPotatoCyc1.0.1.1http://pathway.gramene.org/POTATO/class-tree?object=Pathways
PotatoCyc1.1.0.0http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephoraCoffeaCyc1.1.1.0http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
CoffeaCyc1.1.0.0http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis viniferaVitisNethttp://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpaPoplarCyc2.0.0.0http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybridaPetuniaCyc2.1.1.0http://solcyc.solgenomics.net/PET/server.html?
Solanum melongenaSolaCyc1.2.0.0http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacumNicotianaCyc1.1.0.0http://solcyc.solgenomics.net/TOBACCO/server.html?

The pathways section of Gramene database [15] (a database for grasses such as rice, maize, sorghum, barley, oats, wheat and rye) contains the known and predicted biochemical pathways of rice (RiceCyc) and sorghum (SorghumCyc), both of which are curated by the Gramene database and were built using the Pathway Tools’ PathoLogic module. The website also mirrors the known and predicted biochemical pathways from SolCyc, AraCyc, EcoCyc and the MetaCyc reference databases.

The ‘golden standard’ AraCyc for A. thaliana was built using the Pathway Tools' PathoLogic module with MetaCyc. AraCyc, in addition, uses manual curation to enrich its data. The trade-off is slower progress in completing the network, yet the end result is highly documented and has a more accurate structure. One can argue that databases are of higher quality when domain experts scrutinize the available literature and manually curate them. They can add their scientific experience and intuition to find facts in a way that any algorithm is yet to mimic. However, this all depends on the availability of such experts and for genome-wide projects it is certainly challenging to gather all potentially involved.

The success of AraCyc has led to a broader plant-centric rather than organism-centric initiative, the Plant Metabolic Network (PMN) (available at http://www.plantcyc.org/). This is a collaborative project to build a broad network of plant metabolic pathway databases. PlantCyc, that incorporates some data from MetaCyc, is the central feature of PMN and is a database containing manually curated or reviewed information about shared metabolic pathways present in more than 300 plant species. PlantCyc serves as a reference database, while PMN also contains single species/taxon based databases. Additionally, PMN has a small number of pathways that are known to be present in other organisms and are predicted to exist in plants.

‘Gene regulatory networks’ consist of transcription factors and the genes that they regulate. These networks comprise of protein–DNA interactions and may also include sRNA/miRNA and sRNA/miRNA target gene regulation. A regulatory network is formed by a series of events where regulation of one gene leads to the control of another. An example of a regulatory network database is the Arabidopsis Gene Regulatory Information Server (AGRIS) [16] which contains information on the transcription factors and cis-regulatory elements that are regulated by them in A. thaliana. AGRIS presently consists of three databases: AtcisDB, AtTFDB and AtRegNet. AtcisDB contains upstream regions of annotated A. thaliana genes and describes the experimentally validated and predicted cis-regulatory elements. AtTFDB holds information on the transcription factors grouped into 50 conserved domain families. AtRegNet describes direct interactions between transcription factors and target genes. AGRIS also contains a Regulatory Networks Interaction Module (ReIN), that allows creation, visualization and identification of regulatory networks in A. thaliana. While AGRIS contains data from sequence annotations, TRANSFAC [17] is a gene regulatory network database that contains data on transcription factors, their experimentally proven binding sites and the genes they regulate in 300 species. TRANSFAC is one of the few proprietary plant database resources in PathGuide.

PlantCARE [18] is a database of plant cis-acting regulatory elements where the data on the transcription sites are extracted from literature supplemented with predicted data. PlantCARE provides levels of confidence for experimental evidence, functional information and position of the promoter. Additionally, a plant DNA query sequence can be searched for cis-regulatory elements using a query tool in PlantCARE.

PlantTFDB [19] is a recently constructed database that contains transcription factors from 49 plant species, grouped into 58 families. Each transcription factor is comprehensively annotated with respect to functional domains, 3D structures, gene ontology, gene expression information from expressed sequence tags (ESTs) and microarrays and annotations from other databases.

AthaMap [20] is a genome-wide map of published or experimentally determined transcription factor binding sites (TFBS) in A. thaliana. It also includes predicted sites. AthaMap allows searching for a genomic sequence or a gene to display the potential TFBS. It also provides search functionality for user defined potential co-localization elements. Genes of interest can be analyzed for identification of common TFBSs. Conversely, genes that harbor specific TFBS can also be identified using AthaMap.

Gene co-expression network databases for plants are under development. Such databases contain information on co-expression of genes after examining a large number of experimental conditions. These can be used for identification of genes involved in a certain function, identification of cis-regulatory elements, construction of regulatory networks (although co-expression does not necessarily mean co-regulation [21]) and assist in many other biological problems. Some examples of gene co-expression networks and their applications are discussed in the Supplementary Data.

‘Protein–protein interaction pathways’ contain all interactions, stable or transient, between same or different proteins that are important for the functioning of a cell. Protein–protein interactions take place during protein modification, protein transport, protein oligomerization for activity/non-activity, chaperone assisted protein folding, signal transduction, etc. Protein–protein interaction pathways contain information on all these interactions. The A. thaliana protein interactome database (AtPID) is one such database [22]. It contains protein interaction pairs found through manual text mining or in silico predictions using various bioinformatics methods, along with protein pairs that have been confirmed.

It is now recognized that the experiments required to generate protein interaction data (e.g. yeast-two-hybrid systems) often give false positives as well as false negatives and hence it is important to use this type of data with caution. To discern whether a certain result is reliable, one needs to know the type of experiment and the conditions used, as well as details about the results. A rational assessment as to whether an interaction is truly possible in vivo can be made based on a variety of factors, including the domains involved in interaction and the type of interaction. The IntAct database [23], which contains protein–protein interaction information on several organisms including plant systems, includes such high level details.

Another database, the Predicted Arabidopsis Interactome Resource (PAIR)[24], predicts the potential interactions in A. thaliana using a support vector machine (SVM) model (a machine learning approach) and careful preparation of example data, selection of indirect evidence and a tight control of false positives. We believe that the PAIR database is currently the most accurate and comprehensive database on A. thaliana protein–protein interactions.

Combining interaction data generated through experimental and predictive methods increases the coverage of an interactome and can lead to more reliable information. When the same data is obtained through different methods one can reasonably expect more accurate data. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) [25] is a multi-organism (not limited to the kingdom Plantae) database that includes all available protein–protein interactions. It scores and weighs this information and augments it with predicted interactions and automated text-mining results. STRING includes both physical and functional information on the interactions. This adds an extra measure of reliability to the interaction data.

‘Signaling pathways’ comprise of molecular networks in the signal transduction cascade. These are involved in transmission of information from one part of the cell to another (intracellular, e.g. from the cytoplasm to the nucleus) or from one cell to another (intercellular, e.g. from one neuron to another). Extracellular stimuli can also bring about the activation or inhibition of a pathway and thus a change in the cellular environment. Signaling pathways often involve protein–protein interactions at different levels like protein modification (e.g. protein phosphorylation), protein translocation and protein complex formation or dissociation. Several signaling pathway databases, for example SPIKE [26], exist for non-plant eukaryotes. INOH (hosted at http://www.inoh.org/) is a signaling pathway database for Drosophila melanogaster. SignaLink (hosted at http://signalink.org/) is a cross-species database that includes pathways from human, D. melanogaster and Caenorhabditis elegans. In contrast, few plant signaling pathway databases exist and they lack the quality and efficiency in comparison to their non-plant counterparts. The DRASTIC [27] database resource for analysis of signal transduction in cells developed by the Scottish Crop Research Institute (SCRI) was one of the first relational databases in this area. It included ESTs and regulated genes in response to various environmental factors like pathogens, chemical exposure, drought, salt and low temperature. The data was collected from refereed journals. However, this reference resource is no longer available.

Recently, a database containing the stress response transcription factor database, STIFDB [28], has been created for A. thaliana. It contains the abiotic stress response genes that were found upregulated in microarray experiments, with options to identify possible transcription factor binding sites. PathoPlant [29, 30] is another relational database that contains components of signal transduction pathways related to plant pathogenesis. It also contains microarray data of genes expressed in response to pathogens.

There is a glaring need for plant signaling pathway databases that contain and regularly update all proven and potential/putative signaling pathways in plants as these are discovered. MAPK signaling cascades were discovered >15 years ago in plants [31]. Analogues of pathways that were only known in animals are now being found as well. For example, glutamate receptors (iGluRs) that are involved in excitatory neurotransmission pathways have been extensively studied in the animal kingdom and have been included in several pathway databases. Glutamate receptor-like proteins (GLRs) were reported in 1998 in A. thaliana [32]. Since then these proteins in A. thaliana and other plants have been suggested to be involved in a wide array of pathways, through transgenic plant studies or pharmacological studies. Suggested functions include Ca2+ allocation [33], carbon/nitrogen sensing [34], regulation of abscisic acid and water balance [35], coordinating mitosis in root apical meristem [36], light signal transduction [37] and resistance to fungal infections [38]. Both MAPKs and glutamate-like receptors from A. thaliana are included in a few plant pathway databases like AtPID. However, it is difficult for a biologist looking for pathways involved in resistance to fungal infections, for example, to come immediately across the glutamate receptor-like system or conversely to find all the plant pathways that glutamate receptor like-proteins are involved in by using a keyword. Such databases would be essential to ‘de-specialize’ information and make it available to a wider range of scientists. This also highlights the need for such databases to be freely available to allow biologists irrespective of the system/field that they work with (plant, animal, microbial and so on) with an interest in a particular pathway to retrieve all the relevant information available.

Signaling pathway mechanisms like sugar signaling [39], light signaling [40], jasmonate signaling [41] and their components have been discovered in plants and call for dedicated pathway databases. Looking at the signaling pathways and the properties that these affect in plants, it can be concluded that these pathways cross-connect. It is important to understand these pathways and to integrate this information with other databases in order to obtain a more complete picture which would then enable plant scientists to modulate certain plant properties without affecting other mechanisms and pathways.

Pathway visualization tools

Visualization of pathway data is important not only to understand the data, but also to analyze and to build valid hypotheses based on these data. To address these requirements, many pathway/network visualization tools have been constructed with different functionalities. The level of visualization that these tools offer range from simple two-dimensional pathway maps like those provide by KEGG, to three-dimensional and hierarchical visualizations in immersive virtual reality (C6) environments like those provided by MetNetGE [42]. Interactive visualization allows users to analyze, edit and modify the pathways based on their own experimental data, as is provided by GenMAPP [43]. Gehlenborg et al. [44] in their recent review have thoroughly reviewed available pathway visualization tools and have broadly divided these tools into two partly overlapping categories—tools focused on automated methods for interpreting and exploring large biological networks and tools focused on assembly and curation of pathways. Many of these tools integrate with public databases, allowing the users to analyze and visualize their own data. Another exhaustive overview of visualization tools has been presented by Suderman and Hallett [45]. For a critical evaluation of the requirements for biological visualization tools based on interviews conducted to understand the needs for pathway analysis, see ref. [46].

Pathway database evolution through integration

An individual pathway database holds a variety of information. This has proved to be challenging for scientists who want to access and use this information. Information is scattered across various databases that differ not only in the type of data they contain, but also the form in which they exist. Additionally, in an actual living cell, the pathways are vastly interconnected. Integration of pathway databases thus becomes imperative in order to understand a biological mechanism in its entirety. Researchers interested in a particular biological mechanism should be able to easily find and access all the data they need, without having to go through the difficult process of shifting data from different databases that are based on different platforms.

One of the biggest challenges to the integration of databases is their diversity. The existing databases have syntactic differences in the form of data file formats and retrieval methods and semantic differences in the terminologies and data models [47]. Several pathway database resources listed in Pathguide are not machine-readable. Machine-readability is an essential requirement for automatic data retrieval and processing. Recognition of these challenges has demanded increased efforts to establish pathway ontology standards for defining models. Systems Biology Markup Language (SBML) has presented itself as one such standard for storing and sharing of computational models of biological networks [48]. Another, named BioPAX [49] was developed for detailed pathway depiction and for permitting data exchange as used in the development of MetNet [50]. PSI-MI [51] allows data exchange for protein–protein interactions, while CellML [52] enables storage and exchange of computer based mathematical models. Other data exchange formats exist that are peripherally associated with network-data and can certainly serve as input for other software packages that determine such networks. The Chemical Markup Language (CML) can be used to describe small molecules and ligands that participate in networks [53], whereas the Protein Markup Language (ProML), along with its predecessor PDB, can be used to characterize larger binding-partners [54]. The Microarray Gene Expression Markup Language (MAGE-ML) can be used as input to determine gene co-expression networks under various conditions [55]. The Ondex eXchange Language (OXL) format claims superiority over a range of formats [56], but is more general and requires more coding to implement correctly. Finally, an Application Programming Interface (API) can be provided [57], but then each API requires some study of its peculiarities (as it applies to only one particular database) as well.

Providing an easy-to-use interface for end-users is challenging with formats that allow too many options. All standards are now being used by at least some pathway databases and are certainly steps in the right direction. While laudable efforts in their own right, the proliferation of different data formats creates its own problems: providers need to decide which formats to support and each format represents a laborious and resource-intensive effort. Therefore, many times data formats still need to be converted from one format into another [58].

Ongoing efforts to automate data access and retrieval make the process much simpler for a biologist. KEGG [59] is a comprehensive resource for metabolic pathways and contained data that were originally curated manually from literature and the pathways existed as simple drawings. All pathway maps in KEGG have been redrawn, using KegSketch. The resulting KGML+ files [60] are machine readable and editable.

Plant pathway database integration is a challenge as far fewer plant genomes have been sequenced compared to other life forms (which makes it more difficult to base inferences on homology) and the data resources on plant pathways are more dispersed [61]. The uniqueness of secondary metabolism that exists in many systems adds another layer of complexity. It is, therefore, even more important for plant pathway databases to start incorporating and supporting already existing standard formats for better integration of information and knowledge extraction. The positive side of having a limited number of plant pathway databases is that standardization needs to be applied to a smaller number of pathways. This entails less work than what would be required in other settings.

Supplementary Table S1 shows plant database resources available to date with a short description and other information like the availability of these databases, included organisms, whether the database is included in Pathguide, access to the database, data sources and standard formats supported (if any). As can be seen from Figures 1 and 2 and Table 1 and Supplementary Table S1, plant databases are still far from being overwhelmed with information and diversity load. This makes their standardization and implementation efforts much more realistic than for other systems. Furthermore, this in itself can pave the way for other systems to follow suit by learning from the successes and challenges of plant pathway database integration projects. It would therefore be a tremendously useful exercise for all upcoming plant pathway databases to start following universal standardization right from their conception. Perhaps journals should only accept the publication of databases that conform to—what we term as—Good Databasing Practice (GDbP) standards (Table 2), thereby forcing these to become standard practice. Such practices have already been incorporated for microarray and sequencing results.

Table 2:

Overview of Good Databasing Practices

Good databasing practiceUsefulness
Easy user accessEasy access for even the non-specialists
Integrated visualization toolsEase understanding and analysis of large data sets
Standard ontologyEase of data exchange
Possibility to integrate data from other databasesExpansion of available information
Proper documentation of stored data; provision of source and reliability of original dataPossibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another speciesUsing particular data with caution when inferring a pathway or an ortholog
Good user supportGood response time to user queries
Regular update/maintenanceUpdate information; removal of errors, bugs
Regular and professional data curation and annotationManual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event
Good databasing practiceUsefulness
Easy user accessEasy access for even the non-specialists
Integrated visualization toolsEase understanding and analysis of large data sets
Standard ontologyEase of data exchange
Possibility to integrate data from other databasesExpansion of available information
Proper documentation of stored data; provision of source and reliability of original dataPossibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another speciesUsing particular data with caution when inferring a pathway or an ortholog
Good user supportGood response time to user queries
Regular update/maintenanceUpdate information; removal of errors, bugs
Regular and professional data curation and annotationManual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event
Table 2:

Overview of Good Databasing Practices

Good databasing practiceUsefulness
Easy user accessEasy access for even the non-specialists
Integrated visualization toolsEase understanding and analysis of large data sets
Standard ontologyEase of data exchange
Possibility to integrate data from other databasesExpansion of available information
Proper documentation of stored data; provision of source and reliability of original dataPossibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another speciesUsing particular data with caution when inferring a pathway or an ortholog
Good user supportGood response time to user queries
Regular update/maintenanceUpdate information; removal of errors, bugs
Regular and professional data curation and annotationManual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event
Good databasing practiceUsefulness
Easy user accessEasy access for even the non-specialists
Integrated visualization toolsEase understanding and analysis of large data sets
Standard ontologyEase of data exchange
Possibility to integrate data from other databasesExpansion of available information
Proper documentation of stored data; provision of source and reliability of original dataPossibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another speciesUsing particular data with caution when inferring a pathway or an ortholog
Good user supportGood response time to user queries
Regular update/maintenanceUpdate information; removal of errors, bugs
Regular and professional data curation and annotationManual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event

Applications of pathway database integration

Pathway database integration yields many potential advantages for the biologist and software developer alike. If successful, numerous applications will follow, many of which will be surprising or even unthinkable today. To better appreciate the potential of integration, a few case studies from other fields are presented.

One study [62] integrated data from three metabolic pathways—fatty acid synthesis genes from Arabidopsis Lipid Gene Database [63] (http://lipids.plantbiology.msu.edu/), starch metabolism genes from Starch Metabolism Network project (http://www.starchmetnet.org/) and the original references for leucine catabolism—with transcriptomics data, leading to a picture that no individual study was able to show by itself. The integration revealed that each of these pathways is structured as a co-expressed module with the possibility that these modules exist in a hierarchical organization. The transcripts from each module co-accumulate over a wide range of environmental and genetic perturbations and developmental stages.

In another case study [61], A. thaliana pathways from protein interaction databases were integrated with co-expression data using the Ondex system (http://www.ondex.org/). This method enabled the determination of co-expression of the interacting protein partners and the levels of expression.

An interesting example of using database integration to obtain enhanced information about a system is AraGEM [64]. AraGEM is an attempt at building genome scale reconstruction of the primary metabolic network in A. thaliana. It used A. thaliana metabolic genome information from KEGG as a core enriched with information on the cellular compartmentalization of metabolic pathways from literature and, apart from others, databases like AraPerox [65] and Arabidopsis information resource TAIR [66]. A total of 75 essential primary metabolism reactions were identified for which genetic information was unknown. The resulting genome-scale model was then used to construct a metabolic flux model of plant metabolism representing both photosynthetic and non-photosynthetic cell types. The model was validated by simulation of plant metabolic functions inferred from literature. AraGEM exemplifies how genome-scale models can be first built and then used to explore highly complex and compartmentalized eukaryotic networks and to construct and examine testable, non-trivial hypotheses.

A thorough literature search on plant pathways and newly discovered mechanisms can enable design of new applications through database integration. In plants, for example, hormonal and defense signaling pathways have been found to cross-talk through identical components [67]. An integration of these two types of information can point towards new targets to counteract the microbial components that decrease plant resistance and lead to disease.

Additional examples of applications of database integration are presented in the supplementary material.

Non-plant references and opportunities for the future

Human databases have already benefitted from integration of information from different pathway databases. For example, a meta-analysis study of Type-2 diabetes was conducted to find different genes that are involved in the disease. Various types of data were used: medical reviews, phenotype information, proteome analysis results, candidate gene lists from previous studies, differential gene expression and time series microarray studies [68]. The study also incorporated information from several pathway databases including KEGG, Reactome [69], BioCyc [70], GO [71], IntAct and TRANSFAC to add pathway information and to derive cellular network information on these genes. This allowed identification of 213 genes with overall disease relevance indicating common, tissue-independent processes related to the disease and also identified genes showing changes with respect to a single study.

In another study [72], an integrated human interactome network was constructed using physical and direct binary protein–protein interactions. Data were retrieved from a variety of sources: Biomolecular Interaction Database (BIND), BioGRID, DIP, GeneRIG, IntAct, MINT and Reactome. All of these play a particular role in the integration scheme. BIND [73] contains data from large-scale cell mapping studies and molecular interactions in PDB. BioGRID [74] has protein and genetic interaction information as well as information from primary literature. DIP [75] contains experimentally determined protein–protein interactions. Gene reference into function (GeneRIF) [76] contains short text about curated articles that are relevant to known genes. IntAct contains highly curated interaction data from literature or direct deposition by experienced curators. MINT [77] focuses on experimentally verified protein–protein interactions and Reactome is a knowledgebase containing interaction data in different pathways. The Hepatitis C virus (HCV)-host infection network that was generated experimentally and from text mining was also incorporated on top of this integrated interactome network–—a type of meta-integration. This led to the identification of previously unknown, novel functional pathways of HCV biology and its pathogenesis. One could extrapolate the advantages of a similar approach followed for crop plant systems and pathogens that could then divulge information on plant host–pathogen interactions and the pathways involved in pathogenesis. This could lead to development of methods to bestow pathogen resistance on crop plants or target these pathways against the pathogen.

Not only can plant science benefit from the animal pathway database and integration examples, animal biologists can in turn benefit from the study of plant pathways by asking the question whether pathways discovered only in plants to date also exist in animals or how similar or different are the pathway networks that exist both in plants and animals. Many opportunities become available through such a feedback loop: can we unlock more evolutionary secrets? Can we become better at harnessing plants for our use or could human diseases be experimentally modeled in plants if common pathways do indeed exist for plants and animals? Applications are endless and the potential for knowledge creation extreme.

A survey of integrated pathway databases and tools

Two approaches exist to perform database integration: through the use of tools and through already integrated databases [78] (that hopefully get rebuilt periodically to stay current). Pathway database integration tools along with integrated pathway databases play a very important role in easing data integration for biologists. These tools can also be used for various other purposes like data visualization, pathway prediction, pathway gap-fillers and biological network analysis. Applications of pathway databases and tools help further knowledge of the pathways and on the inner workings of living systems.

Pathway database tools for plant systems are important because of the widely dispersed information within several databases and a lack of consistency among these databases. A growing need exists to bring this information together in a standard format to aid access and model-building. Plants show more heterogeneity among different species (e.g. in terms of secondary metabolism [79]). This makes it even more important to integrate pathway data for all important plant species and to design tools that would aid in pointing out interspecies similarities and differences.

A separate version of Reactome, Arabidopsis Reactome, [80] represents a knowledgebase of biological processes in A. thaliana and several other plant species. It integrates pathway information curated in-house, as well as from KEGG and AraCyc. It also provides a platform to navigate and discover interconnected pathways in A. thaliana. The data model of Arabidopsis Reactome uses reactions and their interconnections; it treats protein modifications, proteins localized in different compartments, as well as protein complexes, as entities on their own. It furthermore allows generalization of protein isoforms, paralogues and splice variants with a possibility of tracing these components back. The model contains both real and inferred data along with proper annotations that allow distinction between the two.

Tools like CORNET [81] help integrate A. thaliana related microarray expression data. The data sets for CORNET were obtained from Gene Expression Omnibus (GEO) [82] and from experiments carried out on Affymetrix ATH1 arrays. Also retrieved were the corresponding meta-data (which is unstructured and hence cumbersome to retrieve and parse automatically), including information about sample tissues, treatments and sampling time points, protein interaction data, localization data and functional information. The meta-data have manually assigned ontology terms using Plant ontology [83–85], the Microarray gene expression data (MGED) ontology (MO) [86] and the Plant environmental ontology (EO) (www.gramene.org/plant_ontology/index.html#eo). Protein–protein interactions were obtained from BIND, IntAct, BioGRID, DIP, MINT, TAIR. Predicted PPIs were obtained from the BAR Arabidopsis interaction viewer [87] and AtPID. Information was also obtained from their own study [88]. Localization data were obtained from SUBA [89], iPSORT [90], LOCtree [91], MITOPRED [92], MitoProt [93], MultiLoc [94], PeroxiP [95], Predotar [96], SubLoc [97], TargetP [98] and WoLF_PSORT [99]—Table S2 provides a short description of these resources. CORNET includes all available data along with related meta-data. The tool then provides a reliability score for each result based on the search options, parameters and thresholds used (supplied by the user). A visualization tool additionally allows the users to distinguish more reliable predictions from less predictable ones.

CORNET aims to provide functional context to genes and conversely, to provide an ability to predict functions of genes that have unknown functions. It is a tool that could also, in the future, use the information on A. thaliana to extrapolate networks in other plant species.

Many pathway resources use only the general localization predictors. In contrast, CORNET has made an attempt to also use species-specific localization information. Thus, CORNET uses localization data from both ‘general’ localization predictors and from an A. thaliana specific localization database SUBA, which was the only species-specific resource available then. SUBA contains data retrieved from literature, experiments and from prediction tools. It has become clearer over time that use of organism-specific predictors and multiple (general) predictors are likely to lead to more accurate predicted localization [100–103]. Predictions from general predictors may not be suitable for predicting localization of an individual organism as these prediction tools are trained on proteins from a variety of organisms (and can suffer from sampling bias). Localization data from any single predictor needs to be treated with caution keeping in mind that inclusion of false positives into the integrated databases would result in amplification of the wrong information. Fortunately for plants, some organism-specific localization predictors have recently become available, e.g. AtSubP (Arabidopsis)[103] and RSLpred (rice) [104]. These should be used while integrating pathway information for the respective species. If a tool similar to CORNET is developed for rice, RSLpred would definitely be an important resource for protein localization data. A need for localization predictors specific to a variety of plants cannot be emphasized enough for a more reliable extrapolation of networks.

The ‘MetNet’ platform contains both metabolic and regulatory networks of A. thaliana, soybean [50] and grapevine. It is an attempt to integrate metabolic data from AraCyc and regulatory data from AGRIS, with additional manually curated signal transduction pathways (in A. thaliana). The pathway information is integrated with other resources like TAIR, GO-classifications (retrieved through TAIR) and MapMan [105] that supply gene related information. Protein information is obtained from PPDB [106], AMPDB [107], AtNoPDB [108], AraPerox, PLprot [109], SUBA and BRENDA [110]. These also provide the subcellular localization information for the entities. Metabolite data from ChEBI [111], PubChem [112], KEGG, NCI [113] are also integrated into the database. As there are large holes in the information on the function of a large number of genes in A. thaliana, MetNet is aimed at formulating testable hypotheses. MetNet supports various types of users and data retrieval methods. MetNet Online (available at http://metnetonline.org/) is an online interface to MetNet. MetNetAPI is an Application Programming Interface to the platform that facilitates automated data retrieval [57] and a plug-in exists for the CellDesigner environment [114].

‘VitisNet’ [115] is a web-based tool for grapevine (Vitis vinifera) that integrates metabolomic, proteomic and transcriptomic pathway information within molecular networks like metabolic or signaling networks and presents a molecular network model. VitisNet allows visualization of genes and biochemical pathways involved in growth, fruiting cycles and environmental stress response. Data from VitisNet is now also available in MetNet.

‘Metacrop’ [116] contains manually curated metabolic pathway information in crop plants (with special emphasis on seeds and tubers), along with a wide variety of other factors like reactions, location, transport processes, kinetics, taxonomy and literature. MetaCrop has an easy to use web interface and allows automatic export of information for creation of metabolic models.

Pathway database maintenance—an easily overlooked detail

Although Pathguide lists more than 300 pathway resources, at least 30 of these databases and resources are no longer functional. At the time of writing this review (October 2010), inaccessible databases ‘not’ marked as non-functional in Pathguide include aMAZE [117,118], Sentra [119] and EMP [120] among others. Other databases may change location. During the preparation of this article, this happened with AtPID. The publication on AtPID is now destined to refer to an incorrect URL. Several of these databases contained high quality data and unavailability of databases is a loss from several angles. For example, aMAZE boasted an excellent data model. It could deal with metabolic, protein–protein interaction, gene regulation, sub-cellular localization, signal transduction and transport and thus had the capacity to integrate a large variety of data. Its current absence is a significant loss to the scientific community at large. While papers do exist for many of these projects, the technical details of an implementation can often only be obtained through communication with the implementing team. This effectively means that if anyone else ventures to do the same elsewhere in the world, they will have to retrace the time and steps to achieve the quality of aMAZE. Similarly, Arabidopsis Reactome is another dedicated database on A. thaliana, which is currently no longer being developed as the continuation of this project requires new funding initiatives.

Due to their ever expanding and evolving nature, pathway databases (like any other scientific database) need to be maintained, curated and developed on a long-term basis. Finding financial support for long-term maintenance of pathway databases is a challenging task. One possibility is to raise funds by establishing license purchase requirements for the use of databases, but this restricts open access to the information contained therein and can thus hinder the development of the field [121]. In addition, this is unfeasible for smaller projects that attract limited attention, but may be useful as part of integration efforts. Solutions are needed to ensure provision of continued funding for especially promising databases (without promoting an uncontrolled proliferation of new platforms) and avoid the loss of valuable information in established resources. Loss of such databases is not only a loss to the scientific community, but also is a waste of resources that have been spent on the creation and development of an excellent database in the first place. Funding agencies could, for example, provide continued funding to the database projects that they have already funded provided that the projects follow the GDbP standards which are continually and rigorously monitored and reported by an independent workgroup. Another solution could be an integration of especially promising databases into more permanent structures such as Gramene or NCBI.

The Arabidopsis Information Resource (TAIR) funding can serve as a recent example of search for alternative funding sources. NSF funding for TAIR would phase out over the next 3 years (http://www.nature.com/news/2009/091118/full/462258b.html). For its continued maintenance, TAIR has recently come up with a corporate sponsorship program. The idea is to avoid subscription requirements for the corporate sector and thus keep the resource open and free of login requirements, thereby allowing continued open access to the data for all scientists. TAIR has already secured several corporate sponsors through this program. Such programs would certainly help survival of at least some databases. However, this is not a real alternative to public funding as such a solution could end up introducing a corporate bias into the system—only those database would survive that are able to find corporate sponsorship. Various funding models for these community resources (that are not necessarily research-projects in their own right) have recently received more attention [122, 123]. These could be applied for plant pathway database integration and maintenance. Funding a community resource requires a different approach compared to more conventional research projects. Various scenarios for databases need to be discussed and changed, a recommendation also posited by Bastow and Leonelli [123].

CONCLUSION

Pathway databases play an important role in advancing our knowledge of the biological functions and mechanisms. Increased understanding of living systems as a whole can, in turn, aid successful application design in silico, in vitro and in vivo.

Plants are important as veritable food, drug and fuel sources, as well as bioremediation and biotechnological tools. This provides a strong incentive to create better, more integrated and easily accessible plant pathway databases. Such efforts would lead to discovery and elucidation of the yet unknown components involved in various pathways and their function. This would also result in the creation of testable models that can further enrich the knowledge on plant systems. This then could lead to the design of more specialized intervention technologies along with potential commercial applications: innovation as a result of integration.

SUPPLEMENTARY DATA

Supplementary data are available online at http://bib.oxfordjournals.org/.

Key Points

  • Considering the importance of plants in human life and in general all life on Earth, plant pathway databases (as well as supporting their continued existence after creation) deserve more attention than they have thus far received.

  • Plant pathway databases, being fewer in number, are more amenable to format standardization and data integration. Upcoming plant pathway databases should strive to follow standard formats and aim at database integration from the start. This can be facilitated through the formulation of Good Database Practice (GDbP).

  • Plant pathway database integration could help expand both basic and applied aspects of information utilization.

  • It is important that pathway databases/tools/resources be regularly curated and maintained. This demands improved strategies to provide continual financial support. The loss of a good database is expensive in terms of the years of hard work required to make a good database/tool/resource in the first place.

Acknowledgements

The authors are thankful to Prof. Eve Syrkin Wurtele and Dr Julie Dickerson at Iowa State University, who supported us in a number of ways with encouragement, sound advice, guidance and lots of good ideas. The authors thank the anonymous reviewers for their constructive criticisms and suggestions which have helped improve this article.

FUNDING

This material is based upon work supported by the National Science Foundation under Awards EEC-0813570 and MCB-0951170.

References

1
Bader
GD
Cary
MP
Sander
C
,
Pathguide: a pathway resource list
Nucleic Acids Res
,
2006
, vol.
34
(pg.
D504
-
6
)
2
Korc
M
,
Pathways for aberrant angiogenesis in pancreatic cancer
Mol Cancer
,
2003
, vol.
2
pg.
8
3
Tsiantis
M
Brown
MI
Skibinski
G
et al.
,
Disruption of auxin transport is associated with aberrant leaf development in maize
Plant Physiol
,
1999
, vol.
121
(pg.
1163
-
8
)
4
Kelley
BP
Sharan
R
Karp
RM
et al.
,
Conserved pathways within bacteria and yeast as revealed by global protein network alignment
Proc Natl Acad Sci USA
,
2003
, vol.
100
(pg.
11394
-
9
)
5
Galperin
MY
Koonin
EV
,
Who's your neighbor? New computational approaches for functional genomics
Nat Biotechnol
,
2000
, vol.
18
(pg.
609
-
13
)
6
Bumgarner
RE
Yeung
KY
,
Methods for the inference of biological pathways and networks
Methods Mol Biol
,
2009
, vol.
541
(pg.
225
-
45
)
7
Pradines
J
Rudolph-Owen
L
Hunter
J
et al.
,
Detection of activity centers in cellular pathways using transcript profiling
J Biopharm Stat
,
2004
, vol.
14
(pg.
701
-
21
)
8
Apic
G
Ignjatovic
T
Boyer
S
et al.
,
Illuminating drug discovery with biological pathways
FEBS Lett
,
2005
, vol.
579
(pg.
1872
-
7
)
9
Schreiber
SL
,
Target-oriented and diversity-oriented organic synthesis in drug discovery
Science
,
2000
, vol.
287
(pg.
1964
-
9
)
10
Dixon
N
Duncan
JN
Geerlings
T
et al.
,
Reengineering orthogonally selective riboswitches
Proc Natl Acad Sci USA
,
2010
, vol.
107
(pg.
2830
-
5
)
11
Mueller
LA
Zhang
P
Rhee
SY
,
AraCyc: a biochemical pathway database for Arabidopsis
Plant Physiol
,
2003
, vol.
132
(pg.
453
-
60
)
12
Huang
YJ
Hang
D
Lu
LJ
et al.
,
Targeting the human cancer pathway protein interaction network by structural genomics
Mol Cell Proteomics
,
2008
, vol.
7
(pg.
2048
-
60
)
13
Lynn
DJ
Winsor
GL
Chan
C
et al.
,
InnateDB: facilitating systems-level analyses of the mammalian innate immune response
Mol Syst Biol
,
2008
, vol.
4
pg.
218
14
Caspi
R
Foerster
H
Fulcher
CA
et al.
,
The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D623
-
31
)
15
Liang
C
Jaiswal
P
Hebbard
C
et al.
,
Gramene: a growing plant comparative genomics resource
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D947
-
53
)
16
Davuluri
RV
Sun
H
Palaniswamy
SK
et al.
,
AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors
BMC Bioinformatics
,
2003
, vol.
4
pg.
25
17
Matys
V
Fricke
E
Geffers
R
et al.
,
TRANSFAC: transcriptional regulation, from patterns to profiles
Nucleic Acids Res
,
2003
, vol.
31
(pg.
374
-
8
)
18
Lescot
M
Dehais
P
Thijs
G
et al.
,
PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences
Nucleic Acids Res
,
2002
, vol.
30
(pg.
325
-
7
)
19
Guo
AY
Chen
X
Gao
G
et al.
,
PlantTFDB: a comprehensive plant transcription factor database
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D966
-
9
)
20
Bulow
L
Steffens
NO
Galuschka
C
et al.
,
AthaMap: from in silico data to real transcription factor binding sites
In Silico Biol
,
2006
, vol.
6
(pg.
243
-
52
)
21
Stuart
JM
Segal
E
Koller
D
et al.
,
A gene-coexpression network for global discovery of conserved genetic modules
Science
,
2003
, vol.
302
(pg.
249
-
55
)
22
Cui
J
Li
P
Li
G
et al.
,
AtPID: Arabidopsis thaliana protein interactome database--an integrative platform for plant systems biology
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D999
-
1008
)
23
Aranda
B
Achuthan
P
Alam-Faruque
Y
et al.
,
The IntAct molecular interaction database in (2010)
Nucleic Acids Res
,
2010
, vol.
38
(pg.
D525
-
31
)
24
Lin
M
Shen
X
Chen
X
,
PAIR: the predicted Arabidopsis interactome resource
Nucleic Acids Res
,
2011
 
[Epub ahead of print 2010]
25
Jensen
LJ
Kuhn
M
Stark
M
et al.
,
STRING 8--a global view on proteins and their functional interactions in 630 organisms
Nucleic Acids Res
,
2009
, vol.
37
(pg.
D412
-
6
)
26
Elkon
R
Vesterman
R
Amit
N
et al.
,
SPIKE--a database, visualization and analysis tool of cellular signaling pathways
BMC Bioinformatics
,
2008
, vol.
9
pg.
110
27
Button
DK
Gartland
KM
Ball
LD
et al.
,
DRASTIC--INSIGHTS: querying information in a plant gene expression database
Nucleic Acids Res
,
2006
, vol.
34
(pg.
D712
-
6
)
28
Shameer
K
Ambika
S
Varghese
SM
et al.
,
STIFDB-Arabidopsis Stress Responsive Transcription Factor DataBase
Int J Plant Genomics
,
2009
, vol.
2009
pg.
583429
29
Bulow
L
Schindler
M
Choi
C
et al.
,
PathoPlant: a database on plant-pathogen interactions
In Silico Biol
,
2004
, vol.
4
(pg.
529
-
36
)
30
Bulow
L
Schindler
M
Hehl
R
,
PathoPlant: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses
Nucleic Acids Res
,
2007
, vol.
35
(pg.
D841
-
5
)
31
Rodriguez
MC
Petersen
M
Mundy
J
,
Mitogen-activated protein kinase signaling in plants
Annu Rev Plant Biol
,
2010
, vol.
61
(pg.
621
-
49
)
32
Lam
HM
Chiu
J
Hsieh
MH
et al.
,
Glutamate-receptor genes in plants
Nature
,
1998
, vol.
396
(pg.
125
-
6
)
33
Kim
SA
Kwak
JM
Jae
SK
et al.
,
Overexpression of the AtGluR2 gene encoding an Arabidopsis homolog of mammalian glutamate receptors impairs calcium utilization and sensitivity to ionic stress in transgenic plants
Plant Cell Physiol
,
2001
, vol.
42
(pg.
74
-
84
)
34
Kang
J
Turano
FJ
,
The putative glutamate receptor 1.1 (AtGLR1.1) functions as a regulator of carbon and nitrogen metabolism in Arabidopsis thaliana
Proc Natl Acad Sci USA
,
2003
, vol.
100
(pg.
6872
-
7
)
35
Kang
J
Mehta
S
Turano
FJ
,
The putative glutamate receptor 1.1 (AtGLR1.1) in Arabidopsis thaliana regulates abscisic acid biosynthesis and signaling to control development and water loss
Plant Cell Physiol
,
2004
, vol.
45
(pg.
1380
-
9
)
36
Li
J
Zhu
S
Song
X
et al.
,
A rice glutamate receptor-like gene is critical for the division and survival of individual cells in the root apical meristem
Plant Cell
,
2006
, vol.
18
(pg.
340
-
9
)
37
Brenner
ED
Martinez-Barboza
N
Clark
AP
et al.
,
Arabidopsis mutants resistant to S(+)-beta-methyl-alpha, beta-diaminopropionic acid, a cycad-derived glutamate receptor agonist
Plant Physiol
,
2000
, vol.
124
(pg.
1615
-
24
)
38
Kang
S
Kim
HB
Lee
H
et al.
,
Overexpression in Arabidopsis of a plasma membrane-targeting glutamate receptor from small radish increases glutamate-mediated Ca2+ influx and delays fungal infection
Mol Cells
,
2006
, vol.
21
(pg.
418
-
27
)
39
Bolouri-Moghaddam
MR
Le Roy
K
Xiang
L
et al.
,
Sugar signalling and antioxidant network connections in plant cells
FEBS J
,
2010
, vol.
277
(pg.
2022
-
37
)
40
Chory
J
,
Light signal transduction: an infinite spectrum of possibilities
Plant J
,
2010
, vol.
61
(pg.
982
-
991
)
41
Chung
HS
Niu
Y
Browse
J
et al.
,
Top hits in contemporary JAZ: an update on jasmonate signaling
Phytochemistry
,
2009
, vol.
70
(pg.
1547
-
59
)
42
Jia
M
Choi
SY
Reiners
D
et al.
,
MetNetGE: interactive views of biological networks and ontologies
BMC Bioinformatics
,
2010
, vol.
11
pg.
469
43
Dahlquist
KD
Salomonis
N
Vranizan
K
et al.
,
GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways
Nat Genet
,
2002
, vol.
31
(pg.
19
-
20
)
44
Gehlenborg
N
O'Donoghue
SI
Baliga
NS
et al.
,
Visualization of omics data for systems biology
Nat Methods
,
2010
, vol.
7
(pg.
S56
-
68
)
45
Suderman
M
Hallett
M
,
Tools for visually exploring biological networks
Bioinformatics
,
2007
, vol.
23
(pg.
2651
-
9
)
46
Saraiya
P
North
C
Duca
K
,
Visualizing biological pathways: requirements analysis, systems evaluation and research agenda
Inf Vis
,
2005
, vol.
4
(pg.
191
-
205
)
47
Cary
MP
Bader
GD
Sander
C
,
Pathway information for systems biology
FEBS Lett
,
2005
, vol.
579
(pg.
1815
-
20
)
48
Hucka
M
Finney
A
Sauro
HM
et al.
,
The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models
Bioinformatics
,
2003
, vol.
19
(pg.
524
-
31
)
49
Luciano
JS
,
PAX of mind for pathway researchers
Drug Discov Today
,
2005
, vol.
10
(pg.
937
-
42
)
50
Wurtele
ES
Li
L
Berleant
D
et al.
Nikolau
BJ
Wurtele
ES
,
MetNet: Systems Biology Software for Arabidopsis
Concepts in Plant Metabolomics
,
2007
Springer
(pg.
145
-
58
)
51
Hermjakob
H
Montecchi-Palazzi
L
Bader
G
et al.
,
The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data
Nat Biotechnol
,
2004
, vol.
22
(pg.
177
-
83
)
52
Lloyd
CM
Halstead
MD
Nielsen
PF
,
CellML: its future, present and past
Prog Biophys Mol Biol
,
2004
, vol.
85
(pg.
433
-
50
)
53
Liao
YM
Ghanadan
H
,
The chemical markup language
Anal Chem
,
2002
, vol.
74
(pg.
389A
-
90A
)
54
Hanisch
D
Zimmer
R
Lengauer
T
,
ProML--the protein markup language for specification of protein sequences, structures and families
In Silico Biol
,
2002
, vol.
2
(pg.
313
-
24
)
55
Spellman
PT
Miller
M
Stewart
J
et al.
,
Design and implementation of microarray gene expression markup language (MAGE-ML)
Genome Biol
,
2002
, vol.
3
pg.
RESEARCH0046
56
Taubert
J
Sieren
KP
Hindle
M
et al.
,
The OXL format for the exchange of integrated datasets
J Integr Bioinf
,
2007
, vol.
4
(pg.
62
-
75
)
57
Sucaet
Y
Wurtele
ES
,
MetNetAPI: A flexible method to access and manipulate biological network data from MetNet
BMC Res Notes
,
2010
, vol.
3
pg.
312
58
Heath
AP
Kavraki
LE
,
Computational challenges in systems biology
Comput Sci Rev
,
2009
, vol.
3
(pg.
1
-
17
)
59
Kanehisa
M
Goto
S
,
KEGG: kyoto encyclopedia of genes and genomes
Nucleic Acids Res
,
2000
, vol.
28
(pg.
27
-
30
)
60
Kanehisa
M
Goto
S
Furumichi
M
et al.
,
KEGG for representation and analysis of molecular networks involving diseases and drugs
Nucleic Acids Res
,
2010
, vol.
38
(pg.
D355
-
60
)
61
Lysenko
A
Hindle
MM
Taubert
J
et al.
,
Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases
Brief Bioinform
,
2009
, vol.
10
(pg.
676
-
93
)
62
Mentzen
WI
Peng
J
Ransom
N
et al.
,
Articulation of three core metabolic processes in Arabidopsis: fatty acid biosynthesis, leucine catabolism and starch metabolism
BMC Plant Biol
,
2008
, vol.
8
pg.
76
63
Beisson
F
Koo
AJ
Ruuska
S
et al.
,
Arabidopsis genes involved in acyl lipid metabolism. A 2003 census of the candidates, a study of the distribution of expressed sequence tags in organs, and a web-based database
Plant Physiol
,
2003
, vol.
132
(pg.
681
-
97
)
64
de Oliveira Dal'Molin
CG
Quek
LE
Palfreyman
RW
et al.
,
AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis
Plant Physiol
,
2010
, vol.
152
(pg.
579
-
89
)
65
Reumann
S
Ma
C
Lemke
S
et al.
,
AraPerox. A database of putative Arabidopsis proteins from plant peroxisomes
Plant Physiol
,
2004
, vol.
136
(pg.
2587
-
608
)
66
Poole
RL
,
The TAIR database
Methods Mol Biol
,
2007
, vol.
406
(pg.
179
-
212
)
67
Grant
MR
Jones
JD
,
Hormone (dis)harmony moulds plant health and disease
Science
,
2009
, vol.
324
(pg.
750
-
752
)
68
Rasche
A
Al-Hasani
H
Herwig
R
,
Meta-analysis approach identifies candidate genes and associated molecular networks for type-2 diabetes mellitus
BMC Genomics
,
2008
, vol.
9
pg.
310
69
Matthews
L
Gopinath
G
Gillespie
M
et al.
,
Reactome knowledgebase of human biological pathways and processes
Nucleic Acids Res
,
2009
, vol.
37
(pg.
D619
-
22
)
70
Karp
PD
Ouzounis
CA
Moore-Kochlacs
C
et al.
,
Expansion of the BioCyc collection of pathway/genome databases to 160 genomes
Nucleic Acids Res
,
2005
, vol.
33
(pg.
6083
-
9
)
71
Ashburner
M
Ball
CA
Blake
JA
et al.
,
Gene ontology: tool for the unification of biology. The Gene Ontology Consortium
Nat Genet
,
2000
, vol.
25
(pg.
25
-
9
)
72
de Chassey
B
Navratil
V
Tafforeau
L
et al.
,
Hepatitis C virus infection protein network
Mol Syst Biol
,
2008
, vol.
4
pg.
230
73
Bader
GD
Betel
D
Hogue
CW
,
BIND: the Biomolecular Interaction Network Database
Nucleic Acids Res
,
2003
, vol.
31
(pg.
248
-
50
)
74
Stark
C
Breitkreutz
BJ
Reguly
T
et al.
,
BioGRID: a general repository for interaction datasets
Nucleic Acids Res
,
2006
, vol.
34
(pg.
D535
-
9
)
75
Xenarios
I
Salwinski
L
Duan
XJ
et al.
,
DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions
Nucleic Acids Res
,
2002
, vol.
30
(pg.
303
-
5
)
76
Lu
Z
Cohen
KB
Hunter
L
,
GeneRIF quality assurance as summary revision
Pac Symp Biocomput
,
2007
(pg.
269
-
80
)
77
Chatr-aryamontri
A
Ceol
A
Palazzi
LM
et al.
,
MINT: the Molecular INTeraction database
Nucleic Acids Res
,
2007
, vol.
35
(pg.
D572
-
4
)
78
Neerincx
PB
Leunissen
JA
,
Evolution of web services in bioinformatics
Brief Bioinform
,
2005
, vol.
6
(pg.
178
-
188
)
79
Schwab
W
,
Metabolome diversity: too few genes, too many metabolites?
Phytochemistry
,
2003
, vol.
62
(pg.
837
-
49
)
80
Tsesmetzis
N
Couchman
M
Higgins
J
et al.
,
Arabidopsis reactome: a foundation knowledgebase for plant systems biology
Plant Cell
,
2008
, vol.
20
(pg.
1426
-
36
)
81
De Bodt
S
Carvajal
D
Hollunder
J
et al.
,
CORNET: a user-friendly tool for data mining and integration
Plant Physiol
,
2010
, vol.
152
(pg.
1167
-
79
)
82
Barrett
T
Edgar
R
,
Gene expression omnibus: microarray data storage, submission, retrieval, and analysis
Methods Enzymol
,
2006
, vol.
411
(pg.
352
-
69
)
83
Bruskiewich
R
Coe
EH
Jaiswal
P
et al.
,
The plant ontology consortium and plant ontologies
Comp Funct Genomics
,
2002
, vol.
3
(pg.
137
-
42
)
84
Avraham
S
Tung
CW
Ilic
K
et al.
,
The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D449
-
54
)
85
Ilic
K
Kellogg
EA
Jaiswal
P
et al.
,
The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant
Plant Physiol
,
2007
, vol.
143
(pg.
587
-
99
)
86
Whetzel
PL
Parkinson
H
Causton
HC
et al.
,
The MGED Ontology: a resource for semantics-based description of microarray experiments
Bioinformatics
,
2006
, vol.
22
(pg.
866
-
73
)
87
Geisler-Lee
J
O'Toole
N
Ammar
R
et al.
,
A predicted interactome for Arabidopsis
Plant Physiol
,
2007
, vol.
145
(pg.
317
-
29
)
88
De Bodt
S
Proost
S
Vandepoele
K
et al.
,
Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression
BMC Genomics
,
2009
, vol.
10
pg.
288
89
Heazlewood
JL
Verboom
RE
Tonti-Filippini
J
et al.
,
SUBA: the Arabidopsis Subcellular Database
Nucleic Acids Res
,
2007
, vol.
35
(pg.
D213
-
8
)
90
Bannai
H
Tamada
Y
Maruyama
O
et al.
,
Extensive feature detection of N-terminal protein sorting signals
Bioinformatics
,
2002
, vol.
18
(pg.
298
-
305
)
91
Nair
R
Rost
B
,
Mimicking cellular sorting improves prediction of subcellular localization
J Mol Biol
,
2005
, vol.
348
(pg.
85
-
100
)
92
Guda
C
Fahy
E
Subramaniam
S
,
MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins
Bioinformatics
,
2004
, vol.
20
(pg.
1785
-
94
)
93
Claros
MG
,
MitoProt, a Macintosh application for studying mitochondrial proteins
Comput Appl Biosci
,
1995
, vol.
11
(pg.
441
-
7
)
94
Hoglund
A
Donnes
P
Blum
T
et al.
,
MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition
Bioinformatics
,
2006
, vol.
22
(pg.
1158
-
65
)
95
Emanuelsson
O
Elofsson
A
von Heijne
G
et al.
,
In silico prediction of the peroxisomal proteome in fungi, plants and animals
J Mol Biol
,
2003
, vol.
330
(pg.
443
-
56
)
96
Small
I
Peeters
N
Legeai
F
et al.
,
Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences
Proteomics
,
2004
, vol.
4
(pg.
1581
-
90
)
97
Chen
H
Huang
N
Sun
Z
,
SubLoc: a server/client suite for protein subcellular location based on SOAP
Bioinformatics
,
2006
, vol.
22
(pg.
376
-
7
)
98
Emanuelsson
O
Brunak
S
von Heijne
G
et al.
,
Locating proteins in the cell using TargetP, SignalP and related tools
Nat Protoc
,
2007
, vol.
2
(pg.
953
-
71
)
99
Horton
P
Park
KJ
Obayashi
T
et al.
,
WoLF PSORT: protein localization predictor
Nucleic Acids Res
,
2007
, vol.
35
(pg.
W585
-
7
)
100
Nielsen
H
Brunak
S
von Heijne
G
,
Machine learning approaches for the prediction of signal peptides and other protein sorting signals
Protein Eng
,
1999
, vol.
12
(pg.
3
-
9
)
101
Bender
A
van Dooren
GG
Ralph
SA
et al.
,
Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum
Mol Biochem Parasitol
,
2003
, vol.
132
(pg.
59
-
66
)
102
Schneider
G
Fechner
U
,
Advances in the prediction of protein targeting signals
Proteomics
,
2004
, vol.
4
(pg.
1571
-
80
)
103
Kaundal
R
Saini
R
Zhao
PX
,
Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis
Plant Physiol
,
2010
, vol.
154
(pg.
36
-
54
)
104
Kaundal
R
Raghava
GP
,
RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information
Proteomics
,
2009
, vol.
9
(pg.
2324
-
42
)
105
Thimm
O
Blasing
O
Gibon
Y
et al.
,
MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes
Plant J
,
2004
, vol.
37
(pg.
914
-
39
)
106
Sun
Q
Zybailov
B
Majeran
W
et al.
,
PPDB, the Plant Proteomics Database at Cornell
Nucleic Acids Res
,
2009
, vol.
37
(pg.
D969
-
74
)
107
Heazlewood
JL
Tonti-Filippini
JS
Gout
AM
et al.
,
Experimental analysis of the Arabidopsis mitochondrial proteome highlights signaling and regulatory components, provides assessment of targeting prediction programs, and indicates plant-specific mitochondrial proteins
Plant Cell
,
2004
, vol.
16
(pg.
241
-
56
)
108
Brown
JW
Shaw
PJ
Shaw
P
et al.
,
Arabidopsis nucleolar protein database (AtNoPDB)
Nucleic Acids Res
,
2005
, vol.
33
(pg.
D633
-
6
)
109
Kleffmann
T
Hirsch-Hoffmann
M
Gruissem
W
et al.
,
plprot: a comprehensive proteome database for different plastid types
Plant Cell Physiol
,
2006
, vol.
47
(pg.
432
-
6
)
110
Chang
A
Scheer
M
Grote
A
et al.
,
BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in (2009)
Nucleic Acids Res
,
2009
, vol.
37
(pg.
D588
-
92
)
111
Degtyarenko
K
de Matos
P
Ennis
M
et al.
,
ChEBI: a database and ontology for chemical entities of biological interest
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D344
-
50
)
112
Wang
Y
Xiao
J
Suzek
TO
et al.
,
PubChem: a public information system for analyzing bioactivities of small molecules
Nucleic Acids Res
,
2009
, vol.
37
(pg.
W623
-
33
)
113
Sitzmann
M
Filippov
IV
Nicklaus
MC
,
Internet resources integrating many small-molecule databases
SAR QSAR Environ Res
,
2008
, vol.
19
(pg.
1
-
9
)
114
Van Hemert
JL
Dickerson
JA
,
PathwayAccess: CellDesigner plugins for pathway databases
Bioinformatics
,
2010
, vol.
26
(pg.
2345
-
6
)
115
Grimplet
J
Cramer
GR
Dickerson
JA
et al.
,
VitisNet: “Omics” Integration through Grapevine Molecular Networks
PLoS ONE
,
2009
, vol.
4
pg.
e8365
116
Grafahrend-Belau
E
Weise
S
Koschutzki
D
et al.
,
MetaCrop: a detailed database of crop plant metabolism
Nucleic Acids Res
,
2008
, vol.
36
(pg.
D954
-
8
)
117
Lemer
C
Antezana
E
Couche
F
et al.
,
The aMAZE LightBench: a web interface to a relational database of cellular processes
Nucleic Acids Res
,
2004
, vol.
32
(pg.
D443
-
8
)
118
van Helden
J
Naim
A
Lemer
C
et al.
,
From molecular activities and processes to biological function
Brief Bioinform
,
2001
, vol.
2
(pg.
81
-
93
)
119
D'Souza
M
Glass
EM
Syed
MH
et al.
,
Sentra: a database of signal transduction proteins for comparative genome analysis
Nucleic Acids Res
,
2007
, vol.
35
(pg.
D271
-
3
)
120
Selkov
E
Basmanova
S
Gaasterland
T
et al.
,
The metabolic pathway collection from EMP: the enzymes and metabolic pathways database
Nucleic Acids Res
,
1996
, vol.
24
(pg.
26
-
8
)
121
Philippi
S
Kohler
J
,
Addressing the problems with life-science databases for traditional uses and systems biology
Nat Rev Genet
,
2006
, vol.
7
(pg.
482
-
8
)
122
Chandras
C
Weaver
T
Zouberakis
M
et al.
,
Models for financial sustainability of biological databases and resources
Database
,
2009
, vol.
106
(pg.
17475
-
80
)
123
Bastow
R
Leonelli
S
,
Sustainable digital infrastructure
EMBO Rep
,
2010
, vol.
11
(pg.
730
-
4
)

Supplementary data