Evolution and applications of plant pathway resources and databases

Overview of plant species specific metabolic pathway databases

Organism	Database	Version	Location
Arabidopsis thaliana	AraCyc	7.0.0.0	http://www.arabidopsis.org/biocyc/index.jsp
	AraCyc	6.0.0.0	http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonica	RiceCyc	3.0.0.0	http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolor	SorghumCyc	1.0.0.0	http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatula	MedicCyc	1.0.1.1	http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicum	LycoCyc	2.0.1.1	http://pathway.gramene.org/LYCO/class-tree?object=Pathways
	LycoCyc	2.0.0.0	http://solcyc.solgenomics.net/LYCO/server.html
Capsicum	CapCyc	1.0.1.1	http://pathway.gramene.org/CAP/class-tree?object=Pathways
	CapCyc	2.1.0.0	http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosum	PotatoCyc	1.0.1.1	http://pathway.gramene.org/POTATO/class-tree?object=Pathways
	PotatoCyc	1.1.0.0	http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephora	CoffeaCyc	1.1.1.0	http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
	CoffeaCyc	1.1.0.0	http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis vinifera	VitisNet		http://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpa	PoplarCyc	2.0.0.0	http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybrida	PetuniaCyc	2.1.1.0	http://solcyc.solgenomics.net/PET/server.html?
Solanum melongena	SolaCyc	1.2.0.0	http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacum	NicotianaCyc	1.1.0.0	http://solcyc.solgenomics.net/TOBACCO/server.html?

Organism	Database	Version	Location
Arabidopsis thaliana	AraCyc	7.0.0.0	http://www.arabidopsis.org/biocyc/index.jsp
	AraCyc	6.0.0.0	http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonica	RiceCyc	3.0.0.0	http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolor	SorghumCyc	1.0.0.0	http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatula	MedicCyc	1.0.1.1	http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicum	LycoCyc	2.0.1.1	http://pathway.gramene.org/LYCO/class-tree?object=Pathways
	LycoCyc	2.0.0.0	http://solcyc.solgenomics.net/LYCO/server.html
Capsicum	CapCyc	1.0.1.1	http://pathway.gramene.org/CAP/class-tree?object=Pathways
	CapCyc	2.1.0.0	http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosum	PotatoCyc	1.0.1.1	http://pathway.gramene.org/POTATO/class-tree?object=Pathways
	PotatoCyc	1.1.0.0	http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephora	CoffeaCyc	1.1.1.0	http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
	CoffeaCyc	1.1.0.0	http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis vinifera	VitisNet		http://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpa	PoplarCyc	2.0.0.0	http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybrida	PetuniaCyc	2.1.1.0	http://solcyc.solgenomics.net/PET/server.html?
Solanum melongena	SolaCyc	1.2.0.0	http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacum	NicotianaCyc	1.1.0.0	http://solcyc.solgenomics.net/TOBACCO/server.html?

Table 1:

Overview of plant species specific metabolic pathway databases

Organism	Database	Version	Location
Arabidopsis thaliana	AraCyc	7.0.0.0	http://www.arabidopsis.org/biocyc/index.jsp
	AraCyc	6.0.0.0	http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonica	RiceCyc	3.0.0.0	http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolor	SorghumCyc	1.0.0.0	http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatula	MedicCyc	1.0.1.1	http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicum	LycoCyc	2.0.1.1	http://pathway.gramene.org/LYCO/class-tree?object=Pathways
	LycoCyc	2.0.0.0	http://solcyc.solgenomics.net/LYCO/server.html
Capsicum	CapCyc	1.0.1.1	http://pathway.gramene.org/CAP/class-tree?object=Pathways
	CapCyc	2.1.0.0	http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosum	PotatoCyc	1.0.1.1	http://pathway.gramene.org/POTATO/class-tree?object=Pathways
	PotatoCyc	1.1.0.0	http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephora	CoffeaCyc	1.1.1.0	http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
	CoffeaCyc	1.1.0.0	http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis vinifera	VitisNet		http://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpa	PoplarCyc	2.0.0.0	http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybrida	PetuniaCyc	2.1.1.0	http://solcyc.solgenomics.net/PET/server.html?
Solanum melongena	SolaCyc	1.2.0.0	http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacum	NicotianaCyc	1.1.0.0	http://solcyc.solgenomics.net/TOBACCO/server.html?

Organism	Database	Version	Location
Arabidopsis thaliana	AraCyc	7.0.0.0	http://www.arabidopsis.org/biocyc/index.jsp
	AraCyc	6.0.0.0	http://pathway.gramene.org/ARA/class-tree?object=Pathways
Oryza sativa japonica	RiceCyc	3.0.0.0	http://pathway.gramene.org/RICE/class-tree?object=Pathways
Sorghum bicolor	SorghumCyc	1.0.0.0	http://pathway.gramene.org/SORGHUM/class-tree?object=Pathways
Medicago truncatula	MedicCyc	1.0.1.1	http://pathway.gramene.org/MEDIC/class-tree?object=Pathways
Solanum lycopersicum	LycoCyc	2.0.1.1	http://pathway.gramene.org/LYCO/class-tree?object=Pathways
	LycoCyc	2.0.0.0	http://solcyc.solgenomics.net/LYCO/server.html
Capsicum	CapCyc	1.0.1.1	http://pathway.gramene.org/CAP/class-tree?object=Pathways
	CapCyc	2.1.0.0	http://solcyc.solgenomics.net/CAP/server.html
Solanum tuberosum	PotatoCyc	1.0.1.1	http://pathway.gramene.org/POTATO/class-tree?object=Pathways
	PotatoCyc	1.1.0.0	http://solcyc.solgenomics.net/POTATO/organism-summary?object=POTATO
Coffea canephora	CoffeaCyc	1.1.1.0	http://pathway.gramene.org/COFFEA/class-tree?object=Pathways
	CoffeaCyc	1.1.0.0	http://solcyc.solgenomics.net/COFFEA/organism-summary?object=COFFEA
Vitis vinifera	VitisNet		http://www.sdstate.edu/aes/vitis/pathways.cfm
Populus trichocarpa	PoplarCyc	2.0.0.0	http://pmn.plantcyc.org/POPLAR/server.html?
Petunia x hybrida	PetuniaCyc	2.1.1.0	http://solcyc.solgenomics.net/PET/server.html?
Solanum melongena	SolaCyc	1.2.0.0	http://solcyc.solgenomics.net/SOLA/organism-summary?object=SOLA
Nicotiana tabacum	NicotianaCyc	1.1.0.0	http://solcyc.solgenomics.net/TOBACCO/server.html?

The pathways section of Gramene database [15] (a database for grasses such as rice, maize, sorghum, barley, oats, wheat and rye) contains the known and predicted biochemical pathways of rice (RiceCyc) and sorghum (SorghumCyc), both of which are curated by the Gramene database and were built using the Pathway Tools’ PathoLogic module. The website also mirrors the known and predicted biochemical pathways from SolCyc, AraCyc, EcoCyc and the MetaCyc reference databases.

The ‘golden standard’ AraCyc for A. thaliana was built using the Pathway Tools' PathoLogic module with MetaCyc. AraCyc, in addition, uses manual curation to enrich its data. The trade-off is slower progress in completing the network, yet the end result is highly documented and has a more accurate structure. One can argue that databases are of higher quality when domain experts scrutinize the available literature and manually curate them. They can add their scientific experience and intuition to find facts in a way that any algorithm is yet to mimic. However, this all depends on the availability of such experts and for genome-wide projects it is certainly challenging to gather all potentially involved.

The success of AraCyc has led to a broader plant-centric rather than organism-centric initiative, the Plant Metabolic Network (PMN) (available at http://www.plantcyc.org/). This is a collaborative project to build a broad network of plant metabolic pathway databases. PlantCyc, that incorporates some data from MetaCyc, is the central feature of PMN and is a database containing manually curated or reviewed information about shared metabolic pathways present in more than 300 plant species. PlantCyc serves as a reference database, while PMN also contains single species/taxon based databases. Additionally, PMN has a small number of pathways that are known to be present in other organisms and are predicted to exist in plants.

‘Gene regulatory networks’ consist of transcription factors and the genes that they regulate. These networks comprise of protein–DNA interactions and may also include sRNA/miRNA and sRNA/miRNA target gene regulation. A regulatory network is formed by a series of events where regulation of one gene leads to the control of another. An example of a regulatory network database is the Arabidopsis Gene Regulatory Information Server (AGRIS) [16] which contains information on the transcription factors and cis-regulatory elements that are regulated by them in A. thaliana. AGRIS presently consists of three databases: AtcisDB, AtTFDB and AtRegNet. AtcisDB contains upstream regions of annotated A. thaliana genes and describes the experimentally validated and predicted cis-regulatory elements. AtTFDB holds information on the transcription factors grouped into 50 conserved domain families. AtRegNet describes direct interactions between transcription factors and target genes. AGRIS also contains a Regulatory Networks Interaction Module (ReIN), that allows creation, visualization and identification of regulatory networks in A. thaliana. While AGRIS contains data from sequence annotations, TRANSFAC [17] is a gene regulatory network database that contains data on transcription factors, their experimentally proven binding sites and the genes they regulate in 300 species. TRANSFAC is one of the few proprietary plant database resources in PathGuide.

PlantCARE [18] is a database of plant cis-acting regulatory elements where the data on the transcription sites are extracted from literature supplemented with predicted data. PlantCARE provides levels of confidence for experimental evidence, functional information and position of the promoter. Additionally, a plant DNA query sequence can be searched for cis-regulatory elements using a query tool in PlantCARE.

PlantTFDB [19] is a recently constructed database that contains transcription factors from 49 plant species, grouped into 58 families. Each transcription factor is comprehensively annotated with respect to functional domains, 3D structures, gene ontology, gene expression information from expressed sequence tags (ESTs) and microarrays and annotations from other databases.

AthaMap [20] is a genome-wide map of published or experimentally determined transcription factor binding sites (TFBS) in A. thaliana. It also includes predicted sites. AthaMap allows searching for a genomic sequence or a gene to display the potential TFBS. It also provides search functionality for user defined potential co-localization elements. Genes of interest can be analyzed for identification of common TFBSs. Conversely, genes that harbor specific TFBS can also be identified using AthaMap.

Gene co-expression network databases for plants are under development. Such databases contain information on co-expression of genes after examining a large number of experimental conditions. These can be used for identification of genes involved in a certain function, identification of cis-regulatory elements, construction of regulatory networks (although co-expression does not necessarily mean co-regulation [21]) and assist in many other biological problems. Some examples of gene co-expression networks and their applications are discussed in the Supplementary Data.

‘Protein–protein interaction pathways’ contain all interactions, stable or transient, between same or different proteins that are important for the functioning of a cell. Protein–protein interactions take place during protein modification, protein transport, protein oligomerization for activity/non-activity, chaperone assisted protein folding, signal transduction, etc. Protein–protein interaction pathways contain information on all these interactions. The A. thaliana protein interactome database (AtPID) is one such database [22]. It contains protein interaction pairs found through manual text mining or in silico predictions using various bioinformatics methods, along with protein pairs that have been confirmed.

It is now recognized that the experiments required to generate protein interaction data (e.g. yeast-two-hybrid systems) often give false positives as well as false negatives and hence it is important to use this type of data with caution. To discern whether a certain result is reliable, one needs to know the type of experiment and the conditions used, as well as details about the results. A rational assessment as to whether an interaction is truly possible in vivo can be made based on a variety of factors, including the domains involved in interaction and the type of interaction. The IntAct database [23], which contains protein–protein interaction information on several organisms including plant systems, includes such high level details.

Another database, the Predicted Arabidopsis Interactome Resource (PAIR)[24], predicts the potential interactions in A. thaliana using a support vector machine (SVM) model (a machine learning approach) and careful preparation of example data, selection of indirect evidence and a tight control of false positives. We believe that the PAIR database is currently the most accurate and comprehensive database on A. thaliana protein–protein interactions.

Combining interaction data generated through experimental and predictive methods increases the coverage of an interactome and can lead to more reliable information. When the same data is obtained through different methods one can reasonably expect more accurate data. STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) [25] is a multi-organism (not limited to the kingdom Plantae) database that includes all available protein–protein interactions. It scores and weighs this information and augments it with predicted interactions and automated text-mining results. STRING includes both physical and functional information on the interactions. This adds an extra measure of reliability to the interaction data.

‘Signaling pathways’ comprise of molecular networks in the signal transduction cascade. These are involved in transmission of information from one part of the cell to another (intracellular, e.g. from the cytoplasm to the nucleus) or from one cell to another (intercellular, e.g. from one neuron to another). Extracellular stimuli can also bring about the activation or inhibition of a pathway and thus a change in the cellular environment. Signaling pathways often involve protein–protein interactions at different levels like protein modification (e.g. protein phosphorylation), protein translocation and protein complex formation or dissociation. Several signaling pathway databases, for example SPIKE [26], exist for non-plant eukaryotes. INOH (hosted at http://www.inoh.org/) is a signaling pathway database for Drosophila melanogaster. SignaLink (hosted at http://signalink.org/) is a cross-species database that includes pathways from human, D. melanogaster and Caenorhabditis elegans. In contrast, few plant signaling pathway databases exist and they lack the quality and efficiency in comparison to their non-plant counterparts. The DRASTIC [27] database resource for analysis of signal transduction in cells developed by the Scottish Crop Research Institute (SCRI) was one of the first relational databases in this area. It included ESTs and regulated genes in response to various environmental factors like pathogens, chemical exposure, drought, salt and low temperature. The data was collected from refereed journals. However, this reference resource is no longer available.

Recently, a database containing the stress response transcription factor database, STIFDB [28], has been created for A. thaliana. It contains the abiotic stress response genes that were found upregulated in microarray experiments, with options to identify possible transcription factor binding sites. PathoPlant [29, 30] is another relational database that contains components of signal transduction pathways related to plant pathogenesis. It also contains microarray data of genes expressed in response to pathogens.

There is a glaring need for plant signaling pathway databases that contain and regularly update all proven and potential/putative signaling pathways in plants as these are discovered. MAPK signaling cascades were discovered >15 years ago in plants [31]. Analogues of pathways that were only known in animals are now being found as well. For example, glutamate receptors (iGluRs) that are involved in excitatory neurotransmission pathways have been extensively studied in the animal kingdom and have been included in several pathway databases. Glutamate receptor-like proteins (GLRs) were reported in 1998 in A. thaliana [32]. Since then these proteins in A. thaliana and other plants have been suggested to be involved in a wide array of pathways, through transgenic plant studies or pharmacological studies. Suggested functions include Ca²⁺ allocation [33], carbon/nitrogen sensing [34], regulation of abscisic acid and water balance [35], coordinating mitosis in root apical meristem [36], light signal transduction [37] and resistance to fungal infections [38]. Both MAPKs and glutamate-like receptors from A. thaliana are included in a few plant pathway databases like AtPID. However, it is difficult for a biologist looking for pathways involved in resistance to fungal infections, for example, to come immediately across the glutamate receptor-like system or conversely to find all the plant pathways that glutamate receptor like-proteins are involved in by using a keyword. Such databases would be essential to ‘de-specialize’ information and make it available to a wider range of scientists. This also highlights the need for such databases to be freely available to allow biologists irrespective of the system/field that they work with (plant, animal, microbial and so on) with an interest in a particular pathway to retrieve all the relevant information available.

Signaling pathway mechanisms like sugar signaling [39], light signaling [40], jasmonate signaling [41] and their components have been discovered in plants and call for dedicated pathway databases. Looking at the signaling pathways and the properties that these affect in plants, it can be concluded that these pathways cross-connect. It is important to understand these pathways and to integrate this information with other databases in order to obtain a more complete picture which would then enable plant scientists to modulate certain plant properties without affecting other mechanisms and pathways.

Pathway visualization tools

Visualization of pathway data is important not only to understand the data, but also to analyze and to build valid hypotheses based on these data. To address these requirements, many pathway/network visualization tools have been constructed with different functionalities. The level of visualization that these tools offer range from simple two-dimensional pathway maps like those provide by KEGG, to three-dimensional and hierarchical visualizations in immersive virtual reality (C6) environments like those provided by MetNetGE [42]. Interactive visualization allows users to analyze, edit and modify the pathways based on their own experimental data, as is provided by GenMAPP [43]. Gehlenborg et al. [44] in their recent review have thoroughly reviewed available pathway visualization tools and have broadly divided these tools into two partly overlapping categories—tools focused on automated methods for interpreting and exploring large biological networks and tools focused on assembly and curation of pathways. Many of these tools integrate with public databases, allowing the users to analyze and visualize their own data. Another exhaustive overview of visualization tools has been presented by Suderman and Hallett [45]. For a critical evaluation of the requirements for biological visualization tools based on interviews conducted to understand the needs for pathway analysis, see ref. [46].

Pathway database evolution through integration

An individual pathway database holds a variety of information. This has proved to be challenging for scientists who want to access and use this information. Information is scattered across various databases that differ not only in the type of data they contain, but also the form in which they exist. Additionally, in an actual living cell, the pathways are vastly interconnected. Integration of pathway databases thus becomes imperative in order to understand a biological mechanism in its entirety. Researchers interested in a particular biological mechanism should be able to easily find and access all the data they need, without having to go through the difficult process of shifting data from different databases that are based on different platforms.

One of the biggest challenges to the integration of databases is their diversity. The existing databases have syntactic differences in the form of data file formats and retrieval methods and semantic differences in the terminologies and data models [47]. Several pathway database resources listed in Pathguide are not machine-readable. Machine-readability is an essential requirement for automatic data retrieval and processing. Recognition of these challenges has demanded increased efforts to establish pathway ontology standards for defining models. Systems Biology Markup Language (SBML) has presented itself as one such standard for storing and sharing of computational models of biological networks [48]. Another, named BioPAX [49] was developed for detailed pathway depiction and for permitting data exchange as used in the development of MetNet [50]. PSI-MI [51] allows data exchange for protein–protein interactions, while CellML [52] enables storage and exchange of computer based mathematical models. Other data exchange formats exist that are peripherally associated with network-data and can certainly serve as input for other software packages that determine such networks. The Chemical Markup Language (CML) can be used to describe small molecules and ligands that participate in networks [53], whereas the Protein Markup Language (ProML), along with its predecessor PDB, can be used to characterize larger binding-partners [54]. The Microarray Gene Expression Markup Language (MAGE-ML) can be used as input to determine gene co-expression networks under various conditions [55]. The Ondex eXchange Language (OXL) format claims superiority over a range of formats [56], but is more general and requires more coding to implement correctly. Finally, an Application Programming Interface (API) can be provided [57], but then each API requires some study of its peculiarities (as it applies to only one particular database) as well.

Providing an easy-to-use interface for end-users is challenging with formats that allow too many options. All standards are now being used by at least some pathway databases and are certainly steps in the right direction. While laudable efforts in their own right, the proliferation of different data formats creates its own problems: providers need to decide which formats to support and each format represents a laborious and resource-intensive effort. Therefore, many times data formats still need to be converted from one format into another [58].

Ongoing efforts to automate data access and retrieval make the process much simpler for a biologist. KEGG [59] is a comprehensive resource for metabolic pathways and contained data that were originally curated manually from literature and the pathways existed as simple drawings. All pathway maps in KEGG have been redrawn, using KegSketch. The resulting KGML+ files [60] are machine readable and editable.

Plant pathway database integration is a challenge as far fewer plant genomes have been sequenced compared to other life forms (which makes it more difficult to base inferences on homology) and the data resources on plant pathways are more dispersed [61]. The uniqueness of secondary metabolism that exists in many systems adds another layer of complexity. It is, therefore, even more important for plant pathway databases to start incorporating and supporting already existing standard formats for better integration of information and knowledge extraction. The positive side of having a limited number of plant pathway databases is that standardization needs to be applied to a smaller number of pathways. This entails less work than what would be required in other settings.

Supplementary Table S1 shows plant database resources available to date with a short description and other information like the availability of these databases, included organisms, whether the database is included in Pathguide, access to the database, data sources and standard formats supported (if any). As can be seen from Figures 1 and 2 and Table 1 and Supplementary Table S1, plant databases are still far from being overwhelmed with information and diversity load. This makes their standardization and implementation efforts much more realistic than for other systems. Furthermore, this in itself can pave the way for other systems to follow suit by learning from the successes and challenges of plant pathway database integration projects. It would therefore be a tremendously useful exercise for all upcoming plant pathway databases to start following universal standardization right from their conception. Perhaps journals should only accept the publication of databases that conform to—what we term as—Good Databasing Practice (GDbP) standards (Table 2), thereby forcing these to become standard practice. Such practices have already been incorporated for microarray and sequencing results.

Table 2:

Overview of Good Databasing Practices

Good databasing practice	Usefulness
Easy user access	Easy access for even the non-specialists
Integrated visualization tools	Ease understanding and analysis of large data sets
Standard ontology	Ease of data exchange
Possibility to integrate data from other databases	Expansion of available information
Proper documentation of stored data; provision of source and reliability of original data	Possibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another species	Using particular data with caution when inferring a pathway or an ortholog
Good user support	Good response time to user queries
Regular update/maintenance	Update information; removal of errors, bugs
Regular and professional data curation and annotation	Manual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event

Good databasing practice	Usefulness
Easy user access	Easy access for even the non-specialists
Integrated visualization tools	Ease understanding and analysis of large data sets
Standard ontology	Ease of data exchange
Possibility to integrate data from other databases	Expansion of available information
Proper documentation of stored data; provision of source and reliability of original data	Possibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another species	Using particular data with caution when inferring a pathway or an ortholog
Good user support	Good response time to user queries
Regular update/maintenance	Update information; removal of errors, bugs
Regular and professional data curation and annotation	Manual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event

Table 2:

Overview of Good Databasing Practices

Good databasing practice	Usefulness
Easy user access	Easy access for even the non-specialists
Integrated visualization tools	Ease understanding and analysis of large data sets
Standard ontology	Ease of data exchange
Possibility to integrate data from other databases	Expansion of available information
Proper documentation of stored data; provision of source and reliability of original data	Possibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another species	Using particular data with caution when inferring a pathway or an ortholog
Good user support	Good response time to user queries
Regular update/maintenance	Update information; removal of errors, bugs
Regular and professional data curation and annotation	Manual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event

Good databasing practice	Usefulness
Easy user access	Easy access for even the non-specialists
Integrated visualization tools	Ease understanding and analysis of large data sets
Standard ontology	Ease of data exchange
Possibility to integrate data from other databases	Expansion of available information
Proper documentation of stored data; provision of source and reliability of original data	Possibility to get back to the original source if required, enable judgment of accuracy of inferred information
Provision of risk factors and probability of error propagation when deriving orthologs in another species	Using particular data with caution when inferring a pathway or an ortholog
Good user support	Good response time to user queries
Regular update/maintenance	Update information; removal of errors, bugs
Regular and professional data curation and annotation	Manual curation of the data/annotations to remove errors generated by automatic data retrieval; annotation—both derived from source and inferred—help describing an entity or an event

Applications of pathway database integration

Pathway database integration yields many potential advantages for the biologist and software developer alike. If successful, numerous applications will follow, many of which will be surprising or even unthinkable today. To better appreciate the potential of integration, a few case studies from other fields are presented.

One study [62] integrated data from three metabolic pathways—fatty acid synthesis genes from Arabidopsis Lipid Gene Database [63] (http://lipids.plantbiology.msu.edu/), starch metabolism genes from Starch Metabolism Network project (http://www.starchmetnet.org/) and the original references for leucine catabolism—with transcriptomics data, leading to a picture that no individual study was able to show by itself. The integration revealed that each of these pathways is structured as a co-expressed module with the possibility that these modules exist in a hierarchical organization. The transcripts from each module co-accumulate over a wide range of environmental and genetic perturbations and developmental stages.

In another case study [61], A. thaliana pathways from protein interaction databases were integrated with co-expression data using the Ondex system (http://www.ondex.org/). This method enabled the determination of co-expression of the interacting protein partners and the levels of expression.

An interesting example of using database integration to obtain enhanced information about a system is AraGEM [64]. AraGEM is an attempt at building genome scale reconstruction of the primary metabolic network in A. thaliana. It used A. thaliana metabolic genome information from KEGG as a core enriched with information on the cellular compartmentalization of metabolic pathways from literature and, apart from others, databases like AraPerox [65] and Arabidopsis information resource TAIR [66]. A total of 75 essential primary metabolism reactions were identified for which genetic information was unknown. The resulting genome-scale model was then used to construct a metabolic flux model of plant metabolism representing both photosynthetic and non-photosynthetic cell types. The model was validated by simulation of plant metabolic functions inferred from literature. AraGEM exemplifies how genome-scale models can be first built and then used to explore highly complex and compartmentalized eukaryotic networks and to construct and examine testable, non-trivial hypotheses.

A thorough literature search on plant pathways and newly discovered mechanisms can enable design of new applications through database integration. In plants, for example, hormonal and defense signaling pathways have been found to cross-talk through identical components [67]. An integration of these two types of information can point towards new targets to counteract the microbial components that decrease plant resistance and lead to disease.

Additional examples of applications of database integration are presented in the supplementary material.

Non-plant references and opportunities for the future

Human databases have already benefitted from integration of information from different pathway databases. For example, a meta-analysis study of Type-2 diabetes was conducted to find different genes that are involved in the disease. Various types of data were used: medical reviews, phenotype information, proteome analysis results, candidate gene lists from previous studies, differential gene expression and time series microarray studies [68]. The study also incorporated information from several pathway databases including KEGG, Reactome [69], BioCyc [70], GO [71], IntAct and TRANSFAC to add pathway information and to derive cellular network information on these genes. This allowed identification of 213 genes with overall disease relevance indicating common, tissue-independent processes related to the disease and also identified genes showing changes with respect to a single study.

In another study [72], an integrated human interactome network was constructed using physical and direct binary protein–protein interactions. Data were retrieved from a variety of sources: Biomolecular Interaction Database (BIND), BioGRID, DIP, GeneRIG, IntAct, MINT and Reactome. All of these play a particular role in the integration scheme. BIND [73] contains data from large-scale cell mapping studies and molecular interactions in PDB. BioGRID [74] has protein and genetic interaction information as well as information from primary literature. DIP [75] contains experimentally determined protein–protein interactions. Gene reference into function (GeneRIF) [76] contains short text about curated articles that are relevant to known genes. IntAct contains highly curated interaction data from literature or direct deposition by experienced curators. MINT [77] focuses on experimentally verified protein–protein interactions and Reactome is a knowledgebase containing interaction data in different pathways. The Hepatitis C virus (HCV)-host infection network that was generated experimentally and from text mining was also incorporated on top of this integrated interactome network–—a type of meta-integration. This led to the identification of previously unknown, novel functional pathways of HCV biology and its pathogenesis. One could extrapolate the advantages of a similar approach followed for crop plant systems and pathogens that could then divulge information on plant host–pathogen interactions and the pathways involved in pathogenesis. This could lead to development of methods to bestow pathogen resistance on crop plants or target these pathways against the pathogen.

Not only can plant science benefit from the animal pathway database and integration examples, animal biologists can in turn benefit from the study of plant pathways by asking the question whether pathways discovered only in plants to date also exist in animals or how similar or different are the pathway networks that exist both in plants and animals. Many opportunities become available through such a feedback loop: can we unlock more evolutionary secrets? Can we become better at harnessing plants for our use or could human diseases be experimentally modeled in plants if common pathways do indeed exist for plants and animals? Applications are endless and the potential for knowledge creation extreme.

A survey of integrated pathway databases and tools

Two approaches exist to perform database integration: through the use of tools and through already integrated databases [78] (that hopefully get rebuilt periodically to stay current). Pathway database integration tools along with integrated pathway databases play a very important role in easing data integration for biologists. These tools can also be used for various other purposes like data visualization, pathway prediction, pathway gap-fillers and biological network analysis. Applications of pathway databases and tools help further knowledge of the pathways and on the inner workings of living systems.

Pathway database tools for plant systems are important because of the widely dispersed information within several databases and a lack of consistency among these databases. A growing need exists to bring this information together in a standard format to aid access and model-building. Plants show more heterogeneity among different species (e.g. in terms of secondary metabolism [79]). This makes it even more important to integrate pathway data for all important plant species and to design tools that would aid in pointing out interspecies similarities and differences.

A separate version of Reactome, Arabidopsis Reactome, [80] represents a knowledgebase of biological processes in A. thaliana and several other plant species. It integrates pathway information curated in-house, as well as from KEGG and AraCyc. It also provides a platform to navigate and discover interconnected pathways in A. thaliana. The data model of Arabidopsis Reactome uses reactions and their interconnections; it treats protein modifications, proteins localized in different compartments, as well as protein complexes, as entities on their own. It furthermore allows generalization of protein isoforms, paralogues and splice variants with a possibility of tracing these components back. The model contains both real and inferred data along with proper annotations that allow distinction between the two.

Tools like CORNET [81] help integrate A. thaliana related microarray expression data. The data sets for CORNET were obtained from Gene Expression Omnibus (GEO) [82] and from experiments carried out on Affymetrix ATH1 arrays. Also retrieved were the corresponding meta-data (which is unstructured and hence cumbersome to retrieve and parse automatically), including information about sample tissues, treatments and sampling time points, protein interaction data, localization data and functional information. The meta-data have manually assigned ontology terms using Plant ontology [83–85], the Microarray gene expression data (MGED) ontology (MO) [86] and the Plant environmental ontology (EO) (www.gramene.org/plant_ontology/index.html#eo). Protein–protein interactions were obtained from BIND, IntAct, BioGRID, DIP, MINT, TAIR. Predicted PPIs were obtained from the BAR Arabidopsis interaction viewer [87] and AtPID. Information was also obtained from their own study [88]. Localization data were obtained from SUBA [89], iPSORT [90], LOCtree [91], MITOPRED [92], MitoProt [93], MultiLoc [94], PeroxiP [95], Predotar [96], SubLoc [97], TargetP [98] and WoLF_PSORT [99]—Table S2 provides a short description of these resources. CORNET includes all available data along with related meta-data. The tool then provides a reliability score for each result based on the search options, parameters and thresholds used (supplied by the user). A visualization tool additionally allows the users to distinguish more reliable predictions from less predictable ones.

CORNET aims to provide functional context to genes and conversely, to provide an ability to predict functions of genes that have unknown functions. It is a tool that could also, in the future, use the information on A. thaliana to extrapolate networks in other plant species.

Many pathway resources use only the general localization predictors. In contrast, CORNET has made an attempt to also use species-specific localization information. Thus, CORNET uses localization data from both ‘general’ localization predictors and from an A. thaliana specific localization database SUBA, which was the only species-specific resource available then. SUBA contains data retrieved from literature, experiments and from prediction tools. It has become clearer over time that use of organism-specific predictors and multiple (general) predictors are likely to lead to more accurate predicted localization [100–103]. Predictions from general predictors may not be suitable for predicting localization of an individual organism as these prediction tools are trained on proteins from a variety of organisms (and can suffer from sampling bias). Localization data from any single predictor needs to be treated with caution keeping in mind that inclusion of false positives into the integrated databases would result in amplification of the wrong information. Fortunately for plants, some organism-specific localization predictors have recently become available, e.g. AtSubP (Arabidopsis)[103] and RSLpred (rice) [104]. These should be used while integrating pathway information for the respective species. If a tool similar to CORNET is developed for rice, RSLpred would definitely be an important resource for protein localization data. A need for localization predictors specific to a variety of plants cannot be emphasized enough for a more reliable extrapolation of networks.

The ‘MetNet’ platform contains both metabolic and regulatory networks of A. thaliana, soybean [50] and grapevine. It is an attempt to integrate metabolic data from AraCyc and regulatory data from AGRIS, with additional manually curated signal transduction pathways (in A. thaliana). The pathway information is integrated with other resources like TAIR, GO-classifications (retrieved through TAIR) and MapMan [105] that supply gene related information. Protein information is obtained from PPDB [106], AMPDB [107], AtNoPDB [108], AraPerox, PLprot [109], SUBA and BRENDA [110]. These also provide the subcellular localization information for the entities. Metabolite data from ChEBI [111], PubChem [112], KEGG, NCI [113] are also integrated into the database. As there are large holes in the information on the function of a large number of genes in A. thaliana, MetNet is aimed at formulating testable hypotheses. MetNet supports various types of users and data retrieval methods. MetNet Online (available at http://metnetonline.org/) is an online interface to MetNet. MetNetAPI is an Application Programming Interface to the platform that facilitates automated data retrieval [57] and a plug-in exists for the CellDesigner environment [114].

‘VitisNet’ [115] is a web-based tool for grapevine (Vitis vinifera) that integrates metabolomic, proteomic and transcriptomic pathway information within molecular networks like metabolic or signaling networks and presents a molecular network model. VitisNet allows visualization of genes and biochemical pathways involved in growth, fruiting cycles and environmental stress response. Data from VitisNet is now also available in MetNet.

‘Metacrop’ [116] contains manually curated metabolic pathway information in crop plants (with special emphasis on seeds and tubers), along with a wide variety of other factors like reactions, location, transport processes, kinetics, taxonomy and literature. MetaCrop has an easy to use web interface and allows automatic export of information for creation of metabolic models.

Pathway database maintenance—an easily overlooked detail

Although Pathguide lists more than 300 pathway resources, at least 30 of these databases and resources are no longer functional. At the time of writing this review (October 2010), inaccessible databases ‘not’ marked as non-functional in Pathguide include aMAZE [117,118], Sentra [119] and EMP [120] among others. Other databases may change location. During the preparation of this article, this happened with AtPID. The publication on AtPID is now destined to refer to an incorrect URL. Several of these databases contained high quality data and unavailability of databases is a loss from several angles. For example, aMAZE boasted an excellent data model. It could deal with metabolic, protein–protein interaction, gene regulation, sub-cellular localization, signal transduction and transport and thus had the capacity to integrate a large variety of data. Its current absence is a significant loss to the scientific community at large. While papers do exist for many of these projects, the technical details of an implementation can often only be obtained through communication with the implementing team. This effectively means that if anyone else ventures to do the same elsewhere in the world, they will have to retrace the time and steps to achieve the quality of aMAZE. Similarly, Arabidopsis Reactome is another dedicated database on A. thaliana, which is currently no longer being developed as the continuation of this project requires new funding initiatives.

Due to their ever expanding and evolving nature, pathway databases (like any other scientific database) need to be maintained, curated and developed on a long-term basis. Finding financial support for long-term maintenance of pathway databases is a challenging task. One possibility is to raise funds by establishing license purchase requirements for the use of databases, but this restricts open access to the information contained therein and can thus hinder the development of the field [121]. In addition, this is unfeasible for smaller projects that attract limited attention, but may be useful as part of integration efforts. Solutions are needed to ensure provision of continued funding for especially promising databases (without promoting an uncontrolled proliferation of new platforms) and avoid the loss of valuable information in established resources. Loss of such databases is not only a loss to the scientific community, but also is a waste of resources that have been spent on the creation and development of an excellent database in the first place. Funding agencies could, for example, provide continued funding to the database projects that they have already funded provided that the projects follow the GDbP standards which are continually and rigorously monitored and reported by an independent workgroup. Another solution could be an integration of especially promising databases into more permanent structures such as Gramene or NCBI.

The Arabidopsis Information Resource (TAIR) funding can serve as a recent example of search for alternative funding sources. NSF funding for TAIR would phase out over the next 3 years (http://www.nature.com/news/2009/091118/full/462258b.html). For its continued maintenance, TAIR has recently come up with a corporate sponsorship program. The idea is to avoid subscription requirements for the corporate sector and thus keep the resource open and free of login requirements, thereby allowing continued open access to the data for all scientists. TAIR has already secured several corporate sponsors through this program. Such programs would certainly help survival of at least some databases. However, this is not a real alternative to public funding as such a solution could end up introducing a corporate bias into the system—only those database would survive that are able to find corporate sponsorship. Various funding models for these community resources (that are not necessarily research-projects in their own right) have recently received more attention [122, 123]. These could be applied for plant pathway database integration and maintenance. Funding a community resource requires a different approach compared to more conventional research projects. Various scenarios for databases need to be discussed and changed, a recommendation also posited by Bastow and Leonelli [123].

CONCLUSION

Pathway databases play an important role in advancing our knowledge of the biological functions and mechanisms. Increased understanding of living systems as a whole can, in turn, aid successful application design in silico, in vitro and in vivo.

Plants are important as veritable food, drug and fuel sources, as well as bioremediation and biotechnological tools. This provides a strong incentive to create better, more integrated and easily accessible plant pathway databases. Such efforts would lead to discovery and elucidation of the yet unknown components involved in various pathways and their function. This would also result in the creation of testable models that can further enrich the knowledge on plant systems. This then could lead to the design of more specialized intervention technologies along with potential commercial applications: innovation as a result of integration.

SUPPLEMENTARY DATA

Supplementary data are available online at http://bib.oxfordjournals.org/.

Key Points

Considering the importance of plants in human life and in general all life on Earth, plant pathway databases (as well as supporting their continued existence after creation) deserve more attention than they have thus far received.
Plant pathway databases, being fewer in number, are more amenable to format standardization and data integration. Upcoming plant pathway databases should strive to follow standard formats and aim at database integration from the start. This can be facilitated through the formulation of Good Database Practice (GDbP).
Plant pathway database integration could help expand both basic and applied aspects of information utilization.
It is important that pathway databases/tools/resources be regularly curated and maintained. This demands improved strategies to provide continual financial support. The loss of a good database is expensive in terms of the years of hard work required to make a good database/tool/resource in the first place.

Acknowledgements

The authors are thankful to Prof. Eve Syrkin Wurtele and Dr Julie Dickerson at Iowa State University, who supported us in a number of ways with encouragement, sound advice, guidance and lots of good ideas. The authors thank the anonymous reviewers for their constructive criticisms and suggestions which have helped improve this article.

FUNDING

This material is based upon work supported by the National Science Foundation under Awards EEC-0813570 and MCB-0951170.

References

1

Bader

GD

Cary

MP

Sander

C

,

Pathguide: a pathway resource list

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

D504

-

6

)

2

Korc

M

,

Pathways for aberrant angiogenesis in pancreatic cancer

,

Mol Cancer

,

2003

, vol.

2

pg.

8

3

Tsiantis

M

Brown

MI

Skibinski

G

et al. ,

Disruption of auxin transport is associated with aberrant leaf development in maize

,

Plant Physiol

,

1999

, vol.

121

(pg.

1163

-

8

)

4

Kelley

BP

Sharan

R

Karp

RM

et al. ,

Conserved pathways within bacteria and yeast as revealed by global protein network alignment

,

Proc Natl Acad Sci USA

,

2003

, vol.

100

(pg.

11394

-

9

)

5

Galperin

MY

Koonin

EV

,

Who's your neighbor? New computational approaches for functional genomics

,

Nat Biotechnol

,

2000

, vol.

18

(pg.

609

-

13

)

6

Bumgarner

RE

Yeung

KY

,

Methods for the inference of biological pathways and networks

,

Methods Mol Biol

,

2009

, vol.

541

(pg.

225

-

45

)

7

Pradines

J

Rudolph-Owen

L

Hunter

J

et al. ,

Detection of activity centers in cellular pathways using transcript profiling

,

J Biopharm Stat

,

2004

, vol.

14

(pg.

701

-

21

)

8

Apic

G

Ignjatovic

T

Boyer

S

et al. ,

Illuminating drug discovery with biological pathways

,

FEBS Lett

,

2005

, vol.

579

(pg.

1872

-

7

)

9

Schreiber

SL

,

Target-oriented and diversity-oriented organic synthesis in drug discovery

,

Science

,

2000

, vol.

287

(pg.

1964

-

9

)

10

Dixon

N

Duncan

JN

Geerlings

T

et al. ,

Reengineering orthogonally selective riboswitches

,

Proc Natl Acad Sci USA

,

2010

, vol.

107

(pg.

2830

-

5

)

11

Mueller

LA

Zhang

P

Rhee

SY

,

AraCyc: a biochemical pathway database for Arabidopsis

,

Plant Physiol

,

2003

, vol.

132

(pg.

453

-

60

)

12

Huang

YJ

Hang

D

Lu

LJ

et al. ,

Targeting the human cancer pathway protein interaction network by structural genomics

,

Mol Cell Proteomics

,

2008

, vol.

7

(pg.

2048

-

60

)

13

Lynn

DJ

Winsor

GL

Chan

C

et al. ,

InnateDB: facilitating systems-level analyses of the mammalian innate immune response

,

Mol Syst Biol

,

2008

, vol.

4

pg.

218

14

Caspi

R

Foerster

H

Fulcher

CA

et al. ,

The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D623

-

31

)

15

Liang

C

Jaiswal

P

Hebbard

C

et al. ,

Gramene: a growing plant comparative genomics resource

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D947

-

53

)

16

Davuluri

RV

Sun

H

Palaniswamy

SK

et al. ,

AGRIS: Arabidopsis gene regulatory information server, an information resource of Arabidopsis cis-regulatory elements and transcription factors

,

BMC Bioinformatics

,

2003

, vol.

4

pg.

25

17

Matys

V

Fricke

E

Geffers

R

et al. ,

TRANSFAC: transcriptional regulation, from patterns to profiles

,

Nucleic Acids Res

,

2003

, vol.

31

(pg.

374

-

8

)

18

Lescot

M

Dehais

P

Thijs

G

et al. ,

PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences

,

Nucleic Acids Res

,

2002

, vol.

30

(pg.

325

-

7

)

19

Guo

AY

Chen

X

Gao

G

et al. ,

PlantTFDB: a comprehensive plant transcription factor database

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D966

-

9

)

20

Bulow

L

Steffens

NO

Galuschka

C

et al. ,

AthaMap: from in silico data to real transcription factor binding sites

,

In Silico Biol

,

2006

, vol.

6

(pg.

243

-

52

)

21

Stuart

JM

Segal

E

Koller

D

et al. ,

A gene-coexpression network for global discovery of conserved genetic modules

,

Science

,

2003

, vol.

302

(pg.

249

-

55

)

22

Cui

J

Li

P

Li

G

et al. ,

AtPID: Arabidopsis thaliana protein interactome database--an integrative platform for plant systems biology

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D999

-

1008

)

23

Aranda

B

Achuthan

P

Alam-Faruque

Y

et al. ,

The IntAct molecular interaction database in (2010)

,

Nucleic Acids Res

,

2010

, vol.

38

(pg.

D525

-

31

)

24

Lin

M

Shen

X

Chen

X

,

PAIR: the predicted Arabidopsis interactome resource

,

Nucleic Acids Res

,

2011

[Epub ahead of print 2010]

25

Jensen

LJ

Kuhn

M

Stark

M

et al. ,

STRING 8--a global view on proteins and their functional interactions in 630 organisms

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D412

-

6

)

26

Elkon

R

Vesterman

R

Amit

N

et al. ,

SPIKE--a database, visualization and analysis tool of cellular signaling pathways

,

BMC Bioinformatics

,

2008

, vol.

9

pg.

110

27

Button

DK

Gartland

KM

Ball

LD

et al. ,

DRASTIC--INSIGHTS: querying information in a plant gene expression database

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

D712

-

6

)

28

Shameer

K

Ambika

S

Varghese

SM

et al. ,

STIFDB-Arabidopsis Stress Responsive Transcription Factor DataBase

,

Int J Plant Genomics

,

2009

, vol.

2009

pg.

583429

29

Bulow

L

Schindler

M

Choi

C

et al. ,

PathoPlant: a database on plant-pathogen interactions

,

In Silico Biol

,

2004

, vol.

4

(pg.

529

-

36

)

30

Bulow

L

Schindler

M

Hehl

R

,

PathoPlant: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

D841

-

5

)

31

Rodriguez

MC

Petersen

M

Mundy

J

,

Mitogen-activated protein kinase signaling in plants

,

Annu Rev Plant Biol

,

2010

, vol.

61

(pg.

621

-

49

)

32

Lam

HM

Chiu

J

Hsieh

MH

et al. ,

Glutamate-receptor genes in plants

,

Nature

,

1998

, vol.

396

(pg.

125

-

6

)

33

Kim

SA

Kwak

JM

Jae

SK

et al. ,

Overexpression of the AtGluR2 gene encoding an Arabidopsis homolog of mammalian glutamate receptors impairs calcium utilization and sensitivity to ionic stress in transgenic plants

,

Plant Cell Physiol

,

2001

, vol.

42

(pg.

74

-

84

)

34

Kang

J

Turano

FJ

,

The putative glutamate receptor 1.1 (AtGLR1.1) functions as a regulator of carbon and nitrogen metabolism in Arabidopsis thaliana

,

Proc Natl Acad Sci USA

,

2003

, vol.

100

(pg.

6872

-

7

)

35

Kang

J

Mehta

S

Turano

FJ

,

The putative glutamate receptor 1.1 (AtGLR1.1) in Arabidopsis thaliana regulates abscisic acid biosynthesis and signaling to control development and water loss

,

Plant Cell Physiol

,

2004

, vol.

45

(pg.

1380

-

9

)

36

Li

J

Zhu

S

Song

X

et al. ,

A rice glutamate receptor-like gene is critical for the division and survival of individual cells in the root apical meristem

,

Plant Cell

,

2006

, vol.

18

(pg.

340

-

9

)

37

Brenner

ED

Martinez-Barboza

N

Clark

AP

et al. ,

Arabidopsis mutants resistant to S(+)-beta-methyl-alpha, beta-diaminopropionic acid, a cycad-derived glutamate receptor agonist

,

Plant Physiol

,

2000

, vol.

124

(pg.

1615

-

24

)

38

Kang

S

Kim

HB

Lee

H

et al. ,

Overexpression in Arabidopsis of a plasma membrane-targeting glutamate receptor from small radish increases glutamate-mediated Ca2+ influx and delays fungal infection

,

Mol Cells

,

2006

, vol.

21

(pg.

418

-

27

)

39

Bolouri-Moghaddam

MR

Le Roy

K

Xiang

L

et al. ,

Sugar signalling and antioxidant network connections in plant cells

,

FEBS J

,

2010

, vol.

277

(pg.

2022

-

37

)

40

Chory

J

,

Light signal transduction: an infinite spectrum of possibilities

,

Plant J

,

2010

, vol.

61

(pg.

982

-

991

)

41

Chung

HS

Niu

Y

Browse

J

et al. ,

Top hits in contemporary JAZ: an update on jasmonate signaling

,

Phytochemistry

,

2009

, vol.

70

(pg.

1547

-

59

)

42

Jia

M

Choi

SY

Reiners

D

et al. ,

MetNetGE: interactive views of biological networks and ontologies

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

469

43

Dahlquist

KD

Salomonis

N

Vranizan

K

et al. ,

GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways

,

Nat Genet

,

2002

, vol.

31

(pg.

19

-

20

)

44

Gehlenborg

N

O'Donoghue

SI

Baliga

NS

et al. ,

Visualization of omics data for systems biology

,

Nat Methods

,

2010

, vol.

7

(pg.

S56

-

68

)

45

Suderman

M

Hallett

M

,

Tools for visually exploring biological networks

,

Bioinformatics

,

2007

, vol.

23

(pg.

2651

-

9

)

46

Saraiya

P

North

C

Duca

K

,

Visualizing biological pathways: requirements analysis, systems evaluation and research agenda

,

Inf Vis

,

2005

, vol.

4

(pg.

191

-

205

)

Crossref

47

Cary

MP

Bader

GD

Sander

C

,

Pathway information for systems biology

,

FEBS Lett

,

2005

, vol.

579

(pg.

1815

-

20

)

48

Hucka

M

Finney

A

Sauro

HM

et al. ,

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models

,

Bioinformatics

,

2003

, vol.

19

(pg.

524

-

31

)

49

Luciano

JS

,

PAX of mind for pathway researchers

,

Drug Discov Today

,

2005

, vol.

10

(pg.

937

-

42

)

50

Wurtele

ES

Li

L

Berleant

D

et al.

Nikolau

BJ

Wurtele

ES

,

MetNet: Systems Biology Software for Arabidopsis

,

Concepts in Plant Metabolomics

,

2007

Springer

(pg.

145

-

58

)

Google Preview

51

Hermjakob

H

Montecchi-Palazzi

L

Bader

G

et al. ,

The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data

,

Nat Biotechnol

,

2004

, vol.

22

(pg.

177

-

83

)

52

Lloyd

CM

Halstead

MD

Nielsen

PF

,

CellML: its future, present and past

,

Prog Biophys Mol Biol

,

2004

, vol.

85

(pg.

433

-

50

)

53

Liao

YM

Ghanadan

H

,

The chemical markup language

,

Anal Chem

,

2002

, vol.

74

(pg.

389A

-

90A

)

54

Hanisch

D

Zimmer

R

Lengauer

T

,

ProML--the protein markup language for specification of protein sequences, structures and families

,

In Silico Biol

,

2002

, vol.

2

(pg.

313

-

24

)

55

Spellman

PT

Miller

M

Stewart

J

et al. ,

Design and implementation of microarray gene expression markup language (MAGE-ML)

,

Genome Biol

,

2002

, vol.

3

pg.

RESEARCH0046

56

Taubert

J

Sieren

KP

Hindle

M

et al. ,

The OXL format for the exchange of integrated datasets

,

J Integr Bioinf

,

2007

, vol.

4

(pg.

62

-

75

)

57

Sucaet

Y

Wurtele

ES

,

MetNetAPI: A flexible method to access and manipulate biological network data from MetNet

,

BMC Res Notes

,

2010

, vol.

3

pg.

312

58

Heath

AP

Kavraki

LE

,

Computational challenges in systems biology

,

Comput Sci Rev

,

2009

, vol.

3

(pg.

1

-

17

)

Crossref

59

Kanehisa

M

Goto

S

,

KEGG: kyoto encyclopedia of genes and genomes

,

Nucleic Acids Res

,

2000

, vol.

28

(pg.

27

-

30

)

60

Kanehisa

M

Goto

S

Furumichi

M

et al. ,

KEGG for representation and analysis of molecular networks involving diseases and drugs

,

Nucleic Acids Res

,

2010

, vol.

38

(pg.

D355

-

60

)

61

Lysenko

A

Hindle

MM

Taubert

J

et al. ,

Data integration for plant genomics--exemplars from the integration of Arabidopsis thaliana databases

,

Brief Bioinform

,

2009

, vol.

10

(pg.

676

-

93

)

62

Mentzen

WI

Peng

J

Ransom

N

et al. ,

Articulation of three core metabolic processes in Arabidopsis: fatty acid biosynthesis, leucine catabolism and starch metabolism

,

BMC Plant Biol

,

2008

, vol.

8

pg.

76

63

Beisson

F

Koo

AJ

Ruuska

S

et al. ,

Arabidopsis genes involved in acyl lipid metabolism. A 2003 census of the candidates, a study of the distribution of expressed sequence tags in organs, and a web-based database

,

Plant Physiol

,

2003

, vol.

132

(pg.

681

-

97

)

64

de Oliveira Dal'Molin

CG

Quek

LE

Palfreyman

RW

et al. ,

AraGEM, a genome-scale reconstruction of the primary metabolic network in Arabidopsis

,

Plant Physiol

,

2010

, vol.

152

(pg.

579

-

89

)

65

Reumann

S

Ma

C

Lemke

S

et al. ,

AraPerox. A database of putative Arabidopsis proteins from plant peroxisomes

,

Plant Physiol

,

2004

, vol.

136

(pg.

2587

-

608

)

66

Poole

RL

,

The TAIR database

,

Methods Mol Biol

,

2007

, vol.

406

(pg.

179

-

212

)

67

Grant

MR

Jones

JD

,

Hormone (dis)harmony moulds plant health and disease

,

Science

,

2009

, vol.

324

(pg.

750

-

752

)

68

Rasche

A

Al-Hasani

H

Herwig

R

,

Meta-analysis approach identifies candidate genes and associated molecular networks for type-2 diabetes mellitus

,

BMC Genomics

,

2008

, vol.

9

pg.

310

69

Matthews

L

Gopinath

G

Gillespie

M

et al. ,

Reactome knowledgebase of human biological pathways and processes

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D619

-

22

)

70

Karp

PD

Ouzounis

CA

Moore-Kochlacs

C

et al. ,

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

,

Nucleic Acids Res

,

2005

, vol.

33

(pg.

6083

-

9

)

71

Ashburner

M

Ball

CA

Blake

JA

et al. ,

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium

,

Nat Genet

,

2000

, vol.

25

(pg.

25

-

9

)

72

de Chassey

B

Navratil

V

Tafforeau

L

et al. ,

Hepatitis C virus infection protein network

,

Mol Syst Biol

,

2008

, vol.

4

pg.

230

73

Bader

GD

Betel

D

Hogue

CW

,

BIND: the Biomolecular Interaction Network Database

,

Nucleic Acids Res

,

2003

, vol.

31

(pg.

248

-

50

)

74

Stark

C

Breitkreutz

BJ

Reguly

T

et al. ,

BioGRID: a general repository for interaction datasets

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

D535

-

9

)

75

Xenarios

I

Salwinski

L

Duan

XJ

et al. ,

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions

,

Nucleic Acids Res

,

2002

, vol.

30

(pg.

303

-

5

)

76

Lu

Z

Cohen

KB

Hunter

L

,

GeneRIF quality assurance as summary revision

,

Pac Symp Biocomput

,

2007

(pg.

269

-

80

)

77

Chatr-aryamontri

A

Ceol

A

Palazzi

LM

et al. ,

MINT: the Molecular INTeraction database

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

D572

-

4

)

78

Neerincx

PB

Leunissen

JA

,

Evolution of web services in bioinformatics

,

Brief Bioinform

,

2005

, vol.

6

(pg.

178

-

188

)

79

Schwab

W

,

Metabolome diversity: too few genes, too many metabolites?

,

Phytochemistry

,

2003

, vol.

62

(pg.

837

-

49

)

80

Tsesmetzis

N

Couchman

M

Higgins

J

et al. ,

Arabidopsis reactome: a foundation knowledgebase for plant systems biology

,

Plant Cell

,

2008

, vol.

20

(pg.

1426

-

36

)

81

De Bodt

S

Carvajal

D

Hollunder

J

et al. ,

CORNET: a user-friendly tool for data mining and integration

,

Plant Physiol

,

2010

, vol.

152

(pg.

1167

-

79

)

82

Barrett

T

Edgar

R

,

Gene expression omnibus: microarray data storage, submission, retrieval, and analysis

,

Methods Enzymol

,

2006

, vol.

411

(pg.

352

-

69

)

83

Bruskiewich

R

Coe

EH

Jaiswal

P

et al. ,

The plant ontology consortium and plant ontologies

,

Comp Funct Genomics

,

2002

, vol.

3

(pg.

137

-

42

)

84

Avraham

S

Tung

CW

Ilic

K

et al. ,

The Plant Ontology Database: a community resource for plant structure and developmental stages controlled vocabulary and annotations

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D449

-

54

)

85

Ilic

K

Kellogg

EA

Jaiswal

P

et al. ,

The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant

,

Plant Physiol

,

2007

, vol.

143

(pg.

587

-

99

)

86

Whetzel

PL

Parkinson

H

Causton

HC

et al. ,

The MGED Ontology: a resource for semantics-based description of microarray experiments

,

Bioinformatics

,

2006

, vol.

22

(pg.

866

-

73

)

87

Geisler-Lee

J

O'Toole

N

Ammar

R

et al. ,

A predicted interactome for Arabidopsis

,

Plant Physiol

,

2007

, vol.

145

(pg.

317

-

29

)

88

De Bodt

S

Proost

S

Vandepoele

K

et al. ,

Predicting protein-protein interactions in Arabidopsis thaliana through integration of orthology, gene ontology and co-expression

,

BMC Genomics

,

2009

, vol.

10

pg.

288

89

Heazlewood

JL

Verboom

RE

Tonti-Filippini

J

et al. ,

SUBA: the Arabidopsis Subcellular Database

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

D213

-

8

)

90

Bannai

H

Tamada

Y

Maruyama

O

et al. ,

Extensive feature detection of N-terminal protein sorting signals

,

Bioinformatics

,

2002

, vol.

18

(pg.

298

-

305

)

91

Nair

R

Rost

B

,

Mimicking cellular sorting improves prediction of subcellular localization

,

J Mol Biol

,

2005

, vol.

348

(pg.

85

-

100

)

92

Guda

C

Fahy

E

Subramaniam

S

,

MITOPRED: a genome-scale method for prediction of nucleus-encoded mitochondrial proteins

,

Bioinformatics

,

2004

, vol.

20

(pg.

1785

-

94

)

93

Claros

MG

,

MitoProt, a Macintosh application for studying mitochondrial proteins

,

Comput Appl Biosci

,

1995

, vol.

11

(pg.

441

-

7

)

94

Hoglund

A

Donnes

P

Blum

T

et al. ,

MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs and amino acid composition

,

Bioinformatics

,

2006

, vol.

22

(pg.

1158

-

65

)

95

Emanuelsson

O

Elofsson

A

von Heijne

G

et al. ,

In silico prediction of the peroxisomal proteome in fungi, plants and animals

,

J Mol Biol

,

2003

, vol.

330

(pg.

443

-

56

)

96

Small

I

Peeters

N

Legeai

F

et al. ,

Predotar: A tool for rapidly screening proteomes for N-terminal targeting sequences

,

Proteomics

,

2004

, vol.

4

(pg.

1581

-

90

)

97

Chen

H

Huang

N

Sun

Z

,

SubLoc: a server/client suite for protein subcellular location based on SOAP

,

Bioinformatics

,

2006

, vol.

22

(pg.

376

-

7

)

98

Emanuelsson

O

Brunak

S

von Heijne

G

et al. ,

Locating proteins in the cell using TargetP, SignalP and related tools

,

Nat Protoc

,

2007

, vol.

2

(pg.

953

-

71

)

99

Horton

P

Park

KJ

Obayashi

T

et al. ,

WoLF PSORT: protein localization predictor

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

W585

-

7

)

100

Nielsen

H

Brunak

S

von Heijne

G

,

Machine learning approaches for the prediction of signal peptides and other protein sorting signals

,

Protein Eng

,

1999

, vol.

12

(pg.

3

-

9

)

101

Bender

A

van Dooren

GG

Ralph

SA

et al. ,

Properties and prediction of mitochondrial transit peptides from Plasmodium falciparum

,

Mol Biochem Parasitol

,

2003

, vol.

132

(pg.

59

-

66

)

102

Schneider

G

Fechner

U

,

Advances in the prediction of protein targeting signals

,

Proteomics

,

2004

, vol.

4

(pg.

1571

-

80

)

103

Kaundal

R

Saini

R

Zhao

PX

,

Combining machine learning and homology-based approaches to accurately predict subcellular localization in Arabidopsis

,

Plant Physiol

,

2010

, vol.

154

(pg.

36

-

54

)

104

Kaundal

R

Raghava

GP

,

RSLpred: an integrative system for predicting subcellular localization of rice proteins combining compositional and evolutionary information

,

Proteomics

,

2009

, vol.

9

(pg.

2324

-

42

)

105

Thimm

O

Blasing

O

Gibon

Y

et al. ,

MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes

,

Plant J

,

2004

, vol.

37

(pg.

914

-

39

)

106

Sun

Q

Zybailov

B

Majeran

W

et al. ,

PPDB, the Plant Proteomics Database at Cornell

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D969

-

74

)

107

Heazlewood

JL

Tonti-Filippini

JS

Gout

AM

et al. ,

Experimental analysis of the Arabidopsis mitochondrial proteome highlights signaling and regulatory components, provides assessment of targeting prediction programs, and indicates plant-specific mitochondrial proteins

,

Plant Cell

,

2004

, vol.

16

(pg.

241

-

56

)

108

Brown

JW

Shaw

PJ

Shaw

P

et al. ,

Arabidopsis nucleolar protein database (AtNoPDB)

,

Nucleic Acids Res

,

2005

, vol.

33

(pg.

D633

-

6

)

109

Kleffmann

T

Hirsch-Hoffmann

M

Gruissem

W

et al. ,

plprot: a comprehensive proteome database for different plastid types

,

Plant Cell Physiol

,

2006

, vol.

47

(pg.

432

-

6

)

110

Chang

A

Scheer

M

Grote

A

et al. ,

BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in (2009)

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D588

-

92

)

111

Degtyarenko

K

de Matos

P

Ennis

M

et al. ,

ChEBI: a database and ontology for chemical entities of biological interest

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D344

-

50

)

112

Wang

Y

Xiao

J

Suzek

TO

et al. ,

PubChem: a public information system for analyzing bioactivities of small molecules

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

W623

-

33

)

113

Sitzmann

M

Filippov

IV

Nicklaus

MC

,

Internet resources integrating many small-molecule databases

,

SAR QSAR Environ Res

,

2008

, vol.

19

(pg.

1

-

9

)

114

Van Hemert

JL

Dickerson

JA

,

PathwayAccess: CellDesigner plugins for pathway databases

,

Bioinformatics

,

2010

, vol.

26

(pg.

2345

-

6

)

115

Grimplet

J

Cramer

GR

Dickerson

JA

et al. ,

VitisNet: “Omics” Integration through Grapevine Molecular Networks

,

PLoS ONE

,

2009

, vol.

4

pg.

e8365

116

Grafahrend-Belau

E

Weise

S

Koschutzki

D

et al. ,

MetaCrop: a detailed database of crop plant metabolism

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D954

-

8

)

117

Lemer

C

Antezana

E

Couche

F

et al. ,

The aMAZE LightBench: a web interface to a relational database of cellular processes

,

Nucleic Acids Res

,

2004

, vol.

32

(pg.

D443

-

8

)

118

van Helden

J

Naim

A

Lemer

C

et al. ,

From molecular activities and processes to biological function

,

Brief Bioinform

,

2001

, vol.

2

(pg.

81

-

93

)

119

D'Souza

M

Glass

EM

Syed

MH

et al. ,

Sentra: a database of signal transduction proteins for comparative genome analysis

,

Nucleic Acids Res

,

2007

, vol.

35

(pg.

D271

-

3

)

120

Selkov

E

Basmanova

S

Gaasterland

T

et al. ,

The metabolic pathway collection from EMP: the enzymes and metabolic pathways database

,

Nucleic Acids Res

,

1996

, vol.

24

(pg.

26

-

8

)

121

Philippi

S

Kohler

J

,

Addressing the problems with life-science databases for traditional uses and systems biology

,

Nat Rev Genet

,

2006

, vol.

7

(pg.

482

-

8

)

122

Chandras

C

Weaver

T

Zouberakis

M

et al. ,

Models for financial sustainability of biological databases and resources

,

Database

,

2009

, vol.

106

(pg.

17475

-

80

)