Social pathway annotation: extensions of the systems biology metabolic modelling assistant

SPA extensions are divided into two main elements: a metabolic pathway model repository (to enable the social annotation of metabolic pathways stored in this repository) and a bibliographic analysis tool (to understand the context of the research studies developed in relation to the selected metabolic pathway).

Metabolic pathway model repository

Using SBMM Assistant capabilities, users can retrieve complete metabolic pathways containing information from different data repositories. The first step for metabolic pathway annotation and curation involves saving the model of interest locally. Later, the model is uploaded to the metabolic pathway model repository by specifying the organism, a description (as complete as possible) and a model name (the owner and the creation date are implicit and automatically added to the model).

The social curation process includes several steps (Figure 2). First step allows users to comment and vote on the whole metabolic pathway. These votes are categorized (Correct, Correct with minor mistakes, Correct with major mistakes and Incorrect) and a chart shows the proportion of votes introduced by the users (Figure 3). Comments and votes are emerging semantics on the metabolic pathway quality and using them a user can produce a new version of the metabolic pathway, solving the mistakes detected on the original version. The versioning tool orders the metabolic pathway versions in a tree, where root is the model uploaded by a user, leaves are curated versions of the metabolic pathway and the nodes are intermediate versions of the metabolic pathway partly curated.

Figure 2:

Curation cycle, where commentaries, votes and versioning are main steps in the process.

Figure 3:

Model’s vote process.

The metabolic pathway model repository allows users to save models, share them with other users and create improved versions of existing models. At this point, we have introduced the ownerships of the models. Thus, a model (or version of a model) can be modified by his/her owner, and only new versions or commentaries can be added by other users. When a user begins to edit a model, the downloading of data from different databases is automatically started, and the user can start working on this metabolic pathway. Newly created and existing metabolic models can be curated and locally stored as SBML files. The tool allows users to perform different tasks in the curation process, including the modification of reaction data (concentrations, reaction parameters, kinetic laws, stoichiometric equations, etc.) and the addition of new components (substrate, product, activator, inhibitor, etc.) in the metabolic pathway (Figure 4).

Figure 4:

Reaction Update Tool. This figure shows how a reaction can be completed by adding new substrates and products.

The metabolic pathway model repository enables navigation in the repository to discover curated metabolic pathways. This navigation panel includes not only user pathways but also versions of these models. Thus, different users can produce different kinds of annotations, such as new versions of metabolic pathways, votations and comments on other user pathways (Figure 3). When a user displays a pathway model and detects errors, inconsistencies or mistakes, then he/she can vote with one of four options: Correct, Correct with minor mistakes, Correct with major mistakes and Incorrect. The vote should be accompanied by comments about the model errors. These SPA extensions facilitate communication between researchers, enabling them to interact and participate in discussions concerning experimental data.

Bibliographic analysis tool

Retrieved metabolic pathways contain information from different databases, which can contain bibliographic references concerning the data they store. To take advantage of this information, the SBMM Assistant has been extended in SPA to cover the relationships between bibliographic references and the different pathway components. This extension provides a way to analyse the networks of publications related to an enzyme or a complete metabolic pathway. This tool enables users to locate those researchers working in a given field. Thus, when a user retrieves information about a metabolic pathway in which he/she is interested, the user will be presented with a graph that contains two types of nodes (researcher and publication nodes) and two types of connections (the author connection and the co-author connection).

This tool provides different icons in the bibliographic network of a metabolic pathway to show the differences between nodes (publications and authors). It is useful to follow the relationship between an interesting publication and the author’s ‘neighbourhood’ (the authors working in the same research area), to query and visualise related publications in Pubmed, and to look for relevant background information on the author (Figure 5).

Figure 5:

Bibliographic Network example for the enzyme 1.2.4.1 (pyruvate dehydrogenase). Main authors are linked with red arrows, publications are the green boxes and the authors are those in the blue boxes.

USE CASES

As use cases, we asked a group of users to deal with the task of curating a malformed metabolic pathway using the SPA extension. The first user group consisted of biologists with basic notions of metabolic modelling. This kind of user can detect basic mistakes and perform simple curation tasks with the tool. The second group consisted of a set of biologists, who had an average-to-high level of comprehension of metabolic modelling but a low level of expertise in the use of modelling tools. This kind of user can perform more complex curation tasks. Neither group had particularly good knowledge of IT tools because we aimed to discover the ‘assistant’ capability of the tool. To perform the test, we had developed two metabolic pathway models that were a malformed version of an already curated metabolic model, the Teusink model of glycolysis in Saccharomyces cerevisiae [19] in SBML format (Figure 6). The test group had to curate it by building a Homo sapiens glycolysis model. The test model contained some errors, including the nonexistence of some species, some malformed kinetic equations and some nonexistent reactions.

Figure 6:

Original Teusink model with mistakes on purpose added.

The selected users were able to deal with the curation of the proposed models. In the case of the less-experienced users, the curation annotation was focussed on indicating some errors, but these users did not deal with the modification of the model because of their inexperience with modelling tasks.

In contrast, the most-experienced users were able to discover and correct the introduced errors, even with their low knowledge of IT tools. In this case, users created new versions of the proposed model by pruning the introduced errors. Thus, the more-experienced users introduced the correct annotations to perform the following functions:

to add new compounds (such as dihydroxiacetone-phosphate);
to add new reactions (such as glucose transport);
to change the compartment of some compounds; and
to interchange compounds (for example, to substitute the less specific term Triose-phosphate with Glyceraldehyde-3-phosphate, the product of Aldolase and substrate of Glyceraldehyde-3-phosphate dehydrogenase).

Additionally, users could change the reactions themselves by changing not only products and substrates but also activators, inhibitors, kinetic laws and kinetic parameters. However, in the described use cases, users did not perform this task because this would have required a more specific knowledge of the metabolic pathway or an analysis of the bibliography.

RELATED WORK

We have analysed some of the more well-known applications (Table 1) of Metabolic Modelling (COPASI [10], Payao [6] based on CellDesigner [4], ByoDyn [7], BioPP [8], WikiPathways [9] and Sycamore [11]) and their social networking capabilities. Also, we have found social curation improvements in a more specific area like kinetic literature curation, such as the SABIO-RK [20] database. We have taken into account some characteristics such as their capability to offer a powerful simulation service, a kinetic service to provide stored data from literature or databases and bibliographic tools. Pathway model management has also been taken into account because this is the main requirement to create a social platform. Related with the model management, the analysis of model-versioning capabilities has been checked to detect if they include improvement and correction management for shared models. Pathway model versioning is needed for social metabolic pathway curation as a way to evaluate multiple ways of curating the same model, leaving the community to choose the best solution among multiple ones. Thus, the community knowledge will be used to prune invalid pathway models, by introducing model annotations.

Table 1:

Open in new tab

Social and distributed capabilities comparison between common modelling applications

	Payao	Byodyn	BioPP	WikiPathways	Copasi	Sycamore	SBMM SPA
SS	×	×			×	×
KS		×				×	×
BT							×
MM	×	×	×	×		×	×
MV				×			×
MTC	×		×	×			×

	Payao	Byodyn	BioPP	WikiPathways	Copasi	Sycamore	SBMM SPA
SS	×	×			×	×
KS		×				×	×
BT							×
MM	×	×	×	×		×	×
MV				×			×
MTC	×		×	×			×

SS, simulation service; KS, kinetic searcher; BT, bibliography tools; MM, modeling managing; MV, model versioning; MTC, model tagging and commenting; X, functional capability.

Table 1:

Open in new tab

Social and distributed capabilities comparison between common modelling applications

	Payao	Byodyn	BioPP	WikiPathways	Copasi	Sycamore	SBMM SPA
SS	×	×			×	×
KS		×				×	×
BT							×
MM	×	×	×	×		×	×
MV				×			×
MTC	×		×	×			×

	Payao	Byodyn	BioPP	WikiPathways	Copasi	Sycamore	SBMM SPA
SS	×	×			×	×
KS		×				×	×
BT							×
MM	×	×	×	×		×	×
MV				×			×
MTC	×		×	×			×

SS, simulation service; KS, kinetic searcher; BT, bibliography tools; MM, modeling managing; MV, model versioning; MTC, model tagging and commenting; X, functional capability.

Foremost in 2003, CellDesigner [4] emerges as one of the most-used process diagram editors for biochemical networks. It had a nice and useful graphical user interface (SBML-compatible). This interface provided the visualization of the metabolic models using a draft of the SBGN standard. A better and more complete simulator for the models should be improved. Currently, CellDesigner provides an API for plugins that allows users to extend the functionalities of the application and interconnection to SBW-powered [21] simulators. Additionally, this tool provides a new feature for searching over SABIO-RK [20]. It has a free-use license and is a Java program downloadable from http://www.celdesigner.org. The BioModels database [22] was published in January 2006 as the first aim to centralize correctly curated kinetic models. The way BioModels works can be summarized as follows: (i) The owner of a model sends it to BioModels. (ii) Persons in care of BioModels make a curation process (that implies a consistency check, curation and simulation). (iii) Each part of the model is submitted to an annotation process. (iv) And, finally, the model is published, open and accessible to third users. Currently, the database has been improved [23] and it is still growing, counting with >450 reactions and models. To provide a feature allowing users to collaborate in the curation process would be very interesting to increase the quality of models available in this repository. It is SBML-compatible and its models are available in http://www.ebi.ac.uk/biomodels-main. COPASI [10], firstly published in December 2006, is surely the most complete SBML-compatible simulator for biochemical networks. However, COPASI lacks of collaborative features to provide social curation/simulation capabilities. It is available at http://www.copasi.org as both a free version and a commercial one. Launched also in 2006, ByoDyn [7] started as a promising SBML-compatible editor and simulator for metabolic pathways, providing a remote manager and a repository to manage the metabolic models generated with the application. It is a Web-based solution available on http://cbbl.imim.es:8080/ByoDyn.

After a time, in May 2007 BioPP [8] was published, allowing users to export a SBML model into HTML for their own purposes or to allow the scientific community to access it, enabling hyperlinks on model elements to related data repositories. It is a good application for deploying final solutions for fully detailed information searches, but it does not allow the tagging, annotation or versioning of the models, which can provide necessary mechanisms to efficiently curate the models in a medium–large community of curators. Thus, it is currently only useful for small communities. In summary, by 2007 we could find two groups of metabolic pathway applications: on one side, stand-alone editors and simulators for metabolic models; and on the other side, applications introducing features to curate metabolic pathways on a collaborative way. However, the integration of most functionalities was still not enough for social curation, not enough to avoid a growing group to curate their own metabolic models. A point of no return was the publication of WikiPathways [9] in April 2008. This wiki web is the actual first application able to provide the essential features for social curation. It allows managing, editing and versioning of pathways with a great efficiency and ease. It does not support exportation of pathways to SBML [18], BioPAX [24], CellML [25] or PSI-MI [26] formats, which would allow users to correctly exchange, simulate and annotate standard identifiers for each reaction or compound, and allows the use of the model with tens of standard compatible applications (i.e. SBML). Currently, it has perhaps the biggest user community (>1000 users). It is available at http://www.wikipathways.org.

With the publication of Sycamore (June 2008) and SBMM Assistant (January 2009), the concept of assistance was implemented allowing users to curate the metabolic models. Both approaches provide automatic kinetic searching. Sycamore is a web-browser application able to construct, simulate and analyse metabolic models. It is useful for building annotated kinetic models because it provides resources for extracting data to SABIO-RK. It does not provide capabilities for social curation, but it provides tools for users to manage the models. SBMM Assistant integrates curated kinetic databases as SABIO-RK, Brenda, ChEBI, KEGG and UniProt managed by the KOMF [27] mediator and using the AMMO ontology [28]. Sycamore is a SBML-compatible web-based application available in http://sycamore.eml.org/sycamore under free academic use license, and SBMM Assistant is a SBML-compatible Java with an application available in http://www.sbmm.uma.es under Creative Commons license.

Finally in April 2010 appears Payao, which uses CellDesigner to show SBML models. It includes capabilities for privilege levels, adding tags to some targets (as reactions) and commenting upon these tags, all in real time and concurrently. Model management can occur in three ways (all, favorites, own). Currently, Payao does not allow the versioning of models and eventually, it will allow models to be updated, but it is in the right way to create a useful community of users to curate metabolic pathways. This is a SBML-compatible web-application available in http://sblab.celldesigner.org/Payao10/bin.

All the previously commented tools fall short of providing a complete set of tools for enabling the community curation of metabolic pathways such as searching capabilities, a metabolic pathway repository, annotation tools and metabolic pathway versioning. Thus, the problem is partly solved by existing tools, but SPA tends towards providing a complete solution for metabolic pathway curation based on community knowledge. The versioning capability is an important characteristic for community curation because it is a means of producing improved versions of metabolic pathways. This characteristic is shared with WikiPathways. There are also social curation capabilities in Payao, BioPP and WikiPathways. Additionally, SPA provides a bibliographic tool, a unique feature that enables the analysis of the context of a given metabolic pathway.

In databases such as SABIO-RK, a restricted social curation is provided by a reduced curator group. Thus, these curation tasks involve a considerable effort to curate a low volume of data, which could be improved by the use of social curation capabilities. Therefore, an effort to support social curation by means of software solutions would provide powerful tools to curate high volumes of data easily and at a low cost.

The integration of social capabilities and data tools (e.g. kinetic searcher) is a need to enable the collaborative development of knowledge supported by new and experienced curators.

DISCUSSION AND CONCLUSIONS

Curation tools must evolve towards social curation [29], due to the impossibility (both in terms of people and resources) of managing the huge quantity of constantly increasing biological information provided by experimental methodologies.

In the future, all new tools, upgrades of old ones and collaborations must have as basis the use of standards like the interchange format SBML, the visual interchange format SBGN and the standard for minimum information into a biochemical model MIRIAM. This will provide the main point for common work between multiple user and tools.

We have developed a novel tool that tests some basic theoretical aspects of the future of data curation [30], based on metabolic pathway curation. The advantages of this solution in comparison to a conventional metabolic pathway curation are as follows:

many eyes are able to look for model inconsistencies, whereas in conventional curation only a few eyes look for those inconsistencies. This capability allows better quality of results to be obtained because more controlled discussions are developed to choose the best options. This capability also allows us to obtain faster curations because bottlenecks could be prevented by using the cumulative experience of many users;
the controlled creation of models and versions allows the production of new ordered knowledge, which would not be the case if a lot of files were shared as an unordered amalgam of names;
the use of a controlled vocabulary allows errors to be kept to a minimum in curation and also helps with the organisation, maintenance and querying of the constantly incoming metabolic pathway models; and
the extension over an easy-to-use, albeit powerful, and graphical user interface facilitates the curation task for inexperienced users.

This tool has been tested with two user groups in a short-scale experiment. Because these results are limited by the small set of users involved in these use cases, future work will expand these use cases to include enough users to ensure statistical significance; these future experiments will allow us to ascertain SPA usability and the usefulness of the collaborative curation of metabolic pathways. Looking forward, upgrades will be incorporated in the near future:

Currently, the annotations of changes are manually provided by the user. The idea is to provide an automatic controller of changes, which controls the changes between a model/version and other versions.
Concerning the bibliography tool, relationships that could be established between an article and its citations would improve its current potential.
Social curation allows users to annotate inconsistency within a metabolic pathway. However, parallel curation tasks can be opened for the same metabolic pathway. In our future work we plan to detect these situations to enable ways of combining similar efforts in the same pathway. A mechanism to find common resources between models must be implemented to allow users to propose the suppression of repeated pathways or the migration of a model as a version of another model.
The owner of a kinetic model will be able to establish himself as moderator of his model, to delegate this task on another or allow the community to assume the task of moderator.
We believe that, in the near future, a necessary improvement for social curation will be the management of kinetic model results (e.g. simulation results). This capability will enable a deep analysis of model quality and usefulness.
Social networking can be used in teaching activity, and the tool could prove to be effective in improving student ability in the field. Thus, the use of this tool could be included in the courses in which pathway modelling is taught.

Key points

A bibliographic tool that allows users to discover scientific networks through the links between papers related by a metabolic pathway, reaction or biological component.
A curation tool that enables the edition of metabolic pathway elements (metabolites, reactions and their kinetics) by individual users.
A social network tool that provides users a way to store metabolic pathway models and collaboratively curate them.

FUNDING

Plan Andaluz de Investigación (BIO-267, P07-CVI-02999, P07-TIC-02978, TIC-136), the Spanish Ministry of Sciences and Innovation (TIN2008-04844, SAF2008-02522, PS09/02216) and Fundación Ramón Areces. The ‘CIBER de Enfermedades Raras’ is an initiative from the ISCIII (Spain).

Acknowledgements

We would like to thank Daniel Pastor for his constant testing of the tool and advice, Amine Kerzazi for his continuous maintenance of the data wrappers and Ian Morilla for his invaluable help in the Tutorial composition. We would also like to mention the undergraduate students enrolled in a Metabolic Biochemistry Group and the graduate students enrolled in a master course on ‘Analysis and Modelling of Complex Biological Systems’ for testing the system in the use cases described in this manuscript.

References

1

Kitano

H

,

Computational systems biology

,

Nature

,

2002

, vol.

420

(pg.

206

-

10

)

2

Reyes-Palomares

A

Montanez

R

Real-Chicharro

A

et al. ,

Systems biology metabolic modeling assistant: an ontology-based tool for the integration of metabolic data in kinetic modeling

,

Bioinformatics

,

2009

, vol.

25

(pg.

834

-

5

)

3

Le Novere

N

Finney

A

Hucka

M

et al. ,

Minimum information requested in the annotation of biochemical models (MIRIAM)

,

Nat Biotechnol

,

2005

, vol.

23

(pg.

1509

-

15

)

4

Funahashi

A

Tanimura

N

Morohashi

M

et al. ,

CellDesigner: a process diagram editor for gene-regulatory and biochemical networks

,

BIOSILICO

,

2003

, vol.

1

(pg.

159

-

62

)

5

Kitano

H

,

A graphical notation for biochemical networks

,

BIOSILICO

,

2003

, vol.

1

(pg.

169

-

76

)

http://cbbl.imim.es:8080/ByoDyn

6

Matsuoka

Y

Ghosh

S

Kikuchi

N

et al. ,

Payao: A Community Platform for SBML Pathway Model Curation

,

Bioinformatics

,

2010

, vol.

26

(pg.

1381

-

3

)

7

ByoDyn: integrative tool for Systems Biology

2010, date last accessed

8

Viswanathan

GA

Nudelman

G

Patil

S

et al. ,

BioPP: a tool for web-publication of biological networks

,

BMC Bioinformatics

,

2007

, vol.

8

pg.

168

9

Pico

AR

Kelder

T

van Iersel

MP

et al. ,

WikiPathways: pathway editing for the people

,

PLoS Biol

,

2008

, vol.

6

pg.

e184

10

Hoops

S

Sahle

S

Gauges

R

et al. ,

COPASI – a COmplex PAthway SImulator

,

Bioinformatics

,

2006

, vol.

22

(pg.

3067

-

74

)

11

Weidemann

A

Richter

S

Stein

M

et al. ,

SYCAMORE – a systems biology computational analysis and modeling research environment

,

Bioinformatics

,

2008

, vol.

24

(pg.

1463

-

4

)

12

The Universal Protein Resource (UniProt)

,

2009

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D169

-

74

)

PubMed

13

Kanehisa

M

Araki

M

Goto

S

et al. ,

KEGG for linking genomes to life and the environment

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D480

-

4

)

14

Degtyarenko

K

de Matos

P

Ennis

M

et al. ,

ChEBI: a database and ontology for chemical entities of biological interest

,

Nucleic Acids Res

,

2008

, vol.

36

(pg.

D344

-

50

)

15

Chang

A

Scheer

M

Grote

A

et al. ,

BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009

,

Nucleic Acids Res

,

2009

, vol.

37

(pg.

D588

-

92

)

16

Rojas

I

Golebiewski

M

Kania

R

et al. ,

Storing and annotating of kinetic data

,

In Silico Biol

,

2007

, vol.

7

(pg.

S37

-

44

)

PubMed

17

Navas-Delgado

I

Aldana-Montes

JF

,

Extending SD-Core for Ontology-based Data Integration

,

j-jucs

,

2009

, vol.

15

(pg.

3201

-

30

)

18

Hucka

M

Finney

A

Sauro

HM

et al. ,

The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models

,

Bioinformatics

,

2003

, vol.

19

(pg.

524

-

31

)

19

Teusink

B

Passarge

J

Reijenga

CA

et al. ,

Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing biochemistry

,

Eur J Biochem

,

2000

, vol.

267

(pg.

5313

-

29

)

20

Funahashi

A

Jouraku

A

Matsuoka

Y

et al. ,

Integration of CellDesigner and SABIO-RK

,

In Silico Biol

,

2007

, vol.

7

(pg.

S81

-

90

)

PubMed

21

Hucka

M

Finney

A

Sauro

HM

et al. ,

The ERATO Systems Biology Workbench: enabling interaction and exchange between software tools for computational biology

,

Pac Symp Biocomput

,

2002

(pg.

450

-

61

)

http://cath.gisum.uma.es:8080/ontologies/AMMO.owl

22

Le Novere

N

Bornstein

B

Broicher

A

et al. ,

BioModels Database: a free, centralized database of curated, published, quantitative kinetic models of biochemical and cellular systems

,

Nucleic Acids Res

,

2006

, vol.

34

(pg.

D689

-

91

)

23

Li

C

Donizelli

M

Rodriguez

N

et al. ,

BioModels Database: an enhanced, curated and annotated resource for published quantitative kinetic models

,

BMC Syst Biol

,

2010

, vol.

4

pg.

92

24

BioPAX—biological pathways exchange language

23 September 2010, date last accessed

http://www.biopax.org

25

CellML

23 September 2010, date last accessed

http://www.cellml.org/

26

PSI-MI

23 September 2010, date last accessed

http://www.psidev.info/

27

Roldan-Garcia Mdel

M

Navas-Delgado

I

Kerzazi

A

et al. ,

KA-SB: from data integration to large scale reasoning

,

BMC Bioinformatics

,

2009

, vol.

10

Suppl 10

pg.

S5

28

AMMO ontology

2010, date last accessed

29

Lambert

P

Tan

L

Turner

K

et al. ,

Data curation standards and social science occupational information resources

,

Int J Digital Curation

,

2007

, vol.

2

(pg.

73

-

91

)