Skip to Main Content

Article Navigation

Journal Article

STITCH 2: an interaction network database for small molecules and proteins

ABSTRACT

Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/.

INTRODUCTION

The effects of small molecules on organisms have long been the focus of biochemistry and pharmacology. Over the last years there has been a considerable increase in the number of high-throughput screens that have been performed using chemical libraries (1–3). At the same time, the molecular targets of individual chemicals are being studied in ever greater detail (4,5). There also is a great interest in chemical biology approaches, using small molecules to perturb cellular functions (6). For the design and interpretation of these studies, the context of the chemicals and proteins needs to be considered. For example, in the case of high-content screening for specific cellular effects, it is important to know whether the active chemicals already have known activities that can explain the observed effects, or whether novel mechanisms of actions might be present. Therefore, we have developed a ‘search tool for interactions of chemicals’ (STITCH) both as a large-scale, downloadable database of interaction data and as an interactive web tool for the exploration of interaction networks (Figure 1). Since its first release (7), STITCH is being accessed by over one hundred scientists each week and has been used as a source of protein–chemical associations e.g. by Prathipati et al. (8), who used the STITCH network to automatically extract the targets of anti-tuberculosis compounds in Mycobacterium tuberculosis.

Interaction network around aspirin. Human proteins predicted to interact with aspirin according to different sources of evidence are shown. Edges are colored according to the source of evidence (magenta: experimental information, cyan: manually curated databases, yellow: text-mining). Clicking on the node ‘aspirin’ will display a pop-up showing the structure and description.

Figure 1.

Interaction network around aspirin. Human proteins predicted to interact with aspirin according to different sources of evidence are shown. Edges are colored according to the source of evidence (magenta: experimental information, cyan: manually curated databases, yellow: text-mining). Clicking on the node ‘aspirin’ will display a pop-up showing the structure and description.

Open in new tab Download slide

Here, we present the second version of STITCH. In addition to the sources of protein–chemical interactions included in the previous version—PDSP K_i Database (9), Protein Data Bank (PDB) (10), KEGG (11), Reactome (12), NCI-Nature Pathway Interaction Database (http://pid.nci.nih.gov), DrugBank (13) and MATADOR (14)—we now further include interactions imported from GLIDA (15), PharmGKB (16,17), Comparative Toxicogenomics Database (CTD) (18) and BindingDB (19). These added databases mainly provide information on interactions between human proteins and drugs or drug-like molecules.

The imported sources of information are scored separately and then combined with information from text-mining (7). Databases which contain manually annotated interactions receive high scores, while interactions based on experimental information are scored by the confidence or relevance of the reported interaction. The number of high-confidence (score ≥ 0.7) human chemical–protein interactions increased from 51 000 to 85 000. For these high-confidence interactions, the number of interacting human proteins increases from 5300 to 7400 (as STITCH is locus-based, only one gene product is counted per gene).

INCREASING THE NUMBER OF SPECIFIED ACTIONS

The STITCH network is created by mapping interactions from the sources mentioned above and from text-mining onto a consolidated set of chemicals that has been derived from PubChem, assigning a confidence score for each interaction (7). The newly-derived protein–chemical and chemical–chemical associations are then complemented with protein–protein interactions from the STRING database (20). In the previous version of STITCH (7), we began to import ‘actions’ derived from natural language processing (NLP), pathway and interaction databases. These actions specify the nature of the interaction independent of the source of interaction information. For example, a ‘binding’ action could be derived from a binding affinity database and an ‘inhibition’ action could be imported from NLP. We have greatly extended the set of available actions by further importing action types from GLIDA (15), PharmGKB (16,17), CTD (18), BindingDB (19) and a manually annotated set of interactions. This set of interactions has been curated from DrugBank (13) records, results from NLP analysis of PubMed abstracts, Medical Subject Headings (MeSH) pharmacological actions, Anatomical Therapeutic Chemical classification (ATC) entries and a review paper on drugs and their targets (21). An action has been assigned to 81% of the high-confidence human chemical–protein interactions. The number of available edges with a high-confidence action annotation increased from 44 000 to 65 000 human chemical–protein interactions.

HANDLING OF CHEMICAL STRUCTURES

As described previously (7), STITCH creates a consolidated set of chemicals from PubChem (22) by merging stereo isomers and salt forms of the same molecule into one compound. This is done to ensure that all information about the same biologically active entity is merged. While this works very well for drugs that can be supplied in different formulations (e.g. different salt forms), it also has limitations, especially regarding carbohydrates. It is our long-term goal to associate interactions both with the individual isomer and the merged structure. For now, we have taken the step to explicitly display all the different compounds that have been deemed biologically equivalent (Figure 2).

Different structural scaffolds corresponding to aspirin. For the drug aspirin, a link to PubChem and a short description is shown. Different salts of aspirin that will have the same bioactivity have been consolidated and merged with the main, uncharged form. Below each chemical structure, the first part of the InChIKey is shown, corresponding to an encoded (hashed) description of the structure. This short string can be used to search for more information about the compound on the Internet.

Figure 2.

Different structural scaffolds corresponding to aspirin. For the drug aspirin, a link to PubChem and a short description is shown. Different salts of aspirin that will have the same bioactivity have been consolidated and merged with the main, uncharged form. Below each chemical structure, the first part of the InChIKey is shown, corresponding to an encoded (hashed) description of the structure. This short string can be used to search for more information about the compound on the Internet.

Open in new tab Download slide

Recently, the International Union of Pure and Applied Chemistry (IUPAC) has standardized an open format for chemical structures, namely the IUPAC International Chemical Identifier (InChI). In addition to the existing capability to search chemical structures using SMILES string, we have now also implemented a search for InChIs. We use the tool Open Babel to convert InChIs to SMILES strings, which are in turn searched against our chemical database by using hashed fingerprints as implemented in the open-source Chemical Development Kit (23). Furthermore, we have implemented a search for InChIKeys, which are short strings that represent an encoded (hashed) form of the chemical structure. InChIKeys consist of two parts, the first of which is based on the chemical connectivity, whereas the second part contains information about stereochemistry, tautomers and other structural variations. As STITCH currently considers structures with the same connectivity to be equivalent (thus merging stereo isomers), only the first part of the InChIKey is queried against our chemical database. We also use this part of the InChIKey to provide links to Google and ChemSpider.

USER INTERFACE IMPROVEMENTS

Many proteins, especially drug targets, have a large number of high-scoring interactions with small molecules in the STITCH network. In this case, a network centered on such a protein will only show chemicals unless very many interaction partners are requested to be shown (Figure 3a). In order to allow the user to see more of the context of the query protein, we now offer the option to show a network in which proteins and chemicals each make up more than a third of the nodes (Figure 3b). When this option is selected, only a limited number of the highest-scoring chemicals are displayed. Further chemicals are omitted in favor of proteins (or vice versa) and their number is shown to the user. If the network consists of only chemicals, but no proteins are available at the current settings (e.g. due to a minimum score limit), then the option to show more context is not shown.

Interactions of prostaglandin-endoperoxide synthase 1 (PTGS1). (a) The highest-scoring interaction partners of PTGS1 are non-steroidal anti-inflammatory drugs (NSAIDs). As the confidence scores for these interactions are very high, no interacting proteins are shown. (b) The user may ask STITCH to display more of the interaction context and to let at least one-third of the interaction partners be proteins. In this case, STITCH is skipping 19 high-scoring chemicals in order to include four interacting proteins. In both networks, the color of the edge corresponds to the type of connected nodes (e.g. green: chemical–protein interaction) and the width of the edge correlates with the confidence score.

Figure 3.

Interactions of prostaglandin-endoperoxide synthase 1 (PTGS1). (a) The highest-scoring interaction partners of PTGS1 are non-steroidal anti-inflammatory drugs (NSAIDs). As the confidence scores for these interactions are very high, no interacting proteins are shown. (b) The user may ask STITCH to display more of the interaction context and to let at least one-third of the interaction partners be proteins. In this case, STITCH is skipping 19 high-scoring chemicals in order to include four interacting proteins. In both networks, the color of the edge corresponds to the type of connected nodes (e.g. green: chemical–protein interaction) and the width of the edge correlates with the confidence score.

Open in new tab Download slide

Previously, STITCH required the user to select an organism when searching for interactions with a chemical. Now, this is not required anymore. When no organism is selected, the organism with the highest-scoring interaction partners is selected. In case of multiple organisms with equal scores, human and several model organisms are preferentially selected. (Human is one of the highest-ranking species for 60% of the chemicals with protein–chemical interactions.) For example, the binding between the antipsychotic agent fluspiperone and the 5–hydroxytryptamine (serotonin) receptor 7 has only been studied in mouse and rat. Consequently, a user searching for this compound would be directed to the protein–chemical interaction network in mouse. It is also possible to restrict the search to different levels of the NCBI taxonomy (24), e.g. bacteria, fungi or rodents.

While central repositories of gene annotations exist, no such information is available in a centralized manner for chemicals. To be able to display text annotation for chemicals, we have imported information from the following databases: DrugBank (13), National Cancer Institute (NCI) thesaurus (25), MeSH descriptors and qualifiers. Using STITCH's dictionary of chemical synonyms we mapped compounds from these databases to STITCH identifiers. In case where descriptions are available for different forms of the same compound (e.g. different salt forms, which have been merged in STITCH), we have automatically assigned the description of the main compound. Any remaining inconsistencies were manually resolved. For each chemical we have assigned the text annotation from only one source, prioritizing sources as follows: NCI (descriptions), DrugBank (descriptions), DrugBank (pharmacology), DrugBank (drug category), MeSH (pharmacological action), NCI (tags) and MeSH (scope note). Descriptions are available for 33 352 chemicals, covering 33% of the chemicals with interactions.

USE CASES

The STITCH homepage offers several short tutorials to introduce the different query options (e.g. searching for a single identifier or multiple chemical structures). A search for ‘aspirin’ on the homepage will lead to the interaction network shown in Figure 1. Here, the main interactors of the drug are shown in human (which is selected automatically as described above). The known main targets, PTGS1 and PTGS2, are connected by very high scores. While most interaction partners are backed up by evidence from manually curated databases and are therefore very reliable, one interaction is derived only from text-mining: COX1 is actually a false positive arising from an ambiguous synonym.

Taken together, STITCH 2 offers an enlarged set of protein–chemical interactions, extended inter-database operability, increased query options and an improved user interface. STITCH can be accessed at http://stitch.embl.de/. Users can explore the interaction network interactively or download the complete set of interactions. In addition, we provide an application programming interface (API) to let scripts resolve identifiers and retrieve interaction networks either as an image or in standard network formats (20).

FUNDING

Klaus Tschira Foundation (to M.K. and A.B.). Novo Nordisk Foundation Center for Protein Research (partial). Funding for open access charge: European Molecular Biology Laboratory.

Conflict of interest statement. None declared.

REFERENCES

1.

Han

L

,

Wang

Y

,

Bryant

SH

.

A survey of across-target bioactivity results of small molecules in PubChem

.

Bioinformatics

(

2009

)

25

:

2251

–

2255

.

2.

Zanzoni

A

,

Soler-López

M

,

Aloy

P

.

A network medicine approach to human disease

.

FEBS lett.

(

2009

)

583

:

1759

–

1765

.

3.

Peterson

RT

.

Chemical biology and the limits of reductionism

.

Nature Chem. Biol.

(

2008

)

4

:

635

–

638

.

4.

Ovaa

H

,

van Leeuwen

F

.

Chemical biology approaches to probe the proteome

.

Chembiochem: Eur. J. Chem. Biol.

(

2008

)

9

:

2913

–

2919

.

5.

Rix

U

,

Superti-Furga

G

.

Target profiling of small molecules by chemical proteomics

.

Nature Chem. Biol.

(

2009

)

5

:

616

–

624

.

6.

Edwards

AM

,

Bountra

C

,

Kerr

DJ

,

Willson

TM

.

Open access chemical and clinical probes to support drug discovery

.

Nature Chem. Biol.

(

2009

)

5

:

436

–

440

.

7.

Kuhn

M

,

von Mering

C

,

Campillos

M

,

Jensen

LJ

,

Bork

P

.

STITCH: interaction networks of chemicals and proteins

.

Nucleic Acids Res.

(

2008

)

36

:

D684

–

D688

.

8.

Prathipati

P

,

Ma

NL

,

Manjunatha

UH

,

Bender

A

.

Fishing the target of antitubercular compounds: in silico target deconvolution model development and validation

.

J. Proteome Res.

(

2009

)

8

:

2788

–

2798

.

9.

Roth

B

,

Lopez

E

,

Patel

S

,

Kroeze

W

.

The multiplicity of serotonin receptors: uselessly diverse molecules or an embarrassment of riches?

Neuroscientist

(

2000

)

6

:

262

.

10.

Berman

HM

,

Westbrook

J

,

Feng

Z

,

Gilliland

G

,

Bhat

TN

,

Weissig

H

,

Shindyalov

IN

,

Bourne

PE

.

The Protein Data Bank

.

Nucleic Acids Res.

(

2000

)

28

:

242

.

11.

Kanehisa

M

,

Goto

S

,

Hattori

M

,

Aoki-Kinoshita

KF

,

Itoh

M

,

Kawashima

S

,

Katayama

T

,

Araki

M

,

Hirakawa

M

.

From genomics to chemical genomics: new developments in KEGG

.

Nucleic Acids Res.

(

2006

)

34

:

D354

–

D357

.

12.

Joshi-Tope

G

,

Gillespie

M

,

Vastrik

I

,

D'Eustachio

P

,

Schmidt

E

,

de Bono

B

,

Jassal

B

,

Gopinath

GR

,

Wu

GR

,

Matthews

L

, et al.

Reactome: a knowledgebase of biological pathways

.

Nucleic Acids Res.

(

2005

)

33

:

D428

–

D432

.

13.

Wishart

DS

,

Knox

C

,

Guo

AC

,

Shrivastava

S

,

Hassanali

M

,

Stothard

P

,

Chang

Z

,

Woolsey

J

.

DrugBank: a comprehensive resource for in silico drug discovery and exploration

.

Nucleic Acids Res.

(

2006

)

34

:

D668

–

D672

.

14.

Günther

S

,

Kuhn

M

,

Dunkel

M

,

Campillos

M

,

Senger

C

,

Petsalaki

E

,

Ahmed

J

,

Urdiales

EG

,

Gewiess

A

,

Jensen

LJ

, et al.

SuperTarget and Matador: Resources for exploring drug-target relationships

.

Nucleic Acids Res.

(

2008

)

D919

–

D922

.

15.

Okuno

Y

,

Yang

J

,

Taneishi

K

,

Yabuuchi

H

,

Tsujimoto

G

.

GLIDA: GPCR-ligand database for chemical genomic drug discovery

.

Nucleic Acids Res.

(

2006

)

34

:

D673

–

D677

.

16.

Gong

L

,

Owen

RP

,

Gor

W

,

Altman

RB

,

Klein

TE

.

PharmGKB: an integrated resource of pharmacogenomic data and knowledge

.

Curr. Protoc. Bioinformatics

(

2008

) Chapter 14(Unit 14):17.

17.

Hewett

M

,

Oliver

DE

,

Rubin

DL

,

Easton

KL

,

Stuart

JM

,

Altman

RB

,

Klein

TE

.

PharmGKB: the Pharmacogenetics Knowledge Base

.

Nucleic Acids Res.

(

2002

)

30

:

163

–

165

.

18.

Davis

AP

,

Murphy

CG

,

Saraceni-Richards

CA

,

Rosenstein

MC

,

Wiegers

TC

,

Mattingly

CJ

.

Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical-gene-disease networks

.

Nucleic Acids Res.

(

2009

)

37

:

D786

–

D792

.

19.

Liu

T

,

Lin

Y

,

Wen

X

,

Jorissen

RN

,

Gilson

MK

.

BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities

.

Nucleic Acids Res.

(

2007

)

35

:

D198

–

D201

.

20.

Jensen

LJ

,

Kuhn

M

,

Stark

M

,

Chaffron

S

,

Creevey

C

,

Muller

J

,

Doerks

T

,

Julien

P

,

Roth

A

,

Simonovic

M

, et al.

STRING 8—a global view on proteins and their functional interactions in 630 organisms

.

Nucleic Acids Res.

(

2009

)

37

:

D412

–

D416

.

21.

Imming

P

,

Sinning

C

,

Meyer

A

.

Drugs, their targets and the nature and number of drug targets

.

Nature Rev. Drug Disc.

(

2006

)

5

:

821

–

834

.

22.

Wheeler

DL

,

Barrett

T

,

Benson

DA

,

Bryant

SH

,

Canese

K

,

Chetvernin

V

,

Church

DM

,

DiCuccio

M

,

Edgar

R

,

Federhen

S

, et al.

Database resources of the National Center for Biotechnology Information

.

Nucleic Acids Res.

(

2007

)

35

:

D5

–

D12

.

23.

Steinbeck

C

,

Hoppe

C

,

Kuhn

S

,

Floris

M

,

Guha

R

,

Willighagen

E

.

Recent developments of the chemistry development kit (CDK)—an open-source java library for chemo- and bioinformatics

.

Curr. Pharm. Des.

(

2006

)

12

:

2111

–

2120

.

24.

Wheeler

DL

,

Barrett

T

,

Benson

DA

,

Bryant

SH

,

Canese

K

,

Chetvernin

V

,

Church

DM

,

DiCuccio

M

,

Edgar

R

,

Federhen

S

, et al.

Database resources of the National Center for Biotechnology Information

.

Nucleic Acids Res.

(

2007

)

35

:

D5

–

D12

.

25.

Sioutos

N

,

de Coronado

S

,

Haber

MW

,

Hartel

FW

,

Shaiu

WL

,

Wright

LW

.

NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information

.

J. Biomed. Inform.

(

2007

)

40

:

30

–

43

.

© The Author(s) 2009. Published by Oxford University Press.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download all slides

Views

3,849

Altmetric

Total Views 3,849

2,907 Pageviews

942 PDF Downloads

Since 4/1/2017

Month:	Total Views:
April 2017	10
May 2017	3
June 2017	5
July 2017	10
August 2017	7
September 2017	5
October 2017	12
November 2017	7
December 2017	25
January 2018	23
February 2018	16
March 2018	53
April 2018	41
May 2018	30
June 2018	33
July 2018	13
August 2018	12
September 2018	20
October 2018	10
November 2018	25
December 2018	19
January 2019	16
February 2019	14
March 2019	31
April 2019	26
May 2019	40
June 2019	33
July 2019	51
August 2019	42
September 2019	50
October 2019	20
November 2019	16
December 2019	25
January 2020	35
February 2020	30
March 2020	26
April 2020	8
May 2020	37
June 2020	28
July 2020	12
August 2020	19
September 2020	25
October 2020	23
November 2020	20
December 2020	35
January 2021	18
February 2021	29
March 2021	39
April 2021	30
May 2021	41
June 2021	38
July 2021	26
August 2021	34
September 2021	36
October 2021	39
November 2021	34
December 2021	20
January 2022	32
February 2022	36
March 2022	56
April 2022	52
May 2022	47
June 2022	23
July 2022	65
August 2022	42
September 2022	43
October 2022	50
November 2022	44
December 2022	58
January 2023	42
February 2023	38
March 2023	62
April 2023	39
May 2023	116
June 2023	32
July 2023	35
August 2023	34
September 2023	54
October 2023	63
November 2023	72
December 2023	73
January 2024	84
February 2024	80
March 2024	78
April 2024	82
May 2024	66
June 2024	83
July 2024	67
August 2024	70
September 2024	71
October 2024	108
November 2024	98
December 2024	45
January 2025	65
February 2025	56
March 2025	59
April 2025	68
May 2025	36