RAID v2.0: an updated resource of RNA-associated interactions across organisms

Abstract

With the development of biotechnologies and computational prediction algorithms, the number of experimental and computational prediction RNA-associated interactions has grown rapidly in recent years. However, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms, whereas a fully comprehensive view of diverse RNA-associated interactions is still not available for any species. Hence, we have updated the RAID database to version 2.0 (RAID v2.0, www.rna-society.org/raid/) by integrating experimental and computational prediction interactions from manually reading literature and other database resources under one common framework. The new developments in RAID v2.0 include (i) over 850-fold RNA-associated interactions, an enhancement compared to the previous version; (ii) numerous resources integrated with experimental or computational prediction evidence for each RNA-associated interaction; (iii) a reliability assessment for each RNA-associated interaction based on an integrative confidence score; and (iv) an increase of species coverage to 60. Consequently, RAID v2.0 recruits more than 5.27 million RNA-associated interactions, including more than 4 million RNA–RNA interactions and more than 1.2 million RNA–protein interactions, referring to nearly 130 000 RNA/protein symbols across 60 species.

INTRODUCTION

Recent developments have indicated that diverse RNA-associated (RNA–RNA/RNA–Protein) interactions are also fundamental to cellular processes like protein–protein interactions. They are also essential for a system-level understanding of cellular behavior (1–4). Hence, in recent years, a wide variety of experimental and computational prediction techniques have expanded a number of diverse RNA-associated interaction data sets. Most of these interactions are available in a variety of databases (5–9), including several databases that primarily manually collect and curate diverse RNA-associated interactions with experimental evidence from literature. Other databases focus on a more generalized perspective for diverse RNAs and their partners in specific cellular processes. Another resource predicts diverse RNA-associated interactions using computational prediction algorithms. However, a fully comprehensive view of diverse RNA-associated interactions is still not available for any particular species.

Because the comprehensive regulation of crosstalk between diverse RNA and proteins still remains ambiguous, we updated the RAID database (5) to version 2.0 (RAID v2.0, http://www.rna-society.org/raid/) by integrating experimental and computational prediction interactions through the manual curation of the literature and another 18 resources under one common framework (Figure 1). Accordingly, RAID v2.0 will offer several distinctive advantages: (i) integration from numerous resources, including experimental and computational prediction databases as well as manual curation of the literature (recruiting more than 5.27 million RNA-associated interactions and exceeding an 850-fold increase over the previous version); (ii) provision of an integrative confidence score for each RNA-associated interaction, considering that an integrated scoring strategy will offer higher confidence when independent types of evidence agree; and (iii) mapping RNA-associated interactions into numerous species to facilitate studies of homology (increased coverage across 60 species).

Figure 1.

Flowchart of database construction and the statistics of RNA categories and interactions. (A) The overview of the RAID v2.0 database; (B) The percentage of diverse RNA categories in RAID v2.0 database; (C) The number of RNA–RNA/RNA–protein interactions for diverse RNA categories in RAID v2.0 database, the height of histogram transformed by log10.

Open in new tab Download slide

DATA COLLECTION

To update this version of the RAID database, we first screened all of the literature in the PubMed database (mainly from 2000–2016) with the following keywords combinations: (i) RNA–RNA interactions: (RNA symbols or RNA category names) and (RNA symbols or RNA category names) and (e.g. interaction or binding); (ii) RNA–protein interactions: (RNA symbols or RNA category names) and (protein symbols) and (e.g. interaction or binding). The relevant hits were downloaded and prepared systematically for further manual data curation. Second, RAID v2.0 integrated diverse RNA-associated interactions from other 18 databases, including ChIPBase (10), LncRNA2Target (11), LncRNAdisease (7), miR2Disease (12), miRTarBase (13), MNDR (14), ncRDeathDB (8), NPInter (15), OncomiRDB (16), sRNATarBase (17), StarBase (6), TransmiR (18) and ViRBase (19) as well as five computational prediction databases (DroID (20), EIMMo (21), miRanda (22), miRDB (23) and TargetScan (9)).

For the RNA/protein names collected from different resources, RAID v2.0 mapped these symbols to either an official gene Symbol or a miRBase ID and presented them to NCBI Alias, HGNC ID, Ensembl ID, OMIM ID, HPRD ID and UniProtKB protein accession, among others. Furthermore, to facilitate researcher access to information from external resources, we also linked Entrez ID, miRBase accession and UniprotKB protein accession to the NCBI Gene, miRBase database and UniProt (24) database, which can efficiently retrieve a substantial amount of genomic-associated data from external resources.

INTEGRATIVE CONFIDENCE SCORES

In RAID v2.0, the RNA-associated interactions are collected from different types of resources under one common framework, including experimental, literature mining and computational prediction evidence. Furthermore, similar to miRTarBase database, the experimental evidence in RAID v2.0 was divided into strong experimental evidence (e.g. RNA immunoprecipitation and luciferase reporter assay) and weak experimental evidence (e.g. ChIP-seq and CLIP-seq) by a manual assignment, depending on the nature and qualitative annotation of the experiment method. Because multiple types of evidence contribute to the identification of a specific RNA-associated interaction, the RNA-associated interactions stored in RAID v2.0 are not equally reliable. Because it is difficult for a user to assess the quality of each interaction, we developed an integrative confidence score system to facilitate the evaluation of the reliability of each RNA-associated interaction (25). An integrative confidence score that combines scores from all of these evidence resources can give an overall estimation of the reliability of each RNA-associated interaction.

In principle, we assume that (i) experimental evidence contributes more significantly to the confidence score than does evidence derived from computational prediction algorithms; (ii) strong experimental evidence with lower false positive rates are considered to provide more reliable evidence than weak experimental evidence; and (iii) RNA-associated interactions supported by more evidence resources should be given higher confidence scores than those supported by fewer evidence resources. Therefore, we firstly assign quantitative confidence scores (strong experimental evidence: s_s, weak experimental evidence: s_w, computational prediction database: s_p) to each RNA-associated interaction based on the evidence types and number of evidence resources as follows:

\begin{equation*} {s_{\rm i}} = \left\{ {\begin{array}{@{}*{1}{c}@{}} {0,\ x = 0}\\ {\frac{{{w_i}}}{{1 + {e^{ - x}}}},\ x >0} \end{array}} \right. \end{equation*}

(1)

where i is the evidence type (s_s: strong experimental evidence, s_w: weak experimental evidence, s_p: computational prediction database) and x is the number of evidence resources, we set weight factor w_s, w_w and w_p to 1, 0.75 and 0.25, respectively.

Finally, an integrative confidence score (S) is calculated as:

\begin{equation*} S = 1 - \mathop \prod \limits_{\rm i} (1 - {s_{\rm i}}) \end{equation*}

(2)

Hence, as illustrated in Supplementary Figure S1, our integrative confidence score system can effectively estimate the reliability of each RNA-associated interaction with more or fewer evidence types and the number of resources. The resulting score ranges between 0 and 1. Only well-supported interactions obtain a value close to 1. Therefore, this is an effective tool for filtering interactions.

DATABASE CONTENT AND CONSTRUCTION

In total, RAID v2.0 recruits 5,272,396 RNA-associated entries (an over 850-fold increase from the previous version), including over 4 million RNA–RNA interactions and over 1.2 million RNA–protein interactions, referring to 129 857 RNA/protein symbols. RAID v2.0 involves at least 13 RNAs (including circRNA, lncRNA, miRNA, mRNA, miscRNA, pseudogenes, rRNA, scRNA, sncRNA, snoRNA, snRNA, sRNA and tRNA) and contains up to 60 species covering seven categories (bacteria, fungi, insects, nematodes, plants, vertebrates and viruses). More importantly, each RNA-associated interaction in RAID v2.0 is provided with an integrative confidence score. The user can select RNA-associated interactions by a user-specific threshold.

A ‘Homology’ option has been added to the ‘Detail Information’ page to help users investigate the conservation of RNA-associated interactions between RNA orthology/paralogy obtained from miRBase and NCBI HomoloGene (Supplementary Figure S2). In the current version, there are more than 80 000 RNAs/proteins with homology information.

In RAID v2.0, we have also modified the display of the predicted binding sites for RNA-associated interactions because several tools used in the previous version were not available. For RNA–RNA interactions, the binding sites and scores are predicted according to miRanda (22) and RISearch (26). For RNA–protein interactions, PRIdictor is used to predict RNA-binding residues in proteins. Additionally, RAID v2.0 have represented the experimental verified RNA-binding sites in proteins documented in RBPDB (27), RsiteDB (28) and PDB (29) databases (Supplementary Figure S2).

On the updated ‘Browse’ page, users can access RAID v2.0 via three different paths: ‘Interaction Type’, ‘Species’ and ‘Detection Methods’. For user convenience, we have designed the treeview and users can obtain browse results by clicking the node.

CONCLUSION AND FUTURE DIRECTIONS

In past decades, numerous protein–protein interactions databases have been established, including the most widely used STRING database. This has led to a more comprehensive understanding of protein functions and cellular processes. However, recent developments have indicated that protein–protein interactions represent perhaps only half of the story in cells. The RNA-associated interactome is likely to be much larger and more complex than we can imagine. Currently, diverse RNA-associated interactions are scattered over a wide variety of resources and organisms. A fully comprehensive view of all diverse RNA-associated interactions is still not available for any species. Consequently, we have updated the RAID database to version 2.0 by integrating manually reading literature and 18 other database resources under one common framework and providing an integrative confidence score for each RNA-associated interaction. RAID v2.0 aims to provide a comprehensive and reliably assessed collection of RNA-associated interactions across organisms. Furthermore, because each RNA-associated interaction has an integrative confidence score, users can filter the diverse RNA-associated interaction network at any threshold.

In the future, we will expand the database with more information, including RNA binding domain annotation, 2D and 3D RNA structures and improvement of the current computational prediction algorithm to obtain our own predicted data. With the emergence of more RNA-related information, we may improve the integrative confidence scoring strategy. We will keep a watchful eye on new research progress and will continuously curate and update the reference data. Hence, complemented by the successful PPI databases, RAID will provide a valuable skeleton for better understanding the functional organization of the cell.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Natural Science Foundation of Heilongjiang Province of China [C2015027]; Scientific Research Fund of Heilongjiang Provincial Education Department [12541426]; WeihanYu Youth Science Fund Project of Harbin Medical University. Funding for open access charge: Natural Science Foundation of Heilongjiang Province of China [C2015027]; Scientific Research Fund of Heilongjiang Provincial Education Department [12541426]; WeihanYu Youth Science Fund Project of Harbin Medical University.

Conflict of interest statement. None declared.

REFERENCES

Sumazin

Yang

Chiu

H.S.

Chung

W.J.

Iyer

Llobet-Navas

Rajbhandari

Bansal

Guarnieri

Silva

et al. .

An extensive microRNA-mediated network of RNA-RNA interactions regulates established oncogenic pathways in glioblastoma

Cell

2011

;

147

370

–

381

Guttman

Rinn

J.L.

Modular regulatory principles of large non-coding RNAs

Nature

2012

;

482

339

–

346

Huerta-Cepas

Szklarczyk

Forslund

Cook

Heller

Walter

M.C.

Rattei

Mende

D.R.

Sunagawa

Kuhn

et al. .

eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences

Nucleic Acids Res.

2016

;

D286

–

D293

Szklarczyk

Franceschini

Wyder

Forslund

Heller

Huerta-Cepas

Simonovic

Roth

Santos

Tsafou

K.P.

et al. .

STRING v10: protein-protein interaction networks, integrated over the tree of life

Nucleic Acids Res.

2015

;

D447

–

D452

Zhang

Chen

Yang

Fan

Dong

Liu

Tan

et al. .

RAID: a comprehensive resource for human RNA-associated (RNA-RNA/RNA-protein) interaction

RNA

2014

;

989

–

993

J.H.

Liu

Zhou

L.H.

Yang

J.H.

starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data

Nucleic Acids Res.

2014

;

D92

–

D97

Chen

Wang

Qiu

Liu

Chen

Zhang

Yan

Cui

LncRNADisease: a database for long-non-coding RNA-associated diseases

Nucleic Acids Res.

2013

;

D983

–

D986

Huang

Kang

Zhang

Jin

Tan

Zhang

et al. .

ncRDeathDB: A comprehensive bioinformatics resource for deciphering network organization of the ncRNA-mediated cell death system

Autophagy

2015

;

1917

–

1926

Agarwal

Bell

G.W.

Nam

J.W.

Bartel

D.P.

Predicting effective microRNA target sites in mammalian mRNAs

Elife

2015

;

e05005

Google Scholar

Crossref

WorldCat

10.

Yang

J.H.

Jiang

Zhou

L.H.

ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data

Nucleic Acids Res.

2013

;

D177

–

D187

11.

Jiang

Wang

Zhang

Jin

Han

Tan

Peng

Liu

et al. .

LncRNA2Target: a database for differentially expressed genes after lncRNA knockdown or overexpression

Nucleic Acids Res.

2015

;

D193

–

D196

12.

Jiang

Wang

Hao

Juan

Teng

Zhang

Wang

Liu

miR2Disease: a manually curated database for microRNA deregulation in human disease

Nucleic Acids Res.

2009

;

D98

–

D104

13.

Chou

C.H.

Chang

N.W.

Shrestha

Hsu

S.D.

Lin

Y.L.

Lee

W.H.

Yang

C.D.

Hong

H.C.

Wei

T.Y.

S.J.

et al. .

miRTarBase 2016: updates to the experimentally validated miRNA-target interactions database

Nucleic Acids Res.

2016

;

D239

–

D247

14.

Wang

Chen

Kang

Fan

Yang

et al. .

Mammalian ncRNA-disease repository: a global view of ncRNA-mediated disease network

Cell Death Dis.

2013

;

e765

15.

Hao

Yuan

Luo

Zhao

Chen

NPInter v3.0: an upgraded database of noncoding RNA-associated interactions

Database (Oxford)

2016

;

2016

baw057

16.

Wang

Ding

OncomiRDB: a database for the experimentally verified oncogenic and tumor-suppressive microRNAs

Bioinformatics

2014

;

2237

–

2238

17.

Wang

Liu

Zhao

Wang

Cao

sRNATarBase 3.0: an updated database for sRNA-target interactions in bacteria

Nucleic Acids Res.

2016

;

D248

–

D253

18.

Wang

Qiu

Cui

TransmiR: a transcription factor-microRNA regulation database

Nucleic Acids Res.

2010

;

D119

–

D122

19.

Wang

Miao

Jin

Wang

Qian

et al. .

ViRBase: a resource for virus-host ncRNA-associated interactions

Nucleic Acids Res.

2015

;

D578

–

D582

20.

Murali

Pacifico

Guest

Roberts

G.G.

3rd,

Finley

R.L.

Jr.

DroID 2011: a comprehensive, integrated resource for protein, transcription factor, RNA and gene interactions for Drosophila

Nucleic Acids Res.

2011

;

D736

–

D743

21.

Gaidatzis

van Nimwegen

Hausser

Zavolan

Inference of miRNA targets using evolutionary conservation and pathway analysis

BMC Bioinformatics

2007

;

22.

Betel

Koppal

Agius

Sander

Leslie

Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

Genome Biol.

2010

;

R90

23.

Wong

Wang

miRDB: an online resource for microRNA target prediction and functional annotations

Nucleic Acids Res.

2015

;

D146

–

D152

24.

UniProt

UniProt: a hub for protein information

Nucleic Acids Res.

2015

;

D204

–

D212

25.

Guo

Liu

Zheng

SynLethDB: synthetic lethality database toward discovery of selective and sensitive anticancer drug targets

Nucleic Acids Res.

2016

;

D1011

–

D1017

26.

Wenzel

Akbasli

Gorodkin

RIsearch: fast RNA-RNA interaction search using a simplified nearest-neighbor energy model

Bioinformatics

2012

;

2738

–

2746

27.

Cook

K.B.

Kazan

Zuberi

Morris

Hughes

T.R.

RBPDB: a database of RNA-binding specificities

Nucleic Acids Res.

2011

;

D301

–

D308

28.

Shulman-Peleg

Nussinov

Wolfson

H.J.

RsiteDB: a database of protein binding pockets that interact with RNA nucleotide bases

Nucleic Acids Res.

2009

;

D369

–

D373

29.

Rose

P.W.

Prlic

Bluhm

W.F.

Christie

C.H.

Dutta

Green

R.K.

Goodsell

D.S.

Westbrook

J.D.

Woo

et al. .

The RCSB Protein Data Bank: views of structural biology for basic and applied research and education

Nucleic Acids Res.

2015

;

D345

–

D356

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Download all slides

Month:	Total Views:
November 2016	22
December 2016	4
January 2017	23
February 2017	50
March 2017	45
April 2017	25
May 2017	31
June 2017	33
July 2017	40
August 2017	38
September 2017	38
October 2017	32
November 2017	53
December 2017	57
January 2018	75
February 2018	38
March 2018	47
April 2018	54
May 2018	110
June 2018	89
July 2018	72
August 2018	112
September 2018	62
October 2018	48
November 2018	80
December 2018	37
January 2019	50
February 2019	56
March 2019	120
April 2019	76
May 2019	50
June 2019	60
July 2019	64
August 2019	70
September 2019	82
October 2019	80
November 2019	50
December 2019	31
January 2020	23
February 2020	36
March 2020	22
April 2020	17
May 2020	50
June 2020	49
July 2020	67
August 2020	63
September 2020	73
October 2020	53
November 2020	58
December 2020	54
January 2021	57
February 2021	40
March 2021	70
April 2021	79
May 2021	57
June 2021	82
July 2021	80
August 2021	48
September 2021	58
October 2021	85
November 2021	48
December 2021	45
January 2022	51
February 2022	57
March 2022	48
April 2022	59
May 2022	39
June 2022	48
July 2022	50
August 2022	42
September 2022	185
October 2022	301
November 2022	155
December 2022	204
January 2023	106
February 2023	49
March 2023	77
April 2023	105
May 2023	66
June 2023	46
July 2023	46
August 2023	44
September 2023	51
October 2023	51
November 2023	34
December 2023	62
January 2024	58
February 2024	28
March 2024	44
April 2024	51
May 2024	48
June 2024	28
July 2024	74
August 2024	64
September 2024	41
October 2024	71
November 2024	94
December 2024	61
January 2025	54
February 2025	50
March 2025	54
April 2025	56
May 2025	34

Article Contents

RAID v2.0: an updated resource of RNA-associated interactions across organisms

Abstract

INTRODUCTION

DATA COLLECTION

INTEGRATIVE CONFIDENCE SCORES

DATABASE CONTENT AND CONSTRUCTION

CONCLUSION AND FUTURE DIRECTIONS

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

RAID v2.0: an updated resource of RNA-associated interactions across organisms Open Access

Abstract

INTRODUCTION

DATA COLLECTION

INTEGRATIVE CONFIDENCE SCORES

DATABASE CONTENT AND CONSTRUCTION

CONCLUSION AND FUTURE DIRECTIONS

SUPPLEMENTARY DATA

FUNDING

REFERENCES

Supplementary data

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

RAID v2.0: an updated resource of RNA-associated interactions across organisms