ABSTRACT

Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data. The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans and Arabidopsis thaliana. By analyzing ∼14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, we annotated and identified ∼380 000 unique ncRNA-associated small RNAs (nasRNAs), ∼1.5 million unique promoter-associated small RNAs (pasRNAs), ∼4.0 million unique exon-associated small RNAs (easRNAs) and ∼6 million unique repeat-associated small RNAs (rasRNAs). Furthermore, 2038 miRNA and 1889 snoRNA candidates were predicted by miRDeep and snoSeeker. All of the mapped reads can be grouped into about 1.2 million RNA clusters. For the purpose of comparative analysis, deepBase provides an integrative, interactive and versatile display. A convenient search option, related publications and other useful information are also provided for further investigation. deepBase is available at: http://deepbase.sysu.edu.cn/.

INTRODUCTION

Next-generation ‘deep-sequencing’ technologies have enabled the detection and profiling of both known and novel small noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth (13). Most studies to date have used 454 and Solexa technologies to discover new and different ncRNA classes in a multitude of species, including human (46), mouse (5,7,8), chicken (9,10), Ciona intestinalis (11), Drosophila melanogaster (1215), Caenorhabditis elegans (1618) and Arabidopsis thaliana (1922). However, the analysis of these massive and heterogeneous deep sequencing data sets poses several challenges, including effective data mapping, annotation and visualization; efficient data storage and retrieval; integration and interpretation of data from multiple technological platforms, tissues and cell lines; and customizing the analysis so that a variety of biological questions can be addressed. Although the above-mentioned studies have targeted some of these individual steps in a specific genome, an integrated database that can meet all these basic needs for deep sequencing data is not yet available for animal and plant genomes.

Recent studies have shown that many small RNAs derived from annotated genomic elements, such as long ncRNAs, transcription start sites (TSSs) and transposable elements (TEs), can modulate diverse biological functions (6,2329), raising the possibility that a large group of small RNAs originating from annotated genomic elements may still be hiding in eukaryotic genomes (6,2329). However, in the past, sequence reads mapped to non-miRNA or non-piRNA gene families have been routinely discarded and not analyzed further. Intriguingly, a large number of highly abundant small RNAs derived from known ncRNAs often span the entire RNA locus, indicating that we not only can recapitulate known ncRNAs but also can identify novel ncRNAs by grouping these nearby small RNAs into clusters.

In this study, we describe the newly developed deepBase database for the comprehensive annotation and mining of deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms (Figure 1). deepBase contains millions of small RNAs derived from known ncRNAs, protein-coding genes and repeat elements, as well as a massive number of unannotated small RNAs. In addition, we report about 1.2 million RNA clusters that include multiple classes of infrastructural ncRNAs (e.g. tRNAs, rRNAs, snRNA and snoRNAs), miRNAs, piRNA precursors and repeat-associated siRNA precursors, as well as numerous novel ncRNAs, some of which can be predicted as novel miRNAs and snoRNAs by miRDeep (5) and our snoSeeker programme (30). Finally, deepBase provides an integrative, interactive and versatile web graphical interface to display these data and facilitate transcriptomic research and the discovery of novel ncRNAs.

The basic framework of deepBase. All results generated by deepBase are deposited in relational databases and displayed in the visual browser and web page. The web-interface programmes and browser can be accessed by a wide range of research biologists to analyze and visualize data over the internet.
Figure 1.

The basic framework of deepBase. All results generated by deepBase are deposited in relational databases and displayed in the visual browser and web page. The web-interface programmes and browser can be accessed by a wide range of research biologists to analyze and visualize data over the internet.

MATERIALS AND METHODS

One hundred and eighty-five small-RNA libraries from diverse tissues and cell lines from seven organisms were compiled from 34 related studies (Supplementary Data) and downloaded from the NCBI GEO website (31). Known ncRNAs were downloaded from Ensembl (32) or UCSC bioinformatics website (33), or were obtained from the literature. All known miRNAs were downloaded from miRBase [release 13.0, (34)]. Human and Arabidopsis snoRNAs were downloaded from snoRNABase (35) and the Plant snoRNA Database (36), respectively. All known tRNAs were downloaded from the Genomic tRNA Database (37). All refSeq genes and repeat elements (38,39) for animal genomes were downloaded from the UCSC bioinformatics website (33). Human (UCSC hg18), mouse (NCBI Build 37), chicken (Gallus gallus, v2.1), C. intestinalis (JGI v2.0), D. melanogaster (BDGP Release 5) and C. elegans (WS190) genome sequences were downloaded from the UCSC bioinformatics website (33). Arabidopsis (TAIR8) genome sequences, repeat elements and protein-coding genes were downloaded from the TAIR website (40).

All deep sequencing data downloaded from the NCBI GEO database is in SOFT format, and some raw data included 3′ adapters or barcodes. If the raw data included 3′ adapters or barcodes, we clipped the reads using our in-house Perl scripts. Upon removal of adapters, the sequences shorter than 15 nt were discarded. The low-complexity reads were also discarded (41). All unique reads without adapters in each library were mapped to the seven genomes using Bowtie (version 0.9.9.3) (42) with options: −f −k 200 −v 0, and only perfect matches over their entire length were set aside. Specifying the parameters (−k 200 –v 0) instructs Bowtie to report up to 200 perfect hits for each read (42). Together with all mapped reads in each library, we found a total of ∼14.6 million unique reads that perfectly mapped to more than 284 million genomic loci. Finally, up to 50 perfect hits to each genome were considered per query read in the subsequent analysis. Considering mismatch is not allowed between the genome and the small RNA reads in deepBase, current deepBase does not contains isomiRs with at least one mismatch to the genome (4). These mismatches are usually generated by adding the untemplated nucleotide to the 3′-terminal of miRNAs (4) or RNA editing (4,43,44). The large amount of data that is generated and that needs to be analyzed in such a large-scale screen requires appropriate computational means for storage and processing. For this task, a MySQL database was created to store the mapped reads.

We define an RNA cluster as a group of small RNAs in which each small RNA is ≤70 nt from its nearest neighbour and whose cluster length is ≥45 nt. These parameters were determined based on our statistics for the distribution of the distance between two nearest neighbouring reads that mapped to known ncRNAs (Supplementary Data). Our analysis revealed that more than 92% of the known ncRNA precursors can be grouped into clusters (Supplementary Data). RNAfold (45) was applied to predict the RNA secondary structures of ncRNAs and RNA clusters.

Novel miRNA and snoRNA candidates were predicted from deep sequencing data by a modified miRDeep (5) and snoSeeker (30), respectively. RNA cluster sequences, extended by an additional 100 nt in both the 5′- and 3′-directions for each of the species, were extracted as the snoSeeker input data set. We applied the snoSeeker programme (30) to these RNA clusters with the following options: guide C/D ≥ 37.5 bits, orphan C/D ≥ 26.5 bits, guide H/ACA ≥ 40 bits and orphan H/ACA ≥ 27.0 bits. The novel snoRNA candidates that significantly overlapped with exons, repeat elements or other known ncRNAs were discarded. Novel miRNA candidates were predicted from deep sequencing data by a modification of the miRDeep programme (5) with default option scores. To improve search speed of miRDeep, we introduced the following modifications: (i) the sequence reads were mapped to the genome using Bowtie (42), rather than BLAST (41), and (ii) the sequences were extracted from the huge genomes using our fetchSeq programme (the programme are available from the authors upon request), which was written in C language.

Relative expression analysis was sought to determine the expression preferences of individual miRNA and ncRNA across all small RNA libraries. The number of reads matching a particular ncRNA was calculated. Each ncRNA count from each library was normalized to the total read number for that library. The normalized count of a particular ncRNA in a particular library was divided by the sum of normalized count for that ncRNA across all libraries. Those normalized counts were transformed to 100 percentiles, and each bar in heatmap represents the normalized level. Except the miRNAs, the heatmap reflects a rough measure of ncRNA total expression because most of the reads mapped to the other ncRNA species might be the degenerated products.

deepBase DATABASE

Annotation and identification of about 380 000 nasRNAs from millions of deep sequencing reads

Recent studies have shown that many small RNAs generated from long ncRNAs by specific biogenesis pathways can modulate and silence gene expression, indicating that further investigation of these small RNA data sets is worthwhile for discovering novel functional small RNAs (23,24,46). Moreover, miRNA-offset RNAs (moRs) generated from 60-nt pre-miRNAs have been identified in C. intestinalis, suggesting an intrinsic property of the miRNA processing machinery (11). In this study, all mapped sequences were intersected against all types of long ncRNAs, including miRNA precursors (miRBase v13), snoRNAs, tRNAs, rRNAs, snRNAs, scRNAs, Mt_tRNAs and misc_RNAs. A total of ∼58 800 unique reads and ∼380 000 unique ncRNA-associated small RNAs (nasRNAs) originated from 2013 miRNA precursors and the other 9719 known long ncRNAs (Table 1), respectively. All reads overlapping these RNA genes were stored in the MySQL database for searching and browsing in deepBase.

Table 1.

Statistics in deepBase

HumanMouseChickenC. intestinalisD. melanogasterC. elegansArabidopsis
small RNA library96344312549
Unique read1 456 5371 490 531137 801340 8792 522 2893 156 8215 478 930
Locus number22 437 894215 546 228782 4883 590 20819 760 5637 402 05714 613 634
nasRNA49 70399 65710 370544863 565137 90412 507
pasRNA62 791105 413563346 411142 645459 139697 750
easRNA160 347354 52466661687751 7281 990 763674 086
rasRNA616 070658 476809934 3001 409 439293 6582 907 928
RNA cluster151 245538 138880162 58377 113215 226114 235
Predicted miRNA705588275/134336/
Predicted snoRNA378603124263145197179
HumanMouseChickenC. intestinalisD. melanogasterC. elegansArabidopsis
small RNA library96344312549
Unique read1 456 5371 490 531137 801340 8792 522 2893 156 8215 478 930
Locus number22 437 894215 546 228782 4883 590 20819 760 5637 402 05714 613 634
nasRNA49 70399 65710 370544863 565137 90412 507
pasRNA62 791105 413563346 411142 645459 139697 750
easRNA160 347354 52466661687751 7281 990 763674 086
rasRNA616 070658 476809934 3001 409 439293 6582 907 928
RNA cluster151 245538 138880162 58377 113215 226114 235
Predicted miRNA705588275/134336/
Predicted snoRNA378603124263145197179

Statistics indicating the numbers of small RNA library, unique read mapped to one or more loci, locus number, ncRNA-associated small RNAs (nasRNAs), promoter-associated small RNAs (pasRNAs), exon-associated small RNAs (easRNAs), repeat-associated small RNAs (rasRNAs), RNA cluster, predicted miRNAs and snoRNAs for the seven organisms, including human, mouse, chicken, C. intestinalis, D. melanogaster, C. elegans and Arabidopsis. Arabidopsis miRNA data are not present in the table because miRDeep (5) cannot effectively predict plant miRNAs. C. intestinalis miRNAs have been predicted previously by miRDeep (11).

Table 1.

Statistics in deepBase

HumanMouseChickenC. intestinalisD. melanogasterC. elegansArabidopsis
small RNA library96344312549
Unique read1 456 5371 490 531137 801340 8792 522 2893 156 8215 478 930
Locus number22 437 894215 546 228782 4883 590 20819 760 5637 402 05714 613 634
nasRNA49 70399 65710 370544863 565137 90412 507
pasRNA62 791105 413563346 411142 645459 139697 750
easRNA160 347354 52466661687751 7281 990 763674 086
rasRNA616 070658 476809934 3001 409 439293 6582 907 928
RNA cluster151 245538 138880162 58377 113215 226114 235
Predicted miRNA705588275/134336/
Predicted snoRNA378603124263145197179
HumanMouseChickenC. intestinalisD. melanogasterC. elegansArabidopsis
small RNA library96344312549
Unique read1 456 5371 490 531137 801340 8792 522 2893 156 8215 478 930
Locus number22 437 894215 546 228782 4883 590 20819 760 5637 402 05714 613 634
nasRNA49 70399 65710 370544863 565137 90412 507
pasRNA62 791105 413563346 411142 645459 139697 750
easRNA160 347354 52466661687751 7281 990 763674 086
rasRNA616 070658 476809934 3001 409 439293 6582 907 928
RNA cluster151 245538 138880162 58377 113215 226114 235
Predicted miRNA705588275/134336/
Predicted snoRNA378603124263145197179

Statistics indicating the numbers of small RNA library, unique read mapped to one or more loci, locus number, ncRNA-associated small RNAs (nasRNAs), promoter-associated small RNAs (pasRNAs), exon-associated small RNAs (easRNAs), repeat-associated small RNAs (rasRNAs), RNA cluster, predicted miRNAs and snoRNAs for the seven organisms, including human, mouse, chicken, C. intestinalis, D. melanogaster, C. elegans and Arabidopsis. Arabidopsis miRNA data are not present in the table because miRDeep (5) cannot effectively predict plant miRNAs. C. intestinalis miRNAs have been predicted previously by miRDeep (11).

Annotation and identification of abundant pasRNAs and easRNAs

A new class of transcripts were recently reported to originate near the expected TSSs upstream of protein-coding sequences (6,2527,29). The existence of these promoter-associated small RNAs (pasRNAs) challenges our simplistic models of how the DNA sequences known as ‘promoters’ define TSSs (28). Moreover, many endogenous small interfering RNAs (endo-siRNAs) derived from protein-coding regions modulate gene expression and silencing (47,48). Thus, a genome-wide investigation of all of these small RNAs remains desirable due to the light it could shed on their biogenesis and function. In this study, all mapped reads were also intersected against the known refSeq genes and the upstream 350 nucleotides and downstream 150 nucleotides of TSSs. Those mapped reads overlapping TSSs were designated as pasRNAs (49,50). We divided the small RNAs overlapping with exons into sense and antisense exon-associated small RNAs (easRNAs) according to their strand. A total of ∼1.5 million unique pasRNAs and ∼4.0 million unique easRNAs were identified from TSSs and protein-coding sequences, producing the most comprehensive database of pasRNAs to date (Table 1).

Annotation and identification of abundant rasRNAs

A major system that controls the activity of TEs in flies and vertebrates is mediated by Piwi-interacting RNAs (piRNAs), 24–30 nucleotide RNAs that are bound by Piwi-class effectors (5153). Previously, these piRNAs were grouped together based on their genomic location as repeat-associated small interfering RNAs (rasiRNAs) (5461). Recent studies have also shown that many small interfering RNAs (siRNAs) from TEs play important roles in plants, fungi, Drosophila and vertebrates (5461). To annotate and identify these repeat-associated small RNAs (rasRNAs), all mapped reads were also intersected with RepeatMasker annotations (38,39). These mapped small RNAs-overlapping repeats were divided into sense and antisense rasRNAs. A total of ∼3.0 million unique sense and ∼3.0 million unique antisense rasRNAs were identified from repeat elements, producing the most comprehensive database for rasRNAs to date (Table 1).

RNA clusters and novel ncRNA discovery

When we finished the annotation and identification of nasRNAs, we found that a large number of highly abundant ncRNA-associated small RNAs often span part of and even the entire RNA locus. Thus, an analysis of genomic clustering can be used to identify novel ncRNAs, hunt for hidden transcripts and determine whether small RNAs and clusters are differentially expressed in the sampled tissues. To cluster these small RNAs, we grouped all the mapped reads into about 1.2 million RNA clusters according to their distance (details in ‘Materials and Methods’ section). These clusters ranged in size from 45 nt to thousands of nt. All RNA clusters were intersected with known ncRNAs, and 1684 and 8364 RNA clusters were found to overlap known miRNAs and ncRNAs, respectively (Supplementary Data). Moreover, we found that 285 530 RNA clusters overlapped with the evolutionarily conserved elements generated by the PhastCons programme (62) in five organisms (Supplementary Data). These data suggest the possibility that a large group of novel ncRNAs, and perhaps even a novel class of ncRNAs, may still be hiding in eukaryotic genomes. To test the hypothesis, we applied a modified miRDeep (5) and our snoSeeker programmes (30) to the deep sequencing data and these RNA clusters (details in ‘Materials and Methods’ section). We identified 1161 novel miRNA and 857 novel snoRNA candidates, in addition to 877 known miRNAs and 1032 known snoRNAs.

WEB INTERFACE

deepBase provides a variety of interfaces and graphical visualization to facilitate analysis of the massive and heterogeneous small RNA data sets from different tissues, cell lines and technology platforms. We have also developed a new visualization tool, deepView genome browser, to provide a quick overview of a particular region in the genome and for visually correlating various types of features (Figure 2, Supplementary Data). The deepView browser in deepBase provides an integrated view of mapped reads, known and predicted ncRNAs, protein-coding genes and RNA clusters and their expression peaks (Figure 2, Supplementary Data). Clicking a prediction or gene of interest launches a multiple-alignment trace viewer that displays all traces of genes or links to external resources such as NCBI, UCSC, miRBase and TAIR to obtain more comprehensive information. The libView browsers provide the graphical comparisons of multiple libraries for the distribution of length and 5′-terminal nucleotide of small RNAs (Supplementary Data). We also provide the nasView graphical browser to facilitate the comparisons of multiple small RNA libraries of ncRNAs, including miRNAs, snoRNAs, tRNAs, rRNAs, snRNAs, scRNAs, Mt_tRNAs and misc_RNAs (Supplementary Data). The expression profiles for ncRNAs are also provided to test for differential expression pattern among different tissues and cell lines (Supplementary Data). For small RNAs derived from diverse RNAs, RNA clusters and predicted ncRNAs, the database provides the sequence, genomic location, RNA secondary structures, references and annotations.

Snapshot of the deepView browser. (a) The controls directly underneath position the browser over a specific region in the genome. (b) RNA genes from Ensembl or the literature. (c) refSeq Gene. (d) microRNA gene from miRBase v13. (e) RNA clusters generated by this study. (f) The predicted snoRNAs from deep sequencing data using snoSeeker. (g) The predicted miRNA genes from deep sequencing data using miRDeep. (h) Strand-specific cluster expression peak (mapped small RNA density) generated for diverse tissues and cell lines. (i) Reads mapped to the genome.
Figure 2.

Snapshot of the deepView browser. (a) The controls directly underneath position the browser over a specific region in the genome. (b) RNA genes from Ensembl or the literature. (c) refSeq Gene. (d) microRNA gene from miRBase v13. (e) RNA clusters generated by this study. (f) The predicted snoRNAs from deep sequencing data using snoSeeker. (g) The predicted miRNA genes from deep sequencing data using miRDeep. (h) Strand-specific cluster expression peak (mapped small RNA density) generated for diverse tissues and cell lines. (i) Reads mapped to the genome.

deepBase provides a variety of search functions, including keyword function for searching small RNA, ncRNA and RNA cluster information, and a BLAST (41) function for performing searches against sets of small RNA sequences. The search results are linked to the full database records.

DISCUSSION AND CONCLUSIONS

By mapping and annotating ∼66 million unique sequences derived from 185 small RNA libraries of diverse tissues and cell lines from seven organisms (Supplementary Data), we have provided a comprehensive integrated map of the diverse small RNAs, including miRNAs, piRNAs, endo-siRNAs, nasRNAs, pasRNAs, easRNAs and rasRNAs, in these genomes. In addition to recapitulating known small RNAs, we provide enhanced resolution and novel findings owing to the integration of the large number of small RNA libraries of diverse tissues and cell lines. Moreover, the ∼1.2 million RNA clusters identified in this study have shown an extensive and complex transcriptional map in the seven genomes.

Our initial analysis of these RNA clusters reveals that (i) these clusters cover thousands of known ncRNAs and protein exons (Supplementary Data) and (ii) additional members of known ncRNA (miRNA and snoRNA) families were identified from deep sequencing data using miRDeep (5) and snoSeeker (30). However, the most intriguing result of our study is the numerous predicted RNA clusters that could not be assigned to known annotated RNAs. Some of these overlapped with the evolutionarily conserved phastCons elements (62), indicating their important functions. By contrast, many of these RNA clusters might not be functional, but rather ‘junk’ RNA generated as a by-product of cellular activities. To determine whether these RNA clusters are evidence of important new biochemical pathways, it will ultimately be necessary to test their function by new experimental or computational methods. Nevertheless, our findings indicate that future investigation of the RNA clusters is worthwhile for discovering novel ncRNAs and even novel ncRNA classes.

In comparison to the other databases related to deep sequencing small RNA data sets including FANTOM4 (29,63) and Gene Expression Omnibus (GEO) Short Read Archive (SRA) (31), deepBase aims on the mapping, annotation, mining and visualization of deep sequencing data from multiple technological platforms, tissues and cell lines of different organisms, and customizing the analysis so that a variety of biological questions can be addressed. The GEO SRA mainly offers the submission, storage and retrieval of deep sequencing data (31), whereas the FANTOM4 currently provides a genome browser for displaying all their own data and only contains the deep sequencing data from a human monocytic cell line THP-1 (29,63). Finally, the data and the integrative, interactive and versatile display provided by the deepBase database will aid future experimental and computational studies in the discovery of novel ncRNAs and transcriptomes.

FUTURE DIRECTIONS

Next-generation sequencing technologies have played a vital role in improving our understanding of functional genomics. As new genome builds and genome-wide high-throughput deep sequencing data from different species, cell lines, tissues and conditions become available, we will continuously maintain and update the database. The Automatic Mapping, Annotating and Mining Tools (AutoMAMT) in deepBase are run in our high-performance computer servers. Indeed, we have updated the deepBase for human genome (hg19 version) using AutoMAMT. At present, deepBase has integrated additional 52 small RNA libraries which are annotated and mapped to the latest human assemble version (hg19). We will continue to extend the volume on the current disk and improve the performance of our computer servers for storing the new sequencing data. The stand-alone graphical user interface (GUI) softwares (http://deepbase.sysu.edu.cn/deepTools.php) will be continuously released in deepBase. Bench biologists can use these stand-alone GUI softwares to manipulate and analyze their own data or data downloaded from deepBase locally on personal computers. The integration of transcriptome datasets from the deepBase database with other deep sequencing research (13), such as genomic mRNA-Seq, methylC-Seq and ChIP-Seq, will contribute to functional annotation of the genome and to a deeper understanding of genomic and cellular dynamics and features.

AVAILABILITY

deepBase is freely available at http://deepbase.sysu.edu.cn/. The deepBase data files can be freely downloaded and used according to the GNU Public License.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

ACKNOWLEDGEMENTS

The authors are grateful to Dr Daniel Gautheret for his useful communications

FUNDING

National Natural Science Foundation of China (No. 30830066, 30771151, 30900820); National Basic Research Program (No. 2005CB724600) from the Ministry of Science and Technology of China, the funds from the Ministry of Education of China and Guangdong Province (No. IRT0447, NSF05200303, 9451027501002591); China Postdoctoral Science Foundation (No. 4109898); Young Teacher Fund of Sun Yat-sen University (No. 3171917). Funding for open access charge: National Basic Research Program (No. 2005CB724600) from the Ministry of Science and Technology of China.

Conflict of interest statement. None declared.

REFERENCES

1.

Lister
R
,
Gregory
BD
,
Ecker
JR
.
Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond
.
Curr. Opin. Plant Biol.
(
2009
)
12
:
107
118
.

2.

Mardis
ER
.
Next-generation DNA sequencing methods
.
Annu. Rev. Genomics Hum. Genet.
(
2008
)
9
:
387
402
.

3.

Mardis
ER
.
The impact of next-generation sequencing technology on genetics
.
Trends Genet.
(
2008
)
24
:
133
141
.

4.

Morin
RD
,
O'Connor
MD
,
Griffith
M
,
Kuchenbauer
F
,
Delaney
A
,
Prabhu
AL
,
Zhao
Y
,
McDonald
H
,
Zeng
T
,
Hirst
M
, et al.
Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells
.
Genome Res.
(
2008
)
18
:
610
621
.

5.

Friedlander
MR
,
Chen
W
,
Adamidi
C
,
Maaskola
J
,
Einspanier
R
,
Knespel
S
,
Rajewsky
N
.
Discovering microRNAs from deep sequencing data using miRDeep
.
Nature Biotechnol.
(
2008
)
26
:
407
415
.

6.

Seila
AC
,
Calabrese
JM
,
Levine
SS
,
Yeo
GW
,
Rahl
PB
,
Flynn
RA
,
Young
RA
,
Sharp
PA
.
Divergent transcription from active promoters
.
Science
(
2008
)
322
:
1849
1851
.

7.

Lau
NC
,
Seto
AG
,
Kim
J
,
Kuramochi-Miyagawa
S
,
Nakano
T
,
Bartel
DP
,
Kingston
RE
.
Characterization of the piRNA complex from rat testes
.
Science
(
2006
)
313
:
363
367
.

8.

Babiarz
JE
,
Ruby
JG
,
Wang
Y
,
Bartel
DP
,
Blelloch
R
.
Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs
.
Genes Dev.
(
2008
)
22
:
2773
2785
.

9.

Glazov
EA
,
Cottee
PA
,
Barris
WC
,
Moore
RJ
,
Dalrymple
BP
,
Tizard
ML
.
A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach
.
Genome Res.
(
2008
)
18
:
957
964
.

10.

Rathjen
T
,
Pais
H
,
Sweetman
D
,
Moulton
V
,
Munsterberg
A
,
Dalmay
T
.
High throughput sequencing of microRNAs in chicken somites
.
FEBS Lett.
(
2009
)
583
:
1422
1426
.

11.

Shi
W
,
Hendrix
D
,
Levine
M
,
Haley
B
.
A distinct class of small RNAs arises from pre-miRNA-proximal regions in a simple chordate
.
Nat. Struct. Mol. Biol.
(
2009
)
16
:
183
189
.

12.

Ruby
JG
,
Stark
A
,
Johnston
WK
,
Kellis
M
,
Bartel
DP
,
Lai
EC
.
Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs
.
Genome Res.
(
2007
)
17
:
1850
1864
.

13.

Kawamura
Y
,
Saito
K
,
Kin
T
,
Ono
Y
,
Asai
K
,
Sunohara
T
,
Okada
TN
,
Siomi
MC
,
Siomi
H
.
Drosophila endogenous small RNAs bind to Argonaute 2 in somatic cells
.
Nature
(
2008
)
453
:
793
797
.

14.

Chung
WJ
,
Okamura
K
,
Martin
R
,
Lai
EC
.
Endogenous RNA interference provides a somatic defense against Drosophila transposons
.
Curr. Biol.
(
2008
)
18
:
795
802
.

15.

Czech
B
,
Malone
CD
,
Zhou
R
,
Stark
A
,
Schlingeheyde
C
,
Dus
M
,
Perrimon
N
,
Kellis
M
,
Wohlschlegel
JA
,
Sachidanandam
R
, et al.
An endogenous small interfering RNA pathway in Drosophila
.
Nature
(
2008
)
453
:
798
802
.

16.

Ruby
JG
,
Jan
C
,
Player
C
,
Axtell
MJ
,
Lee
W
,
Nusbaum
C
,
Ge
H
,
Bartel
DP
.
Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C., elegans
.
Cell
(
2006
)
127
:
1193
1207
.

17.

Batista
PJ
,
Ruby
JG
,
Claycomb
JM
,
Chiang
R
,
Fahlgren
N
,
Kasschau
KD
,
Chaves
DA
,
Gu
W
,
Vasale
JJ
,
Duan
S
, et al.
PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C., elegans
.
Mol. Cell
(
2008
)
31
:
67
78
.

18.

Kato
M
,
de Lencastre
A
,
Pincus
Z
,
Slack
FJ
.
Dynamic expression of small non-coding RNAs, including novel microRNAs and piRNAs/21U-RNAs, during Caenorhabditis elegans development
.
Genome Biol.
(
2009
)
10
:
R54
.

19.

Axtell
MJ
,
Jan
C
,
Rajagopalan
R
,
Bartel
DP
.
A two-hit trigger for siRNA biogenesis in plants
.
Cell
(
2006
)
127
:
565
577
.

20.

Rajagopalan
R
,
Vaucheret
H
,
Trejo
J
,
Bartel
DP
.
A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana
.
Genes Dev.
(
2006
)
20
:
3407
3425
.

21.

Kasschau
KD
,
Fahlgren
N
,
Chapman
EJ
,
Sullivan
CM
,
Cumbie
JS
,
Givan
SA
,
Carrington
JC
.
Genome-wide profiling and analysis of Arabidopsis siRNAs
.
PLoS Biol.
(
2007
)
5
:
e57
.

22.

Backman
TW
,
Sullivan
CM
,
Cumbie
JS
,
Miller
ZA
,
Chapman
EJ
,
Fahlgren
N
,
Givan
SA
,
Carrington
JC
,
Kasschau
KD
.
Update of ASRP: the Arabidopsis small RNA Project database
.
Nucleic Acids Res.
(
2008
)
36
:
D982
D985
.

23.

Ender
C
,
Krek
A
,
Friedlander
MR
,
Beitzinger
M
,
Weinmann
L
,
Chen
W
,
Pfeffer
S
,
Rajewsky
N
,
Meister
G
.
A human snoRNA with microRNA-like functions
.
Mol. Cell
(
2008
)
32
:
519
528
.

24.

Lee
HC
,
Chang
SS
,
Choudhary
S
,
Aalto
AP
,
Maiti
M
,
Bamford
DH
,
Liu
Y
.
qiRNA is a new type of small interfering RNA induced by DNA damage
.
Nature
(
2009
)
459
:
274
277
.

25.

He
Y
,
Vogelstein
B
,
Velculescu
VE
,
Papadopoulos
N
,
Kinzler
KW
.
The antisense transcriptomes of human cells
.
Science
(
2008
)
322
:
1855
1857
.

26.

Sultan
M
,
Schulz
MH
,
Richard
H
,
Magen
A
,
Klingenhoff
A
,
Scherf
M
,
Seifert
M
,
Borodina
T
,
Soldatov
A
,
Parkhomchuk
D
, et al.
A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome
.
Science
(
2008
)
321
:
956
960
.

27.

Core
LJ
,
Waterfall
JJ
,
Lis
JT
.
Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters
.
Science
(
2008
)
322
:
1845
1848
.

28.

Buratowski
S
.
Transcription. Gene expression—where to start?
Science
(
2008
)
322
:
1804
1805
.

29.

Taft
RJ
,
Glazov
EA
,
Cloonan
N
,
Simons
C
,
Stephen
S
,
Faulkner
GJ
,
Lassmann
T
,
Forrest
AR
,
Grimmond
SM
,
Schroder
K
, et al.
Tiny RNAs associated with transcription start sites in animals
.
Nature Genet.
(
2009
)
41
:
572
578
.

30.

Yang
JH
,
Zhang
XC
,
Huang
ZP
,
Zhou
H
,
Huang
MB
,
Zhang
S
,
Chen
YQ
,
Qu
LH
.
snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome
.
Nucleic Acids Res.
(
2006
)
34
:
5112
5123
.

31.

Barrett
T
,
Troup
DB
,
Wilhite
SE
,
Ledoux
P
,
Rudnev
D
,
Evangelista
C
,
Kim
IF
,
Soboleva
A
,
Tomashevsky
M
,
Edgar
R
.
NCBI GEO: mining tens of millions of expression profiles—database and tools update
.
Nucleic Acids Res.
(
2007
)
35
:
D760
D765
.

32.

Hubbard
TJ
,
Aken
BL
,
Ayling
S
,
Ballester
B
,
Beal
K
,
Bragin
E
,
Brent
S
,
Chen
Y
,
Clapham
P
,
Clarke
L
, et al.
Ensembl 2009
.
Nucleic Acids Res.
(
2009
)
37
:
D690
D697
.

33.

Kuhn
RM
,
Karolchik
D
,
Zweig
AS
,
Wang
T
,
Smith
KE
,
Rosenbloom
KR
,
Rhead
B
,
Raney
BJ
,
Pohl
A
,
Pheasant
M
, et al.
The UCSC Genome Browser Database: update 2009
.
Nucleic Acids Res.
(
2009
)
37
:
D755
D761
.

34.

Griffiths-Jones
S
,
Saini
HK
,
van Dongen
S
,
Enright
AJ
.
miRBase: tools for microRNA genomics
.
Nucleic Acids Res.
(
2008
)
36
:
D154
D158
.

35.

Lestrade
L
,
Weber
MJ
.
snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs
.
Nucleic Acids Res.
(
2006
)
34
:
D158
D162
.

36.

Brown
JW
,
Echeverria
M
,
Qu
LH
,
Lowe
TM
,
Bachellerie
JP
,
Huttenhofer
A
,
Kastenmayer
JP
,
Green
PJ
,
Shaw
P
,
Marshall
DF
.
Plant snoRNA database
.
Nucleic Acids Res.
(
2003
)
31
:
432
435
.

37.

Chan
PP
,
Lowe
TM
.
GtRNAdb: a database of transfer RNA genes detected in genomic sequence
.
Nucleic Acids Res.
(
2009
)
37
:
D93
D97
.

38.

Jurka
J
.
Repbase update: a database and an electronic journal of repetitive elements
.
Trends Genet.
(
2000
)
16
:
418
420
.

39.

Smit
AFA
,
Hubley
R
,
Green
P
. (
1996–2007
) (2 November 2009, date last accessed). RepeatMasker Open-3.0. http://www.repeatmasker.org.

40.

Swarbreck
D
,
Wilks
C
,
Lamesch
P
,
Berardini
TZ
,
Garcia-Hernandez
M
,
Foerster
H
,
Li
D
,
Meyer
T
,
Muller
R
,
Ploetz
L
, et al.
The Arabidopsis Information Resource (TAIR): gene structure and function annotation
.
Nucleic Acids Res.
(
2008
)
36
:
D1009
D1014
.

41.

Altschul
SF
,
Madden
TL
,
Schaffer
AA
,
Zhang
J
,
Zhang
Z
,
Miller
W
,
Lipman
DJ
.
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
.
Nucleic Acids Res.
(
1997
)
25
:
3389
3402
.

42.

Langmead
B
,
Trapnell
C
,
Pop
M
,
Salzberg
SL
.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
.
Genome Biol.
(
2009
)
10
:
R25
.

43.

Kawahara
Y
,
Zinshteyn
B
,
Sethupathy
P
,
Iizasa
H
,
Hatzigeorgiou
AG
,
Nishikura
K
.
Redirection of silencing targets by adenosine-to-inosine editing of miRNAs
.
Science
(
2007
)
315
:
1137
1140
.

44.

Reid
JG
,
Nagaraja
AK
,
Lynn
FC
,
Drabek
RB
,
Muzny
DM
,
Shaw
CA
,
Weiss
MK
,
Naghavi
AO
,
Khan
M
,
Zhu
H
, et al.
Mouse let-7 miRNA populations exhibit RNA editing that is constrained in the 5′-seed/cleavage/anchor regions and stabilize predicted mmu-let-7a: mRNA duplexes
.
Genome Res.
(
2008
)
18
:
1571
1581
.

45.

Hofacker
IL
.
Vienna RNA secondary structure server
.
Nucleic Acids Res.
(
2003
)
31
:
3429
3431
.

46.

Gabel
HW
,
Ruvkun
G
.
The exonuclease ERI-1 has a conserved dual role in 5.8S rRNA processing and RNAi
.
Nat. Struct. Mol. Biol.
(
2008
)
15
:
531
533
.

47.

Okamura
K
,
Balla
S
,
Martin
R
,
Liu
N
,
Lai
EC
.
Two distinct mechanisms generate endogenous siRNAs from bidirectional transcription in Drosophila melanogaster
.
Nat. Struct. Mol. Biol.
(
2008
)
15
:
998
.

48.

Borsani
O
,
Zhu
J
,
Verslues
PE
,
Sunkar
R
,
Zhu
JK
.
Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis
.
Cell
(
2005
)
123
:
1279
1291
.

49.

Fejes-Toth
K
,
Kapranov
P
,
Foissac
SK
,
Sotirova
V
,
Sachidanandam
R
,
Willingham
AT
,
Duttagupta
R
,
Dumais
E
,
Hannon
GJ
,
Gingeras
TR
.
Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs
.
Nature
(
2009
)
457
:
1028
1032
.

50.

Kapranov
P
,
Cheng
J
,
Dike
S
,
Nix
DA
,
Duttagupta
R
,
Willingham
AT
,
Stadler
PF
,
Hertel
J
,
Hackermuller
J
,
Hofacker
IL
, et al.
RNA maps reveal new RNA classes and a possible function for pervasive transcription
.
Science
(
2007
)
316
:
1484
1488
.

51.

Klattenhoff
C
,
Theurkauf
W
.
Biogenesis and germline functions of piRNAs
.
Development
(
2008
)
135
:
3
9
.

52.

Aravin
AA
,
Hannon
GJ
,
Brennecke
J
.
The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race
.
Science
(
2007
)
318
:
761
764
.

53.

Lin
H
.
piRNAs in the germ line
.
Science
(
2007
)
316
:
397
.

54.

Llave
C
,
Kasschau
KD
,
Rector
MA
,
Carrington
JC
.
Endogenous and silencing-associated small RNAs in plants
.
Plant Cell
(
2002
)
14
:
1605
1619
.

55.

Reinhart
BJ
,
Weinstein
EG
,
Rhoades
MW
,
Bartel
B
,
Bartel
DP
.
MicroRNAs in plants
.
Genes Dev.
(
2002
)
16
:
1616
1626
.

56.

Aravin
AA
,
Lagos-Quintana
M
,
Yalcin
A
,
Zavolan
M
,
Marks
D
,
Snyder
B
,
Gaasterland
T
,
Meyer
J
,
Tuschl
T
.
The small RNA profile during Drosophila melanogaster development
.
Dev. Cell
(
2003
)
5
:
337
350
.

57.

Buhler
M
,
Spies
N
,
Bartel
DP
,
Moazed
D
.
TRAMP-mediated RNA surveillance prevents spurious entry of RNAs into the Schizosaccharomyces pombe siRNA pathway
.
Nat. Struct. Mol. Biol.
(
2008
)
15
:
1015
1023
.

58.

Tam
OH
,
Aravin
AA
,
Stein
P
,
Girard
A
,
Murchison
EP
,
Cheloufi
S
,
Hodges
E
,
Anger
M
,
Sachidanandam
R
,
Schultz
RM
, et al.
Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes
.
Nature
(
2008
)
453
:
534
538
.

59.

Watanabe
T
,
Totoki
Y
,
Toyoda
A
,
Kaneda
M
,
Kuramochi-Miyagawa
S
,
Obata
Y
,
Chiba
H
,
Kohara
Y
,
Kono
T
,
Nakano
T
, et al.
Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes
.
Nature
(
2008
)
453
:
539
543
.

60.

Ghildiyal
M
,
Seitz
H
,
Horwich
MD
,
Li
C
,
Du
T
,
Lee
S
,
Xu
J
,
Kittler
EL
,
Zapp
ML
,
Weng
Z
, et al.
Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells
.
Science
(
2008
)
320
:
1077
1081
.

61.

Faulkner
GJ
,
Kimura
Y
,
Daub
CO
,
Wani
S
,
Plessy
C
,
Irvine
KM
,
Schroder
K
,
Cloonan
N
,
Steptoe
AL
,
Lassmann
T
, et al.
The regulated retrotransposon transcriptome of mammalian cells
.
Nature Genet.
(
2009
)
41
:
563
571
.

62.

Siepel
A
,
Bejerano
G
,
Pedersen
JS
,
Hinrichs
AS
,
Hou
M
,
Rosenbloom
K
,
Clawson
H
,
Spieth
J
,
Hillier
LW
,
Richards
S
, et al.
Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes
.
Genome Res.
(
2005
)
15
:
1034
1050
.

63.

Suzuki
H
,
Forrest
AR
,
van Nimwegen
E
,
Daub
CO
,
Balwierz
PJ
,
Irvine
KM
,
Lassmann
T
,
Ravasi
T
,
Hasegawa
Y
,
de Hoon
MJ
, et al.
The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line
.
Nature Genet.
(
2009
)
41
:
553
562
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.