Logic programming to infer complex RNA expression patterns from RNA-seq data

Weirick, Tyler; Militello, Giuseppe; Ponomareva, Yuliya; John, David; Döring, Claudia; Dimmeler, Stefanie; Uchida, Shizuka

doi:10.1093/bib/bbw117

Abstract

To meet the increasing demand in the field, numerous long noncoding RNA (lncRNA) databases are available. Given many lncRNAs are specifically expressed in certain cell types and/or time-dependent manners, most lncRNA databases fall short of providing such profiles. We developed a strategy using logic programming to handle the complex organization of organs, their tissues and cell types as well as gender and developmental time points. To showcase this strategy, we introduce ‘RenalDB’ (http://renaldb.uni-frankfurt.de), a database providing expression profiles of RNAs in major organs focusing on kidney tissues and cells. RenalDB uses logic programming to describe complex anatomy, sample metadata and logical relationships defining expression, enrichment or specificity. We validated the content of RenalDB with biological experiments and functionally characterized two long intergenic noncoding RNAs: LOC440173 is important for cell growth or cell survival, whereas PAXIP1-AS1 is a regulator of cell death. We anticipate RenalDB will be used as a first step toward functional studies of lncRNAs in the kidney.

gene expression, kidney, lncRNA, microarray, RNA-seq

Introduction

A noncoding RNA (ncRNA) is any expressed transcript that is not translated into a protein. ncRNAs largely outnumber protein-coding transcripts and contain many sub-classes, such as ribosomal RNAs, transfer RNAs and microRNAs. Long noncoding RNAs (lncRNAs) are defined as any ncRNA longer than 200 nt [1]. This interest stems from their high abundance, importance to many biological functions and relatively small number of characterized lncRNAs, indicating a wealth of discoveries waiting to be found [2–9]. RNA sequencing (RNA-seq) is an essential tool for studying lncRNAs and is widely used to screen for lncRNAs. Furthermore, the majority of RNA-seq data from published studies is publicly and freely available. Thus, collections of these data can be re-analyzed to test hypotheses outside of the studies they were published in.

Once a set of RNA-seq data sets has been assembled, there are a number of methods for describing the expression specificity of RNAs. Each method falls into one of two categories. The first category describes whether a sequence is tissue specific or ubiquitously expressed. The second one describes how specific a sequence is to a certain tissue (referred to as global and relative specificity, respectively, in this article) [10]. The basic requirement for many methods for describing specificity is that the samples must be on the similar level of anatomical hierarchy. This is concerning considering some lncRNA’s expression profiles are known to be much more complicated than simply expressed in one tissue [11].

To meet with the above needs, we propose to extend the current methods with logic programming by allowing much greater logical nuance to be used when processing data. Logic programming is a programming paradigm based on formal logic and is composed of facts, rules and queries [12]. Here, we created sets of logic programming facts describing the metadata of RNA-seq samples and the anatomical organization of kidneys. We then created logic programming rules describing what expression, enrichment and specificity mean in these contexts. These facts and rules were used to preform logic programming queries to find the hierarchical expression, enrichment and specificity of RNAs within various RNA-seq data sets. To this end, we introduce a new relational database and Datalog knowledge base for nephrology called ‘RenalDB’ (http://renaldb.uni-frankfurt.de) to facilitate the needs of researchers working with kidneys and serve as an example of the logic programming technique for bioinformaticians. Additionally, logic programming is used within RenalDB to extend the utility of our SQL-based advanced search and to determine the layout of the hierarchical tree structures showing as expression heat maps in RenalDB. Furthermore, given that most of the available lncRNA databases were released without biological validation, we provide biological experiments validating the expressions of lncRNAs included in RenalDB as well as functional data to gain confidence of potential users of RenalDB.

Methods

The RenalDB database

The primary data analysis for RenalDB was preformed using the Snakemake workflow engine [13]. The Snakemake pipelines used in this study are provided at https://bitbucket.org/tweirick/renaldb. In the pipeline, RNA-seq data sets were downloaded from the NCBI Sequence Read Archive (SRA) as SRA files [14, 15]. Fastq-dump (version 2.1.7) was used to convert the SRA files to fastq files (http://www.ncbi.nlm.nih.gov/sra). STAR [16] (version 2.5.1b) was used to align the reads using the genome annotation files from the Ensembl database (http://www.ensembl.org/info/data/ftp/index.html; version 83). HTSeq [17] (version 0.6.1.p2) was used to extract read counts. Conditions in which <25% of sequences were expressed were discarded. For paired-end reads, gene-level and transcript-level counts were obtained. For single-end reads, only the gene-level counts were measured to avoid problems assigning counts to similar isoforms. DESeq2 [18] (version 3.2) was used to perform between-sample normalization on the read counts. Finally, the sequences were divided by the sequence’s effective length and scaled by multiplying by 1e3 for sequence length normalization, which is similar to the calculation for Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values. The calculation for FPKM values also includes the total library size in its denominator. However, this division was excluded owing to the previous between-sample normalization.

The information and analyzed RNA-seq data were stored in a MySQL database (Figure 1A). Datalog knowledge bases are shown in Figure 1B-D, which include examples of the codes describing kidney anatomy (Figure 1B), experiments and their corresponding accession IDs (Figure 1C) and relationships between anatomical objects and the corresponding information about their RNA expression patterns (Figure 1D). The web interface was built using the Django web framework and the Datalog processing handled by the pyDatalog package. Gene Ontology (GO) annotations were obtained from the GO annotations available via the BioMart Community Portal (www.biomart.org) [19]. For each GO term, a link to AmiGo 2 (http://amigo.geneontology.org/amigo) [20] is provided. In RenalDB, UGAHash accession system was used as primary IDs [9]. RenalDB will be updated twice a year to include the latest publicly available RNA-seq data sets.

Figure 1.

RenalDB. (A) The database schema for the relational database portion of RenalDB. (B–D) Examples of the knowledge bases used in RenalDB. Knowledge bases are separated based on the type of information they contain. (B) An example of the facts describing kidney anatomy. This series of facts describes the relationships (e.g. contains, develops from) among anatomical objects (e.g. organism, tissue, cell). (C) An example of the facts describing the experiments included in RenalDB. (D) An example of the rules describing how expression, enrichment and specificity are defined. These high-level logical statements then used as queries on the anatomical and experimental databases to determine whether the gene/transcript is expressed, enriched or specific to various anatomical objects.

Open in new tab Download slide

Culturing of cells, quantitative reverse transcription polymerase chain reaction and siRNAs

‘Human Embryonic Kidney 293’ (HEK-293) cells were cultured in the growth medium consisting of DMEM with low glucose and pyruvate (Life Technologies) supplemented with 10% FBS (Life Technologies), antibiotics (100 units of penicillin and 100 μg of streptomycin per ml, Sigma-Aldrich) at 37 °C in a humidified atmosphere containing 5% CO₂.

RNA was isolated with TRIzol reagent, purified and treated with TURBO DNase (Life Technologies) before reverse transcription. The primer pairs were designed using Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/) [21] and in silico validated with UCSC insilico polymerase chain reaction (PCR; https://genome.ucsc.edu/cgi-bin/hgPcr) before extensive testing by experiments for the existence of a single band of the expected size for each primer pair. The list of primer pairs used in this study can be found in Supplementary Table S1.

For human tissues, purified RNA was purchased from commercial vendors as follows: Human Total RNA Master Panel II (Clonetech, #636643, Lot Number 1202050A); and human heart (Amsbio, #R1234122-50, Lot Number A804058).

Transient transfection of siRNA duplexes (MISSION, Sigma-Aldrich; 10 nM and 100 nM final concentration for LOC440173 and PAXIP1-AS1, respectively; Supplementary Table S1) was carried out using RNAiMax (Life Technologies) according to the manufacturer’s protocol. The corresponding amount of control siRNA (MISSION Negative control SIC002, confidential sequence; Sigma-Aldrich) was used. Forty-eight hours after the transfection of siRNAs, cells were exposed to TRIzol to extract RNA.

After the purification and treatment of RNA with TURBO DNase (Life Technologies), 1 μg of RNA was reverse transcribed with SuperScript VILO Master Mix (Life Technologies). The first-strand cDNA was diluted to the concentration of 5 ng/μl. For quantitative reverse transcription polymerase chain reaction (qRT-PCR), 1 μl (5 ng) of the cDNA template was used with Fast SYBR Green Master Mix (Life Technologies) via StepOne Plus Real-Time PCR System (Applied Biosystem) with the following thermal cycling condition: 95 °C for 20 s followed by 40 cycles of 95 °C for 3 s and 60 °C for 30 s. Relative fold expression was calculated by 2^–ΔΔCt using Gapdh as an internal control.

Cell viability assay

A total of 200 000 HEK-293 cells were plated in each well of a six-well plate. On the following day, siRNAs were transfected. Twenty-four hours after the transfection, hydrogen peroxide (Sigma Aldrich, #H1009) was added at the final concentration of 50µM. The next day, cells were detached, stained with Trypan Blue (Sigma Aldrich, cat. T8154) and counted by Neubauer Chamber.

Microarray experiments and data analysis

GeneChip®Human Gene 1.0 ST Arrays (Affymetrix) were used according to the manufacturer's protocol and scanned. The CEL files were analyzed through the updated version of noncoder web interface (http://noncoder.mpi-bn.mpg.de) [22] using the pipeline setup for Gene Array Analyzer web interface (http://gaa.mpi-bn.mpg.de) [23]. After the normalization by Robust Multi-array Average [24] and the application of moderate t-statistics via the Limma package [25], Transcript Cluster IDs that do not match to a gene or that match to multiple genes were discarded. Then, a standard deviation is calculated across samples. For a gene that matches to multiple Transcript Cluster IDs, the Transcript Cluster ID with the highest standard deviation across samples was kept for further analysis.

All the microarray data in this study were deposited in the Gene Expression Omnibus (GSE74325). The analyzed data can be accessed via our noncoder web interface (http://noncoder.mpi-bn.mpg.de/) [22] using ‘Kidney’ as the user name and password.

GO analyses were performed using DAVID (https://david.ncifcrf.gov/home.jsp) [26].

Statistics

Data are presented as mean ± SEM. Two-sample, two-tailed, heteroscedastic Student’s t-test was performed to calculate a p-value via Microsoft Excel.

Results

Survey of public lncRNA databases with expression data

Increasing research interests in the field of lncRNAs have prompted the building of databases to cover the expression profiles of lncRNAs in various conditions and organisms. Currently, there are 19 public databases that contain the expressions of lncRNAs (Table 1). These are ALDB [27], ANGIOGENES [28], C-It-Loci [8], ChIPBase [29], Co-LncRNA [30], deepBase v2.0 [31], Expression Atlas [32], GEO Profiles [33], LncBase v.2 [34], Human Body Map long intergenic noncoding RNAs (lincRNAs) [35], lncRNA2function [36], lncRNAdb v2.0 [37], lncRNAMap [38], lncRNAtor [39], MTD [40], NONCODE 2016 [41], NRED [42], TANRIC [43] and TF2lncRNA [44]. Of note, some lncRNA-focused databases, such as LNCipedia [45], were not included in Table 1 because they do not contain expression profiles. The general trend of these public databases is to provide a comprehensive view of the expressions of lncRNAs in various conditions. In all databases except NRED, the expression profiles are based on RNA-seq data, as only few types of microarrays are designed for lncRNAs [22]. Most of the public databases (indicated by * in Table 1) are designed to provide the expression profiles of protein-coding genes as well. The availability of expression profiles of protein-coding genes is useful, as such expressions could be used as a validation for a certain expression pattern (e.g. guilt-by-association for tissue specificity). Furthermore, by the inclusion of protein-coding genes, GO terms can be used to infer the possible biological functions of lncRNA by its co-expression to protein-coding genes as in the case for Co-LncRNA, lncRNA2function and lncRNAtor.

Table 1.

List of public databases of lncRNAs with their expression profiles

Database Name	Organism(s)	Samples	Technology	URL
ALDB^a	Chicken, cow, pig	Tissues	RNA-seq	http://res.xaut.edu.cn/aldb/index.jsp
ANGIOGENES^a	Human, mouse, zebrafish	Cell lines, tissues	RNA-seq	http://angiogenes.uni-frankfurt.de
C-It-Loci^a	Human, mouse, zebrafish	Tissues	RNA-seq	http://c-it-loci.uni-frankfurt.de
ChIPBase^a	Human	Tissues	RNA-seq	http://deepbase.sysu.edu.cn/chipbase/index.php
Co-LncRNA^a	Human	Cancers, cell lines, tissues	RNA-seq	http://www.bio-bigdata.com/Co-LncRNA/
deepBase v2.0	Chicken, chimpanzee, cow, gorilla, fly, frog, human, monkey, mouse, opossum, platypus, rat, worm, zebrafish	Cell lines, tissues	RNA-seq	http://biocenter.sysu.edu.cn/deepBase/index.php
Expression Atlas^a	Many	Many	Many	https://www.ebi.ac.uk/gxa/home
GEO Profiles^a	Many	Many	Many	https://www.ncbi.nlm.nih.gov/geoprofiles/
LncBase v.2	Human, mouse	Cell lines, tissues	RNA-seq	http://carolina.imis.athena-innovation.gr/ index.php?r=lncbasev2
Human Body Map lincRNAs^a	Human	Tissues	FISH, RNA-seq	http://www.broadinstitute.org/genome_bio/ human_lincrnas/
lncRNA2function^a	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/lncrna2function/index.jsp
lncRNAdb v2.0	Human	Tissues	RNA-seq	http://www.lncrnadb.org
lncRNAMap	Human	Cancers, tissues	RNA-seq	http://lncrnamap.mbc.nctu.edu.tw/php/index.php
lncRNAtor^a	Fly, human, mouse, worm, zebrafish	Cancers, diseases, tissues	RNA-seq	http://lncrnator.ewha.ac.kr/index.htm
MTD^a	Human, pig, rat, mouse	cell lines, tissues	RNA-seq	http://mtd.cbi.ac.cn
NONCODE 2016	Human, mouse	Tissues	RNA-seq	http://www.noncode.org
NRED^a	Human, mouse	Cell lines, tissues	ISH, microarray	http://jsm-research.imb.uq.edu.au/nred/cgi-bin/ ncrnadb.pl
TANRIC	Human	Cancers, cell lines	RNA-seq	http://ibl.mdanderson.org/tanric/_design/basic/ index.html
TF2lncRNA	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/tf2lncrna/index.jsp

Database Name	Organism(s)	Samples	Technology	URL
ALDB^a	Chicken, cow, pig	Tissues	RNA-seq	http://res.xaut.edu.cn/aldb/index.jsp
ANGIOGENES^a	Human, mouse, zebrafish	Cell lines, tissues	RNA-seq	http://angiogenes.uni-frankfurt.de
C-It-Loci^a	Human, mouse, zebrafish	Tissues	RNA-seq	http://c-it-loci.uni-frankfurt.de
ChIPBase^a	Human	Tissues	RNA-seq	http://deepbase.sysu.edu.cn/chipbase/index.php
Co-LncRNA^a	Human	Cancers, cell lines, tissues	RNA-seq	http://www.bio-bigdata.com/Co-LncRNA/
deepBase v2.0	Chicken, chimpanzee, cow, gorilla, fly, frog, human, monkey, mouse, opossum, platypus, rat, worm, zebrafish	Cell lines, tissues	RNA-seq	http://biocenter.sysu.edu.cn/deepBase/index.php
Expression Atlas^a	Many	Many	Many	https://www.ebi.ac.uk/gxa/home
GEO Profiles^a	Many	Many	Many	https://www.ncbi.nlm.nih.gov/geoprofiles/
LncBase v.2	Human, mouse	Cell lines, tissues	RNA-seq	http://carolina.imis.athena-innovation.gr/ index.php?r=lncbasev2
Human Body Map lincRNAs^a	Human	Tissues	FISH, RNA-seq	http://www.broadinstitute.org/genome_bio/ human_lincrnas/
lncRNA2function^a	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/lncrna2function/index.jsp
lncRNAdb v2.0	Human	Tissues	RNA-seq	http://www.lncrnadb.org
lncRNAMap	Human	Cancers, tissues	RNA-seq	http://lncrnamap.mbc.nctu.edu.tw/php/index.php
lncRNAtor^a	Fly, human, mouse, worm, zebrafish	Cancers, diseases, tissues	RNA-seq	http://lncrnator.ewha.ac.kr/index.htm
MTD^a	Human, pig, rat, mouse	cell lines, tissues	RNA-seq	http://mtd.cbi.ac.cn
NONCODE 2016	Human, mouse	Tissues	RNA-seq	http://www.noncode.org
NRED^a	Human, mouse	Cell lines, tissues	ISH, microarray	http://jsm-research.imb.uq.edu.au/nred/cgi-bin/ ncrnadb.pl
TANRIC	Human	Cancers, cell lines	RNA-seq	http://ibl.mdanderson.org/tanric/_design/basic/ index.html
TF2lncRNA	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/tf2lncrna/index.jsp

^aThe databases that contain both protein-coding genes and lncRNAs.

FISH, fluorescent in situ hybridization; ISH, in situ hybridization.

Table 1.

List of public databases of lncRNAs with their expression profiles

Database Name	Organism(s)	Samples	Technology	URL
ALDB^a	Chicken, cow, pig	Tissues	RNA-seq	http://res.xaut.edu.cn/aldb/index.jsp
ANGIOGENES^a	Human, mouse, zebrafish	Cell lines, tissues	RNA-seq	http://angiogenes.uni-frankfurt.de
C-It-Loci^a	Human, mouse, zebrafish	Tissues	RNA-seq	http://c-it-loci.uni-frankfurt.de
ChIPBase^a	Human	Tissues	RNA-seq	http://deepbase.sysu.edu.cn/chipbase/index.php
Co-LncRNA^a	Human	Cancers, cell lines, tissues	RNA-seq	http://www.bio-bigdata.com/Co-LncRNA/
deepBase v2.0	Chicken, chimpanzee, cow, gorilla, fly, frog, human, monkey, mouse, opossum, platypus, rat, worm, zebrafish	Cell lines, tissues	RNA-seq	http://biocenter.sysu.edu.cn/deepBase/index.php
Expression Atlas^a	Many	Many	Many	https://www.ebi.ac.uk/gxa/home
GEO Profiles^a	Many	Many	Many	https://www.ncbi.nlm.nih.gov/geoprofiles/
LncBase v.2	Human, mouse	Cell lines, tissues	RNA-seq	http://carolina.imis.athena-innovation.gr/ index.php?r=lncbasev2
Human Body Map lincRNAs^a	Human	Tissues	FISH, RNA-seq	http://www.broadinstitute.org/genome_bio/ human_lincrnas/
lncRNA2function^a	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/lncrna2function/index.jsp
lncRNAdb v2.0	Human	Tissues	RNA-seq	http://www.lncrnadb.org
lncRNAMap	Human	Cancers, tissues	RNA-seq	http://lncrnamap.mbc.nctu.edu.tw/php/index.php
lncRNAtor^a	Fly, human, mouse, worm, zebrafish	Cancers, diseases, tissues	RNA-seq	http://lncrnator.ewha.ac.kr/index.htm
MTD^a	Human, pig, rat, mouse	cell lines, tissues	RNA-seq	http://mtd.cbi.ac.cn
NONCODE 2016	Human, mouse	Tissues	RNA-seq	http://www.noncode.org
NRED^a	Human, mouse	Cell lines, tissues	ISH, microarray	http://jsm-research.imb.uq.edu.au/nred/cgi-bin/ ncrnadb.pl
TANRIC	Human	Cancers, cell lines	RNA-seq	http://ibl.mdanderson.org/tanric/_design/basic/ index.html
TF2lncRNA	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/tf2lncrna/index.jsp

Database Name	Organism(s)	Samples	Technology	URL
ALDB^a	Chicken, cow, pig	Tissues	RNA-seq	http://res.xaut.edu.cn/aldb/index.jsp
ANGIOGENES^a	Human, mouse, zebrafish	Cell lines, tissues	RNA-seq	http://angiogenes.uni-frankfurt.de
C-It-Loci^a	Human, mouse, zebrafish	Tissues	RNA-seq	http://c-it-loci.uni-frankfurt.de
ChIPBase^a	Human	Tissues	RNA-seq	http://deepbase.sysu.edu.cn/chipbase/index.php
Co-LncRNA^a	Human	Cancers, cell lines, tissues	RNA-seq	http://www.bio-bigdata.com/Co-LncRNA/
deepBase v2.0	Chicken, chimpanzee, cow, gorilla, fly, frog, human, monkey, mouse, opossum, platypus, rat, worm, zebrafish	Cell lines, tissues	RNA-seq	http://biocenter.sysu.edu.cn/deepBase/index.php
Expression Atlas^a	Many	Many	Many	https://www.ebi.ac.uk/gxa/home
GEO Profiles^a	Many	Many	Many	https://www.ncbi.nlm.nih.gov/geoprofiles/
LncBase v.2	Human, mouse	Cell lines, tissues	RNA-seq	http://carolina.imis.athena-innovation.gr/ index.php?r=lncbasev2
Human Body Map lincRNAs^a	Human	Tissues	FISH, RNA-seq	http://www.broadinstitute.org/genome_bio/ human_lincrnas/
lncRNA2function^a	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/lncrna2function/index.jsp
lncRNAdb v2.0	Human	Tissues	RNA-seq	http://www.lncrnadb.org
lncRNAMap	Human	Cancers, tissues	RNA-seq	http://lncrnamap.mbc.nctu.edu.tw/php/index.php
lncRNAtor^a	Fly, human, mouse, worm, zebrafish	Cancers, diseases, tissues	RNA-seq	http://lncrnator.ewha.ac.kr/index.htm
MTD^a	Human, pig, rat, mouse	cell lines, tissues	RNA-seq	http://mtd.cbi.ac.cn
NONCODE 2016	Human, mouse	Tissues	RNA-seq	http://www.noncode.org
NRED^a	Human, mouse	Cell lines, tissues	ISH, microarray	http://jsm-research.imb.uq.edu.au/nred/cgi-bin/ ncrnadb.pl
TANRIC	Human	Cancers, cell lines	RNA-seq	http://ibl.mdanderson.org/tanric/_design/basic/ index.html
TF2lncRNA	Human	Tissues	RNA-seq	http://mlg.hit.edu.cn/tf2lncrna/index.jsp

^aThe databases that contain both protein-coding genes and lncRNAs.

FISH, fluorescent in situ hybridization; ISH, in situ hybridization.

It is generally accepted in the field that lncRNAs are poorly conserved from one species to another when their sequences are examined [3, 4]. Nevertheless, for the purpose of biological experiments, it is important to know the species-conservation of lncRNAs of interest, as it is not always possible to perform biological experiments (e.g. gain/loss-of-function) in human subjects, which leads to the usage of model organisms (e.g. mouse, zebrafish) for in vivo experiments [46]. To provide the evidence of evolutional conservation of lncRNAs, deepBase v2.0, MTD and NONCODE 2016 offer such information based on the sequence similarity via BLAST, while ANGIOGENES and C-It-Loci use three types of conservations. The first homology is based on the concept of ‘positional conservation’ [8, 47] that a genomic locus spanning between two homologous protein-coding genes are conserved when these protein-coding genes are conserved between/among organisms. By defining this locus to be conserved, any lncRNA in this locus is also considered as conserved between/among organisms. The second homology is based on the ultraconserved elements, which are species-conserved regions that are shown to be transcriptional regulators of key developmental genes [48, 49]. The third homology is based on the species-conserved cis-regulatory elements (enhancers) that are experimentally validated in transgenic mice [50, 51]. As the intension of biological databases should be that to assist researchers further for their biological experiments, it is imperative that an option to know the evolutional conservation of lncRNAs is provided.

As lncRNAs are more tissue-specifically but generally lower expressed than protein-coding genes [35, 52–55], many databases contain expression data of various tissues. In most cases, these databases are built in a way to prompt further knowledge discovery from the user side with a defined hypothesis (e.g. ‘In which tissue, a lncRNA of interest is expressed?’). Among these databases, ALDB, C-It-Loci, MTD and deepBase v2.0 offer a way to screen for a list of lncRNAs expressed in a target tissue. Among them, ANGIOGENES, C-It-Loci and MTD allow for the comparison of expressed lncRNAs across various cell lines and tissues. With this feature, these three databases provide a set of predefined hypotheses that could be used directly to screen for tissue-expressed, enriched and/or specific lncRNAs as well as protein-coding genes. This feature is important for in silico screening, as a researcher could obtain a set of interesting lncRNAs to be studied further in their favorite tissues.

As more and more studies are conducted for lncRNAs, it has become evident that lncRNAs are cell-type-specifically expressed more so than protein-coding genes [11, 56]. As different cell types built up a tissue, it is important that such information is provided. Of the databases listed in Table 1, Human Body Map lincRNAs (called ‘lincRNA-FISH catalog’) and NRED provide cell-type-specific expression of lncRNAs via in situ hybridization data while MTD offers such RNA-seq data. Given that it is possible to sequence at the single cell level [57–60], there will be more of such data sets to be included in the databases. However, as lncRNAs are known to be lower expressed than protein-coding genes [5, 8, 52], it might be difficult to comprehensively cover the transcriptomes of lncRNAs at the level of single cell; thus, it will be more helpful to include RNA-seq data of tissue compartments (e.g. hypothalamus of the brain) to the databases as in the case of MTD.

Taken together, although there are public databases for lncRNAs providing their expression patterns in various tissues, most of these databases fall short of offering a comprehensive profiling of lncRNAs to cover their cell-type-specific expressions. Such information will be important especially for in vivo studies, as it will give a clue about where to find phenotypes on ablating a lncRNA as in the case of knockout mice [61–63]. Furthermore, for the utilization of model organisms, it is of utmost importance that the databases provide the information regarding the evolutional conservation of lncRNAs to allow for more functional studies. More importantly, most of the databases are released to the public without validation experiments, especially for functional assays beyond the expression profiling by RT-PCR experiments, for example, leaving the users to validate the content of the databases by performing biological experiments, which are costly and time-consuming. From the perspective of product building, it is not a good practice to release a product (i.e. database) without the extensive validation of its database content, in this case, by performing biological validation experiments.

Building of RenalDB for kidney-related RNA expressions

Provided the above situation, we attempted to build an expression database for lncRNAs and protein-coding genes across organisms to offer a comprehensive profiling of transcripts in one tissue. For this purpose, we chose kidney as a model, as this tissue is present in all vertebrates, is related to human health (e.g. diabetes) and has a modest but diverse set of RNA-seq experiments available. Kidneys are complex organs that perform many important functions, including filtering of blood for excess organic molecules and regulating blood pressure via the secretion of hormones. To maintain their various functions, they are composed of many cell types, which require careful profiling for transcriptomes. Furthermore, there exists the most widely used cell line called HEK-293 cells [64], which allow for easy experimental manipulation (e.g. transfection of plasmids and siRNAs).

To collect RNA-seq data sets of kidneys, various databases (e.g. Gene Expression Omnibus DataSets, PubMed and SRA) were searched manually. Because the qualities of the genomic sequence information and gene annotations vary across organisms, we chose three well-annotated organisms for further study, which are human, mouse and zebrafish (Supplementary Table S2). Available data sets included whole kidney, kidney sub-tissues and isolated cell types, and even single cells. To this end, we propose a database and analysis programs using logic programming. Logic programming is a programming paradigm based on formal logic, using a set of logical sentences consisting of facts, rules and queries to solve a given problem [65]. For example, consider a transcript expressed in the renal cortex. The renal cortex is located within kidneys. When sequencing whole kidney under the same conditions, the same transcript should be expressed (Figure 2A). One could even descend to the level of cell types (e.g. endothelial cells isolated from interlobular arteries, which are located within the kidney cortex). Similarly, all sequences expressed within these endothelial cells are expressed in the kidney. Furthermore, it is well known that high abundance sequences can overwhelm lower abundance sequences. Thus, logic programming can be useful tool for integrating RNA-seq data at different hierarchical levels and beyond. This can be accomplished by modeling the anatomical and experimental relationships (Figure 2B), creating rules to define various types of expression characteristics (Figure 2C) and then using queries to determine the expression characteristic of a given RNA (Figure 2D).

Figure 2.

Logic programming. (A) Kidney anatomy. The original image was obtained from (Blausen.com staff. ‘Blausen gallery 2014’. Wikiversity Journal of Medicine 1 (2); doi:10.15347/wjm/2014.010; ISSN 20018762) and modified by showing only anatomical terms related to this figure. The image of endothelial cells was obtained from https://commons.wikimedia.org/wiki/File:Diagram_of_epithelial_cells_CRUK_033.svg. (B) A sample knowledge base describing the kidney anatomy and experimental data relationships with natural language comments explaining what each Datalog statement represents. (C) Another knowledge base describing some simple logical relationships (i.e. ‘contains’ and ‘expressed in’) with natural language comments explaining what statement represents. (D) A sample Datalog query using the knowledge bases in (B) and (C) to determine expression of ‘RNA_1’.

Open in new tab Download slide

The above concept is further extended in RenalDB through the advanced search functions, which can handle arbitrarily complex combinations of search tags and Boolean operators (‘and’, ‘or’, ‘not’). The search is used in the [LOCI] view (Figure 3A) and in the [VENN] view (Figure 3B). The [LOCI] view displays rows of sequences with Universal Genomic Accession (UGAs), names and other high-level descriptive data [9]. Clicking on a UGA will lead to the sequence view (Figure 3C). The sequence view contains detailed information about the corresponding sequence, such as general annotation information, links to other databases provided via CORS request with the UGAHash server (e.g. accessions will stay up-to-date automatically) [9] and, to the UCSC Genome Browser [66], associated GO terms and the corresponding links to the AmiGO 2 [20] database. Furthermore, the sequence’s expression data are available as a heatmap displaying expression strength with hierarchical tree structures by showing the ‘contains’ and ‘develops to’ relationships described by the logical models. Numerical values of expression profiles are also provided in the table format when [Numeric Values] tab is clicked. The samples in both of these views can be grouped or ungrouped based on sex, age and strain. The search is also used in the [VENN] view, allowing users to visualize up to three searches as a Venn diagram. The numbers shown in each Venn diagram are clickable. Once clicked, the list of associated genes and/or transcripts will be displayed in the [LOCI] view. Special attention should be paid to the power of the search, as it includes some of the logic programming capabilities. For example, searching for RNAs expressed in kidney (EXPRESSED:Kidney) will also yield RNAs not directly detected in the kidney but also those detected in some child components of kidney. Similarly, an RNA found to be specific in a child component of kidney will also be listed as ‘kidney specific’.

Figure 3.

Usage of RenalDB. (A) A [LOCI] view from RenalDB showing search query using search tags and Boolean operators. (B) Example of a query on the [VENN] view. (C) Example from sequence view showing the basic annotation data, the graphical expression overview for a gene, GO terms and homologs. The grouping of expression values can be modified using the checkbox list and updated by clicking the [GO] button.

Open in new tab Download slide

Validity of RenalDB and functional data of lncRNAs

To validate the content of RenalDB, we screened for lncRNAs; more specifically, lincRNAs that are located in between protein-coding genes on the genome [3]. The reason for focusing specifically on lincRNAs (instead of sense overlapping lncRNAs, for example) is that it is experimentally difficult to separate the expression of target lncRNA from that of nearby protein-coding gene, as some of their sequences overlap and the likelihood of sharing their promoter sequences is high.

From RenalDB, we selected 22 lincRNAs that are expressed in the human kidney and performed RT-PCR experiments using cDNA generated from 10 human tissues to validate their expression patterns in the kidney. As a result, 18 of 22 lincRNAs are expressed in the kidney, and some are enriched in the kidney compared with other tissues (Figure 4). Of note, the sources of total RNAs are different from those of publicly available RNA-seq used included in RenalDB. Furthermore, we set the number of PCR cycles to be 35, which may result in not detecting lowly expressed lincRNAs. Based on the RT-PCR results and conservation among organisms, we chose two lincRNAs (LOC440173 and PAXIP1-AS1) and characterized them further.

Figure 4.

RT-PCR experiment of selected kidney-expressed lincRNAs. To be consistent, 35 cycles of PCR reactions were used for all primer pairs. GAPDH and PRLP0 were used as loading controls. Those lincRNAs whose expressions could not detected with 35 cycles of PCR reactions are marked in blue, while two lincRNAs used for further experiments are marked in red.

Open in new tab Download slide

It is a well-known fact that many lncRNAs have distinct expression patterns in the cell (e.g. expressed exclusively in the nucleus). To determine subcellular localization of the selected lincRNAs, nuclear and cytoplasmic fractions of RNA were prepared from HEK-293 cells, and lincRNAs were detected by RT-PCR experiment (Figure 5A). The result indicates that LOC440173 is expressed in both the nuclear and cytoplasmic fractions, whereas PAXIP1-AS1 is exclusively detected in the nucleus.

Figure 5.

Expression of two selected lincRNAs and characterization of LOC440173. (A) Subcellular localization of LOC440173 and PAXIP1-AS1 in HEK-293 cells. For GAPDH, the primer pair targeting its intron between exon 2 and 3 was used. The representative image from three independent assays. (B) Efficiency of silencing by siRNAs; n = 3. (C) Volcano plot comparing silencing of LOC440173 and siScr. Genes selected above the threshold of 1.5-fold and p < 0.05 are colored in red; n = 2. (D) Morphologies of cells on siRNA transfection. The scale bar represents 100 μM.

Open in new tab Download slide

Although the above expression profiling experiments are informative, biological functions of the selected lincRNAs are unknown without further experiments. To this end, LOC440173 was silenced by siRNAs. Compared with the control (siRNA against scramble control sequence, termed ‘siScr’ hereafter), the expression of LOC440173 was efficiently silenced (Figure 5B). From these samples, total RNA was isolated and subjected to microarrays (Figure 5C). When a threshold of 1.5-fold and p-value of 0.05 cutoff were applied, 80 up- and 375 downregulated genes were identified (Supplementary Tables S3 and S4). To these genes, GO analysis was performed (Supplementary Tables S5 and S6). Among upregulated genes, GO terms related to protein transport and cell division are enriched; while GO terms related to cell migration and growth are enriched among downregulated genes. These findings are consistent with the morphology of the cells, which revealed a reduction of cell numbers after LOC440173 silencing (Figure 5D), suggesting that LOC440173 is important for cell growth or cell survival.

Next, PAXIP1-AS1 was analyzed in a similar manner. Although its official name is ‘PAXIP1 Antisense RNA 1’, PAXIP1-AS1 does not overlap with the protein-coding gene PAXIP1 on the genome. After silencing of PAXIP1-AS1 (Figure 6A) followed by microarray experiment (Figure 6B), the same set of threshold values (1.5-fold and p < 0.05) was applied. There were 91 up- and 39 downregulated genes (Supplementary Tables S7 and S8). To these genes, GO analysis was performed (Supplementary Tables S9 and S10). Aside from various GO terms related to metabolic processes enriched among upregulated genes, many GO terms related to cell death are enriched in both up- and downregulated genes. To test whether silencing of PAXIP1-AS1 affects cell viability, cells were treated with hydrogen peroxide, and the surviving cells were counted and normalized to those of the corresponding siScr cells (Figure 6C). Compared with siScr cells, the survival of PAX1P1-AS1-silenced cells was improved, particularly if compared with cells after LOC440173 silencing. These experiments suggest that that PAXIP1-AS1 is a regulator of cell death. However, the mechanism of its action is unknown. To further elucidate the mechanism, we determined whether PAXIP1-AS1 could cis-regulate the expression of the nearby protein-coding gene PAXIP1. On silencing of PAXIP1-AS1, downregulation of PAXIP1 was recorded (Figure 6D). Given that Paxip1 (also known as ‘PTIP’) homozygous mutant mice die by embryonic day 9.5 via accumulation of DNA damage [67], the modulation of cell death-related genes on silencing of PAXIP1-AS1 might be owing to a decreased expression of PAXIP1. However, further research is required to clearly define this mechanism.

Figure 6.

Characterization of PAXIP1-AS1. (A) Efficiency of silencing by siRNAs; n = 3. (B) Volcano plot comparing silencing of PAXIP1-AS1 and siScr. Genes selected above the threshold of 1.5-fold and p < 0.05 are colored in red; n = 2. (C) Cell viability on the treatment with hydrogen peroxide; n = 3. In the case of PAXIP1-AS1, the numbers of surviving cells were normalized to the corresponding siScr (100nM), whereas those of LOC440173-silenced cells were normalized to the corresponding siScr (10nM). (D) Expression of PAXIP1 on silencing of PAXIP1-AS1; n = 3.

Open in new tab Download slide

Discussion

On the survey of public databases for lncRNAs, it is evident that most of the current databases do not provide the detailed profiling of lncRNAs for their cell-type-specific expressions at the genome-wide level. To offer a step forward for providing such information, we built a knowledge database RenalDB to comprehensively cover the transcriptomes of human, mouse and zebrafish kidneys. Although some databases (e.g. Expression Atlas [32] and GEO Profiles [33]) contain more data than RenalDB, none are able to filter data by tissue enrichment or specificity. Furthermore, RenalDB is the only one of these databases to use logic programing. Expression Atlas does not have an advanced search option, while GEO Profiles does offer an advanced search with many options; however, it suffers from a lack of curation. By curation we mean humans going over the data and resolving discrepancies in metadata. This poses a problem for preforming advanced searches. For example, consider simple metadata such as ‘sex’. One would expect the values to be something like ‘Male’, ‘Female’ or ‘Unknown’. However, the actual metadata is much more messy with multiple related headings and values. While building RenalDB, we encountered various headings, including SEX, sex, Sex, Gender, GENDER, mouse gender. The values under these headings contained various labels, such as M, Male, male, None, N, U, Mixed, pooled. This is the case of metadata heading with only a few possible answers, and yet, it became complicated. The sample source metadata tags are more complex, especially because many of the cell types and tissues have synonyms, for example, ‘Renal Cortex’ versus ‘Kidney Cortex’. We standardized all metadata within RenalDB using text similarity clustering with OpenRefine [68]. Furthermore, we extensively searched the GEO/SRA archives for all kidney-related samples. Owing to this situation, we only considered kidney-related metadata with high-quality samples available. The curated metadata can be found here: http://renaldb.uni-frankfurt.de/static/cit/data/samples_dump.20160210.tsv

When similar transcriptomics databases focused in the kidney were searched, there are only two databases currently available: Renal Gene Expression Database (RGED) [69] and Toxygates [70]. Both databases contain microarrays but not RNA-seq data, which is now increasingly used in the laboratories around the world. This fact alone makes RenalDB a valuable tool for researchers working in the field of nephrology. Furthermore, both RGED and Toxygates only contain the information for the selected sets of protein-coding genes that are on the microarray platforms that are being used. In comparison, RenalDB covers whole transcriptomes, including all protein-coding genes and lncRNAs currently being annotated by the most widely used informational database Ensembl.

In the field of nephrology, the following lncRNAs have been identified and studied in detail: Arid2-IR [71], H19 [72], HOTAIR [73], RCCRT1 [74], TapSAKI [75] and Xist [76]. Given that many lncRNAs are expressed in various cell types and parts of the kidney, more functional evidence is necessary to comprehensively understand the transcriptomes of kidney and their contributions to the functionalities of kidneys across organisms. Compared with the reports about the lncRNA databases, this study provides the functional data of lncRNAs along with the applicability of the lncRNA database itself. This point is important, as it should not be up to the users to verify the content of the database being built and introduced to the public, although the database itself might have been built using the previously published high-throughput data sets. It is imperative to note that it is the responsibility of the software developer and his/her team to provide the functional data of such database. In conclusion, this study should set the standard for the further building of bioinformatics tools with the confidence guaranteed to the users.

Key Points

There are public databases providing transcriptomics data for expressions of lncRNAs.
There is a lack of cell-type-specific databases for lncRNAs targeting a specific tissue.
RenalDB provides a convenient way to screen for kidney, its sub-tissues and cell expressed, enriched and/or specific RNAs.
Experimental evidence helps demonstrate the validity of databases being introduced.

Supplementary Data

Supplementary data are available online at http://bib.oxfordjournals.org/.

Tyler Weirick is a senior Research Technologist at the University of Louisville and a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is focused on elucidating the evolutional conservation of long noncoding RNAs (lncRNAs).

Giuseppe Militello is a senior Research Technologist at the University of Louisville and a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is working with lncRNAs in the skeletal muscle.

Yuliya Ponomareva is a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is working with lncRNAs in the heart and stem cells.

David John is a PhD student at the Institute of Cardiovascular Regeneration (Uchida Lab), who is developing computational algorithms and pipelines to identify RNA modification events.

Dr Claudia Döring is a bioinformatics scientist and laboratory manager of the RNA laboratory at the Dr Senckenberg Institute of Pathology, who is focused on gene expression and next-generation sequencing analysis especially in lymphoma diseases.

Prof. Dr Stefanie Dimmeler is the director of the Institutes of Cardiovascular Regeneration.

Dr Shizuka Uchida is an Associate Professor of Medicine at the University of Louisville and an Independent Junior Group Leader at the Institute of Cardiovascular Regeneration. His laboratory (‘Cardiovascular Bioinformatics’: http://heartlncrna.github.io) is interested in elucidating the functions of lncRNAs using dry and wet laboratory techniques.

Acknowledgements

The authors would like to thank Wenjun Jin for excellent technical assistance.

Funding

The LOEWE Center for Cell and Gene Therapy (State of Hessen) (to S.U. and S.D.); the Deutsche Forschungsgemeinschaft (SFB834 to S.U. and S.D.); the German Center for Cardiovascular Research (DZHK) (to S.U. and S.D.); and the startup funding from the Mansbach Family, the Gheens Foundation and other generous supporters at the University of Louisville (to S.U.).

References

1

Lander

ES

,

Linton

LM

,

Birren

B

, et al.

Initial sequencing and analysis of the human genome

.

Nature

2001

;

409

:

860

–

921

.

2

Mercer

TR

,

Gerhardt

DJ

,

Dinger

ME

, et al.

Targeted RNA sequencing reveals the deep complexity of the human transcriptome

.

Nat Biotechnol

2012

;

30

:

99

–

104

.

Google Scholar

Crossref

WorldCat

3

Uchida

S

,

Dimmeler

S.

Long noncoding RNAs in cardiovascular diseases

.

Circ Res

2015

;

116

:

737

–

50

.

4

Uchida

S

,

Gellert

P

,

Braun

T.

Deeply dissecting stemness: making sense to non-coding RNAs in stem cells

.

Stem Cell Rev

2012

;

8

:

78

–

86

.

5

Weirick

T

,

Militello

G

,

Muller

R

, et al.

The identification and characterization of novel transcripts from RNA-seq data

.

Brief Bioinform

2016

;

17

:

678

–

85

.

6

Boeckel

JN

,

Jae

N

,

Heumuller

AW

, et al.

Identification and characterization of Hypoxia-regulated endothelial circular RNA

.

Circ Res

2015

;

117

:

884

–

90

.

7

Michalik

KM

,

You

X

,

Manavski

Y

, et al.

Long noncoding RNA MALAT1 regulates endothelial cell function and vessel growth

.

Circ Res

2014

;

114

:

1389

–

97

.

8

Weirick

T

,

John

D

,

Dimmeler

S

, et al.

C-It-Loci: a knowledge database for tissue-enriched loci

.

Bioinformatics

2015

;

31

:

3537

–

43

.

9

Weirick

T

,

John

D

,

Uchida

S.

Resolving the problem of multiple accessions of the same transcript deposited across various public databases

.

Brief Bioinform

2016

, doi: 10.1093/bib/bbw017.

Google Scholar

OpenURL Placeholder Text

WorldCat

10

Pine

PS

,

Rosenzweig

BA

,

Thompson

KL.

An adaptable method using human mixed tissue ratiometric controls for benchmarking performance on gene expression microarrays in clinical laboratories

.

BMC Biotechnol

2011

;

11

:

38

.

11

Goff

LA

,

Groff

AF

,

Sauvageau

M

, et al.

Spatiotemporal expression and transcriptional perturbations by long noncoding RNAs in the mouse brain

.

Proc Natl Acad Sci USA

2015

;

112

:

6855

–

62

.

12

Baral

C

,

Gelfond

M.

Logic programming and knowledge representation

.

J Log Program

1994

;

19

:

73

–

148

.

Google Scholar

Crossref

WorldCat

13

Koster

J

,

Rahmann

S.

Snakemake–a scalable bioinformatics workflow engine

.

Bioinformatics

2012

;

28

:

2520

–

2

.

14

Kodama

Y

,

Shumway

M

,

Leinonen

R.

The sequence read archive: explosive growth of sequencing data

.

Nucleic Acids Res

2012

;

40

:

D54

–

6

.

15

McWilliam

H

,

Li

W

,

Uludag

M

, et al.

Analysis tool web services from the EMBL-EBI

.

Nucleic Acids Res

2013

;

41

:

W597

–

600

.

16

Dobin

A

,

Davis

CA

,

Schlesinger

F

, et al.

STAR: ultrafast universal RNA-seq aligner

.

Bioinformatics

2013

;

29

:

15

–

21

.

17

Anders

S

,

Pyl

PT

,

Huber

W.

HTSeq–a Python framework to work with high-throughput sequencing data

.

Bioinformatics

2015

;

31

:

166

–

9

.

18

Love

MI

,

Huber

W

,

Anders

S.

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

.

Genome Biol

2014

;

15

:

550

.

19

Smedley

D

,

Haider

S

,

Durinck

S

, et al.

The BioMart community portal: an innovative alternative to large, centralized data repositories

.

Nucleic Acids Res

2015

;

43

:

W589

–

98

.

20

Carbon

S

,

Ireland

A

,

Mungall

CJ

, et al.

AmiGO: online access to ontology and annotation data

.

Bioinformatics

2009

;

25

:

288

–

9

.

21

Untergasser

A

,

Cutcutache

I

,

Koressaar

T

, et al.

Primer3–new capabilities and interfaces

.

Nucleic Acids Res

2012

;

40

:

e115

.

22

Gellert

P

,

Ponomareva

Y

,

Braun

T

, et al.

Noncoder: a web interface for exon array-based detection of long non-coding RNAs

.

Nucleic Acids Res

2013

;

41

:

e20

.

23

Gellert

P

,

Teranishi

M

,

Jenniches

K

, et al.

Gene array analyzer: alternative usage of gene arrays to study alternative splicing events

.

Nucleic Acids Res

2012

;

40

:

2414

–

25

.

24

Irizarry

RA

,

Hobbs

B

,

Collin

F

, et al.

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

.

Biostatistics

2003

;

4

:

249

–

64

.

25

Ritchie

ME

,

Phipson

B

,

Wu

D

, et al.

limma powers differential expression analyses for RNA-sequencing and microarray studies

.

Nucleic Acids Res

2015

;

43

:

e47

.

26

Huang da

W

,

Sherman

BT

,

Lempicki

RA.

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

.

Nat Protoc

2009

;

4

:

44

–

57

.

27

Li

A

,

Zhang

J

,

Zhou

Z

, et al.

ALDB: a domestic-animal long noncoding RNA database

.

PLoS One

2015

;

10

:

e0124003

.

28

Muller

R

,

Weirick

T

,

John

D

, et al.

ANGIOGENES: knowledge database for protein-coding and noncoding RNA genes in endothelial cells

.

Sci Rep

2016

;

6

:

32475

.

29

Yang

JH

,

Li

JH

,

Jiang

S

, et al.

ChIPBase: a database for decoding the transcriptional regulation of long non-coding RNA and microRNA genes from ChIP-Seq data

.

Nucleic Acids Res

2013

;

41

:

D177

–

87

.

30

Zhao

Z

,

Bai

J

,

Wu

A

, et al.

Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

.

Database

2015

;

2015

.

Google Scholar

OpenURL Placeholder Text

WorldCat

31

Zheng

LL

,

Li

JH

,

Wu

J

, et al.

deepBase v2.0: identification, expression, evolution and function of small RNAs, LncRNAs and circular RNAs from deep-sequencing data

.

Nucleic Acids Res

2016

;

44

:

D196

–

202

.

32

Petryszak

R

,

Keays

M

,

Tang

YA

, et al.

Expression atlas update–an integrated database of gene and protein expression in humans, animals and plants

.

Nucleic Acids Res

2016

;

44

:

D746

–

52

.

33

Barrett

T

,

Wilhite

SE

,

Ledoux

P

, et al.

NCBI GEO: archive for functional genomics data sets–update

.

Nucleic Acids Res

2013

;

41

:

D991

–

5

.

34

Paraskevopoulou

MD

,

Georgakilas

G

,

Kostoulas

N

, et al.

DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs

.

Nucleic Acids Res

2013

;

41

:

D239

–

45

.

35

Cabili

MN

,

Trapnell

C

,

Goff

L

, et al.

Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses

.

Genes Dev

2011

;

25

:

1915

–

27

.

36

Jiang

Q

,

Ma

R

,

Wang

J

, et al.

LncRNA2Function: a comprehensive resource for functional investigation of human lncRNAs based on RNA-seq data

.

BMC Genomics

2015

;

16(Suppl 3)

:

S2

.

37

Quek

XC

,

Thomson

DW

,

Maag

JL

, et al.

lncRNAdb v2.0: expanding the reference database for functional long noncoding RNAs

.

Nucleic Acids Res

2015

;

43

:

D168

–

73

.

38

Chan

WL

,

Huang

HD

,

Chang

JG.

lncRNAMap: a map of putative regulatory functions in the long non-coding transcriptome

.

Comput Biol Chem

2014

;

50

:

41

–

9

.

39

Park

C

,

Yu

N

,

Choi

I

, et al.

lncRNAtor: a comprehensive resource for functional investigation of long non-coding RNAs

.

Bioinformatics

2014

;

30

:

2480

–

5

.

40

Sheng

X

,

Wu

J

,

Sun

Q

, et al.

MTD: a mammalian transcriptomic database to explore gene expression and regulation

.

Brief Bioinform

2016

, in press.

Google Scholar

OpenURL Placeholder Text

WorldCat

41

Zhao

Y

,

Li

H

,

Fang

S

, et al.

NONCODE 2016: an informative and valuable data source of long non-coding RNAs

.

Nucleic Acids Res

2015

;

44

:

D203

–

8

.

42

Dinger

ME

,

Pang

KC

,

Mercer

TR

, et al.

NRED: a database of long noncoding RNA expression

.

Nucleic Acids Res

2009

;

37

:

D122

–

6

.

43

Li

J

,

Han

L

,

Roebuck

P

, et al.

TANRIC: an interactive open platform to explore the function of lncRNAs in cancer

.

Cancer Res

2015

;

75

:

3728

–

37

.

44

Jiang

Q

,

Wang

J

,

Wang

Y

, et al.

TF2LncRNA: identifying common transcription factors for a list of lncRNA genes from ChIP-Seq data

.

Biomed Res Int

2014

;

2014

:

317642

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

45

Volders

PJ

,

Verheggen

K

,

Menschaert

G

, et al.

An update on LNCipedia: a database for annotated human lncRNA sequences

.

Nucleic Acids Res

2015

;

43

:

D174

–

80

.

46

Uchida

S

,

Schneider

A

,

Wiesnet

M

, et al.

An integrated approach for the systematic identification and characterization of heart-enriched genes with unknown functions

.

BMC Genomics

2009

;

10

:

100

.

47

Ulitsky

I

,

Shkumatava

A

,

Jan

CH

, et al.

Conserved function of lincRNAs in vertebrate embryonic development despite rapid sequence evolution

.

Cell

2011

;

147

:

1537

–

50

.

48

Bejerano

G

,

Pheasant

M

,

Makunin

I

, et al.

Ultraconserved elements in the human genome

.

Science

2004

;

304

:

1321

–

5

.

49

Dimitrieva

S

,

Bucher

P.

UCNEbase–a database of ultraconserved non-coding elements and genomic regulatory blocks

.

Nucleic Acids Res

2013

;

41

:

D101

–

9

.

50

Pennacchio

LA

,

Ahituv

N

,

Moses

AM

, et al.

In vivo enhancer analysis of human conserved non-coding sequences

.

Nature

2006

;

444

:

499

–

502

.

51

Visel

A

,

Minovitsky

S

,

Dubchak

I

, et al.

VISTA enhancer browser–a database of tissue-specific human enhancers

.

Nucleic Acids Res

2007

;

35

:

D88

–

92

.

52

Derrien

T

,

Johnson

R

,

Bussotti

G

, et al.

The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression

.

Genome Res

2012

;

22

:

1775

–

89

.

53

Clark

MB

,

Mercer

TR

,

Bussotti

G

, et al.

Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing

.

Nat Methods

2015

;

12

:

339

–

42

.

54

Molyneaux

BJ

,

Goff

LA

,

Brettler

AC

, et al.

DeCoN: genome-wide analysis of in vivo transcriptional dynamics during pyramidal neuron fate selection in neocortex

.

Neuron

2015

;

85

:

275

–

88

.

55

Werber

M

,

Wittler

L

,

Timmermann

B

, et al.

The tissue-specific transcriptomic landscape of the mid-gestational mouse embryo

.

Development

2014

;

141

:

2325

–

30

.

56

Mercer

TR

,

Dinger

ME

,

Sunkin

SM

, et al.

Specific expression of long noncoding RNAs in the mouse brain

.

Proc Natl Acad Sci USA

2008

;

105

:

716

–

21

.

57

Tang

F

,

Barbacioru

C

,

Bao

S

, et al.

Tracing the derivation of embryonic stem cells from the inner cell mass by single-cell RNA-Seq analysis

.

Cell Stem Cell

2010

;

6

:

468

–

78

.

58

Tang

F

,

Barbacioru

C

,

Nordman

E

, et al.

RNA-Seq analysis to capture the transcriptome landscape of a single cell

.

Nat Protoc

2010

;

5

:

516

–

35

.

59

Trapnell

C

,

Cacchiarelli

D

,

Grimsby

J

, et al.

The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells

.

Nat Biotechnol

2014

;

32

:

381

–

6

.

60

Yan

L

,

Yang

M

,

Guo

H

, et al.

Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells

.

Nat Struct Mol Biol

2013

;

20

:

1131

–

9

.

61

Grote

P

,

Wittler

L

,

Hendrix

D

, et al.

The tissue-specific lncRNA Fendrr is an essential regulator of heart and body wall development in the mouse

.

Dev Cell

2013

;

24

:

206

–

14

.

62

Lai

KM

,

Gong

G

,

Atanasio

A

, et al.

Diverse phenotypes and specific transcription patterns in twenty mouse lines with ablated LincRNAs

.

PLoS One

2015

;

10

:

e0125522

.

63

Sauvageau

M

,

Goff

LA

,

Lodato

S

, et al.

Multiple knockout mouse models reveal lincRNAs are required for life and brain development

.

Elife

2013

;

2

:

e01749

.

64

Stepanenko

AA

,

Dmitrenko

VV.

HEK293 in cell biology and cancer research: phenotype, karyotype, tumorigenicity, and stress-induced genome-phenotype evolution

.

Gene

2015

;

569

:

182

–

90

.

65

Eklund

P

,

Klawonn

F.

Neural fuzzy logic programming

.

IEEE Trans Neural Netw

1992

;

3

:

815

–

8

.

66

Kent

WJ

,

Sugnet

CW

,

Furey

TS

, et al.

The human genome browser at UCSC

.

Genome Res

2002

;

12

:

996

–

1006

.

67

Cho

EA

,

Prindle

MJ

,

Dressler

GR.

BRCT domain-containing protein PTIP is essential for progression through mitosis

.

Mol Cell Biol

2003

;

23

:

1666

–

73

.

68

Ham

K.

OpenRefine (version 2.5). http://openrefine.org. Free, open-source tool for cleaning and transforming data

.

J Med Libr Assoc

2013

;

101

:

233

–

4

.

Google Scholar

Crossref

WorldCat

69

Zhang

Q

,

Yang

B

,

Chen

X

, et al.

Renal Gene Expression Database (RGED): a relational database of gene expression profiles in kidney disease

.

Database

2014

;

2014

, in press.

Google Scholar

OpenURL Placeholder Text

WorldCat

70

Nystrom-Persson

J

,

Igarashi

Y

,

Ito

M

, et al.

Toxygates: interactive toxicity analysis on a hybrid microarray and linked data platform

.

Bioinformatics

2013

;

29

:

3080

–

6

.

71

Zhou

Q

,

Huang

XR

,

Yu

J

, et al.

Long noncoding RNA Arid2-IR is a novel therapeutic target for renal inflammation

.

Mol Ther

2015

;

23

:

1034

–

43

.

72

Kanwar

YS

,

Pan

X

,

Lin

S

, et al.

Imprinted mesodermal specific transcript (MEST) and H19 genes in renal development and diabetes

.

Kidney Int

2003

;

63

:

1658

–

70

.

73

Wu

Y

,

Liu

J

,

Zheng

Y

, et al.

Suppressed expression of long non-coding RNA HOTAIR inhibits proliferation and tumourigenicity of renal carcinoma cells

.

Tumour Biol

2014

;

35

:

11887

–

94

.

74

Song

S

,

Wu

Z

,

Wang

C

, et al.

RCCRT1 is correlated with prognosis and promotes cell migration and invasion in renal cell carcinoma

.

Urology

2014

;

84

:

730 e731

–

7

.

Google Scholar

Crossref

WorldCat

75

Lorenzen

JM

,

Schauerte

C

,

Kielstein

JT

, et al.

Circulating long noncoding RNATapSaki is a predictor of mortality in critically ill patients with acute kidney injury

.

Clin Chem

2015

;

61

:

191

–

201

.

76

Huang

YS

,

Hsieh

HY

,

Shih

HM

, et al.

Urinary Xist is a potential biomarker for membranous nephropathy

.

Biochem Biophys Res Commun

2014

;

452

:

415

–

21

.

Author notes

Tyler Weirick and Giuseppe Militello authors contributed equally to this work.

Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/about_us/legal/notices)

Download all slides

Month:	Total Views:
December 2016	10
January 2017	19
February 2017	36
March 2017	16
April 2017	14
May 2017	22
June 2017	12
July 2017	6
August 2017	3
September 2017	6
October 2017	3
November 2017	9
December 2017	11
January 2018	28
February 2018	40
March 2018	76
April 2018	45
May 2018	38
June 2018	28
July 2018	20
August 2018	14
September 2018	18
November 2018	24
December 2018	11
January 2019	7
February 2019	14
March 2019	11
April 2019	4
May 2019	7
June 2019	13
July 2019	2
August 2019	7
September 2019	4
October 2019	9
November 2019	8
December 2019	4
January 2020	2
February 2020	4
March 2020	35
April 2020	28
May 2020	14
June 2020	53
July 2020	49
August 2020	24
September 2020	35
October 2020	37
November 2020	46
December 2020	55
January 2021	46
February 2021	40
March 2021	46
April 2021	22
May 2021	38
June 2021	36
July 2021	42
August 2021	32
September 2021	26
October 2021	35
November 2021	23
December 2021	34
January 2022	28
February 2022	14
March 2022	11
April 2022	8
May 2022	25
June 2022	20
July 2022	36
August 2022	15
September 2022	48
October 2022	61
November 2022	11
December 2022	23
January 2023	10
February 2023	24
March 2023	30
April 2023	26
May 2023	9
June 2023	11
July 2023	10
August 2023	6
September 2023	9
October 2023	13
November 2023	11
December 2023	15
January 2024	11
February 2024	12
March 2024	11
April 2024	17
May 2024	23
June 2024	12
July 2024	13
August 2024	14
September 2024	15
October 2024	5
November 2024	17
December 2024	9
January 2025	5
February 2025	16
March 2025	26
April 2025	5
May 2025	1

Article Contents

Logic programming to infer complex RNA expression patterns from RNA-seq data

Abstract

Introduction

Methods

The RenalDB database

Culturing of cells, quantitative reverse transcription polymerase chain reaction and siRNAs

Cell viability assay

Microarray experiments and data analysis

Statistics

Results

Survey of public lncRNA databases with expression data

Building of RenalDB for kidney-related RNA expressions

Validity of RenalDB and functional data of lncRNAs

Discussion

Supplementary Data

Acknowledgements

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Logic programming to infer complex RNA expression patterns from RNA-seq data

Abstract

Introduction

Methods

The RenalDB database

Culturing of cells, quantitative reverse transcription polymerase chain reaction and siRNAs

Cell viability assay

Microarray experiments and data analysis

Statistics

Results

Survey of public lncRNA databases with expression data

Building of RenalDB for kidney-related RNA expressions

Validity of RenalDB and functional data of lncRNAs

Discussion

Supplementary Data

Acknowledgements

Funding

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only