• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2010; 38(Database issue): D123–D130.
Published online Dec 4, 2009. doi:  10.1093/nar/gkp943
PMCID: PMC2808990

deepBase: a database for deeply annotating and mining deep sequencing data

Abstract

Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data. The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans and Arabidopsis thaliana. By analyzing ~14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, we annotated and identified ~380 000 unique ncRNA-associated small RNAs (nasRNAs), ~1.5 million unique promoter-associated small RNAs (pasRNAs), ~4.0 million unique exon-associated small RNAs (easRNAs) and ~6 million unique repeat-associated small RNAs (rasRNAs). Furthermore, 2038 miRNA and 1889 snoRNA candidates were predicted by miRDeep and snoSeeker. All of the mapped reads can be grouped into about 1.2 million RNA clusters. For the purpose of comparative analysis, deepBase provides an integrative, interactive and versatile display. A convenient search option, related publications and other useful information are also provided for further investigation. deepBase is available at: http://deepbase.sysu.edu.cn/.

INTRODUCTION

Next-generation ‘deep-sequencing’ technologies have enabled the detection and profiling of both known and novel small noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth (1–3). Most studies to date have used 454 and Solexa technologies to discover new and different ncRNA classes in a multitude of species, including human (4–6), mouse (5,7,8), chicken (9,10), Ciona intestinalis (11), Drosophila melanogaster (12–15), Caenorhabditis elegans (16–18) and Arabidopsis thaliana (19–22). However, the analysis of these massive and heterogeneous deep sequencing data sets poses several challenges, including effective data mapping, annotation and visualization; efficient data storage and retrieval; integration and interpretation of data from multiple technological platforms, tissues and cell lines; and customizing the analysis so that a variety of biological questions can be addressed. Although the above-mentioned studies have targeted some of these individual steps in a specific genome, an integrated database that can meet all these basic needs for deep sequencing data is not yet available for animal and plant genomes.

Recent studies have shown that many small RNAs derived from annotated genomic elements, such as long ncRNAs, transcription start sites (TSSs) and transposable elements (TEs), can modulate diverse biological functions (6,23–29), raising the possibility that a large group of small RNAs originating from annotated genomic elements may still be hiding in eukaryotic genomes (6,23–29). However, in the past, sequence reads mapped to non-miRNA or non-piRNA gene families have been routinely discarded and not analyzed further. Intriguingly, a large number of highly abundant small RNAs derived from known ncRNAs often span the entire RNA locus, indicating that we not only can recapitulate known ncRNAs but also can identify novel ncRNAs by grouping these nearby small RNAs into clusters.

In this study, we describe the newly developed deepBase database for the comprehensive annotation and mining of deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms (Figure 1). deepBase contains millions of small RNAs derived from known ncRNAs, protein-coding genes and repeat elements, as well as a massive number of unannotated small RNAs. In addition, we report about 1.2 million RNA clusters that include multiple classes of infrastructural ncRNAs (e.g. tRNAs, rRNAs, snRNA and snoRNAs), miRNAs, piRNA precursors and repeat-associated siRNA precursors, as well as numerous novel ncRNAs, some of which can be predicted as novel miRNAs and snoRNAs by miRDeep (5) and our snoSeeker programme (30). Finally, deepBase provides an integrative, interactive and versatile web graphical interface to display these data and facilitate transcriptomic research and the discovery of novel ncRNAs.

Figure 1.
The basic framework of deepBase. All results generated by deepBase are deposited in relational databases and displayed in the visual browser and web page. The web-interface programmes and browser can be accessed by a wide range of research biologists ...

MATERIALS AND METHODS

One hundred and eighty-five small-RNA libraries from diverse tissues and cell lines from seven organisms were compiled from 34 related studies (Supplementary Table S1) and downloaded from the NCBI GEO website (31). Known ncRNAs were downloaded from Ensembl (32) or UCSC bioinformatics website (33), or were obtained from the literature. All known miRNAs were downloaded from miRBase [release 13.0, (34)]. Human and Arabidopsis snoRNAs were downloaded from snoRNABase (35) and the Plant snoRNA Database (36), respectively. All known tRNAs were downloaded from the Genomic tRNA Database (37). All refSeq genes and repeat elements (38,39) for animal genomes were downloaded from the UCSC bioinformatics website (33). Human (UCSC hg18), mouse (NCBI Build 37), chicken (Gallus gallus, v2.1), C. intestinalis (JGI v2.0), D. melanogaster (BDGP Release 5) and C. elegans (WS190) genome sequences were downloaded from the UCSC bioinformatics website (33). Arabidopsis (TAIR8) genome sequences, repeat elements and protein-coding genes were downloaded from the TAIR website (40).

All deep sequencing data downloaded from the NCBI GEO database is in SOFT format, and some raw data included 3′ adapters or barcodes. If the raw data included 3′ adapters or barcodes, we clipped the reads using our in-house Perl scripts. Upon removal of adapters, the sequences shorter than 15 nt were discarded. The low-complexity reads were also discarded (41). All unique reads without adapters in each library were mapped to the seven genomes using Bowtie (version 0.9.9.3) (42) with options: −f −k 200 −v 0, and only perfect matches over their entire length were set aside. Specifying the parameters (−k 200 –v 0) instructs Bowtie to report up to 200 perfect hits for each read (42). Together with all mapped reads in each library, we found a total of ~14.6 million unique reads that perfectly mapped to more than 284 million genomic loci. Finally, up to 50 perfect hits to each genome were considered per query read in the subsequent analysis. Considering mismatch is not allowed between the genome and the small RNA reads in deepBase, current deepBase does not contains isomiRs with at least one mismatch to the genome (4). These mismatches are usually generated by adding the untemplated nucleotide to the 3′-terminal of miRNAs (4) or RNA editing (4,43,44). The large amount of data that is generated and that needs to be analyzed in such a large-scale screen requires appropriate computational means for storage and processing. For this task, a MySQL database was created to store the mapped reads.

We define an RNA cluster as a group of small RNAs in which each small RNA is ≤70 nt from its nearest neighbour and whose cluster length is ≥45 nt. These parameters were determined based on our statistics for the distribution of the distance between two nearest neighbouring reads that mapped to known ncRNAs (Supplementary Table S2). Our analysis revealed that more than 92% of the known ncRNA precursors can be grouped into clusters (Supplementary Table S2). RNAfold (45) was applied to predict the RNA secondary structures of ncRNAs and RNA clusters.

Novel miRNA and snoRNA candidates were predicted from deep sequencing data by a modified miRDeep (5) and snoSeeker (30), respectively. RNA cluster sequences, extended by an additional 100 nt in both the 5′- and 3′-directions for each of the species, were extracted as the snoSeeker input data set. We applied the snoSeeker programme (30) to these RNA clusters with the following options: guide C/D ≥ 37.5 bits, orphan C/D ≥ 26.5 bits, guide H/ACA ≥ 40 bits and orphan H/ACA ≥ 27.0 bits. The novel snoRNA candidates that significantly overlapped with exons, repeat elements or other known ncRNAs were discarded. Novel miRNA candidates were predicted from deep sequencing data by a modification of the miRDeep programme (5) with default option scores. To improve search speed of miRDeep, we introduced the following modifications: (i) the sequence reads were mapped to the genome using Bowtie (42), rather than BLAST (41), and (ii) the sequences were extracted from the huge genomes using our fetchSeq programme (the programme are available from the authors upon request), which was written in C language.

Relative expression analysis was sought to determine the expression preferences of individual miRNA and ncRNA across all small RNA libraries. The number of reads matching a particular ncRNA was calculated. Each ncRNA count from each library was normalized to the total read number for that library. The normalized count of a particular ncRNA in a particular library was divided by the sum of normalized count for that ncRNA across all libraries. Those normalized counts were transformed to 100 percentiles, and each bar in heatmap represents the normalized level. Except the miRNAs, the heatmap reflects a rough measure of ncRNA total expression because most of the reads mapped to the other ncRNA species might be the degenerated products.

deepBase DATABASE

Annotation and identification of about 380 000 nasRNAs from millions of deep sequencing reads

Recent studies have shown that many small RNAs generated from long ncRNAs by specific biogenesis pathways can modulate and silence gene expression, indicating that further investigation of these small RNA data sets is worthwhile for discovering novel functional small RNAs (23,24,46). Moreover, miRNA-offset RNAs (moRs) generated from 60-nt pre-miRNAs have been identified in C. intestinalis, suggesting an intrinsic property of the miRNA processing machinery (11). In this study, all mapped sequences were intersected against all types of long ncRNAs, including miRNA precursors (miRBase v13), snoRNAs, tRNAs, rRNAs, snRNAs, scRNAs, Mt_tRNAs and misc_RNAs. A total of ~58 800 unique reads and ~380 000 unique ncRNA-associated small RNAs (nasRNAs) originated from 2013 miRNA precursors and the other 9719 known long ncRNAs (Table 1), respectively. All reads overlapping these RNA genes were stored in the MySQL database for searching and browsing in deepBase.

Table 1.
Statistics in deepBase

Annotation and identification of abundant pasRNAs and easRNAs

A new class of transcripts were recently reported to originate near the expected TSSs upstream of protein-coding sequences (6,25–27,29). The existence of these promoter-associated small RNAs (pasRNAs) challenges our simplistic models of how the DNA sequences known as ‘promoters’ define TSSs (28). Moreover, many endogenous small interfering RNAs (endo-siRNAs) derived from protein-coding regions modulate gene expression and silencing (47,48). Thus, a genome-wide investigation of all of these small RNAs remains desirable due to the light it could shed on their biogenesis and function. In this study, all mapped reads were also intersected against the known refSeq genes and the upstream 350 nucleotides and downstream 150 nucleotides of TSSs. Those mapped reads overlapping TSSs were designated as pasRNAs (49,50). We divided the small RNAs overlapping with exons into sense and antisense exon-associated small RNAs (easRNAs) according to their strand. A total of ~1.5 million unique pasRNAs and ~4.0 million unique easRNAs were identified from TSSs and protein-coding sequences, producing the most comprehensive database of pasRNAs to date (Table 1).

Annotation and identification of abundant rasRNAs

A major system that controls the activity of TEs in flies and vertebrates is mediated by Piwi-interacting RNAs (piRNAs), 24–30 nucleotide RNAs that are bound by Piwi-class effectors (51–53). Previously, these piRNAs were grouped together based on their genomic location as repeat-associated small interfering RNAs (rasiRNAs) (54–61). Recent studies have also shown that many small interfering RNAs (siRNAs) from TEs play important roles in plants, fungi, Drosophila and vertebrates (54–61). To annotate and identify these repeat-associated small RNAs (rasRNAs), all mapped reads were also intersected with RepeatMasker annotations (38,39). These mapped small RNAs-overlapping repeats were divided into sense and antisense rasRNAs. A total of ~3.0 million unique sense and ~3.0 million unique antisense rasRNAs were identified from repeat elements, producing the most comprehensive database for rasRNAs to date (Table 1).

RNA clusters and novel ncRNA discovery

When we finished the annotation and identification of nasRNAs, we found that a large number of highly abundant ncRNA-associated small RNAs often span part of and even the entire RNA locus. Thus, an analysis of genomic clustering can be used to identify novel ncRNAs, hunt for hidden transcripts and determine whether small RNAs and clusters are differentially expressed in the sampled tissues. To cluster these small RNAs, we grouped all the mapped reads into about 1.2 million RNA clusters according to their distance (details in ‘Materials and Methods’ section). These clusters ranged in size from 45 nt to thousands of nt. All RNA clusters were intersected with known ncRNAs, and 1684 and 8364 RNA clusters were found to overlap known miRNAs and ncRNAs, respectively (Supplementary Table S3). Moreover, we found that 285 530 RNA clusters overlapped with the evolutionarily conserved elements generated by the PhastCons programme (62) in five organisms (Supplementary Table S3). These data suggest the possibility that a large group of novel ncRNAs, and perhaps even a novel class of ncRNAs, may still be hiding in eukaryotic genomes. To test the hypothesis, we applied a modified miRDeep (5) and our snoSeeker programmes (30) to the deep sequencing data and these RNA clusters (details in ‘Materials and Methods’ section). We identified 1161 novel miRNA and 857 novel snoRNA candidates, in addition to 877 known miRNAs and 1032 known snoRNAs.

WEB INTERFACE

deepBase provides a variety of interfaces and graphical visualization to facilitate analysis of the massive and heterogeneous small RNA data sets from different tissues, cell lines and technology platforms. We have also developed a new visualization tool, deepView genome browser, to provide a quick overview of a particular region in the genome and for visually correlating various types of features (Figure 2, Supplementary Figures S1–S4). The deepView browser in deepBase provides an integrated view of mapped reads, known and predicted ncRNAs, protein-coding genes and RNA clusters and their expression peaks (Figure 2, Supplementary Figures S1–S4). Clicking a prediction or gene of interest launches a multiple-alignment trace viewer that displays all traces of genes or links to external resources such as NCBI, UCSC, miRBase and TAIR to obtain more comprehensive information. The libView browsers provide the graphical comparisons of multiple libraries for the distribution of length and 5′-terminal nucleotide of small RNAs (Supplementary Figure S5). We also provide the nasView graphical browser to facilitate the comparisons of multiple small RNA libraries of ncRNAs, including miRNAs, snoRNAs, tRNAs, rRNAs, snRNAs, scRNAs, Mt_tRNAs and misc_RNAs (Supplementary Figure S6). The expression profiles for ncRNAs are also provided to test for differential expression pattern among different tissues and cell lines (Supplementary Figure S7). For small RNAs derived from diverse RNAs, RNA clusters and predicted ncRNAs, the database provides the sequence, genomic location, RNA secondary structures, references and annotations.

Figure 2.
Snapshot of the deepView browser. (a) The controls directly underneath position the browser over a specific region in the genome. (b) RNA genes from Ensembl or the literature. (c) refSeq Gene. (d) microRNA gene from miRBase v13. (e) RNA clusters generated ...

deepBase provides a variety of search functions, including keyword function for searching small RNA, ncRNA and RNA cluster information, and a BLAST (41) function for performing searches against sets of small RNA sequences. The search results are linked to the full database records.

DISCUSSION AND CONCLUSIONS

By mapping and annotating ~66 million unique sequences derived from 185 small RNA libraries of diverse tissues and cell lines from seven organisms (Supplementary Table S1), we have provided a comprehensive integrated map of the diverse small RNAs, including miRNAs, piRNAs, endo-siRNAs, nasRNAs, pasRNAs, easRNAs and rasRNAs, in these genomes. In addition to recapitulating known small RNAs, we provide enhanced resolution and novel findings owing to the integration of the large number of small RNA libraries of diverse tissues and cell lines. Moreover, the ~1.2 million RNA clusters identified in this study have shown an extensive and complex transcriptional map in the seven genomes.

Our initial analysis of these RNA clusters reveals that (i) these clusters cover thousands of known ncRNAs and protein exons (Supplementary Table S3) and (ii) additional members of known ncRNA (miRNA and snoRNA) families were identified from deep sequencing data using miRDeep (5) and snoSeeker (30). However, the most intriguing result of our study is the numerous predicted RNA clusters that could not be assigned to known annotated RNAs. Some of these overlapped with the evolutionarily conserved phastCons elements (62), indicating their important functions. By contrast, many of these RNA clusters might not be functional, but rather ‘junk’ RNA generated as a by-product of cellular activities. To determine whether these RNA clusters are evidence of important new biochemical pathways, it will ultimately be necessary to test their function by new experimental or computational methods. Nevertheless, our findings indicate that future investigation of the RNA clusters is worthwhile for discovering novel ncRNAs and even novel ncRNA classes.

In comparison to the other databases related to deep sequencing small RNA data sets including FANTOM4 (29,63) and Gene Expression Omnibus (GEO) Short Read Archive (SRA) (31), deepBase aims on the mapping, annotation, mining and visualization of deep sequencing data from multiple technological platforms, tissues and cell lines of different organisms, and customizing the analysis so that a variety of biological questions can be addressed. The GEO SRA mainly offers the submission, storage and retrieval of deep sequencing data (31), whereas the FANTOM4 currently provides a genome browser for displaying all their own data and only contains the deep sequencing data from a human monocytic cell line THP-1 (29,63). Finally, the data and the integrative, interactive and versatile display provided by the deepBase database will aid future experimental and computational studies in the discovery of novel ncRNAs and transcriptomes.

FUTURE DIRECTIONS

Next-generation sequencing technologies have played a vital role in improving our understanding of functional genomics. As new genome builds and genome-wide high-throughput deep sequencing data from different species, cell lines, tissues and conditions become available, we will continuously maintain and update the database. The Automatic Mapping, Annotating and Mining Tools (AutoMAMT) in deepBase are run in our high-performance computer servers. Indeed, we have updated the deepBase for human genome (hg19 version) using AutoMAMT. At present, deepBase has integrated additional 52 small RNA libraries which are annotated and mapped to the latest human assemble version (hg19). We will continue to extend the volume on the current disk and improve the performance of our computer servers for storing the new sequencing data. The stand-alone graphical user interface (GUI) softwares (http://deepbase.sysu.edu.cn/deepTools.php) will be continuously released in deepBase. Bench biologists can use these stand-alone GUI softwares to manipulate and analyze their own data or data downloaded from deepBase locally on personal computers. The integration of transcriptome datasets from the deepBase database with other deep sequencing research (1–3), such as genomic mRNA-Seq, methylC-Seq and ChIP-Seq, will contribute to functional annotation of the genome and to a deeper understanding of genomic and cellular dynamics and features.

AVAILABILITY

deepBase is freely available at http://deepbase.sysu.edu.cn/. The deepBase data files can be freely downloaded and used according to the GNU Public License.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Natural Science Foundation of China (No. 30830066, 30771151, 30900820); National Basic Research Program (No. 2005CB724600) from the Ministry of Science and Technology of China, the funds from the Ministry of Education of China and Guangdong Province (No. IRT0447, NSF05200303, 9451027501002591); China Postdoctoral Science Foundation (No. 4109898); Young Teacher Fund of Sun Yat-sen University (No. 3171917). Funding for open access charge: National Basic Research Program (No. 2005CB724600) from the Ministry of Science and Technology of China.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors are grateful to Dr Daniel Gautheret for his useful communications

REFERENCES

1. Lister R, Gregory BD, Ecker JR. Next is now: new technologies for sequencing of genomes, transcriptomes, and beyond. Curr. Opin. Plant Biol. 2009;12:107–118. [PMC free article] [PubMed]
2. Mardis ER. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 2008;9:387–402. [PubMed]
3. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24:133–141. [PubMed]
4. Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, Zhao Y, McDonald H, Zeng T, Hirst M, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res. 2008;18:610–621. [PMC free article] [PubMed]
5. Friedlander MR, Chen W, Adamidi C, Maaskola J, Einspanier R, Knespel S, Rajewsky N. Discovering microRNAs from deep sequencing data using miRDeep. Nature Biotechnol. 2008;26:407–415. [PubMed]
6. Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, Flynn RA, Young RA, Sharp PA. Divergent transcription from active promoters. Science. 2008;322:1849–1851. [PMC free article] [PubMed]
7. Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, Kingston RE. Characterization of the piRNA complex from rat testes. Science. 2006;313:363–367. [PubMed]
8. Babiarz JE, Ruby JG, Wang Y, Bartel DP, Blelloch R. Mouse ES cells express endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 2008;22:2773–2785. [PMC free article] [PubMed]
9. Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res. 2008;18:957–964. [PMC free article] [PubMed]
10. Rathjen T, Pais H, Sweetman D, Moulton V, Munsterberg A, Dalmay T. High throughput sequencing of microRNAs in chicken somites. FEBS Lett. 2009;583:1422–1426. [PubMed]
11. Shi W, Hendrix D, Levine M, Haley B. A distinct class of small RNAs arises from pre-miRNA-proximal regions in a simple chordate. Nat. Struct. Mol. Biol. 2009;16:183–189. [PMC free article] [PubMed]
12. Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res. 2007;17:1850–1864. [PMC free article] [PubMed]
13. Kawamura Y, Saito K, Kin T, Ono Y, Asai K, Sunohara T, Okada TN, Siomi MC, Siomi H. Drosophila endogenous small RNAs bind to Argonaute 2 in somatic cells. Nature. 2008;453:793–797. [PubMed]
14. Chung WJ, Okamura K, Martin R, Lai EC. Endogenous RNA interference provides a somatic defense against Drosophila transposons. Curr. Biol. 2008;18:795–802. [PMC free article] [PubMed]
15. Czech B, Malone CD, Zhou R, Stark A, Schlingeheyde C, Dus M, Perrimon N, Kellis M, Wohlschlegel JA, Sachidanandam R, et al. An endogenous small interfering RNA pathway in Drosophila. Nature. 2008;453:798–802. [PMC free article] [PubMed]
16. Ruby JG, Jan C, Player C, Axtell MJ, Lee W, Nusbaum C, Ge H, Bartel DP. Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenous siRNAs in C. elegans. Cell. 2006;127:1193–1207. [PubMed]
17. Batista PJ, Ruby JG, Claycomb JM, Chiang R, Fahlgren N, Kasschau KD, Chaves DA, Gu W, Vasale JJ, Duan S, et al. PRG-1 and 21U-RNAs interact to form the piRNA complex required for fertility in C. elegans. Mol. Cell. 2008;31:67–78. [PMC free article] [PubMed]
18. Kato M, de Lencastre A, Pincus Z, Slack FJ. Dynamic expression of small non-coding RNAs, including novel microRNAs and piRNAs/21U-RNAs, during Caenorhabditis elegans development. Genome Biol. 2009;10:R54. [PMC free article] [PubMed]
19. Axtell MJ, Jan C, Rajagopalan R, Bartel DP. A two-hit trigger for siRNA biogenesis in plants. Cell. 2006;127:565–577. [PubMed]
20. Rajagopalan R, Vaucheret H, Trejo J, Bartel DP. A diverse and evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev. 2006;20:3407–3425. [PMC free article] [PubMed]
21. Kasschau KD, Fahlgren N, Chapman EJ, Sullivan CM, Cumbie JS, Givan SA, Carrington JC. Genome-wide profiling and analysis of Arabidopsis siRNAs. PLoS Biol. 2007;5:e57. [PMC free article] [PubMed]
22. Backman TW, Sullivan CM, Cumbie JS, Miller ZA, Chapman EJ, Fahlgren N, Givan SA, Carrington JC, Kasschau KD. Update of ASRP: the Arabidopsis small RNA Project database. Nucleic Acids Res. 2008;36:D982–D985. [PMC free article] [PubMed]
23. Ender C, Krek A, Friedlander MR, Beitzinger M, Weinmann L, Chen W, Pfeffer S, Rajewsky N, Meister G. A human snoRNA with microRNA-like functions. Mol. Cell. 2008;32:519–528. [PubMed]
24. Lee HC, Chang SS, Choudhary S, Aalto AP, Maiti M, Bamford DH, Liu Y. qiRNA is a new type of small interfering RNA induced by DNA damage. Nature. 2009;459:274–277. [PMC free article] [PubMed]
25. He Y, Vogelstein B, Velculescu VE, Papadopoulos N, Kinzler KW. The antisense transcriptomes of human cells. Science. 2008;322:1855–1857. [PMC free article] [PubMed]
26. Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008;321:956–960. [PubMed]
27. Core LJ, Waterfall JJ, Lis JT. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science. 2008;322:1845–1848. [PMC free article] [PubMed]
28. Buratowski S. Transcription. Gene expression—where to start? Science. 2008;322:1804–1805. [PMC free article] [PubMed]
29. Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, Faulkner GJ, Lassmann T, Forrest AR, Grimmond SM, Schroder K, et al. Tiny RNAs associated with transcription start sites in animals. Nature Genet. 2009;41:572–578. [PubMed]
30. Yang JH, Zhang XC, Huang ZP, Zhou H, Huang MB, Zhang S, Chen YQ, Qu LH. snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome. Nucleic Acids Res. 2006;34:5112–5123. [PMC free article] [PubMed]
31. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res. 2007;35:D760–D765. [PMC free article] [PubMed]
32. Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al. Ensembl 2009. Nucleic Acids Res. 2009;37:D690–D697. [PMC free article] [PubMed]
33. Kuhn RM, Karolchik D, Zweig AS, Wang T, Smith KE, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pheasant M, et al. The UCSC Genome Browser Database: update 2009. Nucleic Acids Res. 2009;37:D755–D761. [PMC free article] [PubMed]
34. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008;36:D154–D158. [PMC free article] [PubMed]
35. Lestrade L, Weber MJ. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006;34:D158–D162. [PMC free article] [PubMed]
36. Brown JW, Echeverria M, Qu LH, Lowe TM, Bachellerie JP, Huttenhofer A, Kastenmayer JP, Green PJ, Shaw P, Marshall DF. Plant snoRNA database. Nucleic Acids Res. 2003;31:432–435. [PMC free article] [PubMed]
37. Chan PP, Lowe TM. GtRNAdb: a database of transfer RNA genes detected in genomic sequence. Nucleic Acids Res. 2009;37:D93–D97. [PMC free article] [PubMed]
38. Jurka J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 2000;16:418–420. [PubMed]
39. Smit AFA, Hubley R, Green P. 1996–2007. [(2 November 2009, date last accessed)]. RepeatMasker Open-3.0. http://www.repeatmasker.org.
40. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. 2008;36:D1009–D1014. [PMC free article] [PubMed]
41. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
42. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. [PMC free article] [PubMed]
43. Kawahara Y, Zinshteyn B, Sethupathy P, Iizasa H, Hatzigeorgiou AG, Nishikura K. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science. 2007;315:1137–1140. [PMC free article] [PubMed]
44. Reid JG, Nagaraja AK, Lynn FC, Drabek RB, Muzny DM, Shaw CA, Weiss MK, Naghavi AO, Khan M, Zhu H, et al. Mouse let-7 miRNA populations exhibit RNA editing that is constrained in the 5′-seed/cleavage/anchor regions and stabilize predicted mmu-let-7a: mRNA duplexes. Genome Res. 2008;18:1571–1581. [PMC free article] [PubMed]
45. Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31:3429–3431. [PMC free article] [PubMed]
46. Gabel HW, Ruvkun G. The exonuclease ERI-1 has a conserved dual role in 5.8S rRNA processing and RNAi. Nat. Struct. Mol. Biol. 2008;15:531–533. [PMC free article] [PubMed]
47. Okamura K, Balla S, Martin R, Liu N, Lai EC. Two distinct mechanisms generate endogenous siRNAs from bidirectional transcription in Drosophila melanogaster. Nat. Struct. Mol. Biol. 2008;15:998. [PubMed]
48. Borsani O, Zhu J, Verslues PE, Sunkar R, Zhu JK. Endogenous siRNAs derived from a pair of natural cis-antisense transcripts regulate salt tolerance in Arabidopsis. Cell. 2005;123:1279–1291. [PMC free article] [PubMed]
49. Fejes-Toth K, Kapranov P, Foissac SK, Sotirova V, Sachidanandam R, Willingham AT, Duttagupta R, Dumais E, Hannon GJ, Gingeras TR. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. [PMC free article] [PubMed]
50. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermuller J, Hofacker IL, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. [PubMed]
51. Klattenhoff C, Theurkauf W. Biogenesis and germline functions of piRNAs. Development. 2008;135:3–9. [PubMed]
52. Aravin AA, Hannon GJ, Brennecke J. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science. 2007;318:761–764. [PubMed]
53. Lin H. piRNAs in the germ line. Science. 2007;316:397. [PubMed]
54. Llave C, Kasschau KD, Rector MA, Carrington JC. Endogenous and silencing-associated small RNAs in plants. Plant Cell. 2002;14:1605–1619. [PMC free article] [PubMed]
55. Reinhart BJ, Weinstein EG, Rhoades MW, Bartel B, Bartel DP. MicroRNAs in plants. Genes Dev. 2002;16:1616–1626. [PMC free article] [PubMed]
56. Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B, Gaasterland T, Meyer J, Tuschl T. The small RNA profile during Drosophila melanogaster development. Dev. Cell. 2003;5:337–350. [PubMed]
57. Buhler M, Spies N, Bartel DP, Moazed D. TRAMP-mediated RNA surveillance prevents spurious entry of RNAs into the Schizosaccharomyces pombe siRNA pathway. Nat. Struct. Mol. Biol. 2008;15:1015–1023. [PMC free article] [PubMed]
58. Tam OH, Aravin AA, Stein P, Girard A, Murchison EP, Cheloufi S, Hodges E, Anger M, Sachidanandam R, Schultz RM, et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature. 2008;453:534–538. [PMC free article] [PubMed]
59. Watanabe T, Totoki Y, Toyoda A, Kaneda M, Kuramochi-Miyagawa S, Obata Y, Chiba H, Kohara Y, Kono T, Nakano T, et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature. 2008;453:539–543. [PubMed]
60. Ghildiyal M, Seitz H, Horwich MD, Li C, Du T, Lee S, Xu J, Kittler EL, Zapp ML, Weng Z, et al. Endogenous siRNAs derived from transposons and mRNAs in Drosophila somatic cells. Science. 2008;320:1077–1081. [PMC free article] [PubMed]
61. Faulkner GJ, Kimura Y, Daub CO, Wani S, Plessy C, Irvine KM, Schroder K, Cloonan N, Steptoe AL, Lassmann T, et al. The regulated retrotransposon transcriptome of mammalian cells. Nature Genet. 2009;41:563–571. [PubMed]
62. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. [PMC free article] [PubMed]
63. Suzuki H, Forrest AR, van Nimwegen E, Daub CO, Balwierz PJ, Irvine KM, Lassmann T, Ravasi T, Hasegawa Y, de Hoon MJ, et al. The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line. Nature Genet. 2009;41:553–562. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...