![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2008 The Author(s) Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs Bioinformatics and Pattern Discovery Group, IBM Thomas J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA *To whom correspondence should be addressed. Phone: +(914) 945 1384, Fax: +(914) 945 4217, Email: rigoutso/at/us.ibm.com Received February 18, 2008; Accepted March 19, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract We identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. We discovered that these motifs have virtually no overlap with intronic sequences that are conserved between human and mouse, and thus are genome-specific. Moreover, we found that these motifs span a substantial fraction of previously uncharacterized human and mouse intronic space. Surprisingly, we found that these genome-specific motifs are over-represented in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes even though the underlying sequences are not conserved between the two genomes. In fact, the processes and functions that are linked to these genome-specific sequence-motifs are distinct from the processes and functions which are associated with intronic regions that are conserved between human and mouse. The findings show that intronic regions from different genomes are linked to the same processes and functions in the absence of underlying sequence conservation. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1. INTRODUCTION Most of the searches for regulatory sequences have operated on the premise that functional motifs ought to be conserved across orthologous sequences (1–7). This cross-species conservation constraint has frequently proven to be a sufficient condition for the identification of regulatory regions. However, there is increasing evidence that such a prerequisite may not be necessary (8–12). Of late, the search for putative regulatory sequences has extended to introns: even though they were predicted to harbor regulatory signals (13,14) some of which were subsequently discovered (2,11,15–25), the true significance of introns remains elusive. The more recent interest in introns has been fueled in part by the discovery in them of microRNA precursors that may not always engage the canonical microRNA pathway (23,26). Parallel work has begun revealing a complex picture of the organization and functional richness of genomes. Arguably, the ENCODE project provided the latest major discoveries along those lines. In particular, it was found that the human genome is massively transcribed in a complex manner (27–29). Following this and related work, it is now clear that eukaryotic genomes must contain more functional elements than previously estimated. Human genome regions can be classified into three broad categories with respect to the extent of their evolutionary conservation and their coding potential: (a) sequences that are under strong evolutionary constraints and represent ~5% of the human genome (18,22); (b) conserved non-exonic sequences that are more frequent than expected (30) but do not necessarily comprise functional elements (31); and (c) non-conserved, non-exonic sequences, a category with an unexpected high number of functional elements (29). Adding to this complex landscape, several repeat element fragments appear to undergo strong purifying selection and to be exapted into functional elements (1,20,32–35). Moreover, conserved non-exonic regions with repetitive origins appear near developmental genes suggesting that mobile elements may play a role in gene regulation (32), whereas a special class of fairly long stretches of DNA, termed ultraconserved elements, have been found to be exceptionally well conserved across several genomes (2). In this discussion, we present results from our exploration of intronic space in the human and mouse genomes. We analyzed human and mouse intronic sequences using ‘pyknons’ as our tool; pyknons are previously described sets of very frequent genome-specific DNA sequence motifs that were shown to have a number of interesting and functionally meaningful properties (11,36). The analysis that follows shows that pyknons span a substantial fraction of previously uncharacterized intronic space. Additionally, as a set, pyknons are distinct from repeat elements. The notable finding of the analysis presented below is that in both the human and mouse genomes these very frequent motifs are over-represented in the introns of genes belonging to the same set of biological processes and molecular functions, even though the underlying sequences are not conserved. Moreover, the intronic instances of these motifs are linked to processes and functions that do not overlap with the processes and functions which are linked to intronic regions that are conserved between human and mouse. Finally, we show that a subset of the pyknons co-localizes extensively with human and mouse piRNAs (37,38) inside human and mouse intronic sequences respectively. The presentation concludes with a discussion of the potential relevance of these findings in the disease context by analyzing the introns of the microsatellite instability gene MLH1. Our results suggest that extended regions of human and mouse introns are involved in conserved functional links that do not depend on underlying sequence conservation. METHODS Data sources We obtained human and mouse chromosomal sequences and genomic region coordinates for transcripts, exons, 5′UTRs, CDSs and 3′UTRs as well as GO annotations from ENSEMBL release 42. Human/mouse pairwise alignments and repeat regions corresponding to the same genome assembly version (NCBI36) were obtained with the help of the UCSC Genome Browser. The human and mouse piRNA sequences were obtained from the supplementary material of previous work (37,38). Computing the pyknon sets Pyknons were recomputed to reflect changes from ENSEMBL Release 39 to Release 42 (39). For this, we used the parallel version of a pattern discovery algorithm that we developed earlier (40). The input comprised the intergenic and intronic sequences of the human and mouse genomes respectively but excluded intergenic and intronic segments that were the reverse complement of the 5′ untranslated, amino acid coding or 3′ untranslated regions of some human gene; more details can be found elsewhere (11). This exclusion ensures that any discovered patterns are not connected to the sequences of known genes, protein motifs or domains, or to the reverse complement of such sequences. The pattern discovery algorithm that we used for this analysis requires the setting of three parameters: L, W and K. The parameter L controls the minimum possible size of the discovered patterns but has no bearing on the patterns’ maximum length; the latter is not constrained in any way. The parameter W satisfies the inequality W ≥ L and controls the ‘degree of conservation’ across the various instances of the reported patterns: smaller (resp. larger) values of W will tolerate fewer (resp. more) mismatches across the instances. Since we are interested in only patterns with identically conserved instances, we set W = L (i.e. the discovered patterns contained no ‘wildcards’). The parameter K controls the minimum required number of appearances before a pattern can be reported by the algorithm. For a given choice of L, W and K the algorithm guarantees the reporting of all patterns that have K or more appearances in the processed input and are such that any L consecutive (but not necessarily contiguous) positions span at most W positions. Human and mouse pyknons were computed using L = 16, W = 16 and K = 30. These values of L and K ensure statistical significance (11). Computing region overlaps For any given pair of regions (for example pyknon and repeat regions), we computed their overlap by counting the number of positions in the genome belonging to both sets. All sequences were compared with one another in their 5′ → 3′ direction. As a preprocessing step, we converted each set of regions into a non-redundant set of non-overlapping sequences to avoid double-counting (e.g. different transcripts of the same gene, or genes that overlap). The probability of achieving a given overlap at random given the frequencies of the two sets was computed using the hypergeometric distribution. Analyzing gene GO terms For each gene, we computed: (i) its intronic sequence, i.e. the union of the introns of all of its transcripts, (ii) its associated GO term set, i.e. the union of the GO term sets associated with its transcripts and (iii) the concentration of pyknons, conserved elements and repeats in the gene's intronic sequence, i.e. the number of nucleotides of each type of region that lie inside the gene's intronic sequence, divided by the size of the intronic sequence. Concentration is defined as the fraction of the total intronic sequence of a given gene that is covered by a given set of elements, such as conserved regions, repeats, pyknons, or combinations thereof. We tested each GO term x separately for enrichment by comparing two distributions of concentrations: the distribution of concentrations of the genes that have term x on their list versus the distribution of concentrations of the genes that do not have x. First, we compared the distributions with a t-test statistic using Student's t-distribution as an approximation. In addition, we made use of random permutations and found that the generated results are in agreement with those that we obtained from the t-test analysis. This analysis yielded the initial P-values: using global random permutations we subsequently determined the appropriate P-value cutoff to ensure a 5% false discovery rate. Finally, a stability test confirmed that the low P-values were not due to the presence of a few extreme values of the distribution of concentrations. Locating piRNA instances in introns To locate piRNA instances or their reverse complements in intronic regions, we slide a window of size equal to each piRNA's length along these regions. For each placement of the window, the sequence of the piRNA (or its reverse complement) is compared to the underlying sequence substring and a similarity computed as the fraction of matching nucleotides. As pointed out earlier—see Supplementary Data 1 of (38), the pyrosequencing-derived piRNA sequences that were kept and reported contained less than or equal to two defects (≤~5% error or ≥~95% similarity) along the length of a read. Consequently, in our searches of intronic sequences, we permitted at most two nucleotide mismatches along the length of a piRNA. As control, we generated a shuffled version of the intronic regions and sought piRNA instances therein using the same exact criteria. The number of instances found in the shuffled intronic sequences provided an estimate for the expected number of false positives: at the 95% similarity threshold that we used, the false positive error rate was less than 0.00005 for both human and mouse. RESULTS We processed the sequences of the human and mouse genomes using the previously outlined pyknon discovery methodology—see Methods section as well as ref. (11)—and generated the corresponding pyknon sets. By definition, each pyknon is a recurrent motif whose sequence has a minimum length, a minimum number of intact copies in the intergenic and intronic regions of the genome, and at least one additional copy in an exonic region. The choices for minimum length (≥16 nucleotides) and minimum copy number (≥30 intact copies) ensure the pyknons’ statistical significance (11). It should be stressed that pyknons are discovered by processing a genome in isolation: consequently, their sequences are not necessarily conserved in other genomes or present in cross-species aligned sequences (11). The human and mouse pyknon sets contain 209 432 and 128 064 members, respectively. These pyknons are predominantly short (~16–17 nucleotides). Moreover, to the extent that it can be deduced using RNA folding programs, pyknons do not exhibit any characteristic secondary structure. With respect to composition, the pyknons’ A–T composition is essentially identical to that of the entire genome (A = 30.4%, C = 20.0%, G = 20.6% and T = 28.8% for the pyknons versus A = ~29.5%, C = ~20.5%, G = ~20.5% and T = ~29.5% for the human genome). Finally, as reported earlier (11), a large fraction of pyknons (~75%) have at least 100 exact intergenic and/or intronic copies. Intronic instances of pyknons are distinct from human-mouse conserved regions and from known repeat elements In order to simplify the presentation, we introduce and define what we will refer to as the ‘intra-genomic conservation’ model. This genome-centered ‘conservation’ manifests itself in the form of sequence fragments with multiple, intact instances in the genome under consideration. These sequence fragments are assumed to have a minimum length and a minimum number of copies. Figure 1
Do intra-genomically conserved pyknons arise in genomic neighborhoods that were not previously characterized? We addressed this by measuring (i) the extent of overlap between pyknon instances and cross-species conserved regions; and, (ii) the extent of overlap between pyknon instances and repeat regions. Since it is generally not the case that the set of pyknons contains both a sequence and its reverse complement, we computed this overlap by comparing all sequences in 5′ → 3′ direction). Figure 2
We next extended this decomposition analysis to the intergenic and exonic regions of human and mouse. The results are shown in Figure 3 Figures 2 Do pyknons merely reflect genomic oddities, or are they linked, somehow, to specific biological processes and molecular functions? We examine this next. In order to ensure that our findings pertain to sequences that are transcribed, the rest of the analysis focuses solely on human and mouse introns. Intronic instances of pyknons are linked to the same processes and functions in human and mouse even though the underlying sequences are genome-specific and thus not conserved To determine potential associations with biological processes and molecular functions, we performed an analysis of the GO terms (41) with which human and mouse genes are tagged. We labeled each gene's introns with the GO terms of the corresponding gene products and separately tested whether conserved regions, repeats and pyknon instances show a higher-than-random concentration in the introns of genes associated with certain GO terms (see Methods section). It is important to stress that, had we used actual coverage (i.e. the number of covered nucleotides) instead of concentration, longer introns would have been favored and the GO analysis would have simply rediscovered the well-known fact that genes associated with certain GO terms (e.g. development) tend to have much longer transcripts. Indeed, in all the regions we analyzed for GO term enrichment, coverage values highly correlate with gene length: correlation >0.90 in all cases. On the other hand, concentration values do not correlate with gene length: the absolute value of the correlation was <0.05 in all cases. First, we explored the possibility of links between human-mouse conserved intronic regions and GO terms. In both human and mouse, we identified more than 500 GO terms (at different levels of the GO hierarchy) that are significantly enriched in intronic regions conserved between human and mouse. For clarity, Table 1A includes only biological processes from the top three levels of the GO hierarchy that are enriched in conserved human and mouse introns (see Supplementary Data for complete table). Comparing the full lists of significantly enriched GO terms from the human and mouse analyses shows that they are 83% similar (Table 2): this result is not surprising since the conserved elements come from aligned orthologous sequences. Here, we define similarity as the percentage of GO terms in the shorter of the two lists that is common to both lists.
Next, we repeated the analysis separately for the intronic instances of the human and mouse pyknons and identified more than 200 significantly enriched GO terms in each of the two genomes. Table 1B includes only the high-level biological processes (see Supplementary Data for complete table). To ensure that the observed pyknon-related GO term enrichment is not due to the conserved elements (42) that co-localize with pyknons (0.7% and 0.4% overlap in human and mouse, respectively—see Figure 2 The first notable result of our study stems from the comparison of the two complete lists (see Supplementary Data) of significantly enriched GO terms in human and mouse pyknons: in fact, we find that these lists are 75% similar (Table 2). If we consider only high-level biological processes, the human and mouse lists are identical—these shorter lists were presented in Table 1B. What is particularly surprising here is that the same GO terms are enriched in human and mouse introns despite the fact that pyknons do not lie inside intronic regions that are conserved between human and mouse. Note that this functional connection resulted from the analysis of the intronic regions of gene transcripts that contain instances of pyknon sequences; as such it is orthogonal to analogous exonic findings that we described earlier (11). Cross-genome-conserved sequences and intra-genome-conserved pyknons respectively are linked to non-overlapping lists of processes and functions The second notable result of our study arises from the comparison of the enriched GO terms that are associated with intronic ‘conserved’ regions and with intronic ‘pyknon’ instances, respectively: we find that the overlap of the two lists of GO terms is very small and ranges from 0% to 4% (Table 2). This is remarkable because it suggests that distinct intronic sequences are linked to distinct regulatory networks in the human and mouse genomes. Table 3 summarizes these results in the context of the classical inter-genomic model (intronic sequences conserved between human and mouse) and the intra-genomic conservation (intronic space covered by organism-specific pyknon sequences) that we introduced above.
A subset of pyknons co-localizes extensively with piRNAs inside human and mouse introns We note that Table 1B includes ‘meiosis’ as one of the cellular processes associated with the intronic instances of pyknons. Recently, a new class of short RNAs, the piRNAs, was found to accumulate at the onset of meiosis and was reported in three different organisms namely, human, mouse and rat (37,38,43). The distinct association of piRNAs with the meiotic step and the fact that some of the cloned mouse sequences were reported to map to introns (38), led us to investigate the possibility of a pyknon–piRNA connection in intronic sequences. After locating all the piRNA instances in introns, we calculated their overlap with conserved regions, repeats, and pyknons, generating relative enrichment values over what would be expected randomly. We repeated the same analysis for the reverse complement of piRNAs as well as for the intersection of piRNAs and their reverse complements, i.e. the intronic regions that are covered by a piRNA and the reverse complement of a (possibly different) piRNA. The results are summarized in Supplementary Figure 1A. Essentially, we find that piRNA instances as well as the instances of their reverse complements are depleted in conserved intronic regions, somewhat enriched in repeat elements present in introns, and highly enriched in intronic pyknon instances. Again, all sequence comparisons are done in 5′→3′ direction. Supplementary Figure 1B shows the ‘recall’ percentages of piRNAs by conserved, repeat and pyknon elements. The finding is that the computed recall figures indicate that pyknon elements capture (i.e. describe) piRNAs much better than conserved regions and repeat elements: the statistical significance of the overlap of the intronic piRNA instances with intronic pyknons and intronic repeats has P-values of ~10−10 and ~10−3, respectively. Even though a large fraction of piRNAs co-localizes with pyknons, the converse is not true. Indeed, only a small fraction of the intronic regions occupied by human and mouse pyknons co-localizes with piRNAs (9% and 12% for human and mouse respectively at ≥95% similarity). In other words, our pyknon collections contain many sequences that do not co-localize with piRNAs. Does this mean that pyknons merely correspond to piRNAs that have not yet been sequenced? Or, do pyknons capture molecular classes beyond piRNAs? If we only consider pyknons that are not similar to the known piRNAs at the sequence level, most of the previously enriched GO terms survive but ‘meiosis’ now disappears from the list of significant GO terms. The latter result holds true even when we permit a false discovery rate as high as 50%. Given that piRNAs have only been found during meiosis, and therefore play a role during this process, the fact that the subset of pyknons which are not associated with the known piRNAs is not linked to meiosis, suggests that the sequences of the pyknons capture piRNAs but also other currently unidentified categories of molecules. Case study: pyknons, introns, piRNAs and the MLH1 gene Arguably, the picture that is emerging from the above analysis is complicated. We highlight this observation with a concrete example that also shows the relevance of these results in the disease context. The complete list of GO terms (see Supplementary Data) that are significantly enriched in pyknon-containing regions of the human and mouse introns includes the terms: ‘GO:0006281/DNA repair’ and ‘GO:0006298/mismatch repair’. We emphasize that these two terms are uniquely associated with pyknons as is shown in the Supplementary Data; thus, the results that we describe next are neither associated with conserved regions nor with known repeat elements. A search of the ENSEMBL database (39) for human genes labeled with these two GO terms identifies a number of entries; among them is MLH1, a gene that has been associated with hereditary non-polyposis colorectal cancer and other types of carcinomas, microsatellite instabilities, etc. (44–47). The human MLH1 transcript has 17 introns whereas its mouse orthologue has 18. Table 4 lists a few examples of human and mouse pyknons that are present in the introns of MLH1: as can be seen, their distribution and copy numbers across MLH1's introns is rather complex. Also shown is the total number of genomic copies of each listed pyknon: we provide this number only as a reference since, as we showed above, the pyknons are already over-represented in the introns of genes belonging to specific GO processes. We further examined whether the 10 human pyknons shown in Table 4 had any instances in the mouse introns of MLH1: we found no such cases even when we allowed as many as 15% of the pyknon positions to be mismatched. Respectively, we examined whether the four mouse pyknons shown in Table 4 had any instances in the human introns of MLH1: again, we found no such cases even when we allowed as many as 15% of the pyknon positions to be mismatched. In addition to their MLH1 instances, each of the shown pyknons has thousands of intact copies in other parts of the corresponding genome. It should also be pointed out that several of the listed pyknons are reverse complement pairs (e.g. GTATTTTTAGTAGAGA and TCTCTACTAAAAATAC), with the members of these pairs having generally different copy numbers that appear in either the same or in different introns of MLH1. Finally, we note that 17 known human piRNAs and 23 known mouse piRNAs can be found intact, in either sense or antisense direction, in nine human introns and six mouse introns of MLH1, respectively. The sequences and the identity of these introns are shown in Table 5.
DISCUSSION Beginning in the early 1980s with the analysis of amino acid coding sequences (48) an argument was made in support of the hypothesis that ‘sequence conservation implies functional conservation’. The hypothesis was quickly extended to include non-coding sequences and has since been fueling the biological sequence analysis revolution (49). Underlying this hypothesis is an inter-genomic model of conservation according to which genomic regions with functional significance undergo negative selection. Contrasting this, recent work showed a first example that the same type of functional information may exist in multiple genomes in the absence of discernible underlying sequence conservation (9,12). The analysis that we presented above proceeded along similar lines: using organism-specific pyknon sequences from the human and mouse genomes, we demonstrated that functional conservation in the absence of sequence conservation is rather pronounced. Our results revealed an important role for what we defined as the ‘intra-genomic conservation’ model and led to the following surprising result: although pyknons are present in intronic regions that are not conserved between human and mouse they nonetheless exhibit a preference for the introns of genes belonging to the same biological processes and molecular functions in these two genomes. Analogously, intronic regions that are conserved between human and mouse are also associated with specific biological processes and functions. However, it is very notable that these two sets of processes and functions have an in-significant overlap; this indicates that distinct intronic regions in human and mouse are associated with distinct biological processes and molecular functions, suggesting the involvement of introns in regulation. Our findings have intriguing implications for intronic evolution. With respect to the conserved intronic regions that we examined, it is apparent that these can be traced back to a common ancestor of the human and mouse genomes. In contrast, the intronic regions that correspond to pyknon instances suggest a substantially more complicated situation: their extent and the high number of the pyknons’ genomic copies suggest that the same basic mechanism may be in action in both the human and the mouse genome. This presumed mechanism, operating on sequences that are not conserved in these two genomes, has given rise to the currently extant collection of introns. It is interesting to note that this presumed mechanism appears to preferentially ‘target’ (actively or passively) the introns of genes that are linked to specific functions, giving rise to the entries of Table 1B (as opposed to the entries of Table 1A). It is not clear at the moment how this presumed mechanism has managed, by acting in an apparently independent manner in two distinct genomes, to ‘delineate’ the pyknons sequences and to ‘arrange’ them inside the transcripts of genes in a manner that for both the human and mouse genomes favors destinations belonging to the same set of processes and functions. These functional links exist without conservation of the underlying sequence, and are in agreement with current thinking that sequence conservation is not a prerequisite for functional relevance (8). Evidence has been steadily accumulating in support of a functional significance for introns (50): a large fraction of the known microRNAs as well as snoRNAs originate in intronic space; mutations in intronic sequence have been linked to desirable phenotypes; ncRNA with currently uncharacterized regulatory role has been found to originate in intronic space; etc. More recently, intronic sequences were linked to a putative regulatory mechanism for modulating the membrane properties and ion channel gradients of hippocampal neurons (51). Such findings together with the ones that we have presented above support a much more active role for introns. This role is perhaps part of a much more pronounced RNA-driven layer of regulation, as conjectured earlier (52). SUPPLEMENTARY DATA At the website http://cbcsrv.watson.ibm.com/pyknons_introns.html the user can access the human and pyknon sequences discussed above. [Supplementary Data 1]
ACKNOWLEDGEMENTS The authors would like to thank Tien Huynh for his assistance with some of the computations. Funding to pay the Open Access publication charges for this article was provided by provided by IBM Research. Conflict of interest statement. None declared. REFERENCES 1. Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Rubin EM, Kent WJ, Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006;441:87–90. [PubMed] 2. Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D. Ultraconserved elements in the human genome. Science. 2004;304:1321–1325. [PubMed] 3. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003;423:241–254. [PubMed] 4. Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V, Lindblad-Toh K, Lander ES, Kellis M. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature. 2005;434:338–345. [PubMed] 5. Mattick JS, Makunin IV. Non-coding RNA. Human Mol. Genet. 2006;15(Spec No 1):R17–R29. [PubMed] 6. Lai EC, Burks C, Posakony JW. The K box, a conserved 3′ UTR sequence motif, negatively regulates accumulation of enhancer of split complex transcripts. Development. 1998;125:4077–4088. [PubMed] 7. Dubchak I, Brudno M, Loots GG, Pachter L, Mayor C, Rubin EM, Frazer KA. Active conservation of noncoding sequences revealed by three-way species comparisons. Genome Res. 2000;10:1304–1306. [PubMed] 8. Pang KC, Frith MC, Mattick JS. Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. Trends Genet. 2006;22:1–5. [PubMed] 9. Fisher S, Grice EA, Vinton RM, Bessling SL, McCallion AS. Conservation of RET regulatory function from human to zebrafish without sequence similarity. Science. 2006;312:276–279. [PubMed] 10. Miranda KC, Huynh T, Tay Y, Ang YS, Tam WL, Thomson AM, Lim B, Rigoutsos I. A pattern-based method for the identification of MicroRNA binding sites and their corresponding heteroduplexes. Cell. 2006;126:1203–1217. [PubMed] 11. Rigoutsos I, Huynh T, Miranda K, Tsirigos A, McHardy A, Platt D. Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. Proc. Natl Acad. Sci. USA. 2006;103:6605–6610. [PubMed] 12. McGaughey DM, Vinton RM, Huynh J, Al-Saif A, Beer MA, McCallion AS. Metrics of sequence constraint overlook regulatory sequences in an exhaustive analysis at phox2b. Genome Res. 2008;18:252–260. [PubMed] 13. Mattick JS. Introns: evolution and function. Curr. Opin. Genet. Dev. 1994;4:823–831. [PubMed] 14. Mattick JS, Gagen MJ. The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol. Biol. Evol. 2001;18:1611–1630. [PubMed] 15. Simons C, Pheasant M, Makunin IV, Mattick JS. Transposon-free regions in mammalian genomes. Genome Res. 2006;16:164–172. [PubMed] 16. Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature. 2006;443:167–172. [PubMed] 17. Sironi M, Menozzi G, Comi GP, Cagliani R, Bresolin N, Pozzoli U. Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Hum. Mol. Genet. 2005;14:2533–2546. [PubMed] 18. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005;15:1034–1050. [PubMed] 19. Nakaya HI, Amaral PP, Louro R, Lopes A, Fachel AA, Moreira YB, El-Jundi TA, da Silva AM, Reis EM, Verjovski-Almeida S. Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription. Genome Biol. 2007;8:R43. [PubMed] 20. Brosius J. RNAs from all categories generate retrosequences that may be exapted as novel genes or regulatory elements. Gene. 1999;238:115–134. [PubMed] 21. Dixon RJ, Eperon IC, Samani NJ. Complementary intron sequence motifs associated with human exon repetition: a role for intragenic, inter-transcript interactions in gene expression. Bioinformatics. 2007;23:150–155. [PubMed] 22. Margulies EH, Cooper GM, Asimenos G, Thomas DJ, Dewey CN, Siepel A, Birney E, Keefe D, Schwartz AS, Hou M, et al. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 2007;17:760–774. [PubMed] 23. Ruby JG, Jan CH, Bartel DP. Intronic microRNA precursors that bypass Drosha processing. Nature. 2007;448:83–86. [PubMed] 24. Sironi M, Menozzi G, Comi GP, Cagliani R, Bresolin N, Pozzoli U. Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Human Mol. Genet. 2005;14:2533–2546. [PubMed] 25. Storici F, Bebenek K, Kunkel TA, Gordenin DA, Resnick MA. RNA-templated DNA repair. Nature. 2007;447:338–341. [PubMed] 26. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004;116:281–297. [PubMed] 27. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT, Stadler PF, Hertel J, Hackermueller J, Hofacker IL, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. [PubMed] 28. Kapranov P, Willingham AT, Gingeras TR. Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 2007;8:413–423. [PubMed] 29. ENCODE Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. [PubMed] 30. Dermitzakis ET, Reymond A, Antonarakis SE. Conserved non-genic sequences – an unexpected feature of mammalian genomes. Nat. Rev. Genet. 2005;6:151–157. [PubMed] 31. Nobrega MA, Zhu Y, Plajzer-Frick I, Afzal V, Rubin EM. Megabase deletions of gene deserts result in viable mice. Nature. 2004;431:988–993. [PubMed] 32. Lowe CB, Bejerano G, Haussler D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl Acad. Sci. USA. 2007;104:8005–8010. [PubMed] 33. Iwashita S, Osada N, Itoh T, Sezaki M, Oshima K, Hashimoto E, Kitagawa-Arita Y, Takahashi I, Masui T, Hashimoto K, et al. A transposable element-mediated gene divergence that directly produces a novel type bovine Bcnt protein including the endonuclease domain of RTE-1. Mol. Biol. Evol. 2003;20:1556–1563. [PubMed] 34. Lev-Maor G, Sorek R, Shomron N, Ast G. The birth of an alternatively spliced exon: 3′ splice-site selection in Alu exons. Science. 2003;300:1288–1291. [PubMed] 35. Gotea V, Makalowski W. Do transposable elements really contribute to proteomes? Trends Genet. 2006;22:260–267. [PubMed] 36. Meynert A, Birney E. Picking pyknons out of the human genome. Cell. 2006;125:836–838. [PubMed] 37. Aravin A, Gaidatzis D, Pfeffer S, Lagos-Quintana M, Landgraf P, Iovino N, Morris P, Brownstein MJ, Kuramochi-Miyagawa S, Nakano T, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006;442:203–207. [PubMed] 38. Girard A, Sachidanandam R, Hannon GJ, Carmell MA. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature. 2006;442:199–202. [PubMed] 39. Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E. The Ensembl core software libraries. Genome Res. 2004;14:929–933. [PubMed] 40. Rigoutsos I, Floratos A. Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics. 1998;14:55–67. [PubMed] 41. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. [PubMed] 42. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. [PubMed] 43. Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, Kingston RE. Characterization of the piRNA complex from rat testes. Science. 2006;313:363–367. [PubMed] 44. Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, Kane M, Earabino C, Lipford J, Lindblom A. Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non-polyposis colon cancer. Nature. 1994;368:258–261. [PubMed] 45. Nystroem-Lahti M, Kristo P, Nicolaides NC, Chang SY, Aaltonen LA, Moisio AL, Jaervinen HJ, Mecklin JP, Kinzler KW, Vogelstein B. Founding mutations and Alu-mediated recombination in hereditary colon cancer. Nat. Med. 1995;1:1203–1206. [PubMed] 46. Leung SY, Yuen ST, Chung LP, Chu KM, Chan AS, Ho JC. hMLH1 promoter methylation and lack of hMLH1 expression in sporadic gastric carcinomas with high-frequency microsatellite instability. Cancer Res. 1999;59:159–164. [PubMed] 47. Alazzouzi H, Domingo E, Gonzalez S, Blanco I, Armengol M, Espin E, Plaja A, Schwartz S, Capella G, Schwartz S., Jr. Low levels of microsatellite instability characterize MLH1 and MSH2 HNPCC carriers before tumor diagnosis. Hum. Mol. Genet. 2005;14:235–239. [PubMed] 48. Doolittle RF, Hunkapiller MW, Hood LE, Devare SG, Robbins KC, Aaronson SA, Antoniades HN. Simian sarcoma virus onc gene, v-sis, is derived from the gene (or genes) encoding a platelet-derived growth factor. Science. 1983;221:275–277. [PubMed] 49. Gusfield D. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge (England); New York: Cambridge University Press; 1997. 50. Mattick JS, Makunin IV. Small regulatory RNAs in mammals. Hum. Mol. Genet. 2005;14(Spec No 1):R121–R132. [PubMed] 51. Bell TJ, Miyashiro KY, Sul JY, McCullough R, Buckley PT, Jochems J, Meaney DF, Haydon P, Cantor C, Parsons TD, et al. Cytoplasmic BK(Ca) channel intron-containing mRNAs contribute to the intrinsic excitability of hippocampal neurons. Proc. Natl Acad. Sci. USA. 2008;105:1901–1906. [PubMed] 52. Mattick JS. A new paradigm for developmental biology. J. Exp. Biol. 2007;210:1526–1547. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nature. 2006 May 4; 441(7089):87-90.
[Nature. 2006]Science. 2004 May 28; 304(5675):1321-5.
[Science. 2004]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Nature. 2005 Mar 17; 434(7031):338-45.
[Nature. 2005]Hum Mol Genet. 2006 Apr 15; 15 Spec No 1():R17-29.
[Hum Mol Genet. 2006]Science. 2007 Jun 8; 316(5830):1484-8.
[Science. 2007]Nat Rev Genet. 2007 Jun; 8(6):413-23.
[Nat Rev Genet. 2007]Nature. 2007 Jun 14; 447(7146):799-816.
[Nature. 2007]Genome Res. 2005 Aug; 15(8):1034-50.
[Genome Res. 2005]Genome Res. 2007 Jun; 17(6):760-74.
[Genome Res. 2007]Nat Rev Genet. 2005 Feb; 6(2):151-7.
[Nat Rev Genet. 2005]Nature. 2004 Oct 21; 431(7011):988-93.
[Nature. 2004]Nature. 2007 Jun 14; 447(7146):799-816.
[Nature. 2007]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Cell. 2006 Jun 2; 125(5):836-8.
[Cell. 2006]Nature. 2006 Jul 13; 442(7099):203-7.
[Nature. 2006]Nature. 2006 Jul 13; 442(7099):199-202.
[Nature. 2006]Nature. 2006 Jul 13; 442(7099):203-7.
[Nature. 2006]Nature. 2006 Jul 13; 442(7099):199-202.
[Nature. 2006]Genome Res. 2004 May; 14(5):929-33.
[Genome Res. 2004]Bioinformatics. 1998; 14(1):55-67.
[Bioinformatics. 1998]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Nature. 2006 Jul 13; 442(7099):199-202.
[Nature. 2006]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Nature. 2002 Dec 5; 420(6915):520-62.
[Nature. 2002]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6605-10.
[Proc Natl Acad Sci U S A. 2006]Nature. 2006 Jul 13; 442(7099):203-7.
[Nature. 2006]Nature. 2006 Jul 13; 442(7099):199-202.
[Nature. 2006]Science. 2006 Jul 21; 313(5785):363-7.
[Science. 2006]Genome Res. 2004 May; 14(5):929-33.
[Genome Res. 2004]Nature. 1994 Mar 17; 368(6468):258-61.
[Nature. 1994]Nat Med. 1995 Nov; 1(11):1203-6.
[Nat Med. 1995]Cancer Res. 1999 Jan 1; 59(1):159-64.
[Cancer Res. 1999]Hum Mol Genet. 2005 Jan 15; 14(2):235-9.
[Hum Mol Genet. 2005]Science. 1983 Jul 15; 221(4607):275-7.
[Science. 1983]Science. 2006 Apr 14; 312(5771):276-9.
[Science. 2006]Genome Res. 2008 Feb; 18(2):252-60.
[Genome Res. 2008]Trends Genet. 2006 Jan; 22(1):1-5.
[Trends Genet. 2006]Hum Mol Genet. 2005 Apr 15; 14 Spec No 1():R121-32.
[Hum Mol Genet. 2005]Proc Natl Acad Sci U S A. 2008 Feb 12; 105(6):1901-6.
[Proc Natl Acad Sci U S A. 2008]J Exp Biol. 2007 May; 210(Pt 9):1526-47.
[J Exp Biol. 2007]