• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Apr 25, 2006; 103(17): 6605–6610.
Published online Apr 24, 2006. doi:  10.1073/pnas.0601688103
PMCID: PMC1447521
Genetics

Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes

Abstract

Using an unsupervised pattern-discovery method, we processed the human intergenic and intronic regions and catalogued all variable-length patterns with identically conserved copies and multiplicities above what is expected by chance. Among the millions of discovered patterns, we found a subset of 127,998 patterns, termed pyknons, which have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of ≈22 nucleotides. We also found pyknons to be enriched in a statistically significant manner in genes involved in specific processes, e.g., cell communication, transcription, regulation of transcription, signaling, transport, etc. For ≈1/3 of the pyknons, the intergenic/intronic instances of their reverse complement lie within 380,084 nonoverlapping regions, typically 60–80 nucleotides long, which are predicted to form double-stranded, energetically stable, hairpin-shaped RNA secondary structures; additionally, the pyknons subsume ≈40% of the known microRNA sequences, thus suggesting a possible link with posttranscriptional gene silencing and RNA interference. Cross-genome comparisons reveal that many of the pyknons have instances in the 3′ UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. These unexpected findings suggest potential unique functional connections between the coding and noncoding parts of the human genome.

Keywords: junk DNA, pattern discovery, posttranscriptional gene silencing, pyknons, RNA interference

The intergenic and intronic regions comprise most of the genomic sequence of higher organisms. Even though recent work suggested their participation in a regulatory role (1, 2), the true function of these regions remains largely elusive. The search for conserved motifs, presumed to be regulatory and control signals, upstream of the 5′ UTRs of genes has been the focus of research activities for many years (37).

Recently, researchers began studying the 3′ UTRs of genes where they discovered functionally significant conserved regions, in direct analogy to the cis-motifs of promoter regions (8). Comparative analyses permitted the study of conservation in the vicinity of genes and elsewhere in the genome (913) but were carried out on only a handful of organisms at a time because of the magnitude of the involved computations (1417).

The analysis of 3′ UTRs intensified after they were discovered to contain binding sites that are targeted by short interfering RNAs and result in the posttranscriptional control of the corresponding gene's expression through either mRNA degradation or translational inhibition (1827). Accumulating evidence that noncoding RNAs control developmental and physiological processes (2832) and that a considerable part of the human genome is transcribed (33) led researchers to identify functional elements (34) in areas of the genome that are not associated with protein-coding regions.

Here, we examine whether highly specific patterns exist within a single genome that may act as targets or sources for putative regulatory activity or as a “vocabulary” for as yet undiscovered mechanisms. Our analysis represents a substantial point of departure from previous efforts. First, we carry out all of the analysis on a single genome. Second, we seek patterns in the intergenic and intronic regions of the genome (not the UTRs or protein coding regions). Third, our patterns transcend chromosomal boundaries. And fourth, we rely on the unsupervised discovery of recurrent variable-length sequence fragments instead of using searching schemes. We discovered >66 million motifs with multiplicities well above what is expected by chance. A sizeable subset of these motifs, referred to as the pyknons, have one or more additional instances in the UTRs and coding regions (CRs) of almost all known human genes and exhibit properties that suggest a possibly extensive link between the genome's nongenic and genic regions and a connection with posttranscriptional gene silencing (PTGS) and RNA interference (RNAi).

Results

Pattern-Discovery Step.

Using a version of a pattern-discovery algorithm we developed earlier (35), modified to handle very large data inputs, we sought variable-length motifs that are identically conserved across all of their instances, comprise a minimum of L = 16 nucleotides, and appear a minimum of K = 40 times in the processed input (see Supporting Text, which is published as supporting information on the PNAS web site, regarding the values of L and K). The algorithm guarantees the reporting of all composition-maximal and length-maximal patterns satisfying these parameters (see Supporting Text). The input comprised the intergenic and intronic sequences of the human genome from ENSEMBL Rel. 31 (36) and totaled 6,039,720,050 nucleotides. The input did not include the reverse complement of the 5′ UTRs, amino acid coding, or 3′ UTRs of any human genes. This exclusion ensures that any discovered patterns are not connected to the sequences of known genes, protein motifs, or domains (see Supporting Text for details). This step generated an initial set Pinit of 66+ million, variable-length statistically significant patterns (see Methods). The Supporting Text contains information on the properties of Pinit's entries.

Notation/Convention.

We will use CRs to refer to the translated, amino acid coding part of exons and also associate the colors blue, red, and yellow with 5′ UTRs, CRs, and 3′ UTRs, respectively.

Determining Which of the Discovered Patterns Have Additional Instances in the 5′ UTRs, CRs, or 3′ UTRs of Known Genes.

We considered the members of Pinit in order of decreasing value of the product (length-of-pattern × copy-number-of-pattern), ensuring that longer and more frequent patterns are considered before shorter and less frequent ones. We kept a pattern p only if none of its untranslated/CR instances collided with a previously kept pattern (see Supporting Text). After filtering kept patterns for low-complexity with nseg (37), we generated three pattern sets P5′utr, Pcr and P3′utr that contained 12,267, 54,396, and 67,544 patterns, respectively, and had one or more instances in 5′ UTRs, CRs, or 3′ UTRs. P5′utr [union or logical sum] Pcr [union or logical sum] P3′utr contained 127,998 patterns, indicating that the three pattern sets are largely disjoint. We refer to these 127,998 patterns as pyknons. See Supporting Text for information on the sets P5′utr, Pcr, and P3′utr.

The pyknons exhibit a number of properties that connect the nongenic and genic regions of the human genome in unexpected ways, in particular, as discussed below.

The pyknons have one or more instances within nearly all known genes.

The 127,998 pyknons that we originally discovered in the human intergenic and intronic regions have an additional 226,874 nonoverlapping copies in the 5′ UTRs, CRs, or 3′ UTRs of 20,059 genes (30,675 transcripts). That is, >90% of all human genes contain one or more pyknon instances. The pyknons in P5′utr cover 3.82% of the 6,947,437 nucleotides in human 5′ UTRs; the pyknons in Pcr cover 3.04% of the 50,737,024 nucleotides in human CRs; and the pyknons in P3′utr cover 7.33% of the 25,597,040 nucleotides in human 3′ UTRs.

The pyknons arrange combinatorially in many human 5′ UTRs, CRs, and 3′ UTRs, forming mosaics.

The number of pyknon instances in human transcripts is skewed (see Supporting Text). More than 16,000 transcripts contain at least 4, whereas ≈2,200 transcripts contain 20 or more pyknon instances in their UTRs and CRs. In those cases where we find many pyknons, they arrange combinatorially and form mosaics. Fig. 1 shows an example of such a combinatorial arrangement in the 3′ UTRs of birc4 (an apoptosis inhibitor) and nine other human genes. The 3′ UTR of birc4 contains 100 instances of 95 distinct pyknons; of these, 22 are also present in the 3′ UTRs of the other nine genes shown. One or more instances of the 95 pyknons from birc4's 3′ UTR exist in the 3′ UTRs of 2,306 transcripts (data not shown). The Supporting Text includes examples of similar combinatorial arrangements of pyknons in the 5′ UTRs and CRs of known genes. Recall that we initially discovered the pyknons in an input that included neither transcribed gene-related sequences nor their reverse complement.

Fig. 1.
Pyknons in the 3′ UTRs of the apoptosis inhibitor birc4 (shown above the horizontal line) and nine other genes. The sequences below the line contain some of birc4's pyknons, but in different arrangements; they also contain instances of other pyknons ...

The pyknons account for 1/6 of the human intergenic and intronic regions.

The intergenic and intronic copies of the pyknons span 692,393,548 positions on the forward and reverse strands. For those pyknons whose reverse complements are not already in the list of 127,998 pyknons, their Watson-strand instances impose constraints on their Crick-strand instances. Taking this observation into account and recalculating shows that pyknons and their reverse complement cover 898,424,004 positions or ≈1/6 of the human intergenic/intronic regions.

The pyknons are nonredundant.

We clustered the pyknons using a blastn-based scheme (38). Because our collection includes pyknon pairs whose members are the reverse complement of one another, we had to ensure that the clustering scheme did not overcount: when comparing sequences A and B, we examined for redundancy the pair (A,B) and the pair (reverse-complement-of-A,B). Clustering at X = 70%, 80%, and 90%, we generated clusters with 32,621, 44,417, and 89,159 pyknons, respectively (see Supporting Text for details). The high numbers of surviving clusters suggest that the pyknons are largely distinct.

On pyknons and repeat elements.

One thousand two hundred ninety-two pyknons (1.0%) have instances occurring exclusively inside repeat elements, as determined with the help of repeatmasker (Smit, A. & Green, P. RepeatMasker: http://ftp.genome.washington.edu). Seventy-nine pyknons have instances exclusively in repeat-free regions. The remaining 126,627 pyknons (98.9% of total) have instances both inside repeat elements and in repeat-free regions. See Supporting Text for details.

The pyknons are distinct from the “ultraconserved elements.”

Fifty-two pyknons have instances in 46 of the 481 ultraconserved elements (9) and cover 0.67% of the 126,007 positions: uc.73+ contains four pyknons; uc.23+, uc.66+, uc.143+, and uc.414+ each contain two pyknons; the remaining 41 elements contain a single pyknon each.

The pyknons are associated with specific biological processes.

For 663 Gene Ontology (GO) terms (39) describing biological processes at varying levels of detail, we found that the corresponding genes had either a significant enrichment or a significant depletion in pyknon instances; Table 1 shows a partial list of GO terms that are enriched or depleted in pyknons. The full list appears in Table 4, which is published as supporting information on the PNAS web site.

Table 1.
Partial list of biological processes whose corresponding genes show significant enrichment (green cells) or depletion (red cells) in pyknon instances in their 5′ UTR, CR, or 3′ UTR

The relative positioning of pyknons in 5′ UTRs, CRs, and 3′ UTRs is strongly biased, but consecutive pyknon instances are not correlated.

We examined the distances between consecutive pyknons, separately for the 5′ UTRs, CRs, and 3′ UTRs: Fig. 2 shows the calculated probability density functions. The curves have similar shapes, pronounced peaks at abscissas 18 and 22, and a preference for distances between 18 and 31 nucleotides, suggesting a tight packing of pyknons in these regions that favors the distances shown in the histogram. We considered the possibility that the pyknon instances are fragments of larger regions that are conserved in genic and nongenic regions. Let b be a pyknon instance in 5′ UTR, CR, or 3′ UTR, and let us assume that, unknown to us, b is part of a larger-size conserved unit B. Then B will span an area larger than is delineated by b, and there will be length(B) − length(b) + 1 strings in the immediate neighborhood of b that would have as many identically conserved intergenic and intronic copies as b. We checked this in 3′ UTRs by taking each instance of a pyknon in P3′utr, shifting it by +d (respectively −d), generating a new string and locating the new string's instances in the human intergenic and intronic regions. Had the pyknons been part of larger conserved units, then for some values of d, the number of intergenic and intronic copies of the newly formed strings would have remained identical to those of the starting strings. On the other hand, if the pyknons were not part of larger units, then the shifted strings would stride the original strings' “natural boundaries,” and the number of their intergenic/intronic copies would change drastically. See Supporting Text for the results for pyknons in 3′ UTRs and separately for the intergenic and intronic regions; the curves for d = 0 correspond to the pyknons in P3′utr. Note that, even for a shift of d = +2, the derived new strings have strikingly fewer intergenic and intronic copies than the pyknons in P3′utr. We obtained similar results for negative values of d (data not shown).

Fig. 2.
Probability density functions for the distance between the starting points of consecutive instances of pyknons, shown separately for 5′ UTRs, CRs, and 3′ UTRs. The distributions have long tails, and only a portion is shown. Note the peaks ...

The pyknons are possibly linked to PTGS.

The most conspicuous feature of Fig. 2 is the preference for distances typically encountered in the context of PTGS. Recall that the 127,998 pyknons have one or more instances in the untranslated and coding regions of human genes: for each pyknon, we generated its reverse complement β, identified all of β's intergenic and intronic instances, and, using the vienna package (40), predicted the RNA structure and folding energy of the immediately surrounding neighborhoods. We discarded structures that were predicted to self-hybridize locally or whose predicted folding energies were >−30 kcal/mol (1 kcal = 4.18 kJ). We also discarded structures that contained either a single large bulge or many unmatched bases. Each of the surviving regions was predicted to fold into a hairpin-shaped RNA structure that had a straightforward arm–loop–arm architecture, contained very small bulges, if any, and was energetically very stable. The analysis identified 380,084 nonoverlapping regions predicted to form hairpin-shaped structures (298,197 in intergenic and 81,887 in intronic sequences). These 380,084 regions contained instances of the reverse complement of 37,421 pyknons (29.24% of total). In terms of length, the majority of these regions are between 60 and 80 nucleotides long. See Supporting Text for information on each chromosome about the density of the surviving regions per 10,000 nucleotides. Recall that the typical pyknon length is similar to that of a microRNA and that there is a straightforward sense–antisense relationship between segments of the 380,084 hairpins and the pyknons instances in human 5′ UTRs/CRs/3′ UTRs. We also note that the 81,887 hairpins that originate in introns account for 21,727 of the 37,421 hairpin-linked pyknons and will be part of transcribed regions. If pyknons are, indeed, connected to PTGS, then Fig. 2 suggests that (i) in addition to 3′ UTRs, PTGS is likely effected through the 5′ UTRs and amino acid coding regions, and (ii) RNAi products in animals likely fall into distinct categories with preferences for lengths of 18, 22, 24, 26, 29, 30, and 31 nucleotides.

The pyknons relate to known microRNAs.

We formed the union of the RNA family database Rfam (34) and pyknon collections and clustered it with a blastn-based scheme, using a threshold of pair-wise remaining sequence similarity of 70% (equals up to six mismatches in 22 nucleotides). When comparing two sequences A and B, we examined for redundancy the pairs (A,B) and (reverse-complement-of-A,B). In total, 1,087 known microRNAs clustered with 689 pyknons across 279 of the 32,994 formed clusters. See also Supporting Text.

The pyknons relate to recently discovered 3′ UTR motifs.

We compared the pyknons in P3′utr to the 72 8-mer motifs that were recently reported to be conserved in human, mouse, rat, and dog 3′ UTRs (32). We say that one of these 8-mer motifs coincides with a pyknon of length [ell] if one of the following conditions holds: the 8-mer motif agrees with letters [ell]−7 through [ell] of a pyknon (“type 0” agreement); the 8-mer motif agrees with letters [ell]−8 through [ell]−1 (“type 1” agreement); or the 8-mer motif agrees with letters [ell]−9 through [ell]−2 (“type 2” agreement). Of the 72 reported conserved 8-mer motifs, 39 were in type 0 agreement, 10 in type 1 agreement, and 7 in type 2 agreement with one or more pyknons from P3′utr. Six of the 8-mer motifs did not match at all any of the pyknons in P3′utr. In summary, the pyknons that we have derived by intragenomic analysis overlap with 56 of the 72 motifs that were discovered through cross-species comparisons.

Human pyknons are also present in other genomes, where they associate with similar biological processes.

Table 2 shows, for each of seven genomes in turn, how many positions in region X of the genome at hand are covered by the human pyknons contained in set Px, X = {5′ UTR, CR, 3′ UTR}. We account for length differences by reporting the number of covered positions per 10,000 nucleotides. Table 3 shows how many of the human pyknons contained in set Px are also present in the region X of the genome under consideration, X = {5′UTR,CR,3′UTR}. For each of the seven analyzed genomes, Table 3 also shows the number of intergenic and intronic positions covered by: (i) all human pyknons and (ii) those human pyknons that have instances in the corresponding genome's 5′ UTRs/CRs/3′ UTRs. Notably, >600 million nucleotides that are associated with nongenic copies of pyknons in the human genome are absent from the mouse and rat genomes. Interestingly, the human pyknons have many instances in the intergenic and intronic regions of the phylogenetically distant worm and fruit fly genomes, covering ≈1.6 million nucleotides in each.

Table 2.
Number of positions per 10,000 nucleotides that are covered by instances of the human pyknons
Table 3.
Number of human pyknons that are conserved in the human genome and the corresponding region of the jth genome for seven genomes and for each of 5′ UTR, CR, and 3′ UTR

A set of 6,160 human-genome-derived pyknons are simultaneously present in human and mouse 3′ UTRs, whereas a second set of 388 pyknons are simultaneously present in human, mouse, and fruit fly 3′ UTRs. Strikingly, we found these two sets of pyknons to be significantly overrepresented in the same biological processes in these other genomes (i.e., mouse and fruit fly) as in the human genome, even though the pyknons were initially discovered by processing the human genome in isolation (see Table 5, which is published as supporting information on the PNAS web site). The common processes include regulation of transcription, cell communication, signal transduction, etc. Finally, for each of the 388 pyknons in this second set, we manually analyzed ≈130 nucleotide-long neighborhoods centered on the instances of each pyknon across the human, mouse, and fruit fly 3′ UTRs, for a total of >4,000 neighborhoods. Notably, we did not find any instance of syntenic conservation across the three genomes.

Discussion

We explored the existence of links between coding and noncoding sequences of the human genome and identified 127,998 pyknons with a combined 226,874 nonoverlapping instances in the 5′ UTRs, CRs, or 3′ UTRs of 30,675 human transcripts (20,059 genes). In transcripts that contained multiple pyknon instances, we were surprised to find the pyknons arranging themselves combinatorially, forming mosaics. Further analysis revealed that the UTRs and/or CRs of genes associated with specific biological processes are significantly enriched/depleted in pyknons.

We also found that the pyknon placement in 5′ UTRs, CRs, and 3′ UTRs is strongly biased: The starting positions of consecutive pyknons show a clear preference for distances between 18 and 31 nucleotides. Importantly, we found an apparent lack of correlation between consecutive pyknon instances in these regions. The observed bias in the relative placement of the pyknons is conspicuously reminiscent of lengths that are associated with small RNA molecules that induce PTGS, suggesting the hypothesis that the pyknons' instances in these regions correspond to binding sites for small RNAs. Analysis of the regions immediately surrounding the intergenic and intronic instances of the reverse complement of the 127,998 discovered pyknons revealed that 30.0% of the pyknons have instances within ≈400,000 distinct, nonoverlapping regions between 60 and 80 nucleotides in length that are predicted to fold into hairpin-shaped RNA secondary structures with folding energies ≤−30 kcal/mol. Many of these predicted hairpin-shaped structures are located inside known introns and, thus, will be part of transcribed regions. Our analysis also suggests that PTGS may be effected through the genes' 5′ UTR and amino acid regions, in addition to their 3′ UTRs. Another suggestion is that RNAi products in animals likely fall into distinct categories, with preferences for lengths of 18, 22, 24, 26, 29, 30, and 31 nucleotides. Notably, through sequence-based analysis, we showed that ≈40% of the known microRNAs are similar to 689 pyknons and that the pyknons subsume 56 of the 72 recently reported 3′ UTR motifs, lending further support to the possibility of a connection between the pyknons and RNAi/PTGS.

The intergenic/intronic copies of the 127,998 pyknons constrain almost 900 million nucleotides of the human genome. Instances of human pyknons are also found in the nongenic and genic regions of the worm, fruit fly, chicken, mouse, rat, and dog genomes, and the numbers of found human pyknons decrease with phylogenetic distance. Strikingly, the human pyknons that we found inside the 3′ UTRs of mouse and fruit fly were overrepresented in the same biological processes as in the human genome. We note that >600 million bases, which correspond to identically conserved intergenic/intronic copies of human pyknons, are not present in the mouse and rat genomes.

The fact that some of the intergenic/intronic copies of pyknons originate in repeat elements may lead one to assume that our analysis has merely “rediscovered” such elements. However, as mentioned above and in the Supporting Text, >50,000 of the pyknons have many of their instances in repeat-free regions. Moreover, the typical length of a pyknon is substantially smaller than, e.g., that of an Alu element. It was recently reported that genes can achieve evolutionary novelty through the “careful” incorporation of Alu elements in their coding regions (41, 42). Also, the “pack-mule” paradigm revealed that entire genes, large fragments from a single gene, or fragments from multiple genes can be “hijacked” by transposable elements (43). “Fortuitous coincidence” is generally considered the prevailing mechanism by which such potential is unleashed. In contrast to this view, the combinatorial arrangement of the pyknons within the untranslated and coding regions of genes, together with the large number of instances in these regions, their tight packing, and the association of pyknons with specific biological processes, suggests that their placement is not accidental and likely serves a specific purpose. Our findings do not rule out a link with transposable elements; instead, they seem to support a dynamic view of a genome (44) that has learned to respond, and likely continues to do so, to environmental challenges or “stress” in a controlled, organized manner.

The results of the analysis suggest the existence of an extensive link between the noncoding and gene-coding parts in animal genomes. It is conceivable that this link could be the result of integration into the genome of dsRNA-breakdown products. Because many genes are known to give rise to antisense transcripts, it is possible that these genes were, at some point, subjected to RNAi-mediated dsRNA breakdown, which, in turn, gave rise to products ≈20 nucleotides in length. The latter, through repeated integration, could have eventually given rise to the numerous intergenic and intronic copies of the pyknons that we have identified. However, this explanation would have to be reconciled with four of our findings. First, the pyknons have identically conserved copies in nongenic regions. Second, pyknons appear to favor a specific size and, in genic regions, a specific relative placement. Third, slight modification of the 3′ UTR instances of the pyknons, by either prepending or appending immediately neighboring positions, results in new strings whose intergenic and intronic copies are markedly decreased. And fourth, we can discover human pyknons in other organisms, such as the mouse and the fruit fly, where they exhibit a persistent enrichment within specific processes, yet are not the result of syntenic conservation. It may well be that we are seeing traces of an organized, coordinated activity that involves nearly all known genes. The existence of a pyknon-based regulatory layer that is massive in scope and extent, originates in the noncoding part of the genome, operates through the genes' UTRs and CRs, and is linked to PTGS is a tantalizing possibility. Moreover, the observed disparity in the number of intergenic/intronic positions covered by human pyknons in the human and the phylogenetically close mouse/rat genomes suggests that pyknons and, thus, the presumed regulatory layer, may be organism-specific to some degree (“pyknome”). Addressing such questions might eventually help explain the apparent lack of correlation between the number of amino acid coding genes in an organism and the organism's apparent complexity.

Methods

Under the assumption that all four nucleotides are independent and identically distributed, we estimate the probability p of a pattern of length l to be P = 4−l. The probability Prk to observe k instances of a given pattern in a database of size D (D [dbl greater-than sign] 1) is then Prk ≈ (pD)ke−pD/k! (Poisson distribution). The least specific pattern that our method will discover is one that is the shortest possible (i.e., l = L = 16) and appears the fewest allowed number of times (i.e., k = K = 40). If D = 6.0 × 109 bases (i.e., all chromosomes, both strands), then Prk = 1.95 × 10−43. In Supporting Text, we recalculate Prk using the nucleotides' natural probability of occurrence. Whether we assume equiprobable nucleotides or use their natural frequency of occurrence in our calculations, even the least specific pattern remains statistically significant. Alternatively, we can estimate the significance of our patterns using z scores: For the least specific patterns of length 16 with only 40 intergenic/intronic copies, we obtain the remarkably high value of z = 32.66; longer patterns and patterns with more copies have even higher z scores. These analyses separately confirm that every one of our discovered patterns is statistically significant and not the result of a random process. These conclusions hold true for the reverse complements of the discovered patterns and for the pyknons, the latter being a subset of the discovered patterns Pinit.

Supplementary Material

Supporting Information:

Acknowledgments

We thank Annie Visviki, Laxmi Parida, Alan Grossfield, and the anonymous reviewers for comments and suggestions on the manuscript.

Abbreviations

CR
coding region
PTGS
posttranscriptional gene silencing
RNAi
RNA interference.

Footnotes

Conflict of interest statement: No conflicts declared.

From the Greek adjective πυκνÓσ/πυκν[eta w/ acute tone mark]/πυκνÓν meaning “serried, dense, frequent.”

References

1. Mattick J. S. Nat. Rev. Genet. 2004;5:316–323. [PubMed]
2. Ruvkun G. Science. 2001;294:797–799. [PubMed]
3. Ettwiller L. M., Rung J., Birney E. Genome Res. 2003;13:883–895. [PMC free article] [PubMed]
4. Brazma A., Jonassen I., Vilo J., Ukkonen E. Genome Res. 1998;8:1202–1215. [PMC free article] [PubMed]
5. Lenhard B., Sandelin A., Mendoza L., Engstrom P., Jareborg N., Wasserman W. W. J. Biol. 2003;2:13. [PMC free article] [PubMed]
6. Sinha S., Tompa M. Nucleic Acids Res. 2002;30:5549–5560. [PMC free article] [PubMed]
7. Wasserman W. W., Sandelin A. Nat. Rev. Genet. 2004;5:276–287. [PubMed]
8. Hobert O. Trends Biochem. Sci. 2004;29:462–468. [PubMed]
9. Bejerano G., Pheasant M., Makunin I., Stephen S., Kent W. J., Mattick J. S., Haussler D. Science. 2004;304:1321–1325. [PubMed]
10. Dubchak I., Brudno M., Loots G. G., Pachter L., Mayor C., Rubin E. M., Frazer K. A. Genome Res. 2000;10:1304–1306. [PMC free article] [PubMed]
11. Frazer K. A., Sheehan J. B., Stokowski R. P., Chen X., Hosseini R., Cheng J. F., Fodor S. P., Cox D. R., Patil N. Genome Res. 2001;11:1651–1659. [PMC free article] [PubMed]
12. Jareborg N., Birney E., Durbin R. Genome Res. 1999;9:815–824. [PMC free article] [PubMed]
13. Miziara M. N., Riggs P. K., Amaral M. E. Genet. Mol. Res. 2004;3:465–473. [PubMed]
14. Boffelli D., McAuliffe J., Ovcharenko D., Lewis K. D., Ovcharenko I., Pachter L., Rubin E. M. Science. 2003;299:1391–1394. [PubMed]
15. Dermitzakis E. T., Kirkness E., Schwarz S., Birney E., Reymond A., Antonarakis S. E. Genome Res. 2004;14:852–859. [PMC free article] [PubMed]
16. Kellis M., Patterson N., Endrizzi M., Birren B., Lander E. S. Nature. 2003;423:241–254. [PubMed]
17. Wasserman W. W., Palumbo M., Thompson W., Fickett J. W., Lawrence C. E. Nat. Genet. 2000;26:225–228. [PubMed]
18. Brennecke J., Hipfner D. R., Stark A., Russell R. B., Cohen S. M. Cell. 2003;113:25–36. [PubMed]
19. Elbashir S. M., Lendeckel W., Tuschl T. Genes Dev. 2001;15:188–200. [PMC free article] [PubMed]
20. Fire A., Xu S., Montgomery M. K., Kostas S. A., Driver S. E., Mello C. C. Nature. 1998;391:806–811. [PubMed]
21. Johnston R. J., Hobert O. Nature. 2003;426:845–849. [PubMed]
22. Lau N. C., Lim L. P., Weinstein E. G., Bartel D. P. Science. 2001;294:858–862. [PubMed]
23. Lee R. C., Ambros V. Science. 2001;294:862–864. [PubMed]
24. Lim L. P., Glasner M. E., Yekta S., Burge C. B., Bartel D. P. Science. 2003;299:1540. [PubMed]
25. Moss E. G., Lee R. C., Ambros V. Cell. 1997;88:637–646. [PubMed]
26. Reinhart B. J., Slack F. J., Basson M., Pasquinelli A. E., Bettinger J. C., Rougvie A. E., Horvitz H. R., Ruvkun G. Nature. 2000;403:901–906. [PubMed]
27. Slack F. J., Basson M., Liu Z., Ambros V., Horvitz H. R., Ruvkun G. Mol. Cell. 2000;5:659–669. [PubMed]
28. Ambros V., Lee R. C., Lavanway A., Williams P. T., Jewell D. Curr. Biol. 2003;13:807–818. [PubMed]
29. Mattick J. S., Makunin I. V. Hum. Mol. Genet. 2005;14:R121–R132. [PubMed]
30. Poy M. N., Eliasson L., Krutzfeldt J., Kuwajima S., Ma X., Macdonald P. E., Pfeffer S., Tuschl T., Rajewsky N., Rorsman P., Stoffel M. Nature. 2004;432:226–230. [PubMed]
31. Woolfe A., Goodson M., Goode D. K., Snell P., McEwen G. K., Vavouri T., Smith S. F., North P., Callaway H., Kelly K., et al. PLoS Biol. 2005;3:e7. [PMC free article] [PubMed]
32. Xie X., Lu J., Kulbokas E. J., Golub T. R., Mootha V., Lindblad-Toh K., Lander E. S., Kellis M. Nature. 2005;434:338–345. [PMC free article] [PubMed]
33. Cheng J., Kapranov P., Drenkow J., Dike S., Brubaker S., Patel S., Long J., Stern D., Tammana H., Helt G., et al. Science. 2005;308:1149–1154. [PubMed]
34. Griffiths-Jones S., Bateman A., Marshall M., Khanna A., Eddy S. R. Nucleic Acids Res. 2003;31:439–441. [PMC free article] [PubMed]
35. Rigoutsos I., Floratos A. Bioinformatics. 1998;14:55–67. [PubMed]
36. Stabenau A., McVicker G., Melsopp C., Proctor G., Clamp M., Birney E. Genome Res. 2004;14:929–933. [PMC free article] [PubMed]
37. Wootton J. C., Federhen S. Comput. Chem. 1993;17:149–163.
38. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. J. Mol. Biol. 1990;215:403–410. [PubMed]
39. Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., Davis A. P., Dolinski K., Dwight S. S., Eppig J. T., et al. Nat. Genet. 2000;25:25–29. [PMC free article] [PubMed]
40. Hofacker I. L., Fontana W., Stadler P., Bonhoeffer L. S., Tacker M., Schuster P. Monatsh. Chem. 1994;125:167–188.
41. Iwashita S., Osada N., Itoh T., Sezaki M., Oshima K., Hashimoto E., Kitagawa-Arita Y., Takahashi I., Masui T., Hashimoto K., Makalowski W. Mol. Biol. Evol. 2003;20:1556–1563. [PubMed]
42. Lev-Maor G., Sorek R., Shomron N., Ast G. Science. 2003;300:1288–1291. [PubMed]
43. Jiang N., Bao Z., Zhang X., Eddy S. R., Wessler S. R. Nature. 2004;431:569–573. [PubMed]
44. Jorgensen R. A. Cold Spring Harbor Symp. Quant. Biol. 2004;69:349–354. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...