Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Apr 28, 2009; 106(17): 7079–7082.
Published online Apr 8, 2009. doi:  10.1073/pnas.0900523106
PMCID: PMC2667148

Rapid repetitive element-mediated expansion of piRNA clusters in mammalian evolution


Piwi-interacting RNAs (piRNAs) are ≈30 nucleotide noncoding RNAs that may be involved in transposon silencing in mammalian germline cells. Most piRNA sequences are found in a small number of genomic regions referred to as clusters, which range from 1 to hundreds of kilobases. We studied the evolution of 140 rodent piRNA clusters, 103 of which do not overlap protein-coding genes. Phylogenetic analysis revealed that 14 clusters were acquired after rat–mouse divergence and another 44 after rodent–primate divergence. Most clusters originated in a process analogous to the duplication of protein-coding genes by ectopic recombination, via insertions of long sequences that were mediated by flanking chromosome-specific repetitive elements (REs). Source sequences for such insertions are often located on the same chromosomes and also harbor clusters. The rate of piRNA cluster expansion is higher than that of any known gene family and, in contrast to other large gene families, there was not a single cluster loss. These observations suggest that piRNA cluster expansion is driven by positive selection, perhaps caused by the need to silence the ever-expanding repertoire of mammalian transposons.

Keywords: arms race, molecular evolution, small RNA, positive selection

Eukaryotic genomes contain a variety of small noncoding RNAs, including microRNAs (miRNAs), repeat-associated small interfering RNAs (rasiRNAs), small interfering RNAs (siRNAs), and Piwi-interacting RNAs (piRNAs). miRNAs regulate the expression of protein-coding genes, rasiRNAs are involved in transposon silencing, and siRNAs play a dual role in silencing genes and transposons (1). Because of a number of similarities between rasiRNAs in Drosophila and piRNAs in mammals, rasiRNAs are considered to be a subclass of piRNAs. Thus, mammalian piRNAs are hypothesized to also be involved in transposon silencing, although they may perform other functions as well (2). Some noncoding RNAs, in particular miRNAs, evolve very slowly (3). In contrast, small-scale evolution of piRNA sequences proceeds at a rate typical of nonfunctional genomic regions (4). Here, we consider the large-scale evolution of mammalian piRNA clusters.


Recent Acquisition of Many piRNA Clusters.

Thus far, 140 rat and mouse piRNA clusters have been described, each of which is most likely transcribed as a unit and subsequently processed into mature piRNAs (2, 48). We studied the evolution of each of these clusters within their genomic contexts (Table S1). For this purpose, we obtained regions that included 2 flanking protein-coding genes on either side of a cluster and constructed pairwise alignments of orthologous rat, mouse, human, dog, and cow regions. Thirty-seven clusters overlap protein-coding genes, often spanning several exons and introns. All of these clusters are ancestral, being present in rat, mouse, and human, which is not surprising because protein-coding genes are generally conserved. Among the remaining 103 clusters, each of which is contained within an intergenic region, only 43 are ancestral. The other 60 intergenic clusters were acquired recently. Fourteen were acquired after rat–mouse divergence, being present in 1 rodent (sister 1) and absent (aligned reliably against a gap) in the other rodent (sister 2) and in human (outgroup) (Fig. 1). Another 44 were acquired between rodent–primate and rat–mouse divergences, being present in rat and mouse and absent in human and in dog and/or cow. Evolution of 2 clusters is obscure because of a lack of reliable rodent–human alignments of the intergenic regions harboring them.

Fig. 1.
Acquisition of a cluster-harboring sequence. Alignment of a cluster-harboring segment (yellow) in sister 1 to a gap in sister 2 and in an outgroup indicates that this segment was acquired in the lineage of sister 1 after it diverged from the lineage of ...

Ectopic Recombination as a Mechanism of piRNA Cluster Origin.

Close similarity between rat and mouse genomes made it possible to reconstruct the course of events that led to the acquisition of 9 of the 14 rat- or mouse-specific clusters (Table 1). All 9 clusters arose via insertions of long DNA segments. Paralogs from which these sequences most likely originated (source paralogs) were identified for 7 insertions along with several more distant paralogs in 6 cases. Six source paralogs are located on the same chromosome, between 192 and 259,000 Kb from the site of insertion. In these cases, the source paralog is similar not only to the inserted sequence, but also to segments upstream and/or downstream of the site of insertion, indicating that the insertion was mediated by ectopic (nonallelic homologous) recombination of these REs (9) (Fig. 2). In addition to flanking-acquired clusters and source paralogs, several REs are present in other locations. Often, all copies are confined to a single chromosome and sometimes also to 1 (either rat or mouse) genome. In the single case where the source paralog is located on a different chromosome, a copy of a L1 transposable element in the inserted sequence probably mediated the insertion. Out of 7 identified source paralogs, 5 are known to harbor clusters and, because not all rodent clusters are known (10), others may as well. Although source paralogs could not be found for 2 insertions, perhaps because some regions of rodent genomes are still not sequenced, the presence of low copy-number REs flanking these insertions suggests that ectopic recombination was the mechanism of these insertions as well.

Table 1.
piRNA clusters acquired after rat–mouse divergence
Fig. 2.
Schematic used to identify ectopic recombination as the mechanism of an insertion. (A) Architecture of a typical cluster-harboring genomic region. Two protein-coding genes (orange and green) flank an intergenic region containing an acquired cluster (blue) ...

The remaining 5 clusters acquired after rat–mouse divergence most likely arose via 3 independent events, because 2 pairs of related nearby clusters were probably acquired together. All of these clusters have several paralogs, including other known clusters. Their mechanisms of acquisition remain unclear because of unusually high rat–mouse divergence of their genomic regions, which prevents identification of the exact coordinates of cluster-harboring insertions and source paralogs. However, because these clusters are surrounded by REs, and their paralogs are mostly confined to the same chromosomes, their acquisitions were probably also because of ectopic recombination.

It is likely that the same mechanisms led to the 44 more distant acquisitions, although similarity between rodent and human genomes was insufficient to identify the precise locations of cluster-harboring insertions. At least 1 rodent paralog was found for 35 of these clusters, and multiple paralogs are present in most cases, many of which are known clusters. The absence of paralogs in 9 cases could be because of either incomplete rodent genome sequences or longer times since cluster origin, which may have allowed some of them to diverge beyond recognition. In contrast to the pattern observed with acquired clusters, only 14 out of 80 ancestral clusters have an identifiable paralog, including 4 with multiple paralogs.

Two Distinct Subpopulations of Clusters.

Because of the presence of REs, genomic contexts of 60 acquired clusters are remarkably unstable. Only 13 are located within genomic regions that were preserved after acquisition of the cluster. The remaining 47 are within regions that underwent major rearrangements, including insertions, deletions, and inversions of genes and large (>100 Kb) segments of DNA. The pattern is very different for ancestral clusters. The genomic context was preserved in all 3 species for 66 ancestral cluster regions and was disrupted by nearby rearrangements for the other 14. Thus, clusters can be divided into 2 rather distinct subpopulations: stable and expanding.

The high rate of cluster acquisition and large-scale evolution of their genomic regions is unusual for mammalian genomes (9). To quantify this contrast, we randomly chose 103 intergenic cluster-like segments in rat or mouse as controls. With the exception of 7 control sequences for which no reliable alignments with human segments could be obtained, all control segments are ancestral. Thus, no clear cases of acquisition or loss were encountered. Furthermore, genomic regions harboring control segments are generally stable, as only 8 of them underwent major rearrangements.

Unremarkable Small-Scale Evolution of piRNA Clusters.

In contrast to their rapid large-scale evolution, small-scale evolution of clusters proceeds at rates typical for mammalian genomes (4). For 38 ancestral intergenic clusters within collinear genomic contexts, the mean cluster conservations are 0.59 in mouse–rat and 0.11 in rodent–human comparisons, respectively. The corresponding mean conservations are 0.54 and 0.12 for intergenic sequences surrounding these clusters, 0.53 and 0.13 for intergenic sequences between flanking genes, and 0.56 and 0.13 for control segments.


Expansion of piRNA clusters, which are in effect noncoding genes, closely parallels the expansion of protein-coding genes by gene duplication. A variety of mechanisms are responsible for the duplication of protein-coding genes (11), including ectopic recombination (9, 1214). Like piRNA clusters, protein-coding genes arising by ectopic recombination are often confined to the same chromosomes as their ancestral genes for 2 reasons because they are also flanked by mostly chromosome-specific REs (12), and because the rate of intrachromosomal ectopic recombination is higher (15). Thus, the tendency of clusters to reside on a small number of chromosomes (48) is likely because of their mechanism of origin.

Approximately 43% (60/140) of all rodent piRNA clusters arose after rodent–primate divergence, and this fraction increases to 58% if clusters that overlap protein-coding genes are excluded. This exceeds the highest known expansion rate for a family of mammalian genes, that of olfactory receptors, 33% of which were acquired in mouse after rodent–primate divergence (16). Furthermore, gene losses are common for all large families of genes, including olfactory receptors (16) and miRNAs, which are lost at the same rate with which they are acquired (17). However, not a single cluster loss was observed, although our method of analysis could readily detect such events.

Rapid expansion of piRNA clusters during the course of mammalian evolution is most likely driven by positive selection. Although the presence of REs increases the rates of both insertions and deletions (18), deletions usually occur at a much higher rate than insertions (Table 8.1 in ref. 9). Thus, 60 cluster acquisitions without a single loss cannot be because of mutational pressure. More generally, long insertions are unlikely to be selectively neutral, and only beneficial ones can be fixed (19). Data on copy-number variants (CNVs) overlapping clusters within rat and mouse populations can be used to investigate selection on cluster acquisitions. Because positive selection increases the rate of evolution, but does not induce any long-lasting polymorphisms, the McDonald-Kreitman test (20) would indicate positive selection if such CNVs are rare. Although currently available data on rat (21) and mouse (22, 23) seem to be consistent with this, analysis of a larger number of wild-type rat and mouse genotypes is necessary. If piRNAs are indeed involved in transposon silencing, it is natural to assume that selection for cluster acquisitions is caused by an arms race between expanding families of mammalian transposons and piRNA clusters.

Materials and Methods

Classification of Rodent piRNA Cluster.

Genomic locations of 100 rat and 94 mouse piRNA clusters were obtained from ref. 4. Clusters labeled as rat–mouse orthologs were checked to ensure that they were located between orthologous flanking genes; if not, they were analyzed as distinct clusters. Two clusters were not analyzed because of the poor quality of available sequences in their genomic regions.

Phylogenetic Analysis.

Sequences of clusters, along with the 4 closest flanking genes, were downloaded from GenBank (24) at ftp://ftp.ncbi.nih.gov/genomes/. Orthologous segments from the other rodent, human, dog, and cow genomes were identified by applying BLASTP (25) to flanking genes. Pairwise alignments of these regions between the cluster-containing rodent(s) and all other species were constructed with OWEN (26). Parameters were initially strict, with a requirement of 16 successive matches and P < 10−8, and were progressively relaxed to 8 successive matches and P < 0.001. A cluster was considered to be conserved within a pair of species when it was part of an unambiguous alignment.

Identification of Paralogs.

Insertion sites of clusters acquired after rat–mouse divergence corresponded to alignment gaps. Paralogs for inserted sequences were identified using BLASTN (24) and BLAT (27). All paralogs were aligned against the inserted sequence with OWEN, and the paralog with the best alignment was assumed to be the source. Paralogs were also found in the same way for REs and all remaining clusters, with the requirement that BLAT alignments covered >50% of the query sequence.

Measurement of Small-Scale Evolution.

Rat–mouse and rodent–human divergences for ancestral cluster regions were measured for clusters, surrounding intergenic segments, and intergenic sequences between flanking genes. Divergences between cluster-containing sequences acquired after rat–mouse divergence and their source paralogs were also calculated for clusters and surrounding inserted segments. Conservation scores were computed by dividing the number of matching bases by the length of the regions of interest.

Supplementary Material

Supporting Information:


We are deeply indepted to John Kim for proposing that we study the evolution of piRNAs. We also thank Michael Lynch, Fyodor Kondrashov, Yegor Bazykin, and David Ginsburg for their helpful comments and suggestions.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/cgi/content/full/0900523106/DCSupplemental.


1. Sontheimer EJ, Carthew RW. Silence from within: Endogenous siRNAs and miRNAs. Cell. 2005;122:9–12. [PubMed]
2. Aravin AA, Hannon GJ, Brennecke J. The Piwi-piRNA pathway provides an adaptive defense in the transposon arms race. Science. 2007;318:761–764. [PubMed]
3. Shabalina SA, Koonin EV. Origins and evolution of eukaryotic RNA interference. Trends Ecol Evol. 2008;23:578–587. [PMC free article] [PubMed]
4. Lau NC, et al. Characterization of the piRNA complex from rat testes. Science. 2006;313:363–367. [PubMed]
5. Girard A, Sachidanandam R, Hannon GJ, Carmell MA. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature. 2006;442:199–202. [PubMed]
6. Aravin AA, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature. 2006;442:203–207. [PubMed]
7. Watanabe T, et al. Identification and characterization of two novel classes of small RNAs in the mouse germline: Retrotransposon-derived siRNAs in oocytes and germline small RNAs in testes. Genes Dev. 2006;20:1732–1743. [PMC free article] [PubMed]
8. Grivna ST, Beyret E, Wang Z, Lin H. A novel class of small RNAs in mouse spermatogenic cells. Genes Dev. 2006;20:1709–1714. [PMC free article] [PubMed]
9. Lynch M. In: The Origins of Genome Architecture. Lynch M, editor. Sunderland, MA: Sinauer Associates; 2007. pp. 197–202.
10. Betel D, Sheridan R, Marks DS, Sander C. Computational analysis of mouse piRNA sequence and biogenesis. PLoS Comp Biol. 2007;3:2219–2227. [PMC free article] [PubMed]
11. Cusack BP, Wolfe KH. Not born equal: Increased rate asymmetry in relocated and retrotransposed rodent gene duplicates. Mol Biol Evol. 2006;24:679–686. [PubMed]
12. Yang S, et al. Repetitive element-mediated recombination as a mechanism for new gene origination in Drosophila. PLoS Genet. 2008;4:78–87. [PMC free article] [PubMed]
13. Lupski JR, Stankiewicz P. Genomic disorders: Molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet. 2005;1:627–633. [PMC free article] [PubMed]
14. Kwan-Wood GL, Jeffreys AJ. Processes of de novo duplication of human α-globin genes. Proc Natl Acad Sci USA. 2007;104:10950–10955. [PMC free article] [PubMed]
15. Lichten M, Haber JE. Position effects in ectopic and allelic mitotic recombination in Saccharomyces cerevisiae. Genetics. 1989;123:261–268. [PMC free article] [PubMed]
16. Nimura Y, Nei M. Evolutionary changes of the number of olfactory receptor genes in the human and mouse lineages. Gene. 2005;346:23–28. [PubMed]
17. Lu J, et al. The birth and death of microRNA genes in Drosophila. Nat Genet. 2008;40:351–355. [PubMed]
18. Lupski JR. Genomic rearrangements and sporadic disease. Nat Genet. 2007;39:S43–S47. [PubMed]
19. Kondrashov FA, Kondrashov AS. Role of selection in fixation of gene duplications. J Theor Biol. 2006;239:141–151. [PubMed]
20. Smith NGC, Eyre-Walker A. Adaptive protein evolution in Drosophila. Nature. 2002;415:1022–1024. [PubMed]
21. Guryev V, et al. Distribution and functional impact of DNA copy number variation in the rat. Nat Genet. 2008;40:538–545. [PubMed]
22. Graubert TA, et al. A high-resolution map of segmental DNA copy number variation in the mouse genome. PLoS Genet. 2007;3:21–29. [PMC free article] [PubMed]
23. She X, Cheng Z, Zollner S, Church DM, Eichler EE. Mouse segmental duplication and copy number variation. Nat Genet. 2008;40:909–914. [PMC free article] [PubMed]
24. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. 2008;36:D25–D30. [PMC free article] [PubMed]
25. Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
26. Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS. OWEN: Aligning long collinear regions of genomes. Bioinformatics. 2002;18:1703–1704. [PubMed]
27. Karolchik D, et al. The UCSC Genome Browser Database: Update. Nucleic Acids Res. 2008;36:D773–D779. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...