Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jul 26, 2011; 108(30): 12533–12538.
Published online Jul 11, 2011. doi:  10.1073/pnas.1019732108
PMCID: PMC3145732
Plant Biology

Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation


Alternative polyadenylation (APA) has been shown to play an important role in gene expression regulation in animals and plants. However, the extent of sense and antisense APA at the genome level is not known. We developed a deep-sequencing protocol that queries the junctions of 3′UTR and poly(A) tails and confidently maps the poly(A) tags to the annotated genome. The results of this mapping show that 70% of Arabidopsis genes use more than one poly(A) site, excluding microheterogeneity. Analysis of the poly(A) tags reveal extensive APA in introns and coding sequences, results of which can significantly alter transcript sequences and their encoding proteins. Although the interplay of intron splicing and polyadenylation potentially defines poly(A) site uses in introns, the polyadenylation signals leading to the use of CDS protein-coding region poly(A) sites are distinct from the rest of the genome. Interestingly, a large number of poly(A) sites correspond to putative antisense transcripts that overlap with the promoter of the associated sense transcript, a mode previously demonstrated to regulate sense gene expression. Our results suggest that APA plays a far greater role in gene expression in plants than previously expected.

Keywords: alternative processing, antisense transcription, nonstop mRNAs

The polyadenylation of mRNA in eukaryotes is an important step in gene expression in eukaryotes. With few exceptions, mature eukaryotic mRNAs possess a poly(A) tract, that in turn functions to facilitate transport of the mRNA to the cytoplasm and its subsequent stabilization and translation. The poly(A) tail contributes regulatory information to each of these processes through interactions with RNA processing factors and poly(A)-binding proteins. The process of polyadenylation also contributes to regulation by “determining” the composition of the mRNA apart from the poly(A) tail. Thus, the position along the gene where the pre-mRNA is processed and polyadenylated determines the sequence content in terms of exons and regulatory motifs. If a gene possesses more than one polyadenylation site, then the nature of the expressed mRNA can be altered via differential choice of these sites, a process that is called alternative polyadenylation, or APA. That APA may be important is suggested by the observations that more than 50% of human and plant genes have multiple poly(A) sites (15). APA may be an important factor in the regulation of genes associated with cancer and with early embryo development in animals (68). APA has also been implicated in global control of gene expression in neuronal cells in humans (9), and in the responses of genes to stress and developmental cues in Caenorhabditis elegans (10).

In plants, there are many documented cases of APA (1113). Perhaps the best-studied example of APA in plants involves the network of genes that control flowering time in Arabidopsis. One regulatory factor, FY, is a core polyadenylation complex subunit; this protein acts in concert with an RNA-binding protein, FCA, to promote polyadenylation within an intron in transcripts encoded by the FCA gene (14). Two other core polyadenylation factor subunits, CstF77 and CstF64, and a novel RNA-binding protein, FPA, control APA of antisense transcripts encoded by the FLC gene (15, 16); these antisense transcripts are involved in transcriptional regulation of sense FLC mRNAs through chromatin modifications in the vicinity of the sense FLC promoter. The regulation of these two genes thus provides examples of two modes of APA, involving intronic polyadenylation and 3′ end processing of antisense transcripts.

Plant poly(A) site datasets (3, 17) have been assembled from the analysis and curation of the results of EST and full-length cDNA sequencing projects. Unfortunately, these projects are not specially targeted to the identification of poly(A) sites, nor are they high-throughput. With this consideration in mind, a strategy designed to specifically query the mRNA-poly(A) junction on a transcriptome-wide basis was developed and used to study poly(A) site choice in Arabidopsis leaves and seeds. The results obtained using this strategy reveal an extensive network of potential APA in Arabidopsis, including unanticipated and novel modes of APA. In addition, the results corroborate other reports suggestive of wide-spread antisense transcription in Arabidopsis, and provide a dataset of poly(A) sites associated with antisense transcripts. Finally, they provide evidence for tissue-specific poly(A) site choice.


Preparation and Characterization of cDNA Tags That Query Polyadenylation Sites.

To study Arabidopsis poly(A) sites on a genome-wide basis, short DNA tags that include the mRNA-poly(A) site junction [called poly(A) tags, or PATs hereafter] were prepared and sequenced; the starting materials for these samples were RNA isolated from dry seeds and the leaves of young seedlings. The initial sequences were processed and mapped to the Arabidopsis reference genome. After removing potential internal priming candidates and eliminating tags that mapped to chloroplast and mitochondria genomes and to miscellaneous RNAs (primarily rRNAs), a collection of tags that defined more than 280,000 individual poly(A) sites were obtained (Table S1). Because poly(A) site microheterogeneity is ubiquitous in plants (3, 4), poly(A) sites in the same gene that are located within 24 nt of each other were clustered so as to define a poly(A) site cluster (PAC). The results of this process were more than 71,000 PACs with an average of 54 PATs per PAC (Table S1). Of these PACs, 57,473 were in the “sense” orientation with respect to an annotated gene, 24,013 were in the antisense orientation with respect to an annotated gene, and 14,149 fell between annotated genes (in intergenic regions). (Note that some PACs may be identified as both “sense” and “antisense,” if their respective genes are themselves overlapping.) The sense PACs could be mapped to more than 18,000 genes; over 70% of these genes possessed more than one PAC (Fig. S1). More than 70% of these PACs could be linked to MPSS signatures, and almost 80% of poly(A) sites found in pooled EST and cDNA collections were represented in this set of PACs (Dataset S1). For a set of previously characterized genes (18), the patterns of mRNA 3′ ends determined by 3′-RACE were recapitulated in the leaf PAT dataset (Fig. S2). These results indicate that the PACs identified and analyzed in this study define authentic poly(A) sites with high confidence.

Evidence for Different Classes of Poly(A) Signals.

More than 55,000 of the PACs represented in the PAT dataset were oriented in the “sense” orientation with respect to an associated annotated Arabidopsis gene. Roughly 83% of all sense-oriented PACs mapped to the 3′ UTRs of known genes, and the remaining 17% were located in annotated coding sequences (CDS), introns, or 5′-UTRs (Fig. 1). To gain further insight into PACs that lie within some of these different genomic regions, the relative base composition of the sequences surrounding these sites was studied; such analyses have proven effective in identifying important sequence trends and probable cis elements (17, 19). For this process, PACs located in 3′-UTRs were chosen as the “default,” and the position-by-position variations from the relative nucleotide compositions surrounding these sites were assessed using a χ2 test. The results showed that sequences surrounding PACs located within 3′ UTRs and introns were very similar (Fig. 2A), with the only discernible difference being localized to the cleavage site. This difference probably reflects the marked tendency of plant introns to be U-rich (20, 21) and thus to have a lower C content than would be seen near a “normal” poly(A) site. Other than this, the base composition profiles of 3′-UTR (as in Fig. 2B) and intronic PACs are indistinguishable from those published by others (17, 19).

Fig. 1.
Genome-wide distribution of sense PACs. Various genomic regions defined by the TAIR9 annotation of the Arabidopsis genome (45) are listed above the representation of a generic gene. The percent of all sense PACs that fall within these regions is listed ...
Fig. 2.
Position-by-position analysis of average base composition of the regions surrounding PACs within different genomic regions. (A) An χ2 analysis. For this analysis, the negative log of the χ2 metric was plotted as a function of position ...

In contrast, PACs located within CDS were very different (Fig. 2A). There was little preference for U across the entire region analyzed, but there was an increased occurrence of G throughout this region (Figs. 2 B and C). Although the regions within 30 nt of PACs located in 3′ UTRs had a distinctive sequential enrichment for A and then U (Fig. 2B), the corresponding regions surrounding CDS PACs were A+G-rich (Fig. 2C). For several reasons, it is apparent that PATs that map to CDS were not the results of internal priming at short tracts of A or within A+G-rich regions. The bioinformatic filters eliminates PATs that coincide with stretches of A that are six or more nucleotides in length. AAAA tracts that coincide with poly(A) sites only constitute 4.3% of all such tracts that are located within all annotated cDNAs, and poly(A) site-associated AAA tracts 3.9% of all such tracts. Poly(A) site-associated AAAA and AAA tracts constitute 1.2% of all CDS-localized AAAA and AAA sequences, indicating that there is not an enrichment for PAT at short runs of A in protein-coding regions; if anything, the converse is true. More than 80% of genes that lack CDS-associated PACs possess A+G-rich tracts (see SI Methods), but about 70% of genes with CDS-associated PACs possess such sites. Finally, for a small set of representative CDS-localized sites, cDNA populations prepared using RNA to which an RNA adapter had been ligated and an adapter-specific reverse transcriptase primer (as opposed to an oligo-dT primer) included molecules with poly(A) tracts at CDS-localized positions predicted by the PAT data (Dataset S1). Taken together, these results indicate that CDS-associated PACs are not attributable to internal priming by reverse transcriptase.

About 6% of the sense PACs were located in introns. The average lengths of introns with and without PAC were 411 and 160 nt, and the medians were 270 and 99 nt, respectively (Fig. S3A); these differences are of high statistical significance (Wilcoxon tests, P value < 2.2e-16). The strengths of the 5′ splice sites (ss) and 3′ss of these introns were estimated using position-specific scoring matrix scores (5). As shown in Fig. S3 B and C, the 5′ss score of introns with PAC is lower than introns without PAC (Wilcoxon tests, P value <2.5e-9), whereas the 3′ss scores in these two groups were not appreciably different (P value >0.06).

A small number (292) of sense PACs were located within 5′UTRs of corresponding genes. Of these sense PACs, 229 are flanked by upstream genes that are transcribed in the same direction; these may represent incomplete annotations and read-through transcription from characterized genes. Several others (63 sense PACs) however, cannot be derived from neighboring transcription units in these ways.

Polyadenylation Sites Associated with Antisense Transcripts.

A large fraction of the PACs were derived from antisense transcripts. Of the 24,013 antisense PACs that were identified, 17,693 mapped to within the transcribed portions of the corresponding “sense” genes (Fig. 3). Of these antisense PACs, 10,163 were located in regions where adjacent genes are transcribed in a convergent fashion, so as to yield overlapping 3′UTRs or transcripts (classes 1 and 2 in Fig. 3; the distribution of PAC among the four classes is show in Fig. 3A). An additional 5,334 were situated within 2,000 bp of an adjacent gene oriented in a way so as to yield potential read-through transcripts extending to the respective antisense PAC (class 3 in Fig. 3). Of these antisense PACs, 2,196 could not be associated with adjacent, convergently transcribed genes (class 4 in Fig. 3). Together, these antisense PACs affect 4,443 protein-coding genes as well as 92 genes that are not protein-coding; the latter include genes that encode 2 miRNAs, 61 “other RNA,” 20 pseudogenes, 7 transposable elements, 1 tRNA, and 1 snRNA (Dataset S1).

Fig. 3.
Classification of antisense PACs that map to annotated genes. The four classes mentioned in the text are illustrated and numbered as shown. In these representations, the “color-coding” of different regions (CDS, intron, and so forth) is ...

Of these antisense PACs, 6,320 mapped to the promoter regions of genes that are transcribed from the opposite strand. These antisense PACs covered the promoters of 3,338 sense genes (Dataset S1); 40% of these genes had more than one antisense PAC in their promoter regions. Of the antisense PACs, 3,510 were associated with an annotated gene that is transcribed in the opposite direction (class P1 in Fig. 4; the distribution of PAC among the four classes is shown in Fig. 4A); these instances involved divergently transcribed genes that are relatively small, so that the transcripts fall within the 2,000-bp window that is used to define the promoter database. These sites probably do not represent authentic antisense transcripts, but rather small divergently transcribed genes.

Fig. 4.
Classification of antisense PACs that map to promoters of annotated genes. The four classes mentioned in the text are illustrated and numbered as shown. As in Fig. 3, the presence of multiple arrows means that the results for antisense PACs that map to ...

Of the remaining 2,810 promoter-localized antisense PACs, 320 were associated with an overlapping gene that is transcribed in the opposite direction (class P2 in Fig. 4); this class of PAC occurred in 139 genes. Of the 2,810 remaining PACs, 1,748 in 1,246 genes, were within 2,000 bp of an adjacent, convergently transcribed gene whose annotation does not suggest any overlap, but that may nonetheless be associated (through transcriptional read-through or gene misannotation) with the antisense PAC (class P3 in Fig. 4). In 533 genes, 742 promoter-situated antisense PACs could not be associated with nearby genes that are transcribed in the same orientation (class P4 in Fig. 4).

Of the 24,013 antisense PACs, 18,351 were seen only in leaves (12,345) or seeds (6,006). These tissue-specific PACs affect 5,886 and 2,989 genes in leaves and seeds, respectively. Of these PACs, 72% mapped to the transcription units of corresponding sense target genes, and 28% mapped to promoters. These results raise the possibility that tissue-specific antisense transcription might contribute to regulation of cognate sense target genes. To test this theory, two studies were conducted. For one study, possible “sense” targets of leaf-specific antisense PACs that were classified as shown in Fig. 3 were identified in the mRNA stability database described by Narsai et al. (22) and the average half-lives of possible target mRNAs determined. The results indicate that, as a whole, genes whose transcripts are predicted to overlap with mRNAs defined by antisense PACs (classes 1 and 2 in Fig. 3) were slightly less stable than the average Arabidopsis mRNA (Fig. 3B). Interestingly, the mRNA targets of leaf-specific antisense PACs that themselves could not be definitively associated with known transcripts (classes 3 and 4 in Fig. 3) tended to be more stable than the average Arabidopsis mRNA (Fig. 3B), with transcripts targeted by so-called “orphan” antisense PACs (class 4 in Fig. 3) being about 50% more stable than the average mRNA.

The second study entailed an expression analysis of genes targeted by tissue-specific antisense PACs, drawing upon microarray data available from the Nottingham Arabidopsis Stock Centre [NASC (23)]. Separate analyses of the targets associated with the tissue-specific PACs described in Figs. 3 and and44 were conducted, so as to distinguish between effects because of the generation of overlapping transcripts (Fig. 3) and those because of possible alterations of promoter function (Fig. 4). In both leaves and seeds, genes with overlapping 3′UTRs had expression levels that were largely indistinguishable from the overall average (Fig. 3C, case 1). Gene pairs that encode overlapping transcripts were expressed at slightly lower levels (Fig. 3C, case 2). In contrast, genes whose transcripts are complementary to PACs that may not be associated with known, annotated transcripts showed higher expression levels on average (Fig. 3C, cases 3 and 4); this was especially apparent in cases where the antisense PACs are orphans, with no known associated transcript (Fig. 3C, case 4). The numbers of genes available for analysis were lower for the sets shown in Fig. 4, and some of the analogous comparisons could not be made (indicated by “ND” in Fig. 4B). Those results that could be obtained showed that, in leaves, genes in classes “P1” and “P3” in Fig. 4 were largely indistinguishable from the average of all genes. In seeds, genes in classes P1, P3, and P4 all showed higher expression than average.

Analyses of Tissue-Specific Alternative Polyadenylation.

As was the case with the antisense PACs, the majority (73%) of sense PACs were found only in leaves (51%) or seeds (22%). (This is not unexpected; when considering all but the lower quartile of genes present on the NASC arrays used for the analysis described in Figs. 3C and and4B,4B, 74% of the genes are “present” only in the leaf or seed arrays.) The genome-wide distributions of these PACs, in terms of location in UTRs, introns, and so forth, as well as being sense or antisense, were indistinguishable from the distributions shown in Fig. 1. Likewise, there was no indication that the polyadenylation signal in leaves and seeds is different, although there was a tendency for sequences farther than 150 nts upstream from the poly(A) site to be less U-rich in seeds than in leaves (Fig. S4). This reduced U content is consistent with a closer positioning of the poly(A) site to the protein-coding region (that is inherently lower in U-content than the UTRs). Direct examination showed this to be the case; thus, the median 3′UTR length in seeds was some 30 nt less than the median length of 3′UTRs in leaves (Fig. S5).

Twenty-seven percent of the sense PACs represent genes expressed in both seeds and leaves, and most of these possess two or more PACs. This finding raises the possibility that differential poly(A) site choice may contribute to the regulation of 3′UTR length in different tissues, along the lines of the differences seen in genes expressed only in leaves and seeds. To explore this possibility, genes that possess at least two PACs that are represented by enough PATs to permit identification of possible poly(A) site switching, and that show a dramatic difference in the uses of the most abundant two sites in leaves and seeds were identified. A total of 8,254 genes possess at least two PACs with tag per million ≥ 3 and are expressed in both leaves and seeds. Of these, 113 (or 1.4%) exhibited differential poly(A) site choice in leaves and seeds, with at least a 50% shift in polyadenylation from one site to another (Dataset S1). The overwhelming majority (193) of these 226 alternatively-used PACs fell within the 3′UTRs of the affected genes, but 22 alternative sites were located in protein-coding regions, 5 in introns, and 6 within 5′UTRs.


Scope and Implications of Alternative Polyadenylation in Arabidopsis.

Previous large-scale studies of APA in Arabidopsis have suggested that the phenomenon is widespread, affecting as many as 25% of all genes in the model plant (2). The results presented in this article suggest that the scope of the phenomenon is much broader, with 70% or more of all genes possessing two or more identifiable PACs. APA may occur within introns and even protein-coding regions. However, by far the most common sort of APA involves the presence of multiple PACs within the 3′UTR of a gene. This involvement affects about 13,000 genes, more than 70% of those to which PATs can be mapped, and carries with it significant potential for regulation; for example, in animals, alternative poly(A) site choice can affect both the length of the 3′UTR and consequently the inclusion or exclusion of regulatory features, such as microRNA target sites or other mRNA destabilizing sequences (68, 10). The observation that 3′UTRs in mRNAs expressed in seeds are shorter, on average, than those seen in mRNAs present in leaves (Fig. S5) raises the possibility of tissue-specific regulation via the 3′UTR in plants. However, tissue-specific poly(A) site choice in and of itself may not contribute greatly toward this possible mode of regulation. This theory follows from the observation that, of more than 8,000 genes expressed in both leaves and seeds that possess two or more PACs, only 113 exhibit detectable changes in their poly(A) site profiles, a finding that suggests that genome-wide switches in poly(A) site choice are not a feature of tissue-specific gene expression, at least as far as differences between leaves and seeds. However, that there are more than 100 instances of such switching is interesting, and raises the possibility that small numbers of genes may be regulated in part by tissue-specific changes in poly(A) site profiles. Studies with other tissues and developmental stages are needed to better gauge the scope of this phenomenon.

An additional mode of APA involves the use of poly(A) sites that are situated within introns. This mode of APA may affect as many as 2,100 genes in Arabidopsis. Interestingly, there seem to be no differences in terms of possible polyadenylation signals associated with poly(A) sites located in introns or 3′UTRs (Fig. 2). Instead, Arabidopsis introns that possess PACs are longer and have weaker 5′ss (Fig. S3), properties that are also seen in mammalian introns that possess polyadenylation sites (5). This finding suggests that APA within introns may be determined at the level of splicing, and that the dynamic between splicing and polyadenylation may be an important determinant in gene expression. Of course, the underlying mechanisms are likely to be more subtle and malleable than merely a summation of intron length, splice site strength, and presence of a polyadenylation signal. This finding follows from the realization that the intron properties reported here are global averages, and that instances of intronic APA involving short introns and introns with optimal splice sites can be seen; thus, although intron length and splice-site choice may be contributing factors in many cases, other mechanisms must also be at work.

Almost 10% of all PACs in Arabidopsis reside within protein-coding regions, and more than 4,000 genes may be affected by the process. These sites pose an interesting paradox, one that cannot be easily resolved at this time. Polyadenylation within a protein-coding region will usually result in an mRNA that lacks a translation termination codon. In yeast, such RNAs (termed “nonstop RNAs”) are unstable, being subject to degradation by a novel RNA surveillance mechanism (nonstop decay) (24, 25). The products of translation of nonstop RNAs are also unstable because of proteasome-mediated degradation (26). In mammals, nonstop RNAs are subject to translational inhibition, possibly mediated by ribosome stalling along the poly(A) tract (27). The inhibition of function of nonstop RNAs (by mRNA and protein degradation, and by translation inhibition) is typically associated with a larger suite of mRNA quality-control mechanisms that serve to limit the production of aberrant proteins, such as may arise from defective RNAs (28, 29); such defective RNAs are thought to be derived from errors in the process of mRNA biogenesis. The widespread occurrence of CDS-associated PACs (involving more than 10% of known Arabidopsis genes) (Fig. 1), the possible existence of a novel poly(A) signal associated with these PACs (Fig. 2), and the relative abundance of PATs that define CDS-associated PACs (Dataset S1) all suggest that the production of these RNAs is not a matter of mistakes in transcription, RNA processing, or other steps. Rather, these RNAs would seem to be a part of the “normal” transcriptional output of the plant. How the plant can tolerate such large quantities of RNAs that, in other systems, are deleterious to growth remains to be determined.

A very small number of PACs could be mapped to 5′UTRs. Most of these could conceivably be linked to adjacent genes, and probably represent instances where transcription extends into a downstream gene. However, for 63 of these PACs, this is not a viable explanation for their existence. These PACs may represent as-yet unidentified genes, or they may reflect mechanisms such as the termination of transcription shortly after initiation. Of course, this latter possibility seems remote, given that polyadenylation typically does not occur near promoters (30, 31). For the time being, the nature of these PACs remains unresolved.

Approximately 20% of the PACs identified in this study fall in unannotated parts of the Arabidopsis genome (Dataset S1). The nature of these PACs remains to be determined; they likely reflect a combination of incomplete annotation, new protein-coding genes, and noncanonical transcripts (such as noncoding RNAs or so-called cryptic unstable transcripts). Regardless, their existence is demonstrative of a considerable unidentified transcriptome in Arabidopsis.

Antisense RNA-Associated Polyadenylation in Arabidopsis.

A surprisingly large proportion of the PACs identified in this study, more than 33%, are derived from antisense transcripts that in turn potentially affect more than 6,000 genes. This result is in good agreement with other genome-wide studies in plants that suggest the existence of several thousand antisense transcripts (3239). Forty-two percent of antisense PACs occur in cases of known overlapping transcripts (or parts thereof); 22% of antisense PACs fall within 2,000 bp downstream of an annotated gene, albeit outside of the annotated transcription unit itself. These PACs may represent instances of misannotation of the associated genes, extended read-through transcription of these annotated genes, or the existence of smaller, as-yet unidentified transcripts. An additional 9% of antisense PACs cannot be associated with nearby annotated genes and represent a population of unidentified transcripts. Twelve percent of the antisense PACs (classes P2, P3, and P4 in Fig. 4) fall within the promoter of the corresponding sense gene; of these, 7% may be associated with nearby genes, but 3% cannot be associated with an identifiable annotated gene. Thus, our results indicate that between 2,900 and 10,000 antisense PACs may be derived from as-yet unidentified transcription units.

Antisense transcription and the attendant cis-antisense RNAs may affect gene expression in numerous ways, in both negative and positive senses (40). Inspection of the sets of antisense PACs and respective targets provides possible examples of both sorts of regulation, in that targets that show either low or high expression (compared with the global average) can be associated with many of leaf- or seed-specific antisense PACs (these examples may be identified in the Dataset S1). However, there is an interesting difference in the different classes of antisense PAC listed in Figs. 3 and and4,4, one that raises some questions as to the impact of antisense transcription on gene expression. Thus, the mRNA targets of known antisense transcripts (cases 1 and 2 in Fig. 3) tend to be somewhat less stable than the average Arabidopsis mRNA (Fig. 3B), and the expression levels of these genes are also slightly lower, on average, than the genome-wide average expression (Fig. 3C). The reductions in stability and expression levels in these cases are modest (at best); in this respect, the results described here are consistent with other genome-wide studies (33). However, the trend serves to reinforce a common perception that antisense transcription serves to down-regulate the expression of target genes, probably via the induction of siRNA production and attendant degradation of mRNAs homologous to the siRNAs.

Other classes of genes with antisense PACs that map to the respective transcription unit are more curious. As a whole, the putative “targets” of these classes, cases 3 and 4 in Fig. 3, show significantly greater mRNA stabilities (Fig. 3B) and higher expression levels (Fig. 3C) than the “average” Arabidopsis gene. Although these results are only correlative, they raise the possibility that antisense transcription may be rather widely associated with elevated gene expression. The positive effects on global mRNA stability shown in Fig. 3B (cases 3 and 4) and on overall expression levels shown in Figs. 3C (cases 3 and 4) could reflect a tendency for antisense RNAs to directly impact mRNA metabolism, perhaps by limiting the access of miRNAs, siRNAs, or regulatory proteins to the target mRNA (41). The positive effects seen in Fig. 4B (cases P3 and P4) suggest an additional possibility, that many of the antisense PACs documented here may represent the “unwanted” products of bidirectional transcription that has been noted in other systems (42, 43). Such products are to be expected from genes that are highly active and thus yield greater quantities of steady-state (“sense”) mRNAs.

Antisense transcription has been implicated in epigenetic regulation via mechanisms that impact promoters, such that siRNAs that target promoters lead to chromatin modifications and DNA methylation that in turn decrease promoter activity (16, 34, 44). It is tempting to speculate that the wide-spread promoter-localized antisense polyadenylation summarized in Fig. 4 is related to this mechanism of regulation. However, the results shown in Fig. 4B are not consistent with such a scenario, if the primary consequence of siRNA-mediated epigenetic regulation is a diminution of gene expression.


To summarize, next generation DNA sequencing has been used to capture and characterize the mRNA-poly(A) junctions present in RNA isolated from Arabidopsis leaves and seeds. The results reveal a substantial extent of alternative poly(A) site choice in Arabidopsis, including a unique and unanticipated mode of polyadenylation that is directed to sites lying within protein-coding regions. The results also corroborate other studies that reveal an extensive degree of antisense transcription in Arabidopsis, and raise questions as far as the impacts that antisense transcription may have on gene expression.


Detailed methods for the preparation and analysis of cDNA tags that query the mRNA-poly(A) junction are described in the SI Methods. Briefly, cDNA was prepared from total or poly(A)-enriched RNA from Arabidopsis leaves or seeds using an anchored primer that contained sequences to permit incorporation, by PCR, of one of the two Illumina adapters required for paired-end sequencing (see Table S2 for a list of primers). Double-stranded cDNA was digested with one of two restriction enzymes (NlaIII or TaiI); these enzymes recognize four-base sequences and leave an unpaired four-base end. Adapters containing sequences to permit the incorporation, again by PCR, of the other Illumina-compatible sequence were ligated to the digested cDNA that was then purified and amplified using a limited number of PCR cycles. Amplified tags were submitted for Illumina sequencing.

The bioinformatic analysis of the raw sequence data used both in-house tools and third-party software for data processing, integration and analysis. These various tools are described in SI Methods. The mapping results and other metrics are summarized in the Dataset S1 and Fig. S6.

Supplementary Material

Supporting Information:


We thank Carol Von Lanken for technical assistance. This work was supported by the US National Science Foundation (IOS-0817818), the National Institutes of Health (GM07719201), the Kentucky Science and Engineering Foundation (KSEF-2061-RDE-013), the University of Kentucky Executive Vice President for Research, the Miami University Research Office and IT Service (the use of Redhawk Cluster), the National Natural Science Foundation of China (60774033), Special Research Fund for the Doctoral Program of Higher Education (20070384003 and 20090121110022), and Xiamen University's National 211 Project (0630-E62000). X.W. was a visiting doctoral student and supported by the Xiamen University and Miami University Botany Department.


The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Data deposition: The sequence reported in this paper has been deposited in the National Center for Biotechnology Information Short Reads Archive (accession no. SRA028410).

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1019732108/-/DCSupplemental.


1. Lee JY, Yeh I, Park JY, Tian B. PolyA_DB 2: mRNA polyadenylation sites in vertebrate genes. Nucleic Acids Res. 2007;35(Database issue):D165–D168. [PMC free article] [PubMed]
2. Meyers BC, et al. Analysis of the transcriptional complexity of Arabidopsis thaliana by massively parallel signature sequencing. Nat Biotechnol. 2004;22:1006–1011. [PubMed]
3. Shen Y, et al. Genome level analysis of rice mRNA 3′-end processing signals and alternative polyadenylation. Nucleic Acids Res. 2008;36:3150–3161. [PMC free article] [PubMed]
4. Tian B, Hu J, Zhang H, Lutz CS. A large-scale analysis of mRNA polyadenylation of human and mouse genes. Nucleic Acids Res. 2005;33:201–212. [PMC free article] [PubMed]
5. Tian B, Pan Z, Lee JY. Widespread mRNA polyadenylation events in introns indicate dynamic interplay between polyadenylation and splicing. Genome Res. 2007;17:156–165. [PMC free article] [PubMed]
6. Ji Z, Lee JY, Pan Z, Jiang B, Tian B. Progressive lengthening of 3′ untranslated regions of mRNAs by alternative polyadenylation during mouse embryonic development. Proc Natl Acad Sci USA. 2009;106:7028–7033. [PMC free article] [PubMed]
7. Ji Z, Tian B. Reprogramming of 3′ untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types. PLoS ONE. 2009;4:e8419. [PMC free article] [PubMed]
8. Mayr C, Bartel DP. Widespread shortening of 3’UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009;138:673–684. [PMC free article] [PubMed]
9. Flavell SW, et al. Genome-wide analysis of MEF2 transcriptional program reveals synaptic target genes and neuronal activity-dependent polyadenylation site selection. Neuron. 2008;60:1022–1038. [PMC free article] [PubMed]
10. Mangone M, et al. The landscape of C. elegans 3’UTRs. Science. 2010;329:432–435. [PMC free article] [PubMed]
11. Hunt AG. Messenger RNA 3′-end formation and the regulation of gene expression. In: Bassett CL, editor. Regulation of Gene Expression in Plants—The Role of Transcript Structure and Processing. New York: Springer; 2007. pp. 101–122.
12. Hunt AG. Messenger RNA 3′ end formation in plants. Curr Top Microbiol Immunol. 2008;326:151–177. [PubMed]
13. Xing D, Li QQ. Alternative polyadenylation and gene expression regulation in plants. Wiley Interdisciplinary Reviews: RNA. 2010;2:445–458. [PubMed]
14. Simpson GG, Dijkwel PP, Quesada V, Henderson I, Dean C. FY is an RNA 3′ end-processing factor that interacts with FCA to control the Arabidopsis floral transition. Cell. 2003;113:777–787. [PubMed]
15. Hornyik C, Terzi LC, Simpson GG. The spen family protein FPA controls alternative cleavage and polyadenylation of RNA. Dev Cell. 2010;18:203–213. [PubMed]
16. Liu F, Marquardt S, Lister C, Swiezewski S, Dean C. Targeted 3′ processing of antisense transcripts triggers Arabidopsis FLC chromatin silencing. Science. 2010;327:94–97. [PubMed]
17. Loke JC, et al. Compilation of mRNA polyadenylation signals in Arabidopsis revealed a new signal element and potential secondary structures. Plant Physiol. 2005;138:1457–1468. [PMC free article] [PubMed]
18. Zhang J, et al. A polyadenylation factor subunit implicated in regulating oxidative signaling in Arabidopsis thaliana. PLoS ONE. 2008;3:e2410. [PMC free article] [PubMed]
19. Graber JH, Cantor CR, Mohr SC, Smith TF. In silico detection of control signals: mRNA 3′-end-processing sequences in diverse species. Proc Natl Acad Sci USA. 1999;96:14055–14060. [PMC free article] [PubMed]
20. Reddy AS. Alternative splicing of pre-messenger RNAs in plants in the genomic era. Annu Rev Plant Biol. 2007;58:267–294. [PubMed]
21. Schuler MA. Splice site requirements and switches in plants. Curr Top Microbiol Immunol. 2008;326:39–59. [PubMed]
22. Narsai R, et al. Genome-wide analysis of mRNA decay rates and their determinants in Arabidopsis thaliana. Plant Cell. 2007;19:3418–3436. [PMC free article] [PubMed]
23. Craigon DJ, et al. NASCArrays: A repository for microarray data generated by NASC's transcriptomics service. Nucleic Acids Res. 2004;32(Database issue):D575–D577. [PMC free article] [PubMed]
24. Frischmeyer PA, et al. An mRNA surveillance mechanism that eliminates transcripts lacking termination codons. Science. 2002;295:2258–2261. [PubMed]
25. van Hoof A, Frischmeyer PA, Dietz HC, Parker R. Exosome-mediated recognition and degradation of mRNAs lacking a termination codon. Science. 2002;295:2262–2264. [PubMed]
26. Ito-Harashima S, Kuroha K, Tatematsu T, Inada T. Translation of the poly(A) tail plays crucial roles in nonstop mRNA surveillance via translation repression and protein destabilization by proteasome in yeast. Genes Dev. 2007;21:519–524. [PMC free article] [PubMed]
27. Akimitsu N, Tanaka J, Pelletier J. Translation of nonSTOP mRNA is repressed post-initiation in mammalian cells. EMBO J. 2007;26:2327–2338. [PMC free article] [PubMed]
28. Atkinson GC, Baldauf SL, Hauryliuk V. Evolution of nonstop, no-go and nonsense-mediated mRNA decay and their termination factor-derived components. BMC Evol Biol. 2008;8:290. [PMC free article] [PubMed]
29. Isken O, Maquat LE. Quality control of eukaryotic mRNA: Safeguarding cells from abnormal mRNA function. Genes Dev. 2007;21:1833–1856. [PubMed]
30. Guo J, Garrett M, Micklem G, Brogna S. Poly(A) signals located near the 5′ end of genes are silenced by a general mechanism that prevents premature 3′-end processing. Mol Cell Biol. 2011;31:639–651. [PMC free article] [PubMed]
31. Sanfaçon H, Hohn T. Proximity to the promoter inhibits recognition of cauliflower mosaic virus polyadenylation signal. Nature. 1990;346:81–84. [PubMed]
32. Coram TE, Settles ML, Chen X. Large-scale analysis of antisense transcription in wheat using the Affymetrix GeneChip Wheat Genome Array. BMC Genomics. 2009;10:253. [PMC free article] [PubMed]
33. Henz SR, et al. Distinct expression patterns of natural antisense transcripts in Arabidopsis. Plant Physiol. 2007;144:1247–1255. [PMC free article] [PubMed]
34. Jen CH, Michalopoulos I, Westhead DR, Meyer P. Natural antisense transcripts with coding capacity in Arabidopsis may have a regulatory role that is not linked to double-stranded RNA degradation. Genome Biol. 2005;6(6):R51. [PMC free article] [PubMed]
35. Lu C, et al. Genome-wide analysis for discovery of rice microRNAs reveals natural antisense microRNAs (nat-miRNAs) Proc Natl Acad Sci USA. 2008;105:4951–4956. [PMC free article] [PubMed]
36. Osato N, et al. Antisense transcripts with rice full-length cDNAs. Genome Biol. 2003;5(1):R5. [PMC free article] [PubMed]
37. Richardson CR, et al. Analysis of antisense expression by whole genome tiling microarrays and siRNAs suggests mis-annotation of Arabidopsis orphan protein-coding genes. PLoS ONE. 2010;5:e10710. [PMC free article] [PubMed]
38. Stolc V, et al. Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc Natl Acad Sci USA. 2005;102:4453–4458. [PMC free article] [PubMed]
39. Zhou X, Sunkar R, Jin H, Zhu JK, Zhang W. Genome-wide identification and analysis of small RNAs originated from natural antisense transcripts in Oryza sativa. Genome Res. 2009;19:70–78. [PMC free article] [PubMed]
40. Faghihi MA, Wahlestedt C. Regulatory roles of natural antisense transcripts. Nat Rev Mol Cell Biol. 2009;10:637–643. [PMC free article] [PubMed]
41. Faghihi MA, et al. Evidence for natural antisense transcript-mediated inhibition of microRNA function. Genome Biol. 2010;11(5):R56. [PMC free article] [PubMed]
42. Xu Z, et al. Bidirectional promoters generate pervasive transcription in yeast. Nature. 2009;457:1033–1037. [PMC free article] [PubMed]
43. Neil H, et al. Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature. 2009;457:1038–1042. [PubMed]
44. Camblong J, Iglesias N, Fickentscher C, Dieppois G, Stutz F. Antisense RNA stabilization induces transcriptional gene silencing via histone deacetylation in S. cerevisiae. Cell. 2007;131:706–717. [PubMed]
45. Swarbreck D, et al. The Arabidopsis Information Resource (TAIR): Gene structure and function annotation. Nucleic Acids Res. 2008;36(Database issue):D1009–D1014. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...