• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. May 31, 2011; 108(22): 9172–9177.
Published online May 12, 2011. doi:  10.1073/pnas.1100489108
PMCID: PMC3107329

Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing


Transcription-induced chimeric RNAs, possessing sequences from different genes, are expected to increase the proteomic diversity through chimeric proteins or altered regulation. Despite their importance, few studies have focused on chimeric RNAs especially regarding their presence/roles in human cancers. By deep sequencing the transcriptome of 20 human prostate cancer and 10 matched benign prostate tissues, we obtained 1.3 billion sequence reads, which led to the identification of 2,369 chimeric RNA candidates. Chimeric RNAs occurred in significantly higher frequency in cancer than in matched benign samples. Experimental investigation of a selected 46 set led to the confirmation of 32 chimeric RNAs, of which 27 were highly recurrent and previously undescribed in prostate cancer. Importantly, a subset of these chimeras was present in prostate cancer cell lines, but not detectable in primary human prostate epithelium cells, implying their associations with cancer. These chimeras contain discernable 5′ and 3′ splice sites at the RNA junction, indicating that their formation is mediated by splicing. Their presence is also largely independent of the expression of parental genes, suggesting that other factors are involved in their production and regulation. One chimera, TMEM79-SMG5, is highly differentially expressed in human cancer samples and therefore a potential biomarker. The prevalence of chimeric RNAs may allow the limited number of human genes to encode a substantially larger number of RNAs and proteins, forming an additional layer of cellular complexity. Together, our results suggest that chimeric RNAs are widespread, and increased chimeric RNA events could represent a unique class of molecular alteration in cancer.

Chimeric RNAs unrelated to chromosomal rearrangements are primarily generated by two transcription-induced mechanisms (1, 2). The first one is “read-through/splicing,” where a chimeric RNA begins at the upstream gene and ends at a termination point of the adjacent downstream gene, with the region in between removed by splicing. The second one is “trans splicing.” Here two separately generated RNAs are spliced together to give rise to a single RNA. Both mechanisms result in fused transcripts that possess sequences from both genes. Although the exact cause and regulation of chimeric RNAs are unknown, chimeric RNAs, like gene fusions that result from chromosomal rearrangements, are expected to increase the proteomic diversity in cells through chimeric proteins or altered regulation of participating mRNAs.

Despite their importance, few studies have reported on chimeric RNAs, especially regarding their presence and roles in human cancers (3, 4). The lack of reports of chimeric RNAs may stem from the fact that current models of fusion transcripts primarily emphasize chromosomal rearrangements. Alternatively, chimeric RNAs may not have been observed previously because of methods of analyses such as microarrays that lack the necessary resolution for chimeric RNA discovery. Recent advances in high-throughput sequencing technology, which provides single-base resolution of transcribed RNAs, have enabled discoveries of chimeric RNAs from chronic myeloid leukemia cell lines and cultured human melanoma (5, 6). In prostate cancer, the same technology led to the identification of SLC45A3-ELK4. This chimeric RNA is significant in several ways. First, it was present at high levels in the urine of patients with prostate cancer (7, 8), pointing to the potential value of chimeric RNAs as noninvasive biomarkers. Second, its expression is androgen regulated and thus a potential target for androgen ablation therapy. Finally and most intriguingly, unlike the fusion gene TMPRSS2-ERG (9, 10) that results from chromosomal rearrangement, the chimera of SLC45A3-ELK4 appears to be generated by RNA processing without DNA-level rearrangement (7, 8). Thus, SLC45A3-ELK4 may represent a unique class of transcriptional events with important implications in cancer that may have been previously overlooked. This observation raises the possibility that chimeric RNAs in cancers are underinvestigated and mostly unknown.

In this study, we took advantage of the analytical power of paired-end high-throughput sequencing (11, 12) to characterize chimeric RNAs enriched in human prostate cancer. We sequenced the transcribed mRNA (transcriptome) from a cohort of patients with human prostate cancer, yielding 1.3 billion raw sequence reads. This sequencing coverage enabled a “deep” survey of chimeric RNAs expressed from the complex human genome, leading to the validation of 32 recurrent chimeric RNAs. Among them, 27 chimeric RNAs have not been described before. Importantly, one of these chimeras appeared to be highly cancer enriched, as it is expressed at significantly higher levels in human prostate cancers but present at very low levels in noncancer prostates. Our results suggest that recurrent chimeric RNAs are more common than previously thought. The fact that there are more chimeric RNAs in cancer than in matched benign samples raises the possibility that increased chimeric RNA events could represent one of the molecular consequences of cancer.


Identification of Novel Recurrent Chimeric RNAs in Prostate Cancer.

To identify chimeric RNAs that are expressed in prostate cancer, we sequenced the transcriptome of 20 cancer samples and 10 matched benign samples from patients with prostate adenocarcinoma who received no preoperative therapy before radical prostatectomy (Table S1). We used Illumina Genome Analyzer II for sequencing these samples to generate output sequences of paired 36-nucleotide reads. In all, 30 lanes of Illumina Genome Analyzer were used to yield ~1.3 billion raw sequence reads. Using stringent filters (SI Materials and Methods), we obtained nearly 500 million reads that were both uniquely mappable to the human genome and met our additional bioinformatic criteria (Table S2).

Our strategy for identifying chimeric RNAs was to search for paired “chimeric” reads with each read mapping to a different gene either in the genome or in the transcriptome (Fig. 1). To minimize the cases of false positives, we required each read to map to only one location in the genome even with the tolerance of two mismatches. In addition, an event was considered a chimeric candidate only if it was supported by paired chimeric reads from at least three different patient samples. We also filtered out cases of overlapping genes and homologous genes with shared sequences. This strategy led to the identification of 2,369 putative chimeric events (Fig. 2A). Analysis of paired chimeric reads against patients' ages indicates that there is no apparent correlation between age and the number of chimeric reads either in cancer or in matched benign samples. However, paired chimeric reads appear to be more abundant in cancer than in matched benign samples (P value = 0.00014) (Fig. 2B). Together, our results suggest that chimeric RNAs are widespread and occur more frequently in cancer.

Fig. 1.
Strategy for identification and validation of chimeric RNAs. Paired reads are mapped to both the genome and the transcriptome, and if each of the paired reads maps to different genes, then it is considered to be a paired “chimeric” read ...
Fig. 2.
Global analysis of chimeric RNAs. (A) A total of 2,369 putative chimeric events were identified on the basis of stringent bioinformatic criteria described in SI Materials and Methods. The frequency of distribution based on reads from all samples for all ...

To identify chimeric RNAs that are highly recurrent in human cancer samples, we required that a candidate chimeric RNA must be present in >50% of cancer samples used for sequencing (i.e., >10 of 20 total human cancer samples). The analyses narrowed down from 2,369 candidates to a set of 32 highly recurrent chimeric RNAs, and these are marked in purple in Fig. 2A. Among them, 27 are chimeric RNAs that have never been described before (Table S3). The remaining 5, TMPRSS2-ERG, SLC45A3-ELK4, ANKRD39-ANKRD23, HARS2-ZMAT2, and SMG5-PAQR6, are previously known chimeric RNAs in prostate cancer (7, 8, 13, 14), indicating that our procedure is able to “rediscover” previously verified chimeric RNAs. The detailed frequency of occurrence generated from this analysis for each chimeric event from each patient is shown in Table S4A.

To confirm that these identified chimeric RNAs are expressed, we designed specific primer pairs with each primer targeting one parental gene. We then used strand-specific RT-PCR to validate the presence of candidate chimeric RNAs in human samples that were used for paired-end sequencing. For all 32 candidates, including the 5 known chimeras that were used as controls, we were able to obtain RT-PCR products, indicating that these chimeric RNAs are indeed expressed at detectable levels. In addition, the strand-specific RT-PCR enabled the determination of 5′ and 3′ chimeric RNA partners. In most cases, we obtained only a single RT-PCR band for each chimera, which was then excised and subjected to traditional Sanger sequencing. This sequencing led to the identification of the exact junction of the chimeric RNAs (column 8 in Table S3). The results from Sanger sequencing also showed no apparent variation in the RNA junction for each excised band (Fig. S1A), suggesting that chimeric RNAs were generated by a precise mechanism. The newly obtained junction sequences enabled us to search among previously unmappable reads for paired “junction” reads that could now be specifically mapped to a junction site with one read and to a parental gene with another read (see strategy in Fig. 1). For each of the 32 chimeric RNAs, we were able to identify the corresponding junction reads, and their frequencies of occurrence are shown in Table S4B. Together, the results identified and confirmed the junction sequences and strongly support the presence of the corresponding chimeric RNAs in patients’ tissues.

In addition to those putative chimeras we chose to validate, there were 6 additional chimeras that could not be validated by RT-PCR. We also encountered 8 additional chimeras with their RNA sequences extending continuously from one gene to the next without a break or new junctions. These RNAs are considered as simple read-through transcripts, not as chimeric RNAs. None of these RNAs are included in the 32 set described earlier. In all, of a total of 46 putative chimeric events chosen for validation, 32 were experimentally validated as chimeric RNAs with confirmed junctions. Interestingly, all of these 32 chimeric RNAs are intrachromosomal events, with 4 separated by one or more genes, whereas the remaining 28 involve neighboring genes. It is important to note that the list of 2,369 chimeric RNA candidates includes 1,902 interchromosomal and 467 intrachromosomal chimeric events. However, these interchromosomal chimeric RNAs are not highly recurrent, and therefore they were not included in the set that was chosen for validation.

It is known that high-throughput sequencing of the transcriptome can accurately detect gene expression levels over a wide dynamic range (15, 16). Our analysis of gene expression levels using sequence data showed that the parental genes involved in 32 chimeric RNAs are not biased toward highly expressed genes (Fig. S2). Therefore, the chimeric RNAs identified are not simply the result of random “transcriptional leakage” or artifacts of cDNA library construction, as both would generate more events in highly expressed genes. Furthermore, most of the parental genes involved in chimeric RNAs appear to be not overexpressed in cancer vs. matched benign samples. Thus, the higher recurrence of chimeric RNAs in cancer samples cannot simply be attributed to higher expression levels of the parental genes in cancer samples. To estimate the relative expression levels of these chimeric RNAs, we combined all paired chimeric reads and junction reads associated with each chimera and compared them with those of previously known chimeric RNAs in prostate cancer. As shown in column 6 of Table S3, the chimeric RNAs identified in this study averaged 47 paired reads, with SLC16A8-BAIAP2L2 having the highest number, 99. This expression level can be compared with the 9 paired reads supporting SLC45A3-ELK4, the best-characterized chimeric RNA in prostate cancer. Thus, the chimeric RNAs identified in this study are not rare events, but recurrent at relatively high levels. However, their relative expression levels are lower than that of TMPRSS2-ERG (481 paired reads) and this can be attributed to TMPRSS2-ERG being the result of a DNA-level rearrangement and the fact that it is driven by a strong androgen-regulated promoter.

Recurrence of Chimeric RNAs in Cancer vs. Matched Benign Tissue.

To evaluate their differential expression in cancer vs. matched benign tissues, we plotted the number of supporting chimeric reads and junction reads for each chimeric RNA and its occurrence for each patient. As shown in Fig. 3, most of the verified chimeric RNAs appeared to be highly recurrent in cancer samples, although many of them also appeared in matched benign tissues from the radical prostatectomy specimens, albeit with lower frequency. This includes TMPRSS2-ERG used as our control, which is known to be cancer specific (9). The presence of these chimeric RNAs within the matched benign samples may represent a “field effect” within the histologically normal epithelium such that the benign epithelium may have multifocal premalignant lesions that precede histological changes (17). Alternatively, small foci of cancer may be present in some matched benign samples because the tissue is not evaluated histologically throughout the entire tissue used for RNA extraction. Nevertheless, the sequencing data of matched benign tissues enabled the valuable comparison of chimeric RNAs to that of cancer. To determine which chimeric RNAs are differentially expressed in cancer tissue, we used the nonparametric Kolmogorov–Smirnov test and identified seven chimeric RNAs with a P value more significant than that of TMPRSS2-ERG (P = 0.046) (Fig. 3). These include the chimera previously known to be elevated in prostate cancer tissue (SLC45A3-ELK4, P = 0.027) (7, 8) or found in cancer cell lines (ANKRD39-ANKRD23, P = 0.046) (13). The remaining five chimeric RNAs are TMEM79-SMG5, SLC16A8-BAIAP2L2, SLC44A4-EHMT2, BC035340-MCF2L, and TSPAN1-POMGNT1. Among them, TMEM79-SMG5 appeared to be most significant, with a P value of 0.004 that compares favorably to that of TMPRSS2-ERG.

Fig. 3.
Chimeric RNAs are highly recurrent in prostate cancer. The relative frequency of recurrence of chimeric RNAs in each cancer and matched benign sample is shown. Each vertical column represents data from one patient sample and each horizontal row represents ...

To further evaluate the differential expression of unique chimeric RNAs without the confounding issue of tissue impurity, we used RT-PCR to verify the presence of identified chimeric RNAs in an additional 10 human prostate cancer tissues and 5 normal prostate tissues from organ donors without prostate cancer (these samples were not used for transcriptome sequencing). Because this analysis compares prostate tissues from patients with cancer to those from patients without cancer, tissue impurity or field effects are not a consideration. As seen in Table S5, most of the validated chimeric RNAs were highly recurrent in the additional 10 patients with cancer, confirming the results obtained from our transcriptome sequencing. Importantly, two of the chimeric RNAs were found only in patients with cancer, but were undetectable in the donors without cancer, suggesting that their expression is enriched in cancer cells. These include TMEM79-SMG5 (9 of 10 cancer vs. 0 of 5 noncancer) and the known cancer-specific TMPRSS2-ERG (5 of 10 cancer vs. 0 of 5 noncancer). The absence of the TMEM79-SMG5 chimera in donors without cancer raised the question of whether this absence is due to the absence of expression of their participating parental genes. Because chimeric RNAs possess shared sequences with their parental genes, it is difficult to determine whether the reads obtained from transcriptome sequencing were derived from chimeric RNA or parental RNA. Therefore, to answer this question, we designed PCR primers targeting specific segments of participating parental mRNA that are absent in chimeric RNAs. The results of two chimeras (TMEM79-SMG5 and RASL12-OSTbeta), shown in Fig. 4A, indicate that the parental genes were clearly expressed in donors without cancer. The expression of parental TMEM79 and SMG5, however, did not lead to a detectable level of chimera expression in donors without cancer. This result is in contrast to that seen in the cancer tissues where the parental mRNAs and chimera TMEM79-SMG5 were both clearly expressed (Fig. 4A). Furthermore, the absence of expression of the 5′-parental gene does not exclude the generation of the chimeric RNA. This result is seen in RASL12-OSTbeta, where the 5′-parental gene RASL12 is not detected in cancer samples by RT-PCR; however, the chimera is expressed in cancer samples (Fig. 4A).

Fig. 4.
Examples of chimeric RNA and parental gene expression in additional patient cohorts and prostate cancer cell lines. (A) Results of RT-PCR analysis for the indicated chimeric RNAs and their parental genes in 10 cancer samples (C41–C50) and 5 noncancer ...

Expression of Chimeric RNAs in Prostate Cancer Cell Lines.

Because the chimeric RNAs we identified are highly recurrent in human prostate cancer samples, we speculated that these chimeric RNAs may also be present in established prostate cell lines. Therefore, we tested the expression of these chimeric RNAs in cancer cell lines of prostate origin, including androgen receptor negative cell lines (PC3 and DU145) and androgen-sensitive cell lines (LNCaP, VCaP, and LAPC4), as well as immortalized prostatic epithelial cells (PNT1a), to compare with that of the noncancer primary human prostate epithelial cells (PrEC). Our RT-PCR results showed that most of the 32 chimeric RNAs displayed varying expression profile in cancer cell lines (Table S6). However, nine chimeric RNAs were conspicuously absent in noncancer PrEC (Table S6 and Fig. 4B). These chimeric RNAs include TMEM79-SMG5, the same chimera detectable by RT-PCR in patients with cancer but not in donors without cancer. The rest of the group comprises XPA-NCBP1, RASL12-OSTbeta, ELF3-RNPEP, ASTN2-PAPPA, GOLM1-MAK10, BC035340-MCF2L, NCAPD3-JAM3, and the control TMPRSS2-ERG that is known to be highly expressed in VCaP cells (Fig. 4B). The absence of these chimeric RNAs in PrEC also raised the question of whether this absence is because their parental genes are not expressed. Our RT-PCR results, however, showed that many of the parental genes were expressed in PrEC (Fig. 4C). For example, TMEM79 and SMG5 are both expressed; however, the corresponding chimeric RNA is undetectable. Similarly, both ELF3 and RNPEP are expressed in PrEC but not the chimeric RNA. Hence, the absence of these chimeric RNAs in PrEC cannot be attributed simply to the absence of expression of participating parental genes.

Mechanisms Responsible for Generation of Chimeric RNAs.

To validate DNA rearrangements, the standard FISH assay requires a minimum distance between genes in the order of 100–150 kb (8). This assay was possible for TMPRSS2-ERG (distance between genes is 3 Mb) (10). It is, however, not possible to use this assay for the rest of the validated chimeric RNAs on our list, as the distances between the ends of the two genes for these chimeras are ≤30 kb (see column 3 in Table S3). Therefore, to verify whether these unique chimeric RNAs are a result of genomic DNA rearrangement, we performed long-range PCR on genomic DNA from patients who displayed expression of the corresponding chimeric RNAs. The expected PCR product size from genomic DNA for the majority of the chimeras is <30 kb, and this size is within the detection range of long-range PCR. Our results, however, showed that there is no evidence for gross genomic rearrangement for the 27 chimeric RNAs (examples shown in Fig. S3), indicating that these chimeras are likely the result of RNA-level events similar to SLC45A3-ELK4 (7, 8). Intriguingly, further sequence analyses indicated that almost all of the validated chimeras contain a discernable 5′ splice site and a 3′ splice site at the RNA junction (see example in Fig. 5), indicating that the formation of these chimeras is mediated by splicing. To test whether the parental genes involved in chimeras prefer a certain range of intergenic distance, we calculated the distances between the ends of the two participating parental genes on the genome. As shown in column 3 of Table S3, parental genes of the chimeric RNAs tend to reside closer on the genome with a median distance of 2 kb compared with the known median of 48 kb for adjacent genes in the entire genome (18). The observation of splice sites at RNA junctions, together with the fact that the chimera formation prefers neighboring genes with short distances, and the lack of evidence of DNA-level rearrangements suggest that they may be generated by either read-through/splicing or trans splicing.

Fig. 5.
Schematic of TMEM79-SMG5 chimeric RNA in prostate cancer. Chimeric RNAs can result in a new 5′-UTR and truncated ORF as in the case of TMEM79-SMG5. In the pictogram, coding exons are represented by blocks connected by horizontal lines representing ...

TMEM79-SMG5 as a Potential Diagnostic Tool.

Our results from three different analyses, (i) transcriptome sequencing of human cancer and matched benign samples, (ii) RT-PCR of human cancer and noncancer donor samples, and (iii) RT-PCR of prostate cancer cell lines and primary prostate cells, all independently indicate that the chimera TMEM79-SMG5 is highly recurrent and appears to be enriched in cancer. To further evaluate the potential of this chimeric RNA as a diagnostic tool, we performed quantitative RT-PCR analysis in an extended cohort of human prostate samples from 54 patients with prostate cancer and 18 donors without cancer. In parallel, we also performed quantitative RT-PCR in PrEC, which represents primary prostate epithelium cells pooled from donors without cancer. As shown in Fig. 6, the chimera is expressed at a distinctly higher level in most samples from patients with cancer than in donors without cancer (P value= 1.048 e-05), and it is undetectable in PrEC. The pronounced difference between expression levels of this chimera in patients with cancer and donors without cancer and its complete absence in PrEC indicate that TMEM79-SMG5 could be a valuable indicator to differentiate between patients with cancer and patients without cancer.

Fig. 6.
The TMEM79-SMG5 chimera as a potential diagnostic marker. (A) The relative expression level of the TMEM79-SMG5 chimera (normalized to GAPDH) in PrEC, 18 donors without cancer, and 54 patients with prostate cancer, determined by quantitative RT-PCR. These ...


This study provides a deep survey of chimeric RNAs expressed in human prostate cancers. We sequenced 20 cancer samples and 10 matched benign samples from patients with prostate adenocarcinoma. The 1.3 billion reads resulting from deep sequencing enabled the identification of 2,369 putative chimeric RNAs using stringent bioinformatic criteria. Altogether, we chose 46 events for experimental validation and of these, 32 events (~70%) could be verified by RT-PCR and their RNA junctions were determined by Sanger sequencing. Although this 70% validation rate is relatively high, the chimeric RNAs chosen for validation are among the highly recurrent ones. It remains to be determined whether the lower recurrent chimeric RNAs could also be validated at a similar rate. Because our results were derived from 1.3 billon reads with each chimeric candidate supported by at least 3 different patient samples, increasing the sequencing depth or sample numbers should further increase the number of recurrent chimeric RNAs identified while reducing the incidence of false positives. If a significant portion of these 2,369 chimeric events are true, they would involve up to 3,144 parental genes that correspond to ~14% of the total human protein-coding genes. Because only a subset of the human genome is expressed in any given differentiated tissue such as prostate, the percentage of “transcribed” genome that is involved in chimeric RNAs could be significantly higher. Our deep-sequencing results therefore suggest that chimeric RNAs may be more prevalent in human cells than previously thought. The results thus significantly extend the observation made by a recent in silico analysis of ESTs and cDNAs in the National Center for Biotechnology Information databases, which independently estimated that as much as 5% of the human genome could transcribe chimeric sequences (19).

The biological consequences of chimeric RNAs would be similar to that of gene fusions resulting from DNA rearrangement, as both mechanisms generate chimeric transcripts that possess sequences from two genes. Chimeric RNAs, like gene fusions, are expected to increase the proteomic diversity in cells through chimeric proteins or altered regulation of participating mRNAs. However, the number of chimeric RNAs revealed by deep sequencing appears far greater than that of a few gene fusion events identified in prostate tissue so far (7, 13, 14). Furthermore, all of the validated chimeras in our study are recurrent in at least 50% of the patient samples. This result is in contrast to gene fusion events in prostate cancer that are rare and predominantly nonrecurrent with the exception of TMPRSS2-ERG (14). The prevalence of chimeric RNAs may allow the limited number of human genes to encode a larger number of RNAs and proteins, forming yet an additional layer of cellular complexity. Importantly, our results indicate that the number of total chimeric RNAs appeared significantly higher in cancer than in matched benign tissues (Fig. 2B; P value = 0.00014), raising the possibility that increased chimeric RNA events could represent one of the molecular consequences of cancer.

Our PCR analysis of patients’ genomic DNA indicates that all these validated chimeric RNAs are likely generated in the absence of DNA-level rearrangements. Furthermore, sequence analyses indicated that almost all of the RNA junctions contain a discernable 5′ splice site and a 3′ splice site at the junction site, indicating that the formation of chimeric RNAs is mediated by splicing. This mediation could result from two known transcription-induced mechanisms: read-through/splicing and trans splicing. However, on the basis of the RNA sequence information, it is impossible to discern between these two mechanisms. For example, a chimeric RNA joining neighboring genes located on the same strand could theoretically result from a single transcript reading through both genes followed by splicing or by splicing of two separate RNA molecules independently generated from neighboring genes.

Intriguingly, with the exception of TMPRSS2-ERG, SLC45A3-ELK4, and ANKRD39-ANKRD23, the set of recurrent chimeric RNAs that we identified in human prostate cancer tissues has virtually no overlap with the set of chimeric RNAs identified in prostate cancer cell lines using similar sequencing methods (7, 13). This result is not because the recurrent chimeric RNAs found in tissues are absent in cancer cell lines. In fact, we experimentally demonstrated that all of the 27 recurrent chimeras identified in our study displayed expression in at least one or more prostate cancer cell lines. However, the reverse is not true, as the chimeric RNAs reported from cell line studies are largely absent from our extended list of 2,369 putative chimeric events. This result is unlikely due to the depth of sequencing as highly recurrent chimeric RNAs will appear frequently in every study. More likely, the difference between these studies reflects the differences between cancer cell lines and tissue samples, and the chimeric RNA species prevailing in human cancer tissues appeared to be dramatically different from those present in cancer cell lines. Because the composition of chimeric RNA species and their recurrence in this study are derived from human cancer tissues directly, the identified chimeras could be more relevant to tumor biology and may have more valuable clinical utilities.

Several observations suggest that the generation of recurrent chimeric RNAs enriched in cancer is not the result of stochastic processes. First, we observed no variation in RNA junction sequence from sequenced RT-PCR bands or from junction reads. Each of the validated recurrent chimeras exhibits consistent and precise RNA junction within the same patient and across all patient samples. Second, random processes should generate more chimeras from highly expressed genes. However, the identified chimeric RNAs are not biased toward highly expressed genes (Fig. S2). Third, the expression of participating parental genes does not automatically lead to the generation of the corresponding chimeric RNAs. For example, both the parental genes SMG5 and TMEM79 are expressed in cancer samples as well as in samples from donors without cancer, but the presence of their chimeras is largely restricted to cancer samples (Fig. 4A). Furthermore, the absence of parental gene expression does not exclude the generation of the chimeric RNA. For example, RASL12 is not detected in cancer samples but the chimera RASL12-OSTbeta is expressed in cancer samples (Fig. 4A). Together, they suggest that the generation of chimeric RNAs, whether through trans splicing or read-through/splicing, is largely independent of the expression of parental genes and is tightly regulated. This result raises the possibility that other factors are involved in the production and regulation of chimeric RNAs.

The RNA strand information and junction sequences determined experimentally by RT-PCR enabled us to speculate about the potential biological consequences of these chimeric RNAs. Our analysis indicated that 18 of the 32 identified chimeric RNAs could lead to unique protein sequences due to fused or truncated protein-coding regions, and almost all would have either new 5′- or 3′-UTRs that could lead to altered gene regulation (column 7 in Table S3). For example, the recurrent chimera TMEM79-SMG5 (full-length sequence is provided in Table S3) results in the loss of the first exon of SMG5, with genome sequence from TMEM79 contributing a new 5′-UTR to the truncated SMG5 (Fig. 5 and Table S3). This change may lead to altered function and regulation of SMG5, a protein essential to the nonsense-mediated mRNA decay (NMD) (20). However, it is also possible that chimeric RNAs could function as noncoding RNAs or regulatory RNAs without a protein counterpart. These hypotheses remain to be tested.

Our results indicate that the chimera TMEM79-SMG5 is highly recurrent and enriched in cancer samples. First, RT-PCR analysis showed that the TMEM79-SMG5 chimera is absent in noncancer PrEC but is expressed only in androgen-sensitive cancer cell lines VCaP and LNCaP, even though all these cell lines are epithelial in origin. Second, quantitative RT-PCR analysis on an extended cohort of patients confirmed that this chimera is highly expressed in patients with cancer, whereas its expression is far lower in donors without cancer and undetectable in PrEC. The most direct utility of the chimera TMEM79-SMG5 is perhaps in its use as diagnostic biomarker. Compared with TMPRSS2-ERG, which can be found in ~50% of the patients with cancer, TMEM79-SMG5 is detected by RT-PCR in ~90% of the cancer samples we tested. Compared with SLC45A3-ELK4, which is expressed at a relatively high level in patients without cancer (Fig. 4A), TMEM79-SMG5 appears to be largely restricted to cancer. Thus, TMEM79-SMG5 could have the potential to serve as a biomarker to separate patients with cancer from patients without cancer.

Materials and Methods

All radical prostatectomy tissue samples were obtained from the Baylor Prostate Specialized Programs of Research Excellence (SPORE) Tissue Core. RNA was extracted from samples using the Ribopure kit (Ambion). The human prostate cancer cell lines were maintained as detailed in SI Materials and Methods. Total RNA samples were processed for transcriptome sequencing using the Illumina mRNA-seq protocol. Bioinformatic identification of chimeric RNAs and their corresponding junction reads is detailed in SI Results. For validation of chimeric RNAs, RT was performed using superscript II (Invitrogen) and PCR was performed using primers listed in SI Materials and Methods. Detailed materials and methods are included in SI Materials and Methods.

Supplementary Material

Supporting Information:


We thank Dr. Patricia Castro for assisting us with the RNA samples from human cancer and matched benign tissues and Dr. Chad Creighton for helpful discussion. L.Y. is supported by Department of Defense Idea Development Award PC093918, American Cancer Society seed fund IRG-93-034-12, and a Duncan Scholar Award. W.L. is supported by Department of Defense Prostate Cancer Program PC094421 and a Duncan Scholar Award.


The authors declare no conflict of interest.

Data deposition: The RNA-seq data reported in this paper have been deposited in the Gene Expression Omnibus (GEO) database, www.ncbi.nlm.nih.gov/geo (accession no. GSE22260).

This article is a PNAS Direct Submission.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1100489108/-/DCSupplemental.


1. Gingeras TR. Implications of chimaeric non-co-linear transcripts. Nature. 2009;461:206–211. [PMC free article] [PubMed]
2. Kaye FJ. Mutation-associated fusion cancer genes in solid tumors. Mol Cancer Ther. 2009;8:1399–1408. [PubMed]
3. Li H, Wang J, Mor G, Sklar J. A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science. 2008;321:1357–1361. [PubMed]
4. Wang K, Ubriaco G, Sutherland LC. RBM6-RBM5 transcription-induced chimeras are differentially expressed in tumours. BMC Genomics. 2007;8:348. [PMC free article] [PubMed]
5. Berger MF, et al. Integrative analysis of the melanoma transcriptome. Genome Res. 2010;20:413–427. [PMC free article] [PubMed]
6. Levin JZ, et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 2009;10:R115. [PMC free article] [PubMed]
7. Maher CA, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature. 2009;458:97–101. [PMC free article] [PubMed]
8. Rickman DS, et al. SLC45A3-ELK4 is a novel and frequent erythroblast transformation-specific fusion transcript in prostate cancer. Cancer Res. 2009;69:2734–2738. [PMC free article] [PubMed]
9. Perner S, et al. TMPRSS2:ERG fusion-associated deletions provide insight into the heterogeneity of prostate cancer. Cancer Res. 2006;66:8337–8341. [PubMed]
10. Tomlins SA, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005;310:644–648. [PubMed]
11. Mardis ER. Next-generation DNA sequencing methods. Annu Rev Genomics Hum Genet. 2008;9:387–402. [PubMed]
12. Simon SA, et al. Short-read sequencing technologies for transcriptional analyses. Annu Rev Plant Biol. 2009;60:305–333. [PubMed]
13. Maher CA, et al. Chimeric transcript discovery by paired-end transcriptome sequencing. Proc Natl Acad Sci USA. 2009;106:12353–12358. [PMC free article] [PubMed]
14. Pflueger D, et al. Discovery of non-ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome Res. 2011;21:56–67. [PMC free article] [PubMed]
15. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. [PubMed]
16. Nagalakshmi U, et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. [PMC free article] [PubMed]
17. Nonn L, Ananthanarayanan V, Gann PH. Evidence for field cancerization of the prostate. Prostate. 2009;69:1470–1479. [PMC free article] [PubMed]
18. Akiva P, et al. Transcription-mediated gene fusion in the human genome. Genome Res. 2006;16:30–36. [PMC free article] [PubMed]
19. Parra G, et al. Tandem chimerism as a means to increase protein complexity in the human genome. Genome Res. 2006;16:37–44. [PMC free article] [PubMed]
20. Ohnishi T, et al. Phosphorylation of hUPF1 induces formation of mRNA surveillance complexes containing hSMG-5 and hSMG-7. Mol Cell. 2003;12:1187–1200. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • GEO DataSets
    GEO DataSets
    GEO DataSet links
  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...