• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jul 28, 2009; 106(30): 12353–12358.
Published online Jul 10, 2009. doi:  10.1073/pnas.0904720106
PMCID: PMC2708976
Cell Biology

Chimeric transcript discovery by paired-end transcriptome sequencing


Recurrent gene fusions are a prevalent class of mutations arising from the juxtaposition of 2 distinct regions, which can generate novel functional transcripts that could serve as valuable therapeutic targets in cancer. Therefore, we aim to establish a sensitive, high-throughput methodology to comprehensively catalog functional gene fusions in cancer by evaluating a paired-end transcriptome sequencing strategy. Not only did a paired-end approach provide a greater dynamic range in comparison with single read based approaches, but it clearly distinguished the high-level “driving” gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level “passenger” gene fusions. Also, the comprehensiveness of a paired-end approach enabled the discovery of 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded previous approaches. Using the paired-end transcriptome sequencing approach, we observed read-through mRNA chimeras, tissue-type restricted chimeras, converging transcripts, diverging transcripts, and overlapping mRNA transcripts. Last, we successfully used paired-end transcriptome sequencing to detect previously undescribed ETS gene fusions in prostate tumors. Together, this study establishes a highly specific and sensitive approach for accurately and comprehensively cataloguing chimeras within a sample using paired-end transcriptome sequencing.

Keywords: bioinformatics, gene fusions, prostate cancer, breast cancer, RNA-Seq

One of the most common classes of genetic alterations is gene fusions, resulting from chromosomal rearrangements (1). Intriguingly, >80% of all known gene fusions are attributed to leukemias, lymphomas, and bone and soft tissue sarcomas that account for only 10% of all human cancers. In contrast, common epithelial cancers, which account for 80% of cancer-related deaths, can only be attributed to 10% of known recurrent gene fusions (24). However, the recent discovery of a recurrent gene fusion, TMPRSS2-ERG, in a majority of prostate cancers (5, 6), and EML4-ALK in non-small-cell lung cancer (NSCLC) (7), has expanded the realm of gene fusions as an oncogenic mechanism in common solid cancers. Also, the restricted expression of gene fusions to cancer cells makes them desirable therapeutic targets. One successful example is imatinib mesylate, or Gleevec, that targets BCR-ABL1 in chronic myeloid leukemia (CML) (810). Therefore, the identification of novel gene fusions in a broad range of cancers is of enormous therapeutic significance.

The lack of known gene fusions in epithelial cancers has been attributed to their clonal heterogeneity and to the technical limitations of cytogenetic analysis, spectral karyotyping, FISH, and microarray-based comparative genomic hybridization (aCGH). Not surprisingly, TMPRSS2-ERG was discovered by circumventing these limitations through bioinformatics analysis of gene expression data to nominate genes with marked overexpression, or outliers, a signature of a fusion event (6). Building on this success, more recent strategies have adopted unbiased high-throughput approaches, with increased resolution, for genome-wide detection of chromosomal rearrangements in cancer involving BAC end sequencing (11), fosmid paired-end sequences (12), serial analysis of gene expression (SAGE)-like sequencing (13), and next-generation DNA sequencing (14). Despite unveiling many novel genomic rearrangements, solid tumors accumulate multiple nonspecific aberrations throughout tumor progression; thus, making causal and driver aberrations indistinguishable from secondary and insignificant mutations, respectively.

The deep unbiased view of a cancer cell enabled by massively parallel transcriptome sequencing has greatly facilitated gene fusion discovery. As shown in our previous work, integrating long and short read transcriptome sequencing technologies was an effective approach for enriching “expressed” fusion transcripts (15). However, despite the success of this methodology, it required substantial overhead to leverage 2 sequencing platforms. Therefore, in this study, we adopted a single platform paired-end strategy to comprehensively elucidate novel chimeric events in cancer transcriptomes. Not only was using this single platform more economical, but it allowed us to more comprehensively map chimeric mRNA, hone in on driver gene fusion products due to its quantitative nature, and observe rare classes of transcripts that were overlapping, diverging, or converging.


Chimera Discovery via Paired-End Transcriptome Sequencing.

Here, we employ transcriptome sequencing to restrict chimera nominations to “expressed sequences,” thus, enriching for potentially functional mutations. To evaluate massively parallel paired-end transcriptome sequencing to identify novel gene fusions, we generated cDNA libraries from the prostate cancer cell line VCaP, CML cell line K562, universal human reference total RNA (UHR; Stratagene), and human brain reference (HBR) total RNA (Ambion). Using the Illumina Genome Analyzer II, we generated 16.9 million VCaP, 20.7 million K562, 25.5 million UHR, and 23.6 million HBR transcriptome mate pairs (2 × 50 nt). The mate pairs were mapped against the transcriptome and categorized as (i) mapping to same gene, (ii) mapping to different genes (chimera candidates), (iii) nonmapping, (iv) mitochondrial, (v) quality control, or (vi) ribosomal (Table S1). Overall, the chimera candidates represent a minor fraction of the mate pairs, comprising ≈<1% of the reads for each sample.

We believe that a paired-end strategy offers multiple advantages over single read based approaches such as alleviating the reliance on sequencing the reads traversing the fusion junction, increased coverage provided by sequencing reads from the ends of a transcribed fragment, and the ability to resolve ambiguous mappings (Fig. S1). Therefore, to nominate chimeras, we leveraged each of these aspects in our bioinformatics analysis. We focused on both mate pairs encompassing and/or spanning the fusion junction by analyzing 2 main categories of sequence reads: chimera candidates and nonmapping (Fig. S2A). The resulting chimera candidates from the nonmapping category that span the fusion boundary were merged with the chimeras found to encompass the fusion boundary revealing 119, 144, 205, and 294 chimeras in VCaP, K562, HBR, and UHR, respectively.

Comparison of a Paired-End Strategy Against Existing Single Read Approaches.

To assess the merit of adopting a paired-end transcriptome approach, we compared the results against existing single read approaches. Although current RNA sequencing (RNA-Seq) studies have been using 36-nt single reads (16, 17), we increased the likelihood of spanning a fusion junction by generating 100-nt long single reads using the Illumina Genome Analyzer II. Also, we chose this length because it would facilitate a more comparable amount of sequencing time as required for sequencing both 50-nt mate pairs. In total, we generated 7.0, 59.4, and 53.0 million 100-nt transcriptome reads for VCaP, UHR, and HBR, respectively, for comparison against paired-end transcriptome reads from matched samples.

Because the UHR is a mixture of cancer cell lines, we expected to find numerous previously identified gene fusions. Therefore, we first assessed the depth of coverage of a paired-end approach against long single reads by directly comparing the normalized frequency of sequence reads supporting 4 previously identified gene fusions [TMPRSS2-ERG (5, 6), BCR-ABL1 (18), BCAS4-BCAS3 (19), and ARFGEF2-SULF2 (20)]. As shown in Fig. 1A, we observed a marked enrichment of paired-end reads compared with long single reads for each of these well characterized gene fusions.

Fig. 1.
Dynamic range and sensitivity of the paired-end transcriptome analysis relative to single read approaches. (A) Comparison of paired-end (blue) and long single transcriptome reads (black) supporting known gene fusions TMPRSS2-ERG, BCR-ABL1, BCAS4-BCAS3 ...

We observed that TMPRSS2-ERG had a >10-fold enrichment between paired-end and single read approaches. The schematic representation in Fig. 1B indicates the distribution of reads confirming the TMPRSS2-ERG gene fusion from both paired-end and single read sequencing. As expected, the longer reads improve the number of reads spanning known gene fusions. For example, had we sequenced a single 36-mer (shown in red text), 11 of the 17 chimeras, shown in the bottom portion of the long single reads, would not have spanned the gene fusion boundary, but instead, would have terminated before the junction and, therefore, only aligned to TMPRSS2. However, despite the improved results only 17 chimeric reads were generated from 7.0 million long single read sequences. In contrast, paired-end sequencing resulted in 552 reads supporting the TMPRSS2-ERG gene fusion from ≈17 million sequences.

Because we are using sequence based evidence to nominate a chimera, we hypothesized that the approach providing the maximum nucleotide coverage is more likely to capture a fusion junction. We calculated an in silico insert size for each sample using mate pairs aligning to the same gene, and found the mean insert size of ≈200 nt. Then, we compared the total coverage from single reads (coverage is equivalent to the total number of pass filter reads against the read length) with the paired-end approach (coverage is equivalent to the sum of the insert size with the length of each read) (Fig. S2B). Overall, we observed an average coverage of 848.7 and 757.3 MB using single read technology, compared with 2,553.3 and 2,363 MB from paired-end in UHR and HBR, respectively. This increase in ≈3-fold coverage in the paired-end samples compared with the long read approach, per lane, could explain the increased dynamic range we observed using a paired-end strategy.

Next we wanted to identify chimeras common to both strategies. The long read approach nominated 1,375 and 1,228 chimeras, whereas with a paired-end strategy, we only nominated 225 and 144 chimeras in UHR and HBR, respectively. As shown in the Venn diagram (Fig. 1C), there were 32 and 31 candidates common to both technologies for UHR and HBR, respectively. Within the common UHR chimeric candidates, we observed previously identified gene fusions BCAS4-BCAS3, BCR-ABL1, ARFGEF2-SULF2, and RPS6KB1-TMEM49 (13). The remaining chimeras, nominated by both approaches, represent a high fidelity set. Therefore, to further assess whether a paired-end strategy has an increased dynamic range, we compared the ratio of normalized mate pair reads against single reads for the remaining chimeras common to both technologies. We observed that 93.5 and 93.9% of UHR and HBR candidates, respectively, had a higher ratio of normalized mate pair reads to single reads (Table S2), confirming the increased dynamic range offered by a paired-end strategy. We hypothesize that the greater number of nominated candidates specific to the long read approach represents an enrichment of false positives, as observed when using the 454 long read technology (15, 21).

Paired-End Approach Reveals Novel Gene Fusions.

We were interested in determining whether the paired-end libraries could detect novel gene fusions. Among the top chimeras nominated from VCaP, HBR, UHR, and K562, many were already known, including TMPRSS2-ERG, BCAS4-BCAS3, BCR-ABL1, USP10-ZDHHC7, and ARFGEF2-SULF2. Also ranking among these well known gene fusions in UHR was a fusion on chromosome 13 between GAS6 and RASA3 (Fig. S3A and Table S2). The fact that GAS6-RASA3 ranked higher than BCR-ABL1 suggests that it may be a driving fusion in one of the cancer cell lines in the RNA pool.

Another observation was that there were 2 candidates among the top 10 found in both UHR and K562. This observation was intriguing, because hematological malignancies are not considered to have multiple gene fusion events. In addition to BCR-ABL1, we were able to detect a previously undescribed interchromosomal gene fusion between exon 23 of NUP214 located at chromosome 9q34.13 with exon 2 of XKR3 located at chromosome 22q11.1. Both of these genes reside on chromosome 22 and 9 in close proximity to BCR and ABL1, respectively (Fig. S3B). We confirmed the presence of NUP214-XKR3 in K562 cells using qRT-PCR, but were unable to detect it across an additional 5 CML cell lines tested (SUP-B15, MEG-01, KU812, GDM-1, and Kasumi-4) (Fig. S3C). These results suggest that NUP214-XKR3 is a “private” fusion that originated from additional complex rearrangements after the translocation that generated BCR-ABL1 and a focal amplification of both gene regions.

Although we were able to detect BCR-ABL1 and NUP214-XKR3 in both UHR and K562, there was a marked reduction in the mate pairs supporting these fusions in UHR. Although a diluted signal is expected, because UHR is pooled samples, it provides evidence that pooling samples can serve as a useful approach for nominating top expressing chimeras, and potentially enrich for “driver” chimeras.

Previously Undescribed Prostate Gene Fusions.

Our previous work using integrative transcriptome sequencing to detect gene fusions in cancer revealed multiple gene fusions, demonstrating the complexity of the prostate transcriptomes of VCaP and LNCaP (15). Here, we exploit the comprehensiveness of a paired-end strategy on the same cell lines to reveal novel chimeras. In the circular plot shown in Fig. S4A, we displayed all experimentally validated paired-end chimeras in the larger red circle. We found that all of the previously discovered chimeras in VCaP and LNCaP comprised a subset of the paired-end candidates, as displayed in the inner black circle.

As expected, TMPRSS2-ERG was the top VCaP candidate. In addition to “rediscovering” the USP10-ZDHHC7, HJURP-INPP4A, and EIF4E2-HJURP gene fusions, a paired-end approach revealed several previously undescribed gene fusions in VCaP. One such example was an interchromosomal gene fusion between ZDHHC7, on chromosome 16, with ABCB9, residing on chromosome 12, that was validated by qRT-PCR (Fig. S3D). Interestingly, the 5′ partner, ZDHHC7, had previously been validated as a complex intrachromosomal gene fusion with USP10 (15). Both fusions have mate pairs aligning to the same exon of ZDHHC7 (15), suggesting that their breakpoints are in adjacent introns (Fig. S3D).

Another previously undescribed VCaP interchromosomal gene fusion that we discovered was between exon 2 of TIA1, residing on chromosome 2, with exon 3 of DIRC2, or disrupted in renal carcinoma 2, located on chromosome 3. TIA1-DIRC2 was validated by qRT-PCR and FISH (Fig. S5). In total, we confirmed an additional 4 VCaP and 2 LNCaP chimeras (Fig. S6). Overall, these fusions demonstrate that paired-end transcriptome sequencing can nominate candidates that have eluded previous techniques, including other massively parallel transcriptome sequencing approaches.

Distinguishing Causal Gene Fusions from Secondary Mutations.

We were next interested in determining whether the dynamic range provided by paired-end sequencing can distinguish known high-level “driving” gene fusions, such as known recurrent gene fusions BCR-ABL1 and TMPRSS2-ERG, from lower level “passenger” fusions. Therefore, we plotted the normalized mate pair coverage at the fusion boundary for all experimentally validated gene fusions for the 2 cell lines that we sequenced harboring recurrent gene fusions, VCaP and K562. As shown in Fig. S4B, we observed that both driver fusions, TMPRSS2-ERG and BCR-ABL1, show the highest expression among the validated chimeras in VCaP and K562, respectively. This observation suggests a paired-end nomination strategy for selecting putative driver gene fusions among private nonspecific gene fusions that lack detectable levels of expression across a panel of samples (15).

Previously Undescribed Breast Cancer Gene Fusions.

Our ability to detect previously undescribed prostate gene fusions in VCaP and LNCaP demonstrated the comprehensiveness of paired-end transcriptome sequencing compared with an integrated approach, using short and long transcriptome reads. Therefore, we extended our paired-end analysis by using breast cancer cell line MCF-7, which has been mined for fusions using numerous approaches such as expressed sequence tags (ESTs) (22), array CGH (23), single nucleotide polymorphism arrays (24), gene expression arrays (25), end sequence profiling (20, 26), and paired-end diTag (PET) (13).

A histogram (Fig. S4C) of the top ranking MCF-7 candidates highlights BCAS4-BCAS3 and ARFGEF-SULF2 as the top 2 ranking candidates, whereas other previously reported candidates, such as SULF2-PRICKLE, DEPDC1B-ELOVL7, RPS6KB1-TMEM49, and CXorf15-SYAP1, were interspersed among a comprehensive list of previously undescribed putative chimeras. To confirm that these previously undescribed nominations were not false positives, we experimentally validated 2 interchromosomal and 3 intrachromosomal candidates using qRT-PCR (Fig. S6). Overall, not only was a paired-end approach able to detect gene fusions that have eluded numerous existing technologies, it has revealed 5 previously undescribed mutations in breast cancer.

RNA-Based Chimeras.

Although many of the inter and intrachromosomal rearrangements that we nominated were found within a single sample, we observed many chimeric events shared across samples. We identified 11 chimeric events common to UHR, VCaP, K562, and HBR (Table S3). Via heatmap representation (Fig. 2A) of the normalized frequency of mate pairs supporting each chimeric event, we can observe these events are broadly transcribed in contrast to the top restricted chimeric events. Also, we found that 100% of the broadly expressed chimeras resided adjacent to one another on the genome, whereas only 7.7% of the restricted candidates were neighboring genes. This discrepancy can be explained by the enrichment of inter and intrachromosomal rearrangements in the restricted set.

Fig. 2.
RNA based chimeras. (A) Heatmaps showing the normalized number of reads supporting each read-through chimera across samples ranging from 0 (white) to 30 (red). (Upper) The heatmap highlights broadly expressed chimeras in UHR, HBR, VCaP, and K562. (Lower ...

Unlike, previously characterized restricted read-throughs, such as SLC45A3-ELK4 (15), which are found adjacent to one another, but in the same orientation, we found that the majority of the broadly expressed chimera candidates resided adjacent to one another in different orientations. Therefore, we have categorized these events as (i) read-throughs, adjacent genes in the same orientation, (ii) diverging genes, adjacent genes in opposite orientation whose 5′ ends are in close proximity, (iii) convergent genes, adjacent genes in opposite orientation whose 3′ ends are in close proximity, and (iv) overlapping genes, adjacent genes who share common exons (Fig. 2B). Based on this classification, we found 1 read-through, 2 convergent genes, 6 divergent genes, and 2 overlapping genes. Also, we found that ≈81.8% of these chimeras had at least 1 supporting EST, providing independent confirmation of the event (Table S3). In contrast to paired-end, single read approaches would likely miss these instances as each mate would have aligned to their respective genes based on the current annotations (Fig. 2C). Also, these instances may represent extensions of a transcriptional unit, which would not be detectable by a single read approach that identifies chimeric reads that span exon boundaries of independent genes. Overall, we believe that many of these broadly expressed RNA chimeras represent instances where mate pairs are revealing previously undescribed annotation for a transcriptional unit.

Previously Undescribed ETS Gene Fusions in Clinically Localized Prostate Cancer.

Given the high prevalence of gene fusions involving ETS oncogenic transcription factor family members in prostate tumors, we applied paired-end transcriptome sequencing for gene fusion discovery in prostate tumors lacking previously reported ETS fusions. For 2 prostate tumors, aT52 and aT64, we generated 6.2 and 7.4 million transcriptome mate pairs, respectively. In aT64, we found that HERPUD1, residing on chromosome 16, juxtaposed in front of exon 4 of ERG (Fig. 3A), which was validated by qRT-PCR (Fig. S6) and FISH (Fig. 3B), thus identifying a third 5′ fusion partner for ERG, after TMPRSS2 (6) and SLC45A3 (27), and presumably, HERPUD1 also mediates the overexpression of ERG in a subset of prostate cancer patients. Also, just as TMPRSS2 and SLC45A3 have been shown to be androgen regulated by qRT-PCR (5), we found HERPUD1 expression, via RNA-Seq, to be responsive to androgen treatment (Fig. S7). Also, ChIP-Seq analysis revealed androgen binding at the 5′ end of HERPUD1 (Fig. S7).

Fig. 3.
Discovery of previously undescribed ETS gene fusions in localized prostate cancer. (A) Schematic representation of the interchromosomal gene fusion between exon 1 of HERPUD1 (red), residing on chromosome 16, with exon 4 of ERG (blue), located on chromosome ...

Also, in the second prostate tumor sample (aT52), we discovered an interchromosomal gene fusion between the 5′ end of a prostate cDNA clone, AX747630 (FLJ35294), residing on chromosome 17, with exon 4 of ETV1, located on chromosome 7 (Fig. 3C), which was validated via qRT-PCR (Fig. S6) and FISH (Fig. 3D). Interestingly, this fusion has previously been reported in an independent sample found by a fluorescence in situ hybridization screen (27); thus, demonstrating that it is recurrent in a subset of prostate cancer patients. As previously reported, gene expression via RNA-Seq confirmed that AX747630 is an androgen-inducible gene (Fig. S7). Also, ChIP-Seq revealed androgen occupancy at the 5′ end of AX747630 (Fig. S7).


This study demonstrates the effectiveness of paired-end massively parallel transcriptome sequencing for fusion gene discovery. By using a paired-end approach, we were able to rediscover known gene fusions, comprehensively discover previously undescribed gene fusions, and hone in on causal gene fusions. The ability to detect 12 previously undescribed gene fusions in 4 commonly used cell lines that eluded any previous efforts conveys the superior sensitivity of a paired-end RNA-Seq strategy compared with existing approaches. Also, it suggests that we may be able to unveil previously undescribed chimeric events in previously characterized samples believed to be devoid of any known driver gene fusions as exemplified by the discovery of previously undescribed ETS gene fusions in 2 clinically localized prostate tumor samples that lacked known driver gene fusions.

By analyzing the transcriptome at unprecedented depth, we have revealed numerous gene fusions, demonstrating the prevalence of a relatively under-represented class of mutations. However, one of the major goals remains to discover recurrent gene fusions and to distinguish them from secondary, nonspecific chimeras. Although quantifying expression levels is not proof of whether a gene fusion is a driver or passenger, because a low-level gene fusion could still be causative, it still of major significance that a paired-end strategy clearly distinguished known high-level driving gene fusions, such as BCR-ABL1 and TMPRSS2-ERG, from potential lower level passenger chimeras. Overall, these fusions serve as a model for employing a paired-end nomination strategy for prioritizing leads likely to be high-level driving gene fusions, which would subsequently undergo further functional and experimental evaluation.

One of the major advantages of using a transcriptome approach is that it enables us to identify rearrangements that are not detectable at the DNA level. For example, conventional cytogenetic methods would miss gene fusions produced by paracentric inversions, or sub microscopic events, such as GAS6-RASA3. Also, transcriptome sequencing can unveil RNA chimeras, lacking DNA aberrations, as demonstrated by the discovery of a recurrent, prostate specific, read-through of SLC45A3 with ELK4 in prostate cancers. Further classification of RNA based events using paired-end sequencing revealed numerous broadly expressed chimeras between adjacent genes. Although these events were not necessarily read-throughs events, because they typically had different orientations, we believe they represent extensions of transcriptional units beyond their annotated boundaries. Unlike single read based approaches, which require chimeras to span exon boundaries of independent genes, we were able to detect these events using paired-end sequencing, which could have significant impact for improving how we annotate transcriptional units.

Overall, we have demonstrated the advantages of employing a paired-end transcriptome strategy for chimera discovery, established a methodology for mining chimeras, and extensively catalogued chimeras in a prostate and hematological cancer models. We believe that the sensitivity of this approach will be of broad impact and significance for revealing novel causative gene fusions in various cancers while revealing additional private gene fusions that may contribute to tumorigenesis or cooperate with driver gene fusions.


Paired-End Gene Fusion Discovery Pipeline.

Mate pair transcriptome reads were mapped to the human genome (hg18) and Refseq transcripts, allowing up to 2 mismatches, using Efficient Alignment of Nucleotide Databases (ELAND) pair within the Illumina Genome Analyzer Pipeline software. Illumina export output files were parsed to categorize passing filter mate pairs as (i) mapping to the same transcript, (ii) ribosomal, (iii) mitochondrial, (iv) quality control, (v) chimera candidates, and (vi) nonmapping. Chimera candidates and nonmapping categories were used for gene fusion discovery. For the chimera candidates category, the following criteria were used: (i) mate pairs must be of high mapping quality (best unique match across genome), (ii) best unique mate pairs do not have a more logical alternative combination (i.e., best mate pairs suggest an interchromosomal rearrangement, whereas the second best mapping for a mate reveals the pair have a alignment within the expected insert size), (iii) the sum of the distances between the most 5′ and 3′ mate on both partners of the gene fusion must be <500 nt, and (iv) mate pairs supporting a chimera must be nonredundant.

In addition to mining mate pairs encompassing a fusion boundary, the nonmapping category was mined for mate pairs that had 1 read mapping to a gene, whereas its corresponding read fails to align, because it spans the fusion boundary. First, the annotated transcript that the “mapping” mate pair aligned against was extracted, because this transcript represents one of the potential partners involved in the gene fusion. The “nonmapping” mate pair was then aligned against all of the exon boundaries of the known gene partner to identify a perfect partial alignment. A partial alignment confirms that the nonmapping mate pair maps to our expected gene partner while revealing the portion of the nonmapping mate pair, or overhang, aligning to the unknown partner. The overhang is then aligned against the exon boundaries of all known transcripts to identify the fusion partner. This process is done using a Perl script that extracts all possible University of California Santa Cruz (UCSC) and Refseq exon boundaries looking for a single perfect best hit.

Mate pairs spanning the fusion boundary are merged with mate pairs encompassing the fusion boundary. At least 2 independent mate pairs are required to support a chimera nomination, which can be achieved by (i) 2 or more nonredundant mate pairs spanning the fusion boundary, (ii) 2 or more nonredundant mate pairs encompassing a fusion boundary, or (iii) 1 or more mate pairs encompassing a fusion boundary and 1 or more mate pairs spanning the fusion boundary. All chimera nominations were normalized based on the cumulative number of mate pairs encompassing or spanning the fusion junction per million mate pairs passing filter.

RNA Chimera Analysis.

Chimeras found from UHR, HBR, VCaP, and K562 were grouped based on whether they showed expression in all samples, “broadly expressed,” or a single sample, “restricted expression.” Because UHR is comprised of K562, chimeras found in only these 2 samples were also considered as restricted. Heatmap visualization was conducted by using TIGR's MultiExperiment Viewer (TMeV) version 4.0 (www.tm4.org).

Additional Details.

Additional details can be found in SI Text.

Supplementary Material

Supporting Information:


We thank Lu Zhang, Eric Vermaas, Victor Quijano, and Juying Yan for assistance with sequencing, Shawn Baker and Steffen Durinck for helpful discussions, Rohit Mehra and Javed Siddiqui for collecting tissue samples, and Bo Han and Kalpana Ramnarayanan for technical assistance. C.A.M. was supported by a National Institutes of Health (NIH) Ruth L. Kirschstein postdoctoral training grant, and currently derives support from the American Association of Cancer Research Amgen Fellowship in Clinical/Translational Research and the Canary Foundation and American Cancer Society Early Detection Postdoctoral Fellowship. J.Y. was supported by NIH Grant 1K99CA129565-01A1 and Department of Defense (DOD) Grant PC080665. A.M.C. was supported in part by the NIH (Prostate SPORE P50CA69568, R01 R01CA132874), the DOD (BC075023, W81XWH-08-0110), the Early Detection Research Network (U01 CA111275), a Burroughs Welcome Foundation Award in Clinical Translational Research, a Doris Duke Charitable Foundation Distinguished Clinical Investigator Award, and the Howard Hughes Medical Institute. This work was also supported by National Center for Integrative Biomedical Informatics Grant U54 DA021519.


The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0904720106/DCSupplemental.


1. Futreal PA, et al. A census of human cancer genes. Nat Rev. 2004;4:177–183. [PMC free article] [PubMed]
2. Kumar-Sinha C, Tomlins SA, Chinnaiyan AM. Recurrent gene fusions in prostate cancer. Nat Rev. 2008;8:497–511. [PMC free article] [PubMed]
3. Mitelman F, Johansson B, Mertens F. Fusion genes and rearranged genes as a linear function of chromosome aberrations in cancer. Nat Genet. 2004;36:331–334. [PubMed]
4. Mitelman F, Mertens F, Johansson B. Prevalence estimates of recurrent balanced cytogenetic aberrations and gene fusions in unselected patients with neoplastic disorders. Gene Chromosome Canc. 2005;43:350–366. [PubMed]
5. Tomlins SA, et al. Distinct classes of chromosomal rearrangements create oncogenic ETS gene fusions in prostate cancer. Nature. 2007;448:595–599. [PubMed]
6. Tomlins SA, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005;310:644–648. [PubMed]
7. Soda M, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007;448:561–566. [PubMed]
8. Druker BJ, et al. Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. New Engl J Med. 2006;355:2408–2417. [PubMed]
9. Druker BJ, et al. Effects of a selective inhibitor of the Abl tyrosine kinase on the growth of Bcr-Abl positive cells. Nat Med. 1996;2:561–566. [PubMed]
10. Kantarjian H, et al. Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. New Engl J Med. 2002;346:645–652. [PubMed]
11. Volik S, et al. End-sequence profiling: Sequence-based analysis of aberrant genomes. Proc Natl Acad Sci USA. 2003;100:7696–7701. [PMC free article] [PubMed]
12. Tuzun E, et al. Fine-scale structural variation of the human genome. Nat Genet. 2005;37:727–732. [PubMed]
13. Ruan Y, et al. Fusion transcripts and transcribed retrotransposed loci discovered through comprehensive transcriptome analysis using Paired-End diTags (PETs) Genome Res. 2007;17:828–838. [PMC free article] [PubMed]
14. Campbell PJ, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet. 2008;40:722–729. [PMC free article] [PubMed]
15. Maher CA, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature. 2009;458:97–101. [PMC free article] [PubMed]
16. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. [PMC free article] [PubMed]
17. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. [PubMed]
18. Shtivelman E, Lifshitz B, Gale RP, Canaani E. Fused transcript of abl and bcr genes in chronic myelogenous leukaemia. Nature. 1985;315:550–554. [PubMed]
19. Barlund M, et al. Cloning of BCAS3 (17q23) and BCAS4 (20q13) genes that undergo amplification, overexpression, and fusion in breast cancer. Gene Chromosome Canc. 2002;35:311–317. [PubMed]
20. Hampton OA, et al. A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome. Genome Res. 2009;19:167–177. [PMC free article] [PubMed]
21. Zhao Q, et al. Transcriptome-guided characterization of genomic rearrangements in a breast cancer cell line. Proc Natl Acad Sci USA. 2009;106:1886–1891. [PMC free article] [PubMed]
22. Hahn Y, et al. Finding fusion genes resulting from chromosome rearrangement by analyzing the expressed sequence databases. Proc Natl Acad Sci USA. 2004;101:13257–13261. [PMC free article] [PubMed]
23. Shadeo A, Lam WL. Comprehensive copy number profiles of breast cancer cell model genomes. Breast Cancer Res. 2006;8:R9. [PMC free article] [PubMed]
24. Huang J, et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genom. 2004;1:287–299. [PMC free article] [PubMed]
25. Neve RM, et al. A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006;10:515–527. [PMC free article] [PubMed]
26. Volik S, et al. Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res. 2006;16:394–404. [PMC free article] [PubMed]
27. Han B, et al. A fluorescence in situ hybridization screen for E26 transformation-specific aberrations: Identification of DDX5-ETV4 fusion protein in prostate cancer. Cancer Res. 2008;68:7629–7637. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


  • MedGen
    Related information in MedGen
  • Nucleotide
    Published Nucleotide sequences
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...