Format

Send to

Choose Destination
Genome Res. 2016 Jan;26(1):108-18. doi: 10.1101/gr.186114.114. Epub 2015 Nov 10.

INTEGRATE: gene fusion discovery using whole genome and transcriptome data.

Author information

1
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
2
Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
3
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
4
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA;
5
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Internal Medicine, Division of Oncology, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri 63110, USA; Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, Missouri 63110, USA.

Abstract

While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.

PMID:
26556708
PMCID:
PMC4691743
DOI:
10.1101/gr.186114.114
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center