Reference-free prediction of rearrangement breakpoint reads

Bioinformatics. 2014 Sep 15;30(18):2559-67. doi: 10.1093/bioinformatics/btu360. Epub 2014 May 29.

Abstract

Motivation: Chromosome rearrangement events are triggered by atypical breaking and rejoining of DNA molecules, which are observed in many cancer-related diseases. The detection of rearrangement is typically done by using short reads generated by next-generation sequencing (NGS) and combining the reads with knowledge of a reference genome. Because structural variations and genomes differ from one person to another, intermediate comparison via a reference genome may lead to loss of information.

Results: In this article, we propose a reference-free method for detecting clusters of breakpoints from the chromosomal rearrangements. This is done by directly comparing a set of NGS normal reads with another set that may be rearranged. Our method SlideSort-BPR (breakpoint reads) is based on a fast algorithm for all-against-all comparisons of short reads and theoretical analyses of the number of neighboring reads. When applied to a dataset with a sequencing depth of 100×, it finds ∼ 88% of the breakpoints correctly with no false-positive reads. Moreover, evaluation on a real prostate cancer dataset shows that the proposed method predicts more fusion transcripts correctly than previous approaches, and yet produces fewer false-positive reads. To our knowledge, this is the first method to detect breakpoint reads without using a reference genome.

Availability and implementation: The source code of SlideSort-BPR can be freely downloaded from https://code.google.com/p/slidesort-bpr/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Chromosome Breakpoints*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Male
  • Prostatic Neoplasms / genetics
  • Prostatic Neoplasms / pathology
  • Sequence Analysis, DNA
  • Software
  • Translocation, Genetic / genetics*