Format

Send to

Choose Destination
J Comput Biol. 2014 Mar;21(3):219-33. doi: 10.1089/cmb.2013.0129. Epub 2014 Feb 19.

Detection of structural variants involving repetitive regions in the reference genome.

Author information

1
1 School of Informatics and Computing, Indiana University , Bloomington, Indiana.

Abstract

Next-generation sequencing techniques are now commonly used to characterize structural variations (SVs) in population genomics and elucidate their associations with phenotypes. Many of the computational tools developed for detecting structural variations work by mapping paired-end reads to a reference genome and identifying the discordant read-pairs whose mapped loci in the reference genome deviate from the expected insert size and orientation. However, repetitive regions in the reference genome represent a major challenge in SV detection, because the paired-end reads from these regions may be mapped to multiple loci in the reference genome, resulting in spuriously discordant read-pairs. To address this issue, we have developed an algorithmic approach for read mapping and SV detection based on the framework of A-Bruijn graphs. Instead of mapping reads to a linear sequence of the reference genome, we propose to map reads onto the A-Bruijn graph constructed from the reference genome in which all instances of the same repeat are collapsed into a single edge. As a result, any given read, either from repetitive regions or not, will be mapped to a unique location in the A-Bruijn graph, and each discordant read-pair in the A-Bruijn graph indicates a potentially true SV event. We also developed a simple clustering algorithm to derive valid clusters of these discordant read-pairs, each supporting a different SV event. Finally, we demonstrate the performance of this approach, compared to existing approaches, by identifying transposition events of insertion sequence (IS) elements, a class of simple mobile genetic elements (MGEs), in E. coli by using simulated and real paired-end sequence data acquired from E. coli mutation accumulation lines.

PMID:
24552580
DOI:
10.1089/cmb.2013.0129
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center