Complex Variant Discovery Using Discordant Cluster Normalization

J Comput Biol. 2021 Feb;28(2):185-194. doi: 10.1089/cmb.2020.0249. Epub 2020 Aug 12.

Abstract

Complex genomic structural variants (CGSVs) are abnormalities that present with three or more breakpoints, making their discovery a challenge. The majority of existing algorithms for structural variant detection are only designed to find simple structural variants (SSVs) such as deletions and inversions; they fail to find more complex events such as deletion-inversions or deletion-duplications, for example. In this study, we present an algorithm named CleanBreak that employs a clique partitioning graph-based strategy to identify collections of SSV clusters and then subsequently identifies overlapping SSV clusters to examine the search space of possible CGSVs, choosing the one that is most concordant with local read depth. We evaluated CleanBreak's performance on whole genome simulated data and a real data set from the 1000 Genomes Project. We also compared CleanBreak with another algorithm for CGSV discovery. The results demonstrate CleanBreak's utility as an effective method to discover CGSVs.

Keywords: clique partition; complex genomic structural variation; genome rearrangement; structural variation.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Genome, Human
  • Genomic Structural Variation*
  • Humans