HIVE-hexagon: high-performance, parallelized sequence alignment for next-generation sequencing data analysis

PLoS One. 2014 Jun 11;9(6):e99033. doi: 10.1371/journal.pone.0099033. eCollection 2014.

Abstract

Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations.

Availability: https://hive.biochemistry.gwu.edu/hive/

Publication types

  • Research Support, N.I.H., Intramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Genome
  • Sequence Alignment*
  • Sequence Analysis, DNA / methods*