Format

Send to

Choose Destination
Bioinformatics. 2013 Nov 15;29(22):2859-68. doi: 10.1093/bioinformatics/btt512. Epub 2013 Aug 31.

PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data.

Author information

1
Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China and Computational Biology and Bioinformatics Program, University of Southern California, Los Angeles, CA 90089, USA.

Abstract

MOTIVATION:

The identification of short insertions and deletions (indels) and single nucleotide polymorphisms (SNPs) from Ion Torrent and 454 reads is a challenging problem, essentially because these techniques are prone to sequence erroneously at homopolymers and can, therefore, raise indels in reads. Most of the existing mapping programs do not model homopolymer errors when aligning reads against the reference. The resulting alignments will then contain various kinds of mismatches and indels that confound the accurate determination of variant loci and alleles.

RESULTS:

To address these challenges, we realign reads against the reference using our previously proposed hidden Markov model that models homopolymer errors and then merges these pairwise alignments into a weighted alignment graph. Based on our weighted alignment graph and hidden Markov model, we develop a method called PyroHMMvar, which can simultaneously detect short indels and SNPs, as demonstrated in human resequencing data. Specifically, by applying our methods to simulated diploid datasets, we demonstrate that PyroHMMvar produces more accurate results than state-of-the-art methods, such as Samtools and GATK, and is less sensitive to mapping parameter settings than the other methods. We also apply PyroHMMvar to analyze one human whole genome resequencing dataset, and the results confirm that PyroHMMvar predicts SNPs and indels accurately.

AVAILABILITY AND IMPLEMENTATION:

Source code freely available at the following URL: https://code.google.com/p/pyrohmmvar/, implemented in C++ and supported on Linux. .

PMID:
23995392
PMCID:
PMC3888126
DOI:
10.1093/bioinformatics/btt512
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center