Format

Send to

Choose Destination
Bioinformatics. 2015 Feb 15;31(4):515-22. doi: 10.1093/bioinformatics/btu670. Epub 2014 Oct 9.

Accurate estimation of haplotype frequency from pooled sequencing data and cost-effective identification of rare haplotype carriers by overlapping pool sequencing.

Author information

1
State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.

Abstract

MOTIVATION:

A variety of hypotheses have been proposed for finding the missing heritability of complex diseases in genome-wide association studies. Studies have focused on the value of haplotype to improve the power of detecting associations with disease. To facilitate haplotype-based association analysis, it is necessary to accurately estimate haplotype frequencies of pooled samples.

RESULTS:

Taking advantage of databases that contain prior haplotypes, we present Ehapp based on the algorithm for solving the system of linear equations to estimate the frequencies of haplotypes from pooled sequencing data. Effects of various factors in sequencing on the performance are evaluated using simulated data. Our method could estimate the frequencies of haplotypes with only about 3% average relative difference for pooled sequencing of the mixture of 10 haplotypes with total coverage of 50×. When unknown haplotypes exist, our method maintains excellent performance for haplotypes with actual frequencies >0.05. Comparisons with present method on simulated data in conjunction with publicly available Illumina sequencing data indicate that our method is state of the art for many sequencing study designs. We also demonstrate the feasibility of applying overlapping pool sequencing to identify rare haplotype carriers cost-effectively.

AVAILABILITY AND IMPLEMENTATION:

Ehapp (in Perl) for the Linux platforms is available online (http://bioinfo.seu.edu.cn/Ehapp/).

CONTACT:

xsun@seu.edu.cn

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
25304780
DOI:
10.1093/bioinformatics/btu670
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center