Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm

Matti Pirinen; Sangita Kulathinal; Dario Gasbarra; Mikko J Sillanpää

doi:10.1017/S0016672308009877

Estimating population haplotype frequencies from pooled DNA samples using PHASE algorithm

Genet Res (Camb). 2008 Dec;90(6):509-24. doi: 10.1017/S0016672308009877.

Authors

Matti Pirinen¹, Sangita Kulathinal, Dario Gasbarra, Mikko J Sillanpää

Affiliation

¹ Department of Mathematics and Statistics, University of Helsinki, PO Box 68, FIN-00014 University of Helsinki, Finland. matti.pirinen@helsinki.fi

PMID: 19123969
DOI: 10.1017/S0016672308009877

Abstract

Recent studies show that the PHASE algorithm is a state-of-the-art method for population-based haplotyping from individually genotyped data. We present a modified version of PHASE for estimating population haplotype frequencies from pooled DNA data. The algorithm is compared with (i) a maximum likelihood estimation under the multinomial model and (ii) a deterministic greedy algorithm, on both simulated and real data sets (HapMap data). Our results suggest that the PHASE algorithm is a method of choice also on pooled DNA data. The main reason for improvement over the other approaches is assumed to be the same as with individually genotyped data: the biologically motivated model of PHASE takes into account correlated genealogical histories of the haplotypes by modelling mutations and recombinations. The important questions of efficiency of DNA pooling as well as influence of the pool size on the accuracy of the estimates are also considered. Our results are in line with the earlier findings in that the pool size should be relatively small, only 2-5 individuals in our examples, in order to provide reliable estimates of population haplotype frequencies.

Publication types

Comparative Study
Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Alleles
DNA / genetics*
DNA / isolation & purification
Databases, Nucleic Acid
Diploidy
Gene Frequency*
Genetics, Population
Genotype
Haplotypes*
Humans
Likelihood Functions
Markov Chains
Models, Genetic*
Monte Carlo Method

Substances

DNA