Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences

Xiaoming Liu; Taylor J Maxwell; Eric Boerwinkle; Yun-Xin Fu

doi:10.1093/molbev/msp059

Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences

Mol Biol Evol. 2009 Jul;26(7):1479-90. doi: 10.1093/molbev/msp059. Epub 2009 Mar 24.

Authors

Xiaoming Liu¹, Taylor J Maxwell, Eric Boerwinkle, Yun-Xin Fu

Affiliation

¹ Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, TX, USA.

Abstract

One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta =4N mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Base Sequence
Computer Simulation
Humans
Mutation*
Polymorphism, Single Nucleotide*
Sequence Analysis, DNA*

Abstract

Publication types

MeSH terms

Grants and funding