Send to

Choose Destination
See comment in PubMed Commons below
Mol Biol Evol. 2009 Jul;26(7):1479-90. doi: 10.1093/molbev/msp059. Epub 2009 Mar 24.

Inferring population mutation rate and sequencing error rate using the SNP frequency spectrum in a sample of DNA sequences.

Author information

Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, TX, USA.


One challenge of analyzing samples of DNA sequences is to account for the nonnegligible polymorphisms produced by error when the sequencing error rate is high or the sample size is large. Specifically, those artificial sequence variations will bias the observed single nucleotide polymorphism (SNP) frequency spectrum, which in turn may further bias the estimators of the population mutation rate theta =4N mu for diploids. In this paper, we propose a new approach based on the generalized least squares (GLS) method to estimate theta, given a SNP frequency spectrum in a random sample of DNA sequences from a population. With this approach, error rate epsilon can be either known or unknown. In the latter case, epsilon can be estimated given an estimation of theta. Using coalescent simulation, we compared our estimators with other estimators of theta. The results showed that the GLS estimators are more efficient than other theta estimators with error, and the estimation of epsilon is usable in practice when the theta per bp is small. We demonstrate the application of the estimators with 10-kb noncoding region sequence sampled from a human population and provide suggestions for choosing theta estimators with error.

[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems Icon for PubMed Central
    Loading ...
    Support Center