Genetics. Sep 2000; 156(1): 439–447.

# Usefulness of single nucleotide polymorphism data for estimating population parameters.

Department of Genetics, University of Washington, Seattle, Washington 98195-7360, USA. mkkuhner@genetics.washington.edu

This article has been

cited by other articles in PMC.

## Abstract

Single nucleotide polymorphism (SNP) data can be used for parameter estimation via maximum likelihood methods as long as the way in which the SNPs were determined is known, so that an appropriate likelihood formula can be constructed. We present such likelihoods for several sampling methods. As a test of these approaches, we consider use of SNPs to estimate the parameter Theta = 4N(e)micro (the scaled product of effective population size and per-site mutation rate), which is related to the branch lengths of the reconstructed genealogy. With infinite amounts of data, ML models using SNP data are expected to produce consistent estimates of Theta. With finite amounts of data the estimates are accurate when Theta is high, but tend to be biased upward when Theta is low. If recombination is present and not allowed for in the analysis, the results are additionally biased upward, but this effect can be removed by incorporating recombination into the analysis. SNPs defined as sites that are polymorphic in the actual sample under consideration (sample SNPs) are somewhat more accurate for estimation of Theta than SNPs defined by their polymorphism in a panel chosen from the same population (panel SNPs). Misrepresenting panel SNPs as sample SNPs leads to large errors in the maximum likelihood estimate of Theta. Researchers collecting SNPs should collect and preserve information about the method of ascertainment so that the data can be accurately analyzed.

## Full Text

The Full Text of this article is available as a

PDF (151K).

## Selected References

###### These references are in PubMed. This may not be the complete list of references from this article.

- Ewens WJ, Spielman RS, Harris H. Estimation of genetic variation at the DNA level from restriction endonuclease data. Proc Natl Acad Sci U S A. 1981 Jun;78(6):3748–3750. [PMC free article] [PubMed]
- Beerli P, Felsenstein J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics. 1999 Jun;152(2):763–773. [PMC free article] [PubMed]
- Chang JT. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci. 1996 Oct 1;137(1):51–73. [PubMed]
- Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–376. [PubMed]
- Griffiths RC, Tavaré S. Sampling theory for neutral alleles in a varying environment. Philos Trans R Soc Lond B Biol Sci. 1994 Jun 29;344(1310):403–410. [PubMed]
- Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol. 1980 Dec;16(2):111–120. [PubMed]
- Kuhner MK, Yamato J, Felsenstein J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998 May;149(1):429–434. [PMC free article] [PubMed]
- Syvånen AC, Landegren U, Isaksson A, Gyllensten U, Brookes A. First International SNP Meeting at Skokloster, Sweden, August 1998. Enthusiasm mixed with scepticism about single-nucleotide polymorphism markers for dissecting complex disorders. Eur J Hum Genet. 1999 Jan;7(1):98–101. [PubMed]
- Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975 Apr;7(2):256–276. [PubMed]
- Watterson GA. Heterosis or neutrality? Genetics. 1977 Apr;85(4):789–814. [PMC free article] [PubMed]

Articles from Genetics are provided here courtesy of **Genetics Society of America**