• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Sep 2001; 159(1): 401–411.
PMCID: PMC1461805

Mutations as missing data: inferences on the ages and distributions of nonsynonymous and synonymous mutations.

Abstract

This article describes a new Markov chain Monte Carlo (MCMC) method applicable to DNA sequence data, which treats mutations in the genealogy as missing data. The method facilitates inferences regarding the age and identity of specific mutations while taking the full complexities of the mutational process in DNA sequences into account. We demonstrate the utility of the method in three applications. First, we demonstrate how the method can be used to make inferences regarding population genetical parameters such as theta (the effective population size times the mutation rate). Second, we show how the method can be used to estimate the ages of mutations in finite sites models and for making inferences regarding the distribution and ages of nonsynonymous and synonymous mutations. The method is applied to two previously published data sets and we demonstrate that in one of the data sets the average age of nonsynonymous mutations is significantly lower than the average age of synonymous mutations, suggesting the presence of slightly deleterious mutations. Third, we demonstrate how the method in general can be used to evaluate the posterior distribution of a function of a mapping of mutations on a gene genealogy. This application is useful for evaluating the uncertainty associated with methods that rely on mapping mutations on a phylogeny or a gene genealogy.

Full Text

The Full Text of this article is available as a PDF (132K).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.
  • Fitch WM, Bush RM, Bender CA, Cox NJ. Long term trends in the evolution of H(3) HA1 human influenza type A. Proc Natl Acad Sci U S A. 1997 Jul 22;94(15):7712–7718. [PMC free article] [PubMed]
  • Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994 Sep;11(5):725–736. [PubMed]
  • Huelsenbeck JP, Rannala B, Larget B. A Bayesian framework for the analysis of cospeciation. Evolution. 2000 Apr;54(2):352–364. [PubMed]
  • Kuhner MK, Yamato J, Felsenstein J. Estimating effective population size and mutation rate from sequence data using Metropolis-Hastings sampling. Genetics. 1995 Aug;140(4):1421–1430. [PMC free article] [PubMed]
  • Kuhner MK, Yamato J, Felsenstein J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics. 1998 May;149(1):429–434. [PMC free article] [PubMed]
  • Lara MC, Patton JL, da Silva MN. The simultaneous diversification of South American echimyid rodents (Hystricognathi) based on complete cytochrome b sequences. Mol Phylogenet Evol. 1996 Apr;5(2):403–413. [PubMed]
  • Markovtsova L, Marjoram P, Tavaré S. The age of a unique event polymorphism. Genetics. 2000 Sep;156(1):401–409. [PMC free article] [PubMed]
  • Bahlo M, Griffiths RC. Inference from gene trees in a subdivided population. Theor Popul Biol. 2000 Mar;57(2):79–95. [PubMed]
  • Mau B, Newton MA, Larget B. Bayesian phylogenetic inference via Markov chain Monte Carlo methods. Biometrics. 1999 Mar;55(1):1–12. [PubMed]
  • Beerli P, Felsenstein J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics. 1999 Jun;152(2):763–773. [PMC free article] [PubMed]
  • Bush RM, Fitch WM, Bender CA, Cox NJ. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol Biol Evol. 1999 Nov;16(11):1457–1465. [PubMed]
  • Muse SV, Gaut BS. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 1994 Sep;11(5):715–724. [PubMed]
  • da Silva MN, Patton JL. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha). Mol Phylogenet Evol. 1993 Sep;2(3):243–255. [PubMed]
  • Nielsen R. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics. 2000 Feb;154(2):931–942. [PMC free article] [PubMed]
  • Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–376. [PubMed]
  • Nielsen R, Weinreich DM. The age of nonsynonymous and synonymous mutations in animal mtDNA and implications for the mildly deleterious theory. Genetics. 1999 Sep;153(1):497–506. [PMC free article] [PubMed]
  • Felsenstein J. Estimating effective population size from samples of sequences: a bootstrap Monte Carlo integration method. Genet Res. 1992 Dec;60(3):209–220. [PubMed]
  • Rannala B, Yang Z. Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol. 1996 Sep;43(3):304–311. [PubMed]
  • Slatkin M, Rannala B. Estimating the age of alleles by use of intraallelic variability. Am J Hum Genet. 1997 Feb;60(2):447–458. [PMC free article] [PubMed]
  • Wakeley J. Substitution rate variation among sites in hypervariable region 1 of human mitochondrial DNA. J Mol Evol. 1993 Dec;37(6):613–623. [PubMed]
  • Wakeley J. Substitution-rate variation among sites and the estimation of transition bias. Mol Biol Evol. 1994 May;11(3):436–442. [PubMed]
  • Ward RH, Frazier BL, Dew-Jager K, Päbo S. Extensive mitochondrial diversity within a single Amerindian tribe. Proc Natl Acad Sci U S A. 1991 Oct 1;88(19):8720–8724. [PMC free article] [PubMed]
  • Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975 Apr;7(2):256–276. [PubMed]
  • Wilson IJ, Balding DJ. Genealogical inference from microsatellite data. Genetics. 1998 Sep;150(1):499–510. [PMC free article] [PubMed]
  • Yang Z. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol. 1993 Nov;10(6):1396–1401. [PubMed]
  • Yang Z. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1998 May;15(5):568–573. [PubMed]
  • Yang Z, Nielsen R. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol. 1998 Apr;46(4):409–418. [PubMed]
  • Yang Z, Rannala B. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. Mol Biol Evol. 1997 Jul;14(7):717–724. [PubMed]
  • Yang Z, Nielsen R, Goldman N, Pedersen AM. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000 May;155(1):431–449. [PMC free article] [PubMed]

Articles from Genetics are provided here courtesy of Genetics Society of America

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...