# Estimating the distribution of fitness effects from DNA sequence data: Implications for the molecular clock

^{*}To whom correspondence should be addressed. E-mail: ku.ca.xessus@reklaw-erye.c.a.

## Abstract

We present a method for estimating the distribution of fitness effects of new amino acid mutations when those mutations can be assumed to be slightly advantageous, slightly deleterious, or strongly deleterious. We apply the method to mitochondrial data from several different species. In the majority of the data sets, the shape of the distribution is approximately exponential. Our results provide an estimate of the distribution of fitness effects of weakly selected mutations and provide a possible explanation for why the molecular clock is fairly constant across taxa and time.

What proportion of mutations are deleterious, neutral, and advantageous? What is the strength of selection that acts on nonneutral mutations? In short, what is the distribution of fitness effects of new mutations? This is one of the most fundamental problems in evolutionary biology, because it lies at the heart of several important questions. It is the question that has been debated for >30 years in the neutralist-selectionist debate (1, 2), but it is also central to our understanding of the molecular clock (3, 4) and the maintenance of genetic variation, at both the molecular and phenotypic levels (5, 6).

The distribution of fitness effects is central to our understanding of the
molecular clock because certain distributions can stabilize the clock
(7). Although there are
exceptions, the molecular clock is remarkably constant over long periods of
time, particularly for amino acid substitutions
(8,
9). Under the neutral theory of
molecular evolution, the rate of substitution per year is equal to
*uf,* where *u* is the nucleotide mutation rate per year and
*f* is the proportion of mutations that are neutral
(2). However, there is no
reason why the mutation rate per year should be constant across taxa; in fact,
there is some evidence that suggests that the mutation rate is higher in
organisms with short generation times
(10-12).
Ohta and Kimura (13) suggested
a solution to this problem. They suggested that there might be a continuum of
allelic effects, from very deleterious through slightly deleterious to neutral
mutations, rather than the two categories of mutations, deleterious and
neutral, proposed under the original neutral theory. Because deleterious
mutations with effects less than 1/*N*_{e} are
effectively neutral, the proportion of mutations of effectively neutral
mutations, *f*, is lower in large populations. Thus, the rate of
molecular evolution might be constant if species with short generation times,
and hence fast mutation rates, tended to have large population sizes, and
therefore low numbers of effectively neutral mutations, i.e., *f* and
*u* might be negatively correlated. Ohta
(4) showed theoretically that
this was indeed the case; she showed that if the distribution of fitness
effects was exponential and the mutation rate was proportional to the
effective population size, then the two factors exactly cancelled each other
out to yield a constant rate of molecular evolution. Kimura
(3) later showed that if the
distribution of fitness effects was gamma distributed with a shape parameter
of 1/2, then the two cancelled each other out if the mutation rate was
proportional to the square root of the effective population size.

Unfortunately, we know relatively little about the distribution of fitness effects, despite its importance. Analysis of mutation accumulation experiments suggest that the distribution of fitness effects for deleterious mutations is highly leptokurtic, with a few mutations having large fitness effects, and the vast majority having mild effects (14, 15). However, these analyses have very little power to tell us about the precise shape of the distribution of fitness effects; in Keightley's analyses, the mutation rate and the shape of the distribution of fitness effects are confounded with one another. The situation is a little better for advantageous mutations. Theoretical work suggests that the distribution of fitness effects is likely to be exponential (16, 17), and recent work with experimental populations of bacteria has confirmed that the distribution is leptokurtic, with the majority of adaptive mutations having small effects (18).

Analyses of DNA sequence data have also shed some light on the distribution
of fitness effects of new mutations. It is evident from the highly conserved
nature of most protein-coding sequences that most amino acid mutations are
strongly deleterious. It has been estimated that ≈70% of all amino acid
mutations have a deleterious effect of >2 × 10^{-5}
(19). It has also become
apparent that there is a class of slightly deleterious mutations, mutations
that are sufficiently weakly selected that they can contribute to polymorphism
and occasionally become fixed. The evidence for this category of mutations is
threefold. First, the ratio of the nonsynonymous to the synonymous
substitution rate is higher in species with smaller effective population
sizes. This has been shown in mammals
(12,
19,
20), *Drosophila*
(19,
21), and birds
(22); it is thought that a
greater proportion of slightly deleterious amino acid mutations are fixed in
the species with the smaller effective population size. Second, nonsynonymous
polymorphisms segregate at lower frequencies than synonymous polymorphisms in
some species. This has been shown in *Drosophila*
(23,
24) and humans
(25), and is thought to be
caused by the segregation of slightly deleterious nonsynonymous mutations.
Finally, it has been shown in several data sets that the ratio of the number
of nonsynonymous (*P*_{n}) to synonymous
(*P*_{s}) polymorphisms is greater than the ratio of
the number of nonsynonymous (*D*_{n}) to synonymous
(*D*_{s}) substitutions. This pattern is commonly seen
in data sets where recombination is rare, including mitochondrial DNA
(26,
27), the self-fertilizing
plant *Arabidopsis thaliana*
(28), and *Escherichia
coli* (N. Smith and A.E.-W., unpublished results). If all mutations were
either strongly deleterious or neutral,
*P*_{n}/*P*_{s} would equal
*D*_{n}/*D*_{s}; this is the
basis of the McDonald-Kreitman (MK) test of neutral molecular evolution
(29). However, if there is a
class of slightly deleterious mutations, they tend to contribute to
polymorphism, but rarely become fixed; they therefore lead to an excess of
nonsynonymous polymorphism. Fay *et al.*
(30) recently estimated that
at least 20% of nonsynonymous mutations in humans are slightly
deleterious.

Work has also started to elucidate the role of adaptive evolution at the
DNA sequence level. Several studies have recently estimated that a substantial
fraction of the amino acid substitutions in higher primates
(30) and *Drosophila*
(24,
31,
32) are a consequence of
adaptive evolution rather than random genetic drift. However, inferring the
number of advantageous mutations is difficult because the number of
substitutions is a function of both the mutation rate to advantageous
mutations and the strength of selection favoring them. We do not currently
have independent estimates of either of these quantities.

Recently, Nielsen and Yang (33) have estimated the distribution of fitness effects by using DNA sequence data by considering the variation in the rate of substitution between different sites within a gene. They fit a number of distributions to primate mitochondrial DNA data and found some power to differentiate between models. The best fitting models were a normal and a gamma distribution. Both of these distributions fit the data significantly better than an exponential distribution.

Here we introduce a method, based on the MK test, to estimate the distribution of fitness effects from DNA sequence data. The method is suitable for estimating the distribution of fitness effects when there are no strongly advantageous mutations, i.e., all mutations are weakly selected or strongly deleterious.

## Materials and Methods

**The Method.** The method is based on the MK test. In the MK test, we
typically have a number of sequences of a gene from within a species and a
single sequence from a different species. With data of this form, we can count
the number of synonymous (*P*_{s}) and nonsynonymous
(*P*_{n}) polymorphisms, and estimate the number of
synonymous (*D*_{s}) and nonsynonymous
(*D*_{n}) substitutions that have occurred between the
two species. For our method, we also need estimates of the proportion of sites
that are synonymous (ρ_{S}) and nonsynonymous
(ρ_{n}). We deal with the practical aspects of how we
estimate *D*_{s}*, D*_{n},
ρ_{S}, and ρ_{n} later.

Let us assume that synonymous mutations are neutral and that the
distribution of fitness effects of nonsynonymous mutations follows some
distribution *Z*(*S*). Under this model, assuming a standard
Fisher-Wright model of evolution and free recombination we expect to observe

synonymous polymorphisms in a sample of *n* sequences of length
*L* nucleotides (34),
where θ = *4N*_{e}*u* for a diploid or
*2N*_{e}*u* for a haploid, and *u* is
the nucleotide mutation rate per generation. The number of nonsynonymous
polymorphisms we expect to observe is

where

and

*H*(*S, x*) is the time a semidominant mutation with a selective
advantage of *S* spends between *x* and *x* + *dx*
(35,
36).

The expected number of synonymous substitutions is

where λ = 2*ut* is the time of divergence between the two
species under consideration. The expected number of nonsynonymous
substitutions is

where

*F*(*S*) is 2*N*_{e} (or
*N*_{e} for a haploid) times the fixation probability
of a semidominant mutation with selective advantage *S*
(2). Note that we implicitly
assume here, and in the actual implementation of this method, that the time of
divergence is much greater than the age of polymorphisms being considered, and
that we can therefore ignore any contribution polymorphism makes to the
apparent divergence between the two species.

Because we have four observations, we can potentially estimate four parameters; we need to estimate θ and λ, but this then leaves us with two degrees of freedom to estimate two parameters that describe the distribution of fitness effects of nonsynonymous mutations.

**Distribution of Fitness Effects.** We have chosen to model the
distribution of fitness effects in four ways. In the first model (model 1), we
assume that all nonsynonymous mutations are equally deleterious with a
selective disadvantage of *S*. In the second model (model 2), we assume
that all nonsynonymous mutations are deleterious, but that they are gamma
distributed:

The gamma distribution provides us with considerable flexibility; the distribution can take a number of shapes, which allows the relative proportions of mutations that are effectively neutral, slightly deleterious (or advantageous), and strongly deleterious to vary independently of each other. For example, if β << 1, then most mutations are either neutral or strongly deleterious, the relative proportions being dictated by the value of α; if β ≈ 1, then a substantial proportion of mutations are neutral, slightly deleterious, and strongly deleterious; and if β >> 1, then most mutations fit into one particular category. Bimodal distributions cannot be modeled by using the gamma distribution.

However, both model 1 and 2 are unrealistic because they assume that all
mutations are deleterious. It seems likely, particularly for weakly selected
mutations, that each slightly deleterious mutation is matched by a slightly
advantageous mutation; for example, if a T mutation occurs at a site that is
fixed for C, and has an disadvantage of -*S*, then a C mutation at the
same site, when it is fixed for T, will have a selective advantage of
+*S*. This is the model used to describe the evolution of synonymous
codon use (for example, see refs.
37 and
38). If we have a site at
which allele A1 has an advantage of +*S* over allele A2 and the
mutation rate is the same between the two alleles, then the time the site will
be fixed for A1 is

(37,
38). This leads to two new
models. If we assume that all pairs of alleles have the same absolute strength
of selection, the realized distribution of fitness effects will be as follows:
a proportion *X*(*S*) of the mutations will be selectively
disadvantageous with selection -*S*, and a proportion
[1-*X*(*S*)] will be selectively advantageous with selection
+*S* (model 1a). For the gamma distribution, the distribution becomes

(model 2a). We might refer to this as a partially reflected gamma (PRG)
distribution, because part of the distribution is reflected around the
*y* axis. Examples of PRG distributions are given in
Fig. 1.

*S*, the strength of selection. The curves in descending order of leptokurtosis are for shape parameters of 0.5, 1, and 2. Each distribution

**...**

**Parameter Estimation.** To estimate the parameters of our models, we
assumed that *P*_{n}, *P*_{s},
*D*_{n}, and *D*_{s} are
independent Poisson distributed variables, so the likelihood of the data given
the parameters of the model is

where

In reality, *P*_{n}, *P*_{s},
*D*_{n}, and *D*_{s} are
neither independently nor Poisson distributed because recombination is not
free in the data sets we have considered, and we have corrected for multiple
hits in the divergence data. For models 2 and 2a, there was generally a set of
parameters that gives a perfect fit of the model to the data because there are
four parameters and four observations. Although Eqs. **2** and **4**
should be integrated between -∞ and +∞, this was not necessary; it
turned out to be adequate to integrate the functions between -1,000 and 1,000.
To find the maximum likelihood or point estimates, we followed the slope of
steepest ascent, as implemented in the mathematica routine
findminimum. mathematica routines to perform the
analyses are available on request.

For a number of data sets, we estimated the confidence intervals for our maximum likelihood (ML) estimates by performing a random walk of 4,000 steps around the ML parameter estimates by using the Metropolis-Hastings algorithm (39). The confidence intervals estimated by this method are underestimates because free recombination is assumed. Graphical analysis showed that 4,000 steps was sufficient to estimate the confidence intervals.

**Data.** We have applied our method to mitochondrial data from several
species. To compile the data, we considered each of the data sets given in the
compilations of Nachman (26),
Rand and Kann (40), and Gerber
*et al.* (27). If
several data sets shared sequences in common, we randomly selected a data set
so that data sets were independent. To these data sets we added a data set of
complete human mitochondrial sequences with chimpanzee used as the outgroup
(41). For each of these data
sets, we took a single sequence from each of the two species being considered
and calculated the numbers of synonymous and nonsynonymous substitutions by
using the fcodon model of Goldman and Yang
(42) as implemented in
paml (43). We
excluded any data set in which there were more than two synonymous
substitutions per site (full details of all data sets analyzed can be found in
Table 5, which is published as supporting information on the PNAS web site,
www.pnas.org).
This left us with 18 of a total of 26 data sets. We used the polymorphism
counts given by Nachman (26),
Rand and Kann (40), and Gerber
*et al.* (27), unless
they combined polymorphism counts from different species, in which case we
selected the species with the greatest number of sequences and calculated the
number of polymorphisms by using dnasp
(44). To estimate the
proportion of sites that are nonsynonymous and synonymous, we used the
estimates from the Goldman-Yang method; the Goldman-Yang method estimates of
the proportion of sites as the proportion of mutations that are nonsynonymous
and synonymous, and are therefore appropriate for our application
(42). The data are summarized
in Table 1.

## Results

Although our method for estimating the distribution of fitness effects is seemingly quite general, it can in practice only be applied to data sets in which there are few strongly advantageous mutations. This is because advantageous mutations decouple polymorphism and substitution: if the advantageous mutations are under directional selection, they contribute little to polymorphism, and if they are under balancing selection, they contribute little to divergence. We have therefore applied the method to data sets in which the data appear to be dominated by deleterious mutations, namely, data sets in which an MK test shows an excess of amino acid polymorphism (Table 1). Such a pattern is most readily interpreted as being caused by the segregation of slightly deleterious mutations in a gene that has undergone little adaptive substitution.

It has previously been reported that many mitochondrial DNA data sets show
an excess of amino acid polymorphism in an MK test. We compiled data from 18
pairs of species, which are summarized in
Table 1. As in previous
analyses, we find that the vast majority of data sets show an excess of amino
acid polymorphism (16 of 18, *P* < 0.01); this is also true if we
analyze those data sets that were excluded because their level of synonymous
substitution was too high (8 of 8 data sets *P* < 0.01; see Table
5). The proportion of data sets showing an excess of amino acid polymorphism
is somewhat higher than others have found, because we have corrected for
multiple substitutions.

To begin our analysis, we fit a simple model in which we assumed that all
mutations were equally deleterious (model 1). Estimates of the average
strength of selection are given in Table
2. Our estimates are slightly different to those given by Nachman
(26) because he did not
correct the divergence for multiple hits. Interestingly, the fit of the model
is often poor and can be rejected in a goodness-of-fit test in 12 of the 18
data sets. However, it should be noted that the goodness-of-fit test is only
approximate because *P*_{n},
*P*_{s}, *D*_{n}, and
*D*_{s} are not multinomially distributed as assumed
under the test (see above).

The poor fit of the model could be because we have assumed that all mutations are deleterious, because it seems likely that if some mutations are slightly deleterious, then other mutations will be slightly advantageous. However, the fit of model 1a is no better than model 1 (Table 2).

A more likely reason for the poor fit of model 1 is that there is variation
in the strength of selection between mutations, with some mutations being very
deleterious, whilst others are only mildly deleterious or neutral. We
therefore fit a model in which all mutations were assumed to be deleterious,
but in which the strength of selection was assumed to be gamma distributed
(model 2). The model fits the data perfectly in all but two data sets, but
this is not surprising because we have four parameters for the four
observations (Table 3). The
model does not fit the data from *Pomatostomus* (PT) and *Gila*
(GC) because these data sets do not show an excess of amino acid polymorphism;
the best fitting model appears to be one in which the α is infinitely
large and β is infinitely small.

The shape of the gamma distribution, as measured by the parameter β,
is quite consistent across data sets: the value lies between 0 and 4.7, with
the majority of data sets being between 0.2 and 1.0; the mean of β is
0.93 (SE = 0.26). In contrast to the shape, the location of the distribution,
as measured by either α or the mean strength of selection, varies by
several orders of magnitude between data sets: e.g.,
*$\stackrel{\u0304}{S}$* varies from -15 to -920,000. The
values of α and β tend to be very similar under model 2a, when some
of the mutations are allowed to be slightly advantageous
(Table 3). For all data sets
α and β are greater than under model 2, with average strength of
selection being somewhat lower, as we would expect given that some of the
mutations are advantageous.

Unfortunately, because we have corrected the number of substitutions for
multiple hits, and because there is little or no recombination in
mitochondrial DNA, we cannot estimate confidence intervals for the parameters
or test whether is significant variation in α or β (or
*$\stackrel{\u0304}{S}$*) between data sets. However, we can
estimate the minimum confidence interval by assuming that
*D*_{n}, *D*_{s},
*P*_{n}, and *P*_{s} are
poisson distributed, i.e., we assume that there is free recombination in our
data sets and we have not corrected for multiple hits. If we do this, we find
the confidence intervals for β to be generally quite small, and those for
α and *$\stackrel{\u0304}{S}$* to be very large. For
example, for model 2 in humans β = 0.39 with confidence intervals of 0.36
and 0.50, α = 0.00027 (0.00015, 0.0012), and
*$\stackrel{\u0304}{S}$* = -1,400 (-420, -2,400), and in
*Drosophila melanogaster* β = 0.58 (0.48, 2.0), α = 0.00035
(0.000097, 0.081), and *$\stackrel{\u0304}{S}$* = -1,700
(-25, -5,000).

## Discussion

We have developed a method to estimate the distribution of fitness effects from a combination of polymorphism and divergence data. The method can be applied to any data set in which one category of mutations are neutral and the other category are either weakly advantageous, neutral or deleterious. The method cannot be applied to data sets in which there many are strongly advantageous mutations. We have applied our method to a range of mitochondrial data sets in which the ratio of nonsynonymous to synonymous changes is greater for polymorphism than substitution. This pattern appears to be remarkably consistent across mitochondrial data sets and is consistent with a low rate of adaptive amino acid substitution and the segregation of slightly deleterious mutations.

We have shown that many data sets do not appear to fit simple models in which all mutations are equally deleterious, or in which the absolute strength of selection is the same for all mutations, but some mutations are advantageous and some are deleterious. This is perhaps not surprising, because there is ample evidence from studies of mutations with measurable phenotypic effects that mutations vary considerably in their effects on fitness (45).

A model in which the strength of selection varies according to a gamma
distribution fits all but two of the data sets perfectly, the data sets that
do not fit the model perfectly are those that do not show an excess of amino
acid polymorphism. The shape of the gamma distribution varies relatively
little between data sets; β varies between 0 and 4.7 for model 2, and
between 0 and 5.3 for model 2a. The mean shape parameters are 0.93 (0.26) and
1.2 (0.3), respectively, where the numbers in parentheses are standard errors.
These results contrast strongly with those of Nielsen and Yang
(33), who estimated the
distribution of fitness effects by considering the variation in the rate of
substitution between sites in primate mitochondrial DNA. They found that a
gamma distribution with a shape parameter of 3.22 (or a normal distribution)
fit the data significantly better than an exponential distribution, the best
fitting gamma distribution was one in which a substantial fraction of the
mutations had *S* values between -5 and -0.1. In contrast, the
distributions we have estimated have relatively few mutations in the range -5
< *S* < -0.1; for example, if we consider the distribution we
have estimated from humans, just 7% of the mutations lie in the range -5 <
*S* <-0.1, whereas this fraction is 98% in the gamma distribution
estimated by Nielsen and Yang
(33) (and about half of this
when they include a fraction of strongly deleterious mutations). The reason
for the difference between their results and ours is not obvious, both methods
make many assumptions and use rather different data. Possibly the most
conspicuous difference between their method and ours, besides the use of
polymorphism data, is their assumption that the strength of selection on new
mutations is constant through time at a particular codon. In contrast, we make
the assumption that just the distribution of *S* across sites is
constant through time. More work will be needed to resolve the differences
between the results from these two methods.

Gamma distributions with the shapes we have estimated from mitochondrial
DNA have the interesting property that the average probability of fixation is
proportional to a function of the inverse of the effective population size
[*f* ≈ (1/*N*_{e})^{β}]
(3,
4,
46). These distributions
therefore have the potential to make the molecular clock more robust if
species with high mutation rates tend to have large effective population
sizes. We might expect population size and mutation rate to be correlated,
because species with large effective population sizes tend to have short
generation times (46), and
there is some evidence that species with short generation times have high
mutation rates
(10-12).

However, if the distribution of fitness effects is exponential (i.e.,
β = 1), as our data suggest, then the ratio of the nonsynonymous to the
synonymous substitution rate, hereafter the
*d*_{n}/*d*_{s} ratio, is
expected be proportional to the inverse of the effective population size, and
this is not seen for the limited data we have. We are assuming here, as we do
in our method to infer the distribution of fitness effects, that synonymous
mutations are neutral and that nonsynonymous mutations are deleterious;
*d*_{n}/*d*_{s} therefore
provides an estimate of *f,* the proportion of mutations that are
effectively neutral, or equivalently, the average probability of fixation
relative to that of neutral mutations. The effective population size of
*Mus domesticus* appears to be ≈10-fold greater than that of
hominids both for nuclear (19)
and mitochondrial genes (unpublished results). However, the
*d*_{n}/*d*_{s} ratio is
≈2-fold higher in human-chimpanzee than mouse-rat for nuclear genes
(19), and also 2-fold higher
in human-chimpanzee than *M. domesticus*-*Mus spretus* for
mitochondrial genes (see Table
1). So, given the difference in effective population size and the
mean estimate of β, we would expect a 10-fold difference in the
*d*_{n}/*d*_{s} ratio; this is
not observed, the difference is only 2-fold. However, one might argue that
data sets that do not reject model 1 are uniformative about the distribution
of fitness effects and therefore should be ignored. If we ignore those data
sets, then the mean value of β = 0.52 (0.09); under such a distribution,
we would predict that the
*d*_{n}/*d*_{s} ratio should
be ≈3-fold higher in hominids than rodents, which is more consistent with
what is observed.

We have made a number of assumptions in developing our method. First, we
have assumed that there are few strongly advantageous mutations. Advantageous
mutations potentially have two effects, a direct effect and an indirect effect
secondary effect. If there are some strongly advantageous mutations, then a
proportion of the nonsynonymous substitutions or nonsynonymous polymorphisms
are a consequence of adaptive evolution. To investigate the consequences of
adaptive substitution and polymorphism, we reanalyzed the human data set,
assuming that 25, 50, and 75% of the substitutions were a consequence of
adaptive substitution; to do this, we reduced *D*_{n}
by 25, 50, and 75%, respectively. Likewise, to investigate the effect of
balancing selection on some amino acid mutations, we reduced
*P*_{n} by 25, 50, and 75%. The results are presented
in Table 4. Interestingly
adaptive substitution has relatively little effect on the estimates of β.
The effect of balanced polymorphism is a little more marked, particularly on
α, but even here the basic nature of the distribution is not greatly
affected.

**Parameter estimation under model 2a assuming a proportion of substitutions or polymorphisms are adaptive**

Advantageous mutations will also have indirect effects either by the process of genetic hitchhiking (6), in the case of adaptive substitutions, or by leading to the effective subdivision of the population, in the case of a balanced polymorphism. We have also assumed that the population size is stationary, that sampling was random, that synonymous mutations are neutral, and that there is free recombination. The direct effect of assuming free recombination, when there is in fact little or no recombination, would be to lead to an underestimate of the variance associated with our estimates. The indirect effect of assuming free recombination is to ignore the effects of genetic hitchhiking (6), background selection (5), and weak Hill-Robertson interference (37). The fact that the shape parameter estimate is fairly constant across data sets, which come from diverse taxa, suggests that this result is robust to these complications.

It is, at first sight, puzzling why the estimate of the shape parameter is
consistent across data sets and robust to assumptions about the level of
advantageous mutation. However, the results are perhaps not surprising given
two facts: (*i*) there is an excess of amino acid polymorphism in
almost all mitochondrial DNA data sets, and (*ii*) there is variation
in the strength of selection on amino acid polymorphisms. Between them, these
two facts constrain the shape parameter: the shape parameter cannot be too
small or there would be very few slightly deleterious mutations; when β
< 0.1, almost all mutations are either strongly deleterious or neutral.
However, the shape parameter cannot be too large, because we know there is
variation in the strength of selection acting on deleterious mutations
(45).

We have assumed that the distribution of fitness effects is gamma distributed because this is a flexible monotonic distribution. However, other distributions would fit the data; for example, a model in which a proportion of mutations are slightly deleterious, and the remaining mutations are strongly deleterious fits the data, as does a model in which the strongly deleterious class is replaced by a neutral class (although, of course, the proportion of slightly deleterious mutations and the strength of selection acting on them would differ substantially between the two models). In fact, many distributions may fit the data; it is therefore be best to regard our analysis as demonstrating that a gamma distribution is consistent with the mitochondrial DNA data.

## Acknowledgments

We thank Sebastien Gourbiere, Nicolas Bierne, Peter Keightley, and two anonymous referees for helpful comments. G.P. is supported by a Federation of European Biochemical Societies long-term fellowship. A.E.W. is supported by the Royal Society and the Biotechnology and Biological Sciences Research Council.

## Notes

This paper was submitted directly (Track II) to the PNAS office.

Abbreviation: MK, McDonald-Kreitman.

## References

**,**3440-3444. [PMC free article] [PubMed]

**,**1289-1303. [PMC free article] [PubMed]

**,**23-35. [PubMed]

**,**263-286.

**,**573-639. [PubMed]

**,**344-350. [PubMed]

**,**610-621. [PubMed]

**,**330-342. [PubMed]

**,**18-25. [PubMed]

**,**1993-1999. [PMC free article] [PubMed]

**,**3823-3827. [PMC free article] [PubMed]

**,**1519-1526. [PMC free article] [PubMed]

**,**202-215. [PubMed]

**,**1113-1117. [PMC free article] [PubMed]

**,**2142-2149. [PubMed]

**,**56-63. [PubMed]

**,**1297-1307. [PMC free article] [PubMed]

**,**874-881. [PubMed]

**,**221-238. [PMC free article] [PubMed]

**,**1024-1026. [PubMed]

*et al.*(1999) Nat. Genet. 22

**,**231-238. [PubMed]

**,**61-69. [PubMed]

**,**539-566. [PubMed]

**,**385-399. [PMC free article] [PubMed]

**,**652-654. [PubMed]

**,**1227-1234. [PMC free article] [PubMed]

**,**531-534. [PubMed]

**,**1022-1024. [PubMed]

**,**1231-1239. [PubMed]

**,**256-276. [PubMed]

**,**253-259. [PMC free article] [PubMed]

**,**1161-1176. [PMC free article] [PubMed]

**,**337-345. [PubMed]

**,**897-907. [PMC free article] [PubMed]

**,**1087-1095.

**,**393-407. [PubMed]

**,**708-713. [PubMed]

**,**725-736. [PubMed]

**,**555-556. [PubMed]

**,**174-175. [PubMed]

**,**11-20. [PubMed]

**,**688-690.

**National Academy of Sciences**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (264K) |
- Citation

- Overdispersed molecular evolution in constant environments.[J Theor Biol. 1993]
*Iwasa Y.**J Theor Biol. 1993 Oct 7; 164(3):373-93.* - The distribution of fitness effects of new mutations.[Nat Rev Genet. 2007]
*Eyre-Walker A, Keightley PD.**Nat Rev Genet. 2007 Aug; 8(8):610-8.* - The distribution of fitness effects of new deleterious amino acid mutations in humans.[Genetics. 2006]
*Eyre-Walker A, Woolfit M, Phelps T.**Genetics. 2006 Jun; 173(2):891-900. Epub 2006 Mar 17.* - What can we learn about the distribution of fitness effects of new mutations from DNA sequence data?[Philos Trans R Soc Lond B Biol Sci. 2010]
*Keightley PD, Eyre-Walker A.**Philos Trans R Soc Lond B Biol Sci. 2010 Apr 27; 365(1544):1187-93.* - Analysis and implications of mutational variation.[Genetica. 2009]
*Keightley PD, Halligan DL.**Genetica. 2009 Jun; 136(2):359-69. Epub 2008 Jul 29.*

- Causes of natural variation in fitness: Evidence from studies of Drosophila populations[Proceedings of the National Academy of Scie...]
*Charlesworth B.**Proceedings of the National Academy of Sciences of the United States of America. 2015 Feb 10; 112(6)1662-1669* - Approximation to the Distribution of Fitness Effects across Functional Categories in Human Segregating Polymorphisms[PLoS Genetics. ]
*Racimo F, Schraiber JG.**PLoS Genetics. 10(11)e1004697* - Pleiotropy Can Be Effectively Estimated Without Counting Phenotypes Through the Rank of a Genotype–Phenotype Map[Genetics. 2014]
*Gu X.**Genetics. 2014 Aug; 197(4)1357-1363* - Purifying Selection in Deeply Conserved Human Enhancers Is More Consistent than in Coding Sequences[PLoS ONE. ]
*De Silva DR, Nichols R, Elgar G.**PLoS ONE. 9(7)e103357* - Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome[PLoS Genetics. ]
*Comeron JM.**PLoS Genetics. 10(6)e1004434*

- CompoundCompoundPubChem chemical compound records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records. Multiple substance records may contribute to the PubChem compound record.
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles
- SubstanceSubstancePubChem chemical substance records that cite the current articles. These references are taken from those provided on submitted PubChem chemical substance records.
- TaxonomyTaxonomyTaxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
- Taxonomy TreeTaxonomy Tree

- Estimating the distribution of fitness effects from DNA sequence data:
Implicat...Estimating the distribution of fitness effects from DNA sequence data: Implications for the molecular clockProceedings of the National Academy of Sciences of the United States of America. 2003 Sep 2; 100(18)10335

Your browsing activity is empty.

Activity recording is turned off.

See more...