- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Genetics
- v.177(2); 2007 Oct
- PMC2034655

# Unbiased Estimator for Genetic Drift and Effective Population Size

^{*}Department of Biology, Centre for Ecological and Evolutionary Synthesis (CEES), University of Oslo, N-0316 Oslo, Norway and

^{†}Department of Zoology, Division of Population Genetics, Stockholm University, S-106 91 Stockholm, Sweden

^{1}

*Corresponding author:*Institute of Marine Research, Nye Flødevigveien 20, N-4817 His, Norway. E-mail: on.oiu.oib@edroj.e.p

## Abstract

Amounts of genetic drift and the effective size of populations can be estimated from observed temporal shifts in sample allele frequencies. Bias in this so-called temporal method has been noted in cases of small sample sizes and when allele frequencies are highly skewed. We characterize bias in commonly applied estimators under different sampling plans and propose an alternative estimator for genetic drift and effective size that weights alleles differently. Numerical evaluations of exact probability distributions and computer simulations verify that this new estimator yields unbiased estimates also when based on a modest number of alleles and loci. At the cost of a larger standard deviation, it thus eliminates the bias associated with earlier estimators. The new estimator should be particularly useful for microsatellite loci and panels of SNPs, representing a large number of alleles, many of which will occur at low frequencies.

RECENT years have seen an increased interest in using temporal shifts in allele frequencies to estimate the genetically effective size of natural and managed populations. This so-called temporal method relies on sampling individuals from a population at two or more times and estimating the amount of genetic drift in the interim. Typically, two samples, of *n _{x}* and

*n*individuals, respectively, are drawn at random from the population one (or more) generations apart. Screening the samples for a number of polymorphic loci, the most commonly used measures of allele frequency change are those of Nei and Tajima (1981),

_{y}and Pollak (1983),

where *a* is the number of alleles at the locus and *x _{i}* and

*y*are the observed frequencies of the

_{i}*i*th allele in the two samples, respectively, with a (unweighted) mean of

*z*.

_{i}The expected values of the measures defined above depend on how the samples were drawn from the population, and two different sample plans are recognized (Nei and Tajima 1981; Waples 1989). Under plan I, individuals are sampled after they have reproduced, or they are sampled nondestructively and subsequently returned to the population before reproduction occurs. Under plan II, in contrast, individuals are sampled before they reproduce and are not returned to the population. In the latter case, an estimator of genetic drift over the *t* generations separating the samples is obtained from the observed temporal allele frequency change, by subtracting the expected contribution from sampling from the quantities computed in Equations 1 or 2, resulting in

where denotes the harmonic mean sample size, and *F* represents *F*_{c} or *F _{k}*. When sampling follows plan I the actual number of individuals (

*N*) in the population when the first sample was drawn is also a parameter,

In practical applications of the temporal method several polymorphic loci are typically scored and an average estimate of drift over loci is calculated. If the number of generations (*t*) between the two samples is not too large (*cf.* Luikart *et al.* 1999), an estimate of effective population size is obtained from the average *F*′ as

The procedure outlined above should yield approximately correct estimates of genetic drift and effective population size, provided that the various assumptions underlining the method are met and that a sufficiently large number of loci are used. While computer simulations have typically verified this expectation, bias has been noted in specific applications. Waples (1989) noted that expressions (1) and (2) tended to be downward biased (resulting in an upward bias of *N*_{e}) when allele frequencies are close to 0 (zero) or 1 (unity). This problem was addressed in detail by Turner *et al.* (2001), who found that bias is a complex function of population allele frequency, true effective size, sample sizes, and number of generations between the samples. A bias for small sample sizes is also apparent in the computer simulations presented by Nei and Tajima (1981) and Waples and Yakota (2007). A downward bias in *F*, and subsequent overestimate of *N*_{e}, was observed in severely bottlenecked (*N*_{e} ≤ 6) populations by Richards and Leberg (1996) and Luikart *et al.* (1999).

While different sources of bias appear to affect the temporal method, no systematic analysis of this problem has been published and various suggestions to reduce bias have been put forward. Here, we characterize bias in the expressions above and propose an alternative estimator for drift and effective size that is also unbiased for small samples and highly skewed allele frequencies.

## THEORETICAL CONSIDERATIONS

#### Expectation of *F*:

To quantify bias in *F*, as defined in Equations 1 and 2, we calculated exact expected values for those expressions over a range of effective sizes (*N*_{e}), initial population allele frequencies (*q*_{1}), and sample sizes (*n _{x}* and

*n*) and compared these expected values with the true amount of drift (1/2

_{y}*N*

_{e}per generation). The calculations were carried out assuming a diploid organism by summing

*E*(

*F*) over all combinations of sample- and population-allele frequencies, weighing by the probabilities of those combinations. Briefly, sampling of individuals under plan I was considered equivalent to random drawing of 2

*n*genes without replacement from a pool of 2

*N*genes, where

*N*is the actual number of individuals in the population and equals the effective size under the Wright–Fisher model. The first sample of

*n*individuals was taken when the population allele frequency was

_{x}*q*

_{1}. The probability of observing

*X*copies of the allele in this sample is given by the hypergeometric probability density function, Hyper(

*X*, 2

*n*, 2

_{x}*N*,

*q*

_{1}). Under plan II, the sampled individuals are permanently removed from the population and, as shown by Waples (1989), sampling can then be regarded as being carried out on the conceptually infinite gene pool preceding the generation. Sampling is thus independent of population size and the probability of

*X*was calculated from the binomial probability density function, Bin(

*X*, 2

*n*,

_{x}*q*

_{1}). Assuming one generation between sampling events, we considered a Wright–Fisher model and calculated probabilities of new population allele frequencies,

*q*

_{2}, in the next generation when the second sample is to be drawn. Reproduction in this model is equivalent to random sampling of 2

*N*

_{e}genes from an infinite gene pool with allele frequency

*q*

_{1}, and the probability of obtaining

*Q*

_{2}copies of the allele in the second generation was calculated as Bin(

*Q*

_{2}, 2

*N*

_{e},

*q*

_{1}). The second sampling event is analogous to the first, except that the allele frequency is now

*q*

_{2}, and the sample size

*n*may be different. The expected value of

_{y}*F*is thus calculated as the value,

*F*(

*X*,

*Y*), obtained from Equations 1 or 2, when

*X*and

*Y*copies of the allele are observed in the first and second sample, respectively, times the probability of observing those numbers. Thus, under sample plan I (assuming

*N*

_{e}=

*N*),

and, under plan II,

We exclude cases when both samples are fixed for the same allele (*i.e.*, *X* = *Y* = 0 or *X* = 2*n _{x}* and

*Y*= 2

*n*), as these situations lead to undefined

_{y}*F*

_{c}and

*F*. The expectations (6) and (7) are adjusted accordingly, by dividing with the probability of no joint fixation in the two samples.

_{k}Variation in *F*_{c} and *F _{k}* over the range of sample sizes (here,

*n*=

_{x}*n*, ranging from 20 to 100) and initial allele frequencies (

_{y}*q*

_{1}, ranging from 0.005 to 0.995) are apparent from the plots of

*E*(

*F*) − 1/

*n*+ 1/

*N*(sample plan I) and

*E*(

*F*) − 1/

*n*(plan II), depicted in Figure 1. Because

*F*should equal 1/(2

*N*

_{e}) regardless of sample size and population allele frequency, this variation demonstrates that

*F*

_{c}and

*F*are not generally unbiased. When samples are small, as compared to effective size,

_{k}*F*

_{c}tends to be downward biased yielding estimates of

*N*

_{e}that are too large, whereas

*F*is upward biased so that becomes too small when this estimator is used. Because of the inverse relationship between

_{k}*F*and

*N*

_{e}, bias in

*F*can translate to a substantial bias in This source of bias, being in opposite directions for

*F*

_{c}and

*F*, was noted in recent computer simulations by Waples and Yakota (2007). This bias is apparent only when samples are quite small, and

_{k}*F*

_{c}and

*F*yield similar results when samples are larger. Another source of bias is apparent from Figure 1 when the locus is close to fixation,

_{k}*i.e.*, with

*q*

_{1}approaching 0 or 1. For such skewed allele frequencies both

*F*

_{c}and

*F*are downward biased for all sample sizes under sample plan II (Figure 1, B and D), as noted previously by Waples (1989), Turner

_{k}*et al.*(2001), and others. Under sample plan I, on the other hand, the bias is in the opposite direction and declines as a larger fraction of the population is sampled and largely disappears with exhaustive sampling (

*n*=

*N*; Figure 1, A and C). This behavior of

*F*

_{c}and

*F*under sample plan I has not been reported before and sample plan I has generally received little attention in the literature.

_{k}*F*

_{c}(Equation 1) and

*F*(Equation 2), after correction for sampling by subtracting 1/

_{k}*n*(plan II) or 1/

*n*− 1/

*N*(plan I). Expected values were calculated from Equations 6 (plan I) and 7 (plan II) for a population of effective size

**...**

The different types of bias described above demonstrate that the sampling adjustments to *F* made in expressions (3) and (4) are inadequate when samples are small or allele frequencies are skewed. Those expressions were originally derived by approximating the expected value of *F* by the ratio of the expected values of its numerator and denominator separately (Nei and Tajima 1981; Waples 1989). For *F _{k}* (Equation 2), which can be rewritten

*F*= (

_{k}*x*−

*y*)

^{2}/

*z*(1 −

*z*) in the diallelic case, the expected value has thus been approximated as

*E*(

*F*) ≈

_{k}*E*[(

*x*−

*y*)

^{2}]/

*E*[

*z*(1 −

*z*)]. As noted by Turner

*et al.*(2001), however, a better approximation is given by

where *E* denotes the expected value operator and Var and Cov are the variance and covariance, respectively.

Expressions equivalent to (8) were used by Turner *et al.* (2001) to evaluate bias in the temporal method and by Raufaste and Bonhomme (2000) to evaluate bias in an estimator devised by Robertson and Hill (1984) for spatial genetic differentiation, *F*_{ST}. Turner *et al.* (2001) derived an expression for bias in *F*_{c} (their Equation A13), but applying their expression as a means to correct for bias does not generally yield accurate results. As is evident from the numerical examples presented in Figure 2A, this bias correction tends to overcompensate for the negative bias in the unmodified *F*_{c} (*cf.* Figure 1B), resulting in a positive bias that is excessive for small samples. Raufaste and Bonhomme (2000) took a different approach and developed a correction for bias by making a number of simplifying assumptions (one of which was that allele frequencies are not highly skewed). Adopting their bias correction [their expression (24)], which was originally intended for *F*_{ST}, to the temporal estimator *F _{k}* yields only a marginal improvement in bias of the latter when samples are reasonably large (Figure 2B). Both the Turner

*et al.*(2001) and the Raufaste and Bonhomme (2000) corrections apply to sample plan II (explicitly or implicitly) and no expression has so far been put forward to correct for bias under sample plan I.

*F*(

*X*,

*Y*): (A) following Turner

**...**

Numerical evaluation of expression (8) indicates that the approximation inherent in this expression may be poor when allele frequencies are skewed. As seen from Table 1, which refers to sample plan II, the components on the right-hand side (*i.e.*, *A* − *B* + *C*) do not quite add up to the left-hand value of *E*(*F _{k}*) when allele frequencies are close to zero (and unity). This implies that expression (8) is unsatisfactory as a starting point for developing a correction for bias in

*F*(or

_{k}*F*

_{c}) under those conditions. Inspection of Table 1 indicates, however, that the ratio

*A*=

*E*[(

*x*−

*y*)

^{2}]/

*E*[

*z*(1 −

*z*)] is indeed independent of population allele frequency and depends only on sample size

*n*, which is known, and

*N*

_{e}, which is to be estimated.

#### A new measure of *F*:

The above observations suggest that an unbiased estimator for *N*_{e} is also possible for extreme allele frequencies if *E*[(*x* − *y*)^{2}] and *E*[*z*(1 − *z*)] are calculated separately. Thus, we propose the following measure of temporal change:

This measure differs from those considered previously in that the numerator and the denominator are estimated separately by summing over all alleles before carrying out the division. This approach is equivalent to weighing each allele *i* by (Raufaste and Bonhomme 2000, p. 288), as was recommended by Reynolds *et al.* (1983) and Weir and Cockerham (1984) when estimating spatial genetic differentiation (*F*_{ST}). Conversely, the original estimator *F _{k}* (Equation 2) weights each allele by

*w*= (1 −

_{i}*z*)/(

_{i}*a*− 1), which corresponds to the weighing scheme used by Robertson and Hill (1984).

#### Relating *F*_{s} to *N*_{e}:

To find an unbiased estimator for effective population size it remains to find exact expressions for the expectations of the numerator and the denominator of Equation 9. In the derivations we consider a diallelic locus and make the simplifying assumption that generations are discrete (nonoverlapping) and that samples are drawn exactly one generation apart, following either sample plan I or II. The expectation of the numerator of Equation 9 under these assumptions has been found previously (Waples 1989) and is for sample plan II

where, as before, *n _{x}* and

*n*are the number of individuals in the first and in the second sample, respectively, and

_{y}*q*

_{1}is the population allele frequency in the generation when the first sample was drawn. The corresponding expression for sample plan I is similar but includes an additional term, −1/

*N*,

*i.e.*, the inverse of the actual number of individuals in the population (see below).

An exact expression for the expected value of *z*(1 − *z*) under the stipulated conditions can be found as follows: , where Var(*x*) is the sample variance of the allele frequency in the first sample, *q*_{1}(1 − *q*_{1})/2*n _{x}*, and Var(

*y*) is that in the second sample, Cov(

*x*,

*y*) is the covariance between the sample frequencies

*x*and

*y*and equals

*q*

_{1}(1 −

*q*

_{1})/2

*N*under sample plan I and zero under plan II (Waples 1989). Putting this together, we have the expected value of the denominator of

*F*

_{s}under sample plan II:

The expression under sample plan I is similar but includes an additional term, 1/(4*N*), arising from the covariance between *x* and *y* (*cf.* Equation 14 below).

Carrying out the division of expression (10) by (11) eliminates the population allele frequency, *q*_{1} and yields the expected value for Equation 9. Under sample plan II

The expression for plan I is similar, but includes the additional terms −1/*N* and 1/(4*N*) in the numerator and in the denominator, respectively (*cf.* Equation 14).

Substituting *F*_{s} for 1/(2*N*_{e}) in expression (12) and rearranging yields an estimator for drift that should be unbiased under sample plan II,

where is the harmonic mean of the sample sizes *n _{x}* and

*n*. This expression is similar to Equation 14 of Nei and Tajima (1981), but includes the additional factor 1 − 1/(2

_{y}*n*). Under plan I the actual census size (

_{y}*N*) of the population at the time of first sampling is also a factor, so that

## COMPUTER SIMULATIONS

#### Methods:

The expected value of *F*_{s} derived above (Equation 12) is exact under a Wright–Fisher model for the stipulated conditions of a diallelic locus sampled twice, one generation apart. It should be valid also when *N*_{e} ≠ *N*, for multiallelic loci, and at least approximately so for samples collected more than one generation apart. We used computer simulations to check on these conjectures. The simulations started with a population consisting of 2*N* different genes in the first generation, *i.e.*, with *a* = 2*N* different alleles, each with a frequency of *q*_{1} = 1/(2*N*). Both ideal (*N*_{e} = *N*) and nonideal (*N*_{e} < *N*) populations were simulated. For the case of an ideal (Wright–Fisher) population, all 2*N* genes were equally likely to be drawn for the offspring generation (random drawing with replacement), whereas for nonideal populations we first picked (without replacement) 2*N*_{e} out of the 2*N* genes to constitute parents and subsequently drew 2*N* offspring genes (with replacement) from this parental gene pool. Sampling was carried out by random drawing of 2*n _{x}* = 2

*n*genes from the population (2

_{y}*N*genes) with replacement (plan II), or without replacement but with the genes subsequently returned to the population before reproduction (plan I).

*F*

_{s}(Equation 9) and

*F*(Equation 2) were calculated from the sample allele frequencies and corrected for sampling using Equations 3, 4, 13, or 14, as appropriate for the estimator and sample plan and used to estimate

_{k}*N*

_{e}by applying Equation 5.

Computer simulations were further used to check on the accuracy and precision of the proposed estimator when applied to a finite number of loci, segregating for various numbers of alleles and with different allele frequency profiles. Both diallelic (*a* = 2) and multiallelic (*a* = 10) loci were simulated with skewed and even allele frequencies. In the case of skewed frequencies, simulations were initiated with one (for diallelic loci) or nine (for multiallelic loci) rare alleles, each with a frequency of *q*_{1} = 0.01 and with one common allele (with a frequency of 0.99 or 0.91). In the case of even frequencies, all alleles were initiated with equal frequencies (*i.e.*, 0.5 for diallelic and 0.1 for multiallelic loci). Sampling and genetic drift over one generation were simulated as described above, following sample plans II and I (assuming *N*_{e} = *N*). The results of the new estimator *F*_{s} were compared to two alternative estimators, *viz.*, the original estimator *F _{k}* and the maximum-likelihood estimator proposed by Wang (2001). The latter was calculated using the MLNE (ver. 1.0) software, described by Wang and Whitlock (2003) and distributed by J. Wang (http://www.zoo.cam.ac.uk/ioz/software.htm#MLNE), by applying it to a single isolated population with a maximum

*N*

_{e}of 2000 (a larger maximum number resulted in excessive computer time). We report mean estimates of genetic drift and their standard deviations,

*i.e.*, for and the inverse of twice the MLNE estimate of

*N*

_{e}, calculated over 10,000 replicate computer runs (1000 for MLNE). Because the MLNE method is not adapted to sample plan I, we used this method only when simulating sampling according to plan II. (At present, MLNE returns estimates of

*N*

_{e}that are pegged against the maximum value supplied by the user when running the program on simulated data under sample plan I.)

## RESULTS

Simulation results (Table 2) verify the theoretical predictions presented in Figure 1 and demonstrate that the original estimator does yield biased estimates of genetic drift and effective size when applied to loci with skewed allele frequencies or small samples. With one generation between the samples the true amount of genetic drift is per definition 1/(2*N*_{e}). As predicted above, however, the simulated *F*_{k}′'s are typically larger than this value under sample plan I (with *N*_{e} = *N*; top rows), thus yielding estimates of *N*_{e} that are biased downward, approaching the true values only when sample sizes are close to the effective size, *i.e*., when sampling is exhaustive. When *N*_{e} < *N* (middle rows) and under sample plan II (bottom rows), the bias in is in the opposite direction, resulting in estimates of *N*_{e} that are usually too large. This behavior of is in accordance with the numerical results predicted for extreme allele frequencies in Figure 1, C and D.

**Results of computer simulations checking theoretical expectations for old and new estimators of genetic drift and effective size**

With several generations passing between samplings the true amount of drift accumulates according to the multiplicative expression 1 − [1 − 1/(2*N*_{e})]* ^{t}* [Crow and Kimura 1970, expression (7.3.4)], and equals 0.22367, 0.04889, and 0.00996 for effective sizes 20, 100, and 500, respectively, after

*t*= 10 generations. As seen from the right-hand side of Table 2, consistently fails to provide accurate estimates of drift in these simulations, yielding estimates that are too small (and too large) for all sample sizes and sample plans. In summary, there appear to be at least three causes of bias in the original estimator

*viz*., bias emerging with skewed allele frequencies, small samples (

*n*< <

*N*

_{e}), and long sample intervals. In contrast to the original one, the new estimator proposed here appears unbiased or nearly so for all sample sizes under both sample plans, even with long sampling intervals.

The simulation results presented in Table 3 expand on those above by including loci with different numbers of alleles (2 and 10) and allele frequency profiles (skewed and even), but keeping the true effective size fixed at *N*_{e} = 100. When applied to multiple loci, an average *F _{k}* is calculated over all single-locus measures (Equation 2), which are weighted by the number of independent alleles (

*i.e.*,

*a*− 1) at each locus. However, when calculating

*F*

_{s}for these simulations, the weighing scheme for alleles used in Equation 9 is extended over loci, by summing over all alleles and loci in the numerator and denominator separately before carrying out the division. The results (Table 3) validate this procedure and demonstrate that unbiased estimates are obtained with this new estimator also when applied to combinations of loci with different numbers of alleles and with very different allele frequency profiles (right columns). Briefly, the new estimator is seen to provide unbiased estimates of drift, not deviating from the true value of 1/(2

*N*

_{e}) = 0.00500 by >2% when applied to multiallelic loci or to a sufficient number of diallelic loci, also when allele frequencies are highly skewed, as in the top rows of Table 3. The worst-case scenario depicted in Table 3 refers to a small number (10) of diallelic loci with skewed allele frequencies (top left), where the residual bias is ~5%. With increasing numbers of loci (100) the original estimator remains biased (by >20%), whereas the residual bias in disappears. The reduction of bias in this case is entirely due to the extension of the allele-weighing scheme in expression (9) over loci. For loci segregating for multiple alleles, bias is nearly absent from already when calculated for a single locus (10 alleles with skewed frequencies, data not shown).

Table 3 also includes results for the MLNE maximum-likelihood procedure. Estimates of genetic drift, from this method are clearly biased when allele frequencies are skewed (top rows). A slight amount of bias is apparent for multiallelic loci also when allele frequencies are even (middle rows) and bias is not reduced with increasing numbers of loci. There appears to be a direct inverse relationship between accuracy and precision of the three estimators of genetic drift. As judged by their standard deviations (Table 3), the least accurate (*i.e.*, most biased) estimator, MLNE, is also the most precise one, having a lower standard deviation than the other estimators in all simulations involving skewed allele frequencies. Conversely, the new estimator () has the largest standard deviation of the three, except when allele frequencies are even. In that case the three estimators are all largely unbiased and have similar standard deviations.

Under sample plan I with exhaustive sampling (*i.e.*, all individuals sampled nondestructively and returned to the population before reproduction), bias is small for either estimator (data not shown). The mean point estimate of drift based on the original estimator ranges from 0.00502 for loci with even allele frequencies to 0.00543 when the frequencies are highly skewed. The worst case here represents <10% bias from the true value of 0.00500 (inverse of 2*N*_{e}) and may be considered negligible in most practical applications. These observations verify the theoretical expectation (*cf.* Figure 1C) that bias in is reduced under sample plan I when a large fraction of the population is sampled. The new estimator is again virtually unbiased for all allele frequency profiles (mean point estimates range from 0.00499 to 0.00506). Both estimators have similar standard deviations, except for loci with many rare alleles when SD for approaches twice that for . However, for all combinations of numbers of loci, numbers of alleles, and allele frequency profiles, the standard deviations for both estimators are greatly reduced relative to that observed for plan II, typically being one-half (for ) to one-third (for ) of those reported in Table 3.

## DISCUSSION

There is an obvious trade-off between accuracy and precision when selecting estimators for genetic drift and effective size with the temporal method. Unbiased estimates are provided with the new estimator, (Equations 9 and 13 for sample plan II or 14 for plan I), whereas other estimators are seen to yield quite biased estimates in some situations.

The simulation results presented in Table 3 allow for comparing alternative estimators with respect to both accuracy and precision, under relatively realistic scenarios pertaining to sample plan II. Here, the results of simulated mixtures of skewed and even allele frequencies for various number of loci are of practical interest. For example, 10 diallelic loci with mixed allele frequency profiles may exemplify an isozyme study of temporal allele frequency change (*e.g.*, Jorde and Ryman 1996), 8–10 multiallelic loci a typical microsatellite study (Miller and Kapuscinski 1997; Ovenden *et al.* 2006), and 100 diallelic loci a realistic SNP study. Under all three scenarios, the new estimator should yield unbiased estimates of genetic drift and effective size (Table 3, bottom row), whereas the original estimator and the MLNE method do not.

The improved accuracy of the new estimator comes at a cost of reduced precision, however, as is evident from the larger standard deviation of relative to . The reduction in precision is probably caused by a more aggressive allele weighing scheme in the new estimator. Because *F*_{s} weights alleles according to their heterozygosity, or *z*(1 − *z*) (*z* being the allele's average sample frequency), alleles that occur in a low frequency in the samples are given proportionally lower weight and contribute less to the mean estimate. For example, when calculating *F*_{s} an allele that occurs with an average sample frequency of *z* = 0.1 would receive a weight that is more than nine times greater than would a rarer allele with *z* = 0.01, the relative weight being [0.1 × (1 − 0.1)]/[0.01 × (1 − 0.01)] = 9.09. In the measure *F _{k}*, on the other hand, there would be much less difference in weight given to these two alleles; (1 − 0.1)/(1 − 0.01) = 0.91, or nearly equal weights. With intermediate allele frequencies the differences in weight between the two schemes are modest, and and have similar means as well as standard deviations when allele frequencies are even (

*cf.*middle rows of Table 3). The allele weighing scheme used in the new estimator is advantageous because it drastically reduces bias associated with rare alleles (

*cf.*Figure 1), allowing for an unbiased mean. The lower weight given to rare alleles also largely eliminates the bias caused by allele fixation when sample intervals are long (

*e.g.*,

*t*= 10 generations, Table 2). On the other hand, the new weighing scheme is disadvantageous in the sense that the mean estimate is dominated by fewer alleles and therefore becomes more susceptible to random errors raising its standard deviation.

Discussion of the temporal method in the literature has so far focused on the precision that can be expected of the resulting estimates of genetic drift and effective size. This is justified because genetic drift is a stochastic process and estimates of drift are often strongly dominated by random errors when based on a limited number of loci and on limited sample sizes. With increased automation of the genotyping process, larger numbers of individuals and numbers of loci are becoming realistic and systematic errors (bias) are then becoming increasingly important. Thus, the better the data, the more important the issue of bias is. For example, with a large number of multiallelic loci with skewed allele frequencies (*cf.* Table 3, 100 loci), the original estimator yields an estimate of *N*_{e} = 1/(2 × 0.00387) = 129 that is not only biased (by 29.2%), but the 95% confidence interval for the estimate [1/(2 × (0.00387 ± 1.96 × 0.0005)) = 103 − 173] does not even contain the true value of 100! Despite this bias, the mean square error (MSE = bias^{2} + SD^{2}) of the original estimator is only a little higher than the MSE for the new, unbiased This is because has a lower SD. Thus it yields a more precise, but biased estimate of effective size. Clearly, unbiased estimators may sometimes be preferable even if they sacrifice some precision.

When precision is of major concern, the best strategy is to sample a large fraction of the population according to plan I. When such a strategy can be applied, both random errors as well as bias associated with the original estimators ( and ) are greatly reduced. While impractical for many organisms, such exhaustive sampling may nevertheless be realistic for reasonably small populations that can be sampled nondestructively. Nondestructive sampling is trivial for most plants and may be adopted for a number of animals using, *e.g.*, skin or blood biopsy, fin clipping, or noninvasive sampling of feathers, hairs, or feces. The latter has proved effective in sampling otherwise elusive large mammals such as brown bears (Taberlet *et al.* 1997) and wolverines (Flagstad *et al.* 2004).

In recent years, various maximum-likelihood approaches for estimating effective size from temporal allele frequency data have been developed and claimed to be less biased than moment-based estimators such as or (Williamson and Slatkin 1999; Anderson *et al.* 2000; Berthier *et al.* 2002; Wang and Whitlock 2003). Contrary to these claims, we find that bias may also be a problem for maximum-likelihood methods under certain conditions. For example, the MLNE method yields downward-biased estimates of drift, and hence too large when calculated from loci with highly skewed allele frequencies (Table 3, top rows). Further simulation results presented in Table 3 (middle rows), and additional ones not shown here, indicate that the MLNE method performs appropriately when applied to loci with reasonably even allele frequencies. However, making a complete evaluation of the performance of this and other maximum-likelihood estimation procedures is outside the scope of the present article [see Tallmon *et al.* (2004), for some comparisons]. Instead, we focus on moment-based estimators because they are the most widely applied in the empirical literature and because they are more readily adopted to various sampling scenarios (*i.e.*, the two different sample plans referred to herein) and population characteristics, including semelparous (Waples 1990) and iteroparous (Jorde and Ryman 1995; Waples and Yakota 2007) breeding schemes with age structure and overlapping generations and different ploidy (*e.g.*, haploid mtDNA) (Laikre *et al.* 1998; Turner *et al.* 2001). The new estimator should therefore provide an unbiased alternative to other estimators of genetic drift and effective size in a wide range of applications. A computer program implementing the new estimator can be downloaded from http://folk.uio.no/ejorde/ or http://www.zoologi.su.se/~ryman/.

## Acknowledgments

We thank Jinliang Wang and Robin Waples and two anonymous reviewers for valuable comments on a previous version of this manuscript. This work was supported by the Research Council of Norway (P.E.J.) and by the Swedish Research Council (N.R.). Parts of this work were carried out when P.E.J. was on leave at the Division of Population Genetics, Stockholm University, with a Marie Curie postdoctoral stipend from the European Science Foundation.

## References

- Anderson, E. C., E. G. Williamson and E. A. Thompson, 2000. Monte Carlo evaluation of the likelihood for
*N*_{e}from temporally spaced samples. Genetics 156**:**2109–2118. [PMC free article] [PubMed] - Berthier, P., M. A. Beaumont, J.-M. Cornuet and G. Luikart, 2002. Likelihood-based estimation of the effective population size using temporal changes in allele frequencies: a genealogical approach. Genetics 160
**:**741–751. [PMC free article] [PubMed] - Crow, J. F., and M. Kimura, 1970.
*An Introduction to Population Genetics Theory.*Harper & Row, New York. - Flagstad, Ø., E. Hedmark, A. Landa, H. Broseth, J. Persson
*et al.*, 2004. Colonization history and noninvasive monitoring of a reestablished wolverine population. Conserv. Biol. 18**:**676–688. - Jorde, P. E., and N. Ryman, 1995. Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics 139
**:**1077–1090. [PMC free article] [PubMed] - Jorde, P. E., and N. Ryman, 1996. Demographic genetics of brown trout (
*Salmo trutta*) and estimation of effective population size from temporal change of allele frequencies. Genetics 143**:**1369–1381. [PMC free article] [PubMed] - Laikre, L., P. E. Jorde and N. Ryman, 1998. Temporal change of mitochondrial DNA haplotype frequencies and female effective size in a brown trout (
*Salmo trutta*) population. Evolution 52**:**910–915. - Luikart, G., J. M. Cornuet and F. W. Allendorf, 1999. Temporal changes in allele frequencies provide estimates of population bottleneck size. Conserv. Biol. 13
**:**523–530. - Miller, L. M., and A. R. Kapuscinski, 1997. Historical analysis of genetic variation reveals low effective population size in a northern pike (
*Esox lucius*) population. Genetics 147**:**1249–1258. [PMC free article] [PubMed] - Nei, M., and F. Tajima, 1981. Genetic drift and estimation of effective population size. Genetics 98
**:**625–640. [PMC free article] [PubMed] - Ovenden, J. R., D. Peel, R. Street, A. J. Courtney, S. D. Hoyle
*et al.*, 2006. The genetic effective and adult census size of an Australian population of tiger prawns (*Penaeus esculentus*). Mol. Ecol. 16**:**127–138. [PubMed] - Pollak, E., 1983. A new method for estimating the effective population size from allele frequency changes. Genetics 104
**:**531–548. [PMC free article] [PubMed] - Raufaste, N., and F. Bonhomme, 2000. Properties of bias and variance of two multiallelic estimators of F
_{ST}. Theor. Popul. Biol. 57**:**285–296. [PubMed] - Reynolds, J., B. S. Weir and C. C. Cockerham, 1983. Estimation of the coancestry coefficient: basis for a short-term genetic distance. Genetics 105
**:**767–779. [PMC free article] [PubMed] - Richards, C., and P. L. Leberg, 1996. Temporal changes in allele frequencies and a population's history of severe bottlenecks. Conserv. Biol. 10
**:**832–839. - Robertson, A., and W. G. Hill, 1984. Deviations from Hardy–Weinberg proportions: sampling variances and use in estimation of inbreeding coefficients. Genetics 107
**:**703–718. [PMC free article] [PubMed] - Taberlet, P., J.-J. Camarra, S. Griffin, E. Uhrès, O. Hanotte
*et al.*, 1997. Noninvasive genetic tracking of the endangered pyrenean brown bear population. Mol. Ecol. 6**:**869–876. [PubMed] - Tallmon, D. A., G. Luikart and M. A. Beaumont, 2004. Comparative evaluation of a new effective population size estimator based on approximate Bayesian computation. Genetics 167
**:**977–988. [PMC free article] [PubMed] - Turner, T. F., L. A. Salter and J. R. Gold, 2001. Temporal-method estimates of N
_{e}from highly polymorphic loci. Conserv. Genet. 2**:**297–308. - Wang, J., 2001. A pseudo-likelihood method for estimating effective population size from temporally spaced samples. Genet. Res. 78
**:**243–257. [PubMed] - Wang, J., and M. C. Whitlock, 2003. Estimating effective population size and migration rates from genetic samples over space and time. Genetics 163
**:**429–446. [PMC free article] [PubMed] - Waples, R. S., 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121
**:**379–391. [PMC free article] [PubMed] - Waples, R. S., 1990. Conservation genetics of Pacific salmon. III. Estimating effective population size. J. Hered. 81
**:**277–289. - Waples, R. S., and M. Yakota, 2007. Temporal estimates of effective population size in species with overlapping generations. Genetics 175
**:**219–233. [PMC free article] [PubMed] - Weir, B. S., and C. C. Cockerham, 1984. Estimating
*F*-statistics for the analysis of population structure. Evolution 38**:**1358–1370. - Williamson, E. G., and M. Slatkin, 1999. Using maximum likelihood to estimate population size from temporal changes in allele frequencies. Genetics 152
**:**755–761. [PMC free article] [PubMed]

**Genetics Society of America**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (223K)

- An estimator for pairwise relatedness using molecular markers.[Genetics. 2002]
*Wang J.**Genetics. 2002 Mar; 160(3):1203-15.* - Simultaneous estimation of mixing rates and genetic drift under successive sampling of genetic markers with application to the mud crab (Scylla paramamosain) in Japan.[Genetics. 2006]
*Kitakado T, Kitada S, Obata Y, Kishino H.**Genetics. 2006 Aug; 173(4):2063-72.* - Estimating pairwise relatedness from dominant genetic markers.[Mol Ecol. 2004]
*Wang J.**Mol Ecol. 2004 Oct; 13(10):3169-78.* - Maximum-likelihood estimation of relatedness.[Genetics. 2003]
*Milligan BG.**Genetics. 2003 Mar; 163(3):1153-67.* - Estimating F-statistics.[Annu Rev Genet. 2002]
*Weir BS, Hill WG.**Annu Rev Genet. 2002; 36:721-50. Epub 2002 Jun 11.*

- Causes and consequences of fine-scale population structure in a critically endangered freshwater seal[BMC Ecology. ]
*Valtonen M, Palo JU, Aspi J, Ruokonen M, Kunnasranta M, Nyman T.**BMC Ecology. 1422* - Spatial and temporal genetic structure of a river-resident Atlantic salmon (Salmo salar) after millennia of isolation[Ecology and Evolution. 2014]
*Sandlund OT, Karlsson S, Thorstad EB, Berg OK, Kent MP, Norum IC, Hindar K.**Ecology and Evolution. 2014 May; 4(9)1538-1554* - Living on the edge: reconstructing the genetic history of the Finnish wolf population[BMC Evolutionary Biology. ]
*Jansson E, Harmoinen J, Ruokonen M, Aspi J.**BMC Evolutionary Biology. 1464* - Genetic Similarity of Island Populations of Tent Caterpillars during Successive Outbreaks[PLoS ONE. ]
*Franklin MT, Myers JH, Cory JS.**PLoS ONE. 9(5)e96679* - Time-series analysis reveals genetic responses to intensive management of razorback sucker (Xyrauchen texanus)[Evolutionary Applications. 2014]
*Dowling TE, Turner TF, Carson EW, Saltzgiver MJ, Adams D, Kesner B, Marsh PC.**Evolutionary Applications. 2014 Mar; 7(3)339-354*

- PubMedPubMedPubMed citations for these articles

- Unbiased Estimator for Genetic Drift and Effective Population SizeUnbiased Estimator for Genetic Drift and Effective Population SizeGenetics. Oct 2007; 177(2)927PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...