- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Genetics
- v.172(4); Apr 2006
- PMC1456384

# The Hitchhiking Effect on Linkage Disequilibrium Between Linked Neutral Loci

^{*}Section of Evolutionary Biology, Biocenter, University of Munich, 82152 Planegg-Martinsried, Germany and

^{†}Department of Computer Sciences and

^{‡}Section of Evolution and Ecology, University of California, Davis, California 95616

^{1}

*Corresponding author:*University of Munich, Grosshaderner Strasse 2, 82152 Planegg, Germany. E-mail: ed.nehcneum-inu.eigoloib.iz@nahpets

## Abstract

We analyzed a three-locus model of genetic hitchhiking with one locus experiencing positive directional selection and two partially linked neutral loci. Following the original hitchhiking approach by Maynard Smith and Haigh, our analysis is purely deterministic. In the first half of the selected phase after a favored mutation has entered the population, hitchhiking may lead to a strong increase of linkage disequilibrium (LD) between the two neutral sites if both are <0.1*s* away from the selected site (where *s* is the selection coefficient). In the second half of the selected phase, the main effect of hitchhiking is to destroy LD. This occurs very quickly (before the end of the selected phase) when the selected site is between both neutral loci. This pattern cannot be attributed to the well-known variation-reducing effect of hitchhiking but is a consequence of secondary hitchhiking effects on the recombinants created in the selected phase. When the selected site is outside the neutral loci (which are, say, <0.1*s* apart), however, a fast decay of LD is observed only if the selected site is in the immediate neighborhood of one of the neutral sites (*i.e*., if the recombination rate *r* between the selected site and one of the neutral sites satisfies ). If the selected site is far away from the neutral sites (say, *r* > 0.3*s*), the decay rate of LD approaches that of neutrality. Averaging over a uniform distribution of initial gamete frequencies shows that the expected LD at the end of the hitchhiking phase is driven toward zero, while the variance is increased when the selected site is well outside the two neutral sites. When the direction of LD is polarized with respect to the more common allele at each neutral site, hitchhiking creates more positive than negative linkage disequilibrium. Thus, hitchhiking may have a distinctively patterned LD-reducing effect, in particular near the target of selection.

GENETIC drift is recognized as a fundamental stochastic force shaping the polymorphism within populations and divergence between species of both neutral and selected variants at a locus (Fisher 1930; Wright 1931; Kimura 1983). Remarkably similar patterns to those caused by genetic drift are predicted by theoretical models incorporating stochastically varying selection (Gillespie 1994). In 1974 Maynard Smith and Haigh analyzed a simple and obvious extension of the genetic drift models, incorporating the stochastic coupling due to linkage with another locus undergoing strong directional selection. Addressing the apparent uniformity of allozyme polymorphism across species, Maynard Smith and Haigh (1974) focused on the “hitchhiking effect” on allelic frequencies and heterozygosity.

Since this seminal work the hitchhiking effect associated with positive directional selection has been extended to address observations in the emerging field of molecular population genetics (Aguadé *et al.* 1989; Stephan and Langley 1989; Begun and Aquadro 1992). These studies revealed low levels of DNA sequence polymorphism in Drosophila in genomic regions of low crossing over and led to theoretical analyses of the hitchhiking effect on nucleotide diversity (Kaplan *et al.* 1989) and on the frequency spectrum of polymorphisms (Braverman *et al.* 1995). Whether formulated in terms of gamete frequencies or in the coalescent framework, the genetic models dealt with a pair of loci: a selected and a linked neutral locus with a defined rate of recombination between them. Independent of whether single or recurrent hitchhiking events were analyzed (as a stochastic process or a deterministic approximation), the conclusion has consistently been that hitchhiking should have profound effects on expected heterozygosity and allele frequencies at the neutral locus, if selection is strong and linkage is tight.

In 1977 Thomson considered the impact of hitchhiking on linkage disequilibrium (LD). Most of her analysis focused on the association between the selected alleles and those at a single neutral locus. Thomson also considered the impact of hitchhiking (of a heterotic polymorphism) on the LD between alleles at two linked neutral loci. On the basis of numerical examples she concluded that hitchhiking creates LD between neutral loci within the same genomic domain in which it affects levels of heterozygosity. The impact of simple directional selection of rare variants rapidly going to fixation was not considered by Thomson (1977). Except for this initial study and a few rather targeted applications (Robinson *et al.* 1991; Grote *et al.* 1998), no attempts have been made to extend this simple two-locus hitchhiking model of Maynard Smith and Haigh (1974) to multiple loci. This is curious, as the hitchhiking effect has played a major role in molecular population genetics for >15 years.

On the basis of the analyses of Thomson and her colleagues, it has been believed that hitchhiking affects not only polymorphisms at individual sites (or loci), but also the association between polymorphisms. Using a three-locus model with one selected and two neutral loci, she showed that hitchhiking that has a strong effect on heterozygosity can also generate strong LD between the neutral loci. Without paying close attention to Thomson's writing, the important role of hitchhiking in generating LD has been reiterated in textbooks and publications by many authors. It was not until recently that a few authors reported quite the opposite results (Gillespie 1997; Kim and Stephan 2002; Kim and Nielsen 2004). Analyzing “shift” models that are similar to the hitchhiking model described above, Gillespie (1997, p. 297) concluded that “linked selection can reduce variation without building up high levels of linkage disequilibrium, contrary to our intuition.” These latter studies focused on average effects observable in simulated data. In small-sample coalescent simulations, Kim and Nielsen (2004) found increased LD between alleles at two neutral loci on the same side of the selected locus at the time of fixation and reduced LD across the site of selection. Furthermore, they provide heuristic arguments to explain this pattern. These different and somewhat contradictory views of the relationship between genetic hitchhiking and LD motivated us to pursue an analytic investigation of the question. We followed the deterministic approach of Maynard Smith and Haigh (1974), extending their model to three loci as did Thomson (1977). To analyze this model, we used the framework of Barton and Turelli (1991), which provides a natural setting for the investigation of the directional hitchhiking, yielding transparent mathematical expressions that illuminate the rather surprising dynamics of LD under the hitchhiking effect.

## THE THREE-LOCUS HITCHHIKING MODEL

We consider a three-locus model with two neutral loci and one selected locus. For each locus, we assume that there are only two allele types, denoted by 0 and 1. The selected locus may be between the two neutral ones or on either side of them (see Figure 1). We denote by L and R the left and right neutral loci, respectively, and by S the selected locus. The corresponding recombination fractions between loci are *r*_{LR}, etc. We assume positive directional selection according to the following fitness scheme:

where *s* is the selection coefficient of the selected allele (type 1) and *h* the dominance coefficient. Note that we follow here the notation of Maynard Smith and Haigh (1974), not the definition of Kaplan *et al.* (1989). Effective population size *N*_{e} is assumed to be very large (), such that a deterministic analysis of the model is appropriate.

*r*

_{LR}between L and R is given by adding or subtracting

*r*

_{LS}and

*r*

_{RS}, depending on

**...**

### The system of full recursions:

To derive the recursions for the marginal (type 1) allele frequencies *p*_{L}, *p*_{R}, and *p*_{S} at the three loci and the LDs (central moments) of the second (*C*_{LR}, *C*_{LS}, *C*_{RS}) and third (*C*_{LRS}) orders measured with respect to type 1 alleles, we follow the approach of Barton and Turelli (1991). In that approach, coefficients that appear in the recursions are recombination rates and generalized selection coefficients, the latter denoted by and , where *X* and *Y* are nonempty subsets of loci. (Generalized selection coefficients are defined as coefficients that appear in expressing the relative fitness in terms of certain quantities related to LDs.) In our case, nonzero selection coefficients that appear in the recursions at time *t* are

For *h* = , these simplify to

Following Barton and Turelli (1991), we define *r*_{LRS} = *r*_{LS,R} + *r*_{RS,L} + *r*_{LR,S} and use *r _{X}*

_{,Y}to denote the rate of recombination events that partition the loci into two nonempty sets

*X*and

*Y*. In our work, we ignore double-crossover recombination events, so we define

*r*

_{X}_{,Y}= 0 if the partition

*X*,

*Y*corresponds to a double-crossover event. For example, if the selected locus S is between the neutral loci, then

*r*

_{LS,R}=

*r*

_{RS},

*r*

_{RS,L}=

*r*

_{LS}, and

*r*

_{LR,S}= 0. Further, we assume that recombination rates are additive. The recombination rate

*r*

_{LR}is therefore given by adding or subtracting

*r*

_{LS}and

*r*

_{RS}, depending on the configuration (see Figure 1).

To simplify notation, we hereafter omit writing the dependence on time *t*. We define *q*_{S} = 1 − *p*_{S}, *q*_{L} = 1 − *p*_{L}, and *q*_{R} = 1 − *p*_{R}. Marginal allele frequencies satisfy the following recursions:

The LDs satisfy the recursions

where

with

The above general recursions apply to all three configurations shown in Figure 1. Depending on the particular configuration being considered, recombination rates *r _{X}*

_{,Y}need to be defined appropriately.

### The system of truncated recursions:

We explore the behavior of the recursion Equations 1–7 in the region and (see Maynard Smith and Haigh 1974). Here, *r* may be any recombination parameter appearing in the Equations 1–7. Keeping only the terms linear in *s* or *r* leads to the following set of recursions:

These equations agree to first order in *r* and *s* with those of Thomson (1977) (compare her Equations 30iii, 30iv, and 31). Since the hitchhiking effect can be best observed when (such that simultaneously holds), we may approximate the truncated recursions by the following ordinary differential equations (ODEs; see Maynard Smith and Haigh 1974):

Here we have introduced time *t* (in generations) into the equation for *p*_{S} and parameterized the other quantities by *p*_{S} (which is a monotonically increasing function of *t*).

### Structure of equations:

Several features of the dynamical system become apparent from these two sets of equations. Most importantly, selection acts on the alleles at the neutral loci indirectly and in a strictly hierarchical fashion: on the marginal neutral allele frequencies via the pairwise LDs *C*_{LS} and *C*_{RS}, on the LD between the neutral sites by the third moment, and on the latter by a fourth-order term (*i.e.*, the product of the two pairwise LDs *C*_{LS} and *C*_{RS}).

### Analytical solutions when the selected locus is between the neutral loci:

The ODEs for the LDs are first-order linear differential equations. The ODEs for the pairwise LDs *C*_{LS} and *C*_{RS} are homogeneous. The equations for *C*_{LR} and *C*_{LRS} contain the higher-order moments as inhomogeneous terms that act as “driving forces” of the dynamics. However, except for this impact of the higher-order terms, the equations are decoupled and can be solved successively. We have the following results.

The frequency of the selected allele at locus S is

whereas marginal allele frequencies at the neutral loci are

where

which corresponds to the heterozygosity at the selected locus. The LDs *C*_{LS}(*t*) and *C*_{RS}(*t*) can be written as

Given these solutions for *C*_{LS}(*t*) and *C*_{RS}(*t*), the coupled ODEs (24) and (25) admit simple exact solutions when the selected locus is between the two neutral loci. More specifically, the third-order LD is given by

and the LD between the neutral loci can be written as

where *r*_{LR} = *r*_{LRS} = *r*_{LS} + *r*_{RS}.

### Analytical solutions when the selected locus is outside the neutral loci:

Solutions to the ODEs (19)–(23) do not depend on whether the selected locus is inside or outside the two neutral loci. For example, the allele frequency *p*_{S}(*t*) and the LDs *C*_{LS}(*t*) and *C*_{RS}(*t*) are given by (26), (30), and (31), respectively, in all cases. However, the dynamics of the ODEs (24) and (25) for *C*_{LR}(*t*) and *C*_{LRS}(*t*), respectively, depend crucially on the position of the selected locus S with respect to the neutral loci L and R. As we elaborate presently, the dynamics of *C*_{LR}(*t*) when S is between L and R exhibit radically different behavior than when S is outside.

In what follows, suppose that S is to the right of R, which implies *r*_{LS} = *r*_{LRS} = *r*_{LR} + *r*_{RS}. The case where S is to the left of L can be handled in a similar vein, with *r*_{RS} replaced with *r*_{LS}. Now, the ODE (25) for the third-order LD *C*_{LRS}(*t*) does not admit a closed-form solution; a general solution can be obtained in terms of the incomplete beta function *B*(*z*; *x*, *y*), defined as . However, noting that *B*(*z*; 1 − 2*r*_{RS}/*s*, 1 + 2*r*_{RS}/*s*) ≈ *z* if , we obtain the following simple approximate solution:

Further, using this solution and the approximation *B*[*z*; 2 − 2*r*_{RS}/*s*, 1 + 2*r*_{RS}/*s*] ≈ *z*^{2} for , we obtain the following approximate solution to (24):

Note the striking resemblance of (34) and (35) to (32) and (33), respectively. For *r*_{RS} = 0, the two sets of equations agree exactly, and hence our solutions for different regions form one continuous solution for the entire domain. The only difference between the two sets of equations is that, in (34) and (35), an extra factor appears together with [*p*_{S}(*t*) − *p*_{S}(0)]/*H*_{S}(0). This simple difference leads to important observable differences in the dynamics of the LDs.

### Comparison of approximate analytic solutions with numerical solutions to the full recursions:

We have written a computer program to solve the full recursions (1)–(11) numerically. Comparison of our analytic solutions (33) and (35) with numerical solutions to the full recursions are shown in Tables 1 and and2,2, respectively. As these tables show, our analytic solutions are a good approximation to the exact dynamics. In obtaining our analytic solution (35) for the case in which the selected locus is outside the neutral loci, recall that we assumed to approximate the incomplete beta function. As expected, Table 2 shows that (35) becomes less accurate as *r*_{RS} increases, but it is a good approximation as long as .

## THE DYNAMICS OF LD

Here we consider the dynamics of the LD between the two neutral loci. We utilize our analytic solutions from the previous section to study several important aspects of the dynamics.

### Vanishing LD:

In the domain , the term −*sC*_{LS}*C*_{RS} dominates the recursion for the third moment [see (18)] and hence influences also the LD between the neutral sites. If the selected site is between the two neutral sites and linkage is sufficiently tight (*r*_{LR} < 0.1*s*), |*C*_{LR}| quickly increases after the favored mutation has entered the population and, after transiently reaching a peak, rapidly decays to zero before the selected phase ends. To show this, define *t*_{f} as the time satisfying *p*_{S}(*t*_{f}) = 1 − *p*_{S}(0). Henceforward, we loosely refer to this time as the *fixation time*. Using (26), one can show that

We wish to show that, if the selected locus is between the two neutral loci, then *C*_{LR}(*t*_{f}) ≈ 0 for all possible initial conditions of interest. Common to all initial conditions is that *p*_{S}(0) = 1/(2*N*), with *N* being the population size. For *N* = ~10^{4}–10^{6}, , and therefore

which implies that (33) at *t*_{f} can be written approximately as follows:

We use {000, 001, 010, 011, 100, 101, 110, 111} to denote gametic types. Their frequencies are denoted by {*f*_{000}, *f*_{001}, *f*_{010}, *f*_{011}, *f*_{100}, *f*_{101}, *f*_{110}, *f*_{111}}, which are related to the marginal frequencies and the LDs as follows:

Using these relations, we obtain

At *t* = 0, a new favored mutation occurs on only one of the following gametic types: 000, 001, 100, or 101. For instance, if the mutation occurs on a gamete of type 101, *f*_{010}(0) = *f*_{011}(0) = *f*_{110}(0) = 0 and *f*_{111}(0) = *p*_{S}(0) = 1/(2*N*). More generally, only one of *f*_{010}(0), *f*_{111}(0), *f*_{011}(0), *f*_{110}(0) is supposed to be nonzero. This implies that the right-hand side of (39) must be zero. Hence, the coefficient of exp(−*r*_{LR}*t*_{f}) in (38) is exactly zero. If the approximation (37) were not used, then the coefficient of exp(−*r*_{LR}*t*_{f}) in *C*_{LR}(*t*_{f}) would not be exactly zero, but would still be very small. Note that exp(−*r*_{LR}*t*_{f}) may not be very small. For example, exp(−*r*_{LR}*t*_{f}) = 0.452813 for *p*_{S}(0) = 0.00005, *r*_{LR} = 0.00002, and *s* = 0.001. This shows that, for small recombination rates (), selection rather than recombination is the dominant force that causes *C*_{LR}(*t*) to vanish before the fixation time. For large recombination rates, the contribution exp(−*r*_{LR}*t*) from recombination should dominate over selection effects.

This behavior may be explained as follows. Under the scenario of tight linkage and strong selection, a low-frequency gamete on which the favored mutation landed is quickly dragged into intermediate to high frequency. If the recombination rates are nonzero, this gamete may undergo recombination, thereby creating the two types of single recombinants that also carry the selected allele and thus increase in frequency. This reduces the LD between L and R created by the hitchhiking effect in the first half of the selected phase. This hitchhiking effect on the recombinants is stronger, the greater the linkage between the selected site and the two neutral sites is, and thus also the product *C*_{LS}*C*_{RS}.

Figure 2a shows another important observation: LD may vanish very quickly in the selected phase, while relative heterozygosity approaches a finite (*i.e.*, nonzero) equilibrium value. Thus, LD does not vanish because of the variation-reducing effect of hitchhiking *per se*, but as a consequence of secondary hitchhiking effects on the recombinants created in the selected phase (described above).

*C*

_{LR}(

*n*) are shown on the left-hand side, and those of normalized heterozygosity Het(

*n*)/Het(0) are shown on the right-hand side, where solid (resp. dashed) lines correspond to locus L (resp. R). (a) The selected locus is at the midpoint

**...**

A selected mutation occurring outside the two neutral sites on a low-frequency gamete may also lead to a transient peak of *C*_{LR}, if both neutral polymorphic sites are <0.1*s* recombination distances away from the selected site (see Figure 2, b and c). Although this peak vanishes faster than under neutral conditions (*i.e.*, with the selected site far away from the neutral sites, as in Figure 2d), the decay rate is not as high as when the favored mutation occurs between the neutral sites (Figure 2a). We analyze this behavior in more detail below.

Shown in Figure 3 is a plot of *C*_{LR}(*n*) for varying position of the selected locus. As in Figure 2, the distance between the two neutral loci is fixed at *r*_{LR} = 0.0002, and the same set of initial conditions is used. Note that this plot is symmetric about the plane *r* = 0; we return to this point later in the article. We stress that our conclusions described above do not depend on the particular values of neutral marginal allele frequencies *p*_{L}(0) and *p*_{R}(0) used for illustration. Even for low values of *p*_{L}(0) and *p*_{R}(0), for example, the same conclusions hold.

*C*

_{LR}between the neutral loci as a function of the position of the selected locus, for

*r*

_{LR}= 0.0002 and

*s*= 0.01. Here,

*r*is the position of the selected locus S, and the value

*r*= 0 corresponds to the midpoint between the

**...**

An alternative illustration of the above discussion is provided in Figure 4, which shows pairwise LD plots for a region containing 100 neutral loci and a single selected locus. The two plots shown correspond to two different time points. The selected locus is located in the middle of the region and LD values below a cutoff value are not plotted.

### The maximum LD at *t*_{f}:

Consider the case in which the selected locus S is to the right of locus R. Viewing *C*_{LR}(*t*_{f}) as a function of *r*_{RS}, whether a local optimum in the domain exists depends on initial conditions. The example shown in Figure 3 has a local maximum at *r*_{RS}/*s* ≈ 0.039. Differentiating the analytic solution (35), it is possible to determine whether there is a critical point *r*_{RS} = *r**_{RS} that satisfies

Suppose that, at *t* = 0, the new gamete carrying a selected allele (of type 1) at locus S is of type *ij*1, with *i* being the allele at locus L and *j* the allele at locus R. Then, given that [δ_{i1} − *p*_{L}(0)][δ_{j1} − *p*_{R}(0)] > *C*_{LR}(0), where δ_{ab} is 1 if *a* = *b* or 0 if *a* ≠ *b*, we obtain

and

The value of *r**_{RS} in (41) may be very large for some initial conditions. In such a case, as our analytic solution (35) is valid only in the domain , all we can say for sure is that *C*_{LR}(*t*_{f}) has no critical point in the domain [*i.e.*, *C*_{LR}(*t*_{f}) is either a monotonically increasing or a monotonically decreasing function of *r*_{RS} in that domain]. Further, if [δ_{i1} − *p*_{L}(0)][δ_{j1} − *p*_{R}(0)] ≤ *C*_{LR}(0), there is no real-valued *r**_{RS} such that our approximate analytic solution (35) satisfies (40). Hence, noting that *C*_{LR}(*t*_{f}) approaches as *r*_{RS}/*s* increases, we conclude that, in the domain ,

where

and . Note that the maximum possible value of *X* is . The critical point *r*_{LS} = *r**_{LS} for the case in which the selected locus is to the left of locus L is also given by (41).

### Invariance of *C*_{LR} and *C*_{LRS} when the selected locus is between the two neutral loci:

When the favored mutation occurs between the two neutral sites, the dynamics of the system of truncated recursions for *C*_{LR} and *C*_{LRS} do not depend on the position of the selected locus. This can be seen immediately from Equations 17 and 18, which depend only on the sum *r*_{LS} + *r*_{RS} and not on any individual recombination parameter. We show in the appendix that this invariance also (nearly) holds for the system of full recursions.

## INITIAL CONDITIONS AND THE PARAMETER SPACE OF THE MODEL

Here we assume that the selected locus is to the right of locus R. For such a case, recall that the LD (measured with respect to type 1 alleles) between the neutral loci is given by (35). We use *ijk* to denote gametic types, with *i* being for locus L, *j* for locus R, and *k* for locus S. By *the gamete of origin*, we mean the new gamete at *t* = 0 carrying a selected allele (of type 1) at locus S. We include *ij* in superscript [*i.e.*, we write ] if the gamete of origin is of type *ij*1. Using (35), we obtain

where, as before, δ_{ab} is 1 if *a* = *b* or 0 if *a* ≠ *b*, and α(*t*, *y*) is defined as

Recall that marginal (type 1) allele frequencies *p*_{L}(*t*) and *p*_{R}(*t*) at the neutral loci are given by (27) and (28), respectively. For and , we can use the approximation

from which it follows that

where α(*t*_{f}, *r*_{RS}/*s*) is defined as in (44). Similar to , we use and to denote *p*_{L}(*t*) and *p*_{R}(*t*), respectively, if the gamete of origin is of type *ij*1. It is straightforward to show that

### Frequency-averaged LD and the range of the hitchhiking effect:

In what follows, we use *x _{ijk}* to denote the frequency of the gametic type

*ijk*at time

*t*= 0 and define

*x*

_{ij}_{·}=

*x*

_{ij}_{0}+

*x*

_{ij}_{1}. The type of the gamete of origin could be any of 001, 011, 101, and 111. Suppose that the probability of the gamete of origin being of type

*ij*1 is equal to the frequency of the gametic type

*ij*0 just before time

*t*= 0 (note that this frequency is equal to

*x*

_{ij}_{·}). Then, the average value of

*C*

_{LR}(

*t*) with respect to this probability is given by , which we call a

*frequency-averaged*LD. We show below that, contrary to people's common intuition, the effect of selection on such an averaged LD does not depend on haplotype diversity

*x*

_{ij}_{·}at

*t*= 0.

Using (43), we can show that is given by

If *C*_{LR}(0) = 0, then for all *t*. For *C*_{LR}(0) ≠ 0, we define

Note that *r*_{LR} need not be much smaller than *s* for our analytic solution (35) to be valid (recall that only is required). Assuming , we can ignore genetic drift and regard as the behavior of LD under neutrality. Thus, *A*(*t*, *r*_{RS}/*s*) can be viewed as the ratio of the frequency-averaged LD in the presence of selection to that in the absence of selection. At the time *t*_{f} of fixation, *p*_{S}(*t*_{f}) = 1 − *p*_{S}(0) and therefore

For given *r*_{RS}/*s*, *A*(*t*_{f}, *r*_{RS}/*s*) depends only on *p*_{S}(0) = 1/(2*N*); it has no dependence on other initial conditions. A plot of *A*(*t*_{f}, *r*_{RS}/*s*) is shown in Figure 5 for *p*_{S}(0) = 0.00005.

We now compare *A*(*t*_{f}, *r*_{RS}/*s*) with relative frequency-averaged heterozygosity. Let us focus on the right neutral locus R and define . For *H*_{R}(0) = 2*p*_{R}(0)[1 − *p*_{R}(0)] ≠ 0, one can use (46) to show that

For *p*_{S}(0) = 1/(2*N*), this is , which is equivalent to Equation 14d of Stephan *et al.* (1992) (the factor 2 in the exponent of that formula needs to be replaced by 4 because of the different definition of the selection coefficient). In the absence of genetic drift, (*t*_{f}) = *H*_{R}(0) for *s* = 0. Therefore, (48) can be regarded as the ratio of the frequency-averaged heterozygosity in the presence of selection to that in the absence of selection. Surprisingly, this ratio is exactly equal to the analogous ratio for LD shown in (47). The function *A*(*t*_{f}, *r*_{RS}/*s*) plays a special role in the sense that it encodes the effect of selection on two different frequency-averaged quantities.

For site heterozygosity, the hitchhiking effect is generally profound only when, provided that , the recombination distance *r* between the selected and neutral sites satisfies *r* < 0.1*s* (Maynard Smith and Haigh 1974). The term determining this effect is (2*N*)^{−4r/s}, assuming that the initial frequency *p*_{S}(0) of the selected allele is 1/(2*N*). Our above analysis shows that the range of a substantial reduction of LD due to hitchhiking [determined by ] is exactly equal to that for variation [determined by (2*N*)^{−4r/s}] (see Figure 5).

### Characterization of equivalent initial conditions:

We now find the set of all initial conditions that lead to the same value of *C*_{LR} at the time of fixation; *i.e.*, *C*_{LR}(*t*_{f}) = *c*, where *c* is some fixed constant.

To be concrete, suppose that the gamete of origin is of type 001, in which case *x*_{101} = *x*_{011} = *x*_{111} = 0 and *x*_{001} = *p*_{S}(0) = 1/(2*N*). First, note that *x*_{000} + *x*_{010} + *x*_{100} + *x*_{110} + *p*_{S}(0) = 1 implies

which defines a tetrahedron Δ as depicted in Figure 6a. Second, using (43), one can show that implies

which defines a surface Ξ in a three-dimensional Euclidean space with (*x*_{110}, *x*_{010}, *x*_{100}) as coordinates. The intersection of surface Ξ with tetrahedron Δ, illustrated in Figure 6b, corresponds to the set of initial conditions such that . A case in which the gamete of origin is of type other than 001 can be handled in a similar vein.

### Probability distributions of *C*_{LR}(*t*_{f}):

Recall that *C*_{LR}(*t*) for *t* > 0 depends on initial conditions. In what follows, we regard initial gametic frequencies as being random and consider the probability distribution of *C*_{LR}(*t*_{f}). The squared correlation coefficient *R*^{2}(*t*_{f}) is addressed later in the discussion. We assume that all initial gametic frequency configurations *x*_{000}, *x*_{010}, *x*_{100}, *x*_{110} are equally likely and satisfy *x*_{000} + *x*_{010} + *x*_{100} + *x*_{110} + *p*_{S}(0) = 1. Under this assumption of uniform distribution, it is possible to compute the probability distribution *P*[*C*_{LR}(*t*_{f}) < *c*] for fixed *r*_{LR}/*s* and *r*_{RS}/*s*. The key idea is to utilize the characterization of equivalent initial conditions described above. More precisely, as *c* changes, the surface defined by *C*_{LR}(*t*_{f}) = *c* changes in a smooth fashion, sweeping out a region in three dimensions. The probability *P*[*C*_{LR}(*t*_{f}) < *c*] is equal to the volume of the region corresponding to *C*_{LR}(*t*_{f}) < *c* inside Δ, normalized by the total volume of Δ.

Our main result, illustrated in Figure 7, is

where is defined below. Let

Cases with *c* > 0 and *c* < 0 are treated separately below.

*P*[ <

*c*], with

*r*

_{LR}/

*s*= 0.02 and 1/(2

*N*) = 0.00005. As

*r*

_{RS}/

*s*increases,

**...**

#### For c > 0:

If *c* ≤ *b*/4, , and *c* ≤ *ab*(1 − *a*), then

If *c* ≤ *b*/4, , and *c* > *ab*(1 − *a*), then

If either *c* > *b*/4 or , then P[*C* < *c*] = 1.

#### For c = −|c| < 0:

If |*c*| ≤ (1 − *a*)^{2}*b*/4, then

If |*c*| > (1 − *a*)^{2}*b*/4, then *P*[*C* < −|*c*|] = 0.

### Polarization:

Polarized LDs are measured with respect to major alleles. To determine the polarized LD *C*_{ω}(*t*_{f}) between the neutral loci, we compute

the main point being that *C*_{ω}(*t*_{f}) = *C*_{LR}(*t*_{f}) if σ > 0 and *C*_{ω}(*t*_{f}) = −*C*_{LR}(*t*_{f}) if σ < 0. [Recall that *C*_{LR}(*t*) is measured with respect to type 1 alleles.]

First, for *r*_{LR} = *r*_{RS} = 0, note that

Second, for fixed initial marginal frequencies, we need to determine for what values of *r*_{LR} and *r*_{RS} the sign of σ changes. Using (45) and (46), we can obtain the following results:

- For
*i*= 0, σ ≈ 0 if*p*_{L}(0) > and(53) - For
*i*= 1, σ ≈ 0 if*q*_{L}(0) > and(54) - For
*j*= 0, σ ≈ 0 if*p*_{R}(0) > and(55) - For
*j*= 1, σ ≈ 0 if*q*_{R}(0) > and(56)

Combined with (52), these equations determine completely whether *C*_{ω}(*t*_{f}) = *C*_{LR}(*t*_{f}) or *C*_{ω}(*t*_{f}) = −*C*_{LR}(*t*_{f}) for given parameter values. For example, suppose that (*i*, *j*) = (0, 0). If *p*_{L}(0) < and *p*_{R}(0) < , then there is no real-valued solution to the condition σ = 0, and therefore *C*_{ω}(*t*_{f}) = *C*_{LR}(*t*_{f}) for all values of *r*_{LR} and *r*_{RS}. If *p*_{L}(0) > and *p*_{R}(0) < , or if *p*_{L}(0) < and *p*_{R}(0) > , then σ changes sign as illustrated in Figure 8, a and b, where

and

Note that *u* takes its minimum value at *p*_{L}(0) = 1 and that as . Similarly, *v* takes its minimum value at *p*_{R}(0) = 1 and as . If both *p*_{L}(0) > and *p*_{R}(0) > , then there are three possibilities, depicted in Figure 8, c–e.

### Regions of positive *C*_{ω}(*t*_{f}):

To determine the regions of positive *C*_{ω}(*t*_{f}), we need to know how the sign of depends on *r*_{RS}; (43) implies that the sign of does not depend on *r*_{LR}. For concreteness, suppose that (*i*, *j*) = (0, 0), in which case we can obtain the following results from using (43):

- If
*C*_{LR}(0) ≥ 0, then ≥ 0 for all*r*_{RS}, and therefore the sign of*C*_{ω}(*t*_{f}) is completely determined by that of σ.

Note that *w* takes its minimum value of zero at |*C*_{LR}(0)| = *p*_{L}(0)*p*_{R}(0) and that it increases monotonically as |*C*_{LR}(0)|/[*p*_{L}(0)*p*_{R}(0)] decreases. The polarized LD *C*_{ω}(*t*_{f}) is positive if and only if and σ are either both positive or both negative. Examples are shown in Figure 10.

*C*

_{ω}(

*t*

_{f}) when the gamete of origin is of type 001 and

*C*

_{LR}(0) < 0. Note that

*C*

_{ω}(

*t*

_{f}) is positive if and only if and σ are either both positive or both negative. In general,

*C*

_{ω}

**...**

As shown in Figure 8, σ tends to be positive in the neighborhood of (*r*_{LR}, *r*_{RS}) = (0, 0). The size and shape of this neighborhood depend on *u* and *v*. Likewise, as shown in Figure 9, tends to be positive in the neighborhood of (*r*_{LR}, *r*_{RS}) = (0, 0), with the size of the neighborhood depending on *w*. As a consequence, the polarized LD *C*_{ω}(*t*_{f}) also tends to be positive near (*r*_{LR}, *r*_{RS}) = (0, 0).

More generally, if the gamete of origin is of type *ij*1, the sign of (respectively, σ) can be analyzed using (43) [respectively, (52)–(56)]. For all *ij*, the polarized LD *C*_{ω}(*t*_{f}) tends to be positive near (*r*_{LR}, *r*_{RS}) = (0, 0).

### An exact symmetry when the selected locus is outside the two neutral loci:

Suppose that the selected locus is outside the two neutral loci and that geometric configuration and recombination fractions are fixed. Let {*p*_{S}(0), *p*_{L}(0), *p*_{R}(0), *C*_{LS}(0), *C*_{RS}(0), *C*_{LR}(0), *C*_{LRS}(0)} and {*p*′_{S}(0), *p*′_{L}(0), *p*′_{R}(0), *C*′_{LS}(0), *C*′_{RS}(0), *C*′_{LR}(0), *C*′_{LRS}(0)} denote two different sets of initial conditions. At generation *n* > 1, we use “prime” to refer to the marginal allele frequencies and LDs obtained using the second set of initial conditions. In the appendix, we show that if *C*_{LS}(0) = *C*′_{RS}(0) and *C*_{RS}(0) = *C*′_{LS}(0), while *p*_{S}(0) = *p*′_{S}(0), *C*_{LR}(0) = *C*′_{LR}(0), and *C*_{LRS}(0) = *C*′_{LRS}(0), then the system of full recursions (1)–(11) implies

for all *n* ≥ 1. This is an exact symmetry result that holds for an arbitrary dominance coefficient *h*.

An application of this general result is the explanation of the symmetry of Figure 3 with respect to reflection about the *r* = 0 plane, for those regions corresponding to the selected locus being outside the neutral loci. Note that what is depicted in Figure 3 is different from the obviously symmetric case in which initial conditions *C*_{LS}(0) and *C*_{RS}(0) get exchanged when locus S is reflected about *r* = 0. In that figure, initial conditions remain fixed, while the geometric configuration of the loci and recombination fractions change upon reflection. That situation is related to changing initial conditions as described above, while keeping the geometric configuration and recombination fractions fixed.

## DISCUSSION

To understand the forces that shape genomic variation in natural populations and the divergence between species, observed patterns must be compared to predictions of the models that faithfully represent the mechanisms through which such forces may work. While much of the natural selection of organismic phenotypes may be effectively approximated by deterministic single-locus equations, interactive and stochastic forces are thought to play a significant role. Until recently genetic drift has been considered the primary stochastic process determining the temporal, geographic, and genomic distribution of the vast majority of DNA sequence polymorphism and divergence. Gillespie has repeatedly demonstrated and emphasized fundamental differences between constant-fitness models and stochastically varying selection, despite their superficial similarities (Gillespie 1994). Recently emerging results of surveys of genomic regions of low crossing over per physical length indicate that linked selection rather than genetic drift can dominate the levels of polymorphism within populations (Aguadé *et al.* 1989; Stephan and Langley 1989; Begun and Aquadro 1992). The hitchhiking effect not only reduces the average level of heterozygosity in the surrounding genomic regions, but it also leaves a skewed frequency spectrum (Braverman *et al.* 1995). The early study by Thomson (1977) indicated that linked selection can create linkage disequilibrium. Several subsequent articles have addressed specific cases (Robinson *et al.* 1991; Grote *et al.* 1998) or noted some temporal and spatial patterns (Kim and Nielsen 2004). Here we have demonstrated that the hitchhiking effect involves a number of strong and surprisingly distinct dynamics and patterns of linkage disequilibrium. We believe that the approach we have taken to address the impact of selection can be extended further to address more complex selection schemes and genetic interactions.

The technological capacity of molecular population genomics is increasing rapidly. For example, the HapMap Project (International HapMap Consortium 2005) provides extensive genotypic survey results on >1 million SNPs in almost 300 individual humans. At this scale of observation one can anticipate much more powerful inferences about the role of direct selection, linked selection, crossing over, gene conversion, mutation, and geographic demography. Indeed, on the basis of such new data, genomic variation in the rate of crossing over has been proposed as the primary determinant of the patterns of linkage disequilibrium in human populations (McVean *et al.* 2004).

Several representations/notations have been developed to analyze the dynamics of multilocus systems (Bürger 2000). Through a series of articles Barton and Turelli have elaborated and applied their method on the basis of the explicit representations of the moments of allele frequencies (Barton and Turelli 1987, 1991; Turelli and Barton 1990, 1994). Their representation proved surprisingly tractable and transparent in the analysis of the hitchhiking effect on linkage disequilibrium.

Our analysis begins with the full representation of the three-locus dynamics using the notation of Barton and Turelli. These equations suggest the familiar approximation, “truncated equations,” in which and small higher-order terms can be dropped. The truncated equations immediately expose much of the fundamental structure and their differential analogs, ordinary differential equations, allow approximate analytic solutions. Comparisons of these ODE dynamics with those of the Barton and Turelli representation indicate that the approximations remain quite accurate as long as (see Tables 1 and and2).2). Particularly fortuitous and important is the role of the three-locus LD *C*_{LRS} in driving the dynamics of the LD *C*_{LR} between the two linked neutral loci and the dependence of *C*_{LRS} on the product of the two-locus LDs *C*_{LS} and *C*_{RS}.

This systematic investigation of the dynamics of LD under hitchhiking reveals four important features. First, and quite generally, hitchhiking indeed generates LD during the initial half of the hitchhiking time course. As Figure 3 shows LD (positive in this instance) reaches a maximum shortly before the originally rare selected allele reaches 0.5. This result is consistent with Thomson's analysis of hitchhiking caused by the dynamics of an initially rare allele under balancing selection in that its frequency reaches an equilibrium closer to 0.5 than to 1.0. But what is truly surprising is that from several important perspectives the hitchhiking effect on LD is one of reduction. In Figure 3 it is obvious that in the second half of the hitchhiking period the large peak of LD (positive in this case) decreases rapidly. Figure 2 shows several configurations of initial conditions and demonstrates that the decline in the magnitude of LD is not attributable to decline in the heterozygosity at the two neutral loci. A second and striking result is that preexisting LD is completely destroyed when the selected locus is situated between the neutral sites. This geometric relationship produces a striking pattern when all pairwise associations are plotted together as in Figure 4. This is probably the mechanism behind the pattern noted by Kim and Nielsen (2004). This LD-reducing effect of hitchhiking is also evident when the selected site is outside the neutral pair since much of the LD generated during the initial phase is destroyed in the latter phase. A third unexpected property of the hitchhiking on LD is that the averaging over the frequencies of the gametes with which the rare selected variant can be associated indicates that the net effect of hitchhiking would be to reduce preexisting average LD. This is despite the fact that hitchhiking does tend to increase the variance in LD (see below). Note in Figure 3 that there is considerably increased LD in both regions flanking the two neutral sites (*i.e*., when the selected site is outside and is close to the two neutral sites). When the rare favored allele appears on two of the other three haplotypes (10 or 01) the final LD is strongly negative. Thus the average (weighted by the frequencies of the four gametes) will remain at zero if there is no LD and tend toward zero if initially different from zero. The rate of approach to zero is greater than or equal to that expected in the absence of hitchhiking.

The fourth notable LD hitchhiking effect is on the expected LD when this association is polarized by the marginal allele frequencies. Langley and Crow (1974) noted that with molecular polymorphism data the sign of LD is typically arbitrary. They proposed to orient LD such that it reflected the deviation for the expected most frequent gametic type and demonstrated that under quadratic stabilizing selection this measure of LD, denoted *C*_{ω}, is negative. Under hitchhiking the average *C*_{ω} tends to be positive. This can be understood as the consequence of the fact that the neutral alleles at each site on the initially selected haplotype tend to rise to frequencies >0.5 and the LD between those alleles is positive. A bias in the distribution of *C*_{ω} either regionally or across the genome could be interpreted as evidence that hitchhiking is shaping LD. Thus the frequency-averaged hitchhiking effect on LD is to drive it to zero. But as shown in Figure 11a there is a bias with respect to marginal frequencies at the two neutral sites; *C*_{ω}(*t*_{f}) tends to be positive for small *r*/*s*. And, of course, there is a broad range of *r* in which the variance of LD is increased when the selected site is outside the two neutral sites. Figure 11b shows that the projection of the probability distribution of the squared correlation coefficient *R*^{2}(*t*_{f}) also has a peak for small *r*/*s*, near 0.02.

*C*

_{ω}(

*t*

_{f}) and the squared correlation coefficient

*R*

^{2}(

*t*

_{f}), obtained from numerical simulations using

*r*

_{LR}/

*s*= 0.02 and 1/(2

*N*) = 0.00005. The polarized LD

*C*

_{ω}(

*t*

_{f}) tends to be positive

**...**

The genomic scale over which hitchhiking has a significant effect on heterozygosity and the frequency spectrum has been considered previously. Beyond the obvious inherent in the approximation, Stephan *et al.* (1992) showed that the reduction in heterozygosity was approximately proportional to 1 − (2*N*)^{−4r/s}. Simulations of Braverman *et al.* (1995) indicated a similar scale and shape to the skewness in the frequency spectrum as measured by Tajima's *D* (also see Durrett and Schweinsberg 2005). In studying the expectation of the linkage disequilibrium caused by hitchhiking we note a striking common function *A*(*t*, *r*/*s*) that relates the averages of both heterozygosity and LD to what they would be in a large population in the absence of hitchhiking. We are tempted to speculate that this simple function may be fundamental to average dynamics of other moments of allele frequencies under the hitchhiking scenario.

While the effect of hitchhiking on the average *C*_{LR} is to drive it toward zero, this is clearly not expected for , or the absolute value of *C*_{LR}. We have not obtained an analytic expression for such expectations but simulated results such as those shown in Figures 3, ,4,4, and and1111 indicate that the hitchhiking effect on magnitude of *C*_{LR} between neutral sites on the same side of the selected site can be substantial.

Our results were derived using a deterministic three-locus model of hitchhiking. Similar results hold for the pseudo-hitchhiking model (Gillespie 2000 and J. Gillespie, unpublished results). We have compared both models. The recursion equations of the pseudo-hitchhiking model are a good approximation of the dynamics of the three-locus model if the selected locus is outside the two neutral loci and the distance between the selected locus and either one of the neutral loci is much larger than the distance between the two neutral loci. In this parameter region, LD predicted by both models decays more quickly than under neutrality. How might these conclusions about the theoretical hitchhiking dynamics of LD influence the interpretation of population genomic polymorphism and divergence? Certainly it seems to inform any effort to identify regions in the genomes of natural populations in which there has been very recent selected substitution of newly arising mutations or otherwise rare variants. Hitchhiking may not increase LD in the neighborhood of a selected site as it has been widely thought; rather it can decrease it especially when the neutral sites are on opposite sides of the selected locus (see Figure 4). More generally LD that is built up by hitchhiking shortly after the occurrence of a favored mutation is quickly destroyed (even before fixation is reached). As a consequence, genomic regions around targets of recent positive directional selection are expected to exhibit a lack of LD, which is not simply due to the variation-reducing force of hitchhiking. This local dip in the magnitude of LD may be of use in the localization of targets of positive selection in the genome. Given the current debate of how various variation-reducing forces can be distinguished (in particular, bottlenecks from selective sweeps; Glinka *et al.* 2003; Haddrill *et al.* 2005), there is merit in attempting to include the specific pattern of LD predicted by these analyses into the methods for identifying targets of selection by selective sweeps (*e.g.*, Kim and Stephan 2002; Kim and Nielsen 2004). Because populations that have undergone population size bottlenecks should show elevated genomewide levels of LD, regions lacking LD around targets of selection may be more easily distinguishable from the rest of the loci than when statistics that are solely based on the reduction of variation are used.

We have not attempted to extend our results to situations in which recurring and genomically randomly distributed hitchhiking events occur. The significant impediment to the analysis of the effect of such recurrent hitchhiking on heterozygosity may be the impact of simultaneous events within the same genomic region. But if selection is strong and events are sufficiently rare such occurrences may be negligible (Kaplan *et al.* 1989; Durrett and Schweinsberg 2005). While this issue of the dynamic interaction of simultaneous linked hitchhiking events may well remain for the analysis of the impact of hitchhiking on LD, there is clearly a second considerable issue. While in large populations the heterozygosity does not change in between hitchhiking events, that is not true of LD, which, of course, decays in magnitude at rate *r*. If the rate of the recurrent and randomly distributed hitchhiking events were sufficiently rare and there were no other force causing LD, the results given above are applicable, since LD would decay to zero throughout the genomic region before the next event. Given that LD is, in fact, commonly present on some scale in the various studied species, further analysis and/or simulations are warranted to make a general prediction of the genomic pattern.

## Acknowledgments

We thank Michael Turelli for teaching us the multilocus formalism and John Gillespie for supplying us with an unpublished manuscript on pseudo-hitchhiking. We acknowledge support from the following sources: the Volkswagen-Foundation grant I/78 815 (W.S.), the National Science Foundation grants ELA-0220154 (Y.S.S. and C.H.L.) and IIS-0513910 (Y.S.S. and C.H.L.), and the National Human Genome Research Institute (National Institutes of Health) grants 5R01HG002107-03 and 5R01HG002942-02 (C.H.L.).

## APPENDIX

#### Quasi-invariance (embedded selected locus case):

Here we examine in more detail the case where the selected locus is between the neutral loci. More exactly, we wish to keep the sum *r*_{LS} + *r*_{RS} fixed to some value, say ρ, and consider varying *r*_{LS} and *r*_{RS} while satisfying that condition. To avoid being long-winded, we call this kind of translation of the selected locus a *constrained S-translation*. We wish to show that the dynamics of certain linkage disequilibria are *quasi-invariant*, as we clarify presently, under the constrained *S*-translation.

First, note that the dynamics of *p*_{S} do not depend at all on the position of the selected locus. Then, as *r*_{LR} = *r*_{LS} + *r*_{RS} for the case under consideration, recursions (6) and (10) do not change under the constrained *S*-translation. Since *r*_{LS,R} = *r*_{RS} and *r*_{RS,L} = *r*_{LS}, we have *r*_{LS,R} + *r*_{RS,L} = *r*_{RS} + *r*_{LS} and *r*_{LRS} = *r*_{LR,S} + *r*_{RS} + *r*_{LS}. If no double crossovers are allowed, then *r*_{LR,S} is identically zero. Note that and that the sum *g*(*r*_{LS}) + *g*(*r*_{RS}) does not change under the constrained *S*-translation. Therefore, recursions (7) and (11) do not change under the constrained *S*-translation, as long as *r*_{LR,S} does not depend on where in between the neutral loci the selected locus is located.

We now turn to the product *C*_{LS}*C*_{RS}. For ease of notation, we define

Then, (4), (5), (8), and (9) imply

The product *C*_{LS}*C*_{RS} satisfies the recursion

The quantity *f*(*r*_{LS}; *k*)*f*(*r*_{RS}; *k*) can be written as

where

Under the restriction that *r*_{LS} + *r*_{RS} = ρ, the maximum value of *r*_{LS}*r*_{RS} is ρ^{2}/4, whereas the minimum is 0. Since γ(*k*) is positive definite, the maximum variation of *C*_{LS}(*n*)*C*_{RS}(*n*), as the selected locus moves between the neutral loci, can be obtained by comparing *C*_{LS}(*n*)*C*_{RS}(*n*) at *r*_{LS}*r*_{RS} = 0 with that at *r*_{LS}*r*_{RS} = ρ^{2}/4. Define *maximal relative variation* (*n*) as

It is straightforward to show that

where “…” represents terms proportional to ρ* ^{m}*,

*m*≥ 4. In the case of directional selection, γ(

*k*)/(α(

*k*) + ρβ(

*k*)) is of order 1 for all values of

*s*,

*h*,

*p*

_{S}(

*k*), and ρ. Therefore, (

*n*) =

*O*(ρ

^{2}

*n*), and we conclude that relative variation increases as time passes.

The dynamics of *C*_{LR} and *C*_{LRS} are almost (or quasi-) invariant under the constrained *S*-translation in the following sense: the range of ρ in which selection has observable influence on the dynamics of *C*_{LR} and *C*_{LRS} is where . In that case, it is possible to maintain throughout the entire period from the initial generation to the fixation generation. We would then observe almost no variation in *C*_{LR} or *C*_{LRS} as the location of the selected locus is varied between the neutral loci. For large ρ, selection has little influence on *C*_{LR} and *C*_{LRS}, so their dynamics should be approximately invariant under translation of the selected locus.

#### An exact symmetry:

Suppose that the selected locus is outside the two neutral loci and that recombination fractions *r*_{LS}, *r*_{RS}, *r*_{LR}, *r*_{LRS}, *r*_{LR,S}, *r*_{RS,L}, and *r*_{LS,R} appearing in the system of full recursions (1)–(11) are fixed. In what follows, the dominance coefficient *h* is assumed to be arbitrary. Let {*p*_{S}(0), *p*_{L}(0), *p*_{R}(0), *C*_{LS}(0), *C*_{RS}(0), *C*_{LR}(0), *C*_{LRS}(0)} and {*p*′_{S}(0), *p*′_{L}(0), *p*′_{R}(0), *C*′_{LS}(0), *C*′_{RS}(0), *C*′_{LR}(0), *C*′_{LRS}(0)} denote two different sets of initial conditions. At generation *n* > 1, we use “prime” to refer to the allele frequencies and LDs obtained using the second set of initial conditions.

We first consider the second-order LDs involving the selected locus.

Lemma 1. *Suppose that C*_{LS}(0) = *C*′_{RS}(0) *and C*_{RS}(0) = *C*′_{LS}(0). *Then, for all n* ≥ 1,

*Proof*. This result follows from induction on *n*. Recall that

where the function *f*(*r*, *k*) is defined as in (A1). Similarly,

If *C*_{LS}(0) = *C*′_{RS}(0) and *C*_{RS}(0) = *C*′_{LS}(0), then

Suppose that the claim is true for all 1 ≤ *n* ≤ *k*. Then, for *n* = *k* + 1,

where the third line follows from the induction hypothesis.

Using the above lemma, we can obtain the following result regarding the third-order LD and the LD between the neutral loci:

Proposition 1. *Suppose that p*_{S}(0) = *p*′_{S}(0), *C*_{LS}(0) = *C*′_{RS}(0), *C*_{RS}(0) = *C*′_{LS}(0), *C*_{LR}(0) = *C*′_{LR}(0), *and C*_{LRS}(0) = *C*′_{LRS}(0). *Then,*

*for all n* ≥ 1.

*Proof.* First, note that *p*_{S}(0) = *p*′_{S}(0) implies *p*_{S}(*n*) = *p*′_{S}(*n*), , and for all *n* ≥ 1. Therefore, since *C*′_{LS}*C*′_{RS} = *C*_{LS}*C*_{RS} by Lemma 1, we obtain

which is equivalent to (10), and

which is equivalent to (6). Similarly,

and

are equivalent to (11) and (7), respectively. Hence, *C*′_{LR} and *C*′_{LRS} satisfy exactly the same set of recursions as do *C*_{LR} and *C*_{LRS}. Since *C*′_{LR}(0) = *C*_{LR}(0) and *C*′_{LRS}(0) = *C*_{LRS}(0), it thus follows that *C*′_{LR}(*n*) = *C*_{LR}(*n*) and *C*′_{LRS}(*n*) = *C*_{LRS}(*n*) for all *n* ≥ 1.

## References

- Aguadé, M., N. Miyashita and C. H. Langley, 1989. Reduced variation in the
*yellow-achaete-scute*region in natural populations of*Drosophila melanogaster.*Genetics 122**:**607–615. [PMC free article] [PubMed] - Barton, N. H., and M. Turelli, 1987. Adaptive landscapes, genetic distance and evolution of quantitative characters. Genet. Res. 49
**:**157–173. [PubMed] - Barton, N. H., and M. Turelli, 1991. Natural and sexual selection on many loci. Genetics 127
**:**229–255. [PMC free article] [PubMed] - Begun, D. J., and C. F. Aquadro, 1992. Levels of naturally occurring DNA polymorphism correlate with recombination rates in
*D. melanogaster.*Nature 356**:**519–520. [PubMed] - Braverman, J. M., R. R. Hudson, N. L. Kaplan, C. H. Langley and W. Stephan, 1995. The hitchhiking effect on the site-frequency spectrum of DNA polymorphism. Genetics 140
**:**783–796. [PMC free article] [PubMed] - Bürger, R., 2000.
*The Mathematical Theory of Selection, Recombination, and Mutation.*John Wiley & Sons, Chichester, UK. - Durrett, R., and J. Schweinsberg, 2005. A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl. 115
**:**1628–1657. - Fisher, R. A., 1930.
*The Genetical Theory of Natural Selection.*Clarendon Press, Oxford. - Gillespie, J. H., 1994.
*The Causes of Molecular Evolution.*Oxford University Press, Oxford. - Gillespie, J. H., 1997. Junk ain't what junk does: neutral alleles in a selected context. Gene 205
**:**291–299. [PubMed] - Gillespie, J. H., 2000. Genetic drift in an infinite population: the pseudo-hitchhiking model. Genetics 155
**:**909–919. [PMC free article] [PubMed] - Glinka, S., L. Ometto, S. Mousset, W. Stephan and D. De Lorenzo, 2003. Demography and natural selection have shaped genetic variation in
*Drosophila melanogaster*: a multilocus approach. Genetics 165**:**1269–1278. [PMC free article] [PubMed] - Grote, M., W. Klitz and G. Thomson, 1998. Constrained disequilibrium values and hitchhiking in a three-locus system. Genetics 150
**:**1295–1307. [PMC free article] [PubMed] - Haddrill, P. R., K. R. Thornton, B. Charlesworth and P. Andolfatto, 2005. Multilocus patterns of nucleotide variability and the demographic and selection history of
*Drosophila melanogaster*populations. Genome Res. 15**:**790–799. [PMC free article] [PubMed] - International HapMap Consortium, 2005. A haplotype map of the human genome. Nature 437
**:**1299–1320. [PMC free article] [PubMed] - Kaplan, N. L., R. R. Hudson and C. H. Langley, 1989. The “hitchhiking effect” revisited. Genetics 123
**:**887–899. [PMC free article] [PubMed] - Kim, Y., and R. Nielsen, 2004. Linkage disequilibrium as a signature of selective sweeps. Genetics 167
**:**1513–1524. [PMC free article] [PubMed] - Kim, Y., and W. Stephan, 2002. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics 160
**:**765–777. [PMC free article] [PubMed] - Kimura, M., 1983.
*The Neutral Theory of Molecular Evolution.*Cambridge University Press, Cambridge, UK. - Langley, C. H., and J. F. Crow, 1974. The direction of linkage disequilibrium. Genetics 78
**:**937–941. [PMC free article] [PubMed] - McVean, G. A. T., S. R. Myers, S. Hunt, P. Deloukas, D. R. Bentley
*et al.*, 2004. The fine-scale structure of recombination rate variation in the human genome. Science 304**:**581–584. [PubMed] - Robinson, W. P., A. Cambon-Thomsen, N. Borot, W. Klitz and G. Thomson, 1991. Selection, hitchhiking and disequilibrium analysis at three linked loci with application to HLA data. Genetics 129
**:**931–948. [PMC free article] [PubMed] - Maynard Smith, J., and J. Haigh, 1974. The hitch-hiking effect of a favourable gene. Genet. Res. 23
**:**23–35. [PubMed] - Stephan, W., and C. H. Langley, 1989. Molecular genetic variation in the centromeric region of the X chromosome in three
*Drosophila ananassae*populations. I. Contrasts between the*vermilion*and*forked*loci. Genetics 121**:**89–99. [PMC free article] [PubMed] - Stephan, W., T. H. E. Wiehe and M. W. Lenz, 1992. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor. Popul. Biol. 41
**:**237–254. - Thomson, G., 1977. The effect of a selected locus on linked neutral loci. Genetics 85
**:**753–788. [PMC free article] [PubMed] - Turelli, M., and N. H. Barton, 1990. Dynamics of polygenic characters under selection. Theor. Popul. Biol. 38
**:**1–57. - Turelli, M., and N. H. Barton, 1994. Genetic and statistical analyses of strong selection on polygenic traits. What, me normal? Genetics 138
**:**913–941. [PMC free article] [PubMed] - Wright, S., 1931. Evolution in Mendelian populations. Genetics 16
**:**97–159. [PMC free article] [PubMed]

**Genetics Society of America**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (588K)

- The dynamics of interlocus associations in the three locus hitchhiking model. 1. The three-way linkage disequilibrium function.[J Math Biol. 1986]
*Asmussen MA.**J Math Biol. 1986; 23(3):285-304.* - The dynamics of interlocus associations in the three-locus hitchhiking model. 2. The pairwise linkage disequilibrium between two neutral loci.[J Math Biol. 1986]
*Asmussen MA.**J Math Biol. 1986; 24(4):361-80.* - The effect of a selected locus on linked neutral loci.[Genetics. 1977]
*Thomson G.**Genetics. 1977 Apr; 85(4):753-88.* - On selecting markers for association studies: patterns of linkage disequilibrium between two and three diallelic loci.[Genet Epidemiol. 2003]
*Garner C, Slatkin M.**Genet Epidemiol. 2003 Jan; 24(1):57-67.* - Genetic hitchhiking.[Philos Trans R Soc Lond B Biol Sci. 2000]
*Barton NH.**Philos Trans R Soc Lond B Biol Sci. 2000 Nov 29; 355(1403):1553-62.*

- Background Selection as Baseline for Nucleotide Variation across the Drosophila Genome[PLoS Genetics. ]
*Comeron JM.**PLoS Genetics. 10(6)e1004434* - Assessing signatures of selection through variation in linkage disequilibrium between taurine and indicine cattle[Genetics, Selection, Evolution : GSE. ]
*Pérez O’Brien AM, Utsunomiya YT, Mészáros G, Bickhart DM, Liu GE, Van Tassell CP, Sonstegard TS, Da Silva MV, Garcia JF, Sölkner J.**Genetics, Selection, Evolution : GSE. 46(1)19* - Robust Forward Simulations of Recurrent Hitchhiking[Genetics. 2014]
*Uricchio LH, Hernandez RD.**Genetics. 2014 May; 197(1)221-236* - Measuring Natural Selection on Genotypes and Phenotypes in the Wild[Cold Spring Harbor symposia on quantitative...]
*Linnen CR, Hoekstra HE.**Cold Spring Harbor symposia on quantitative biology. 2009; 74155-168* - The impact of equilibrium assumptions on tests of selection[Frontiers in Genetics. ]
*Crisci JL, Poh YP, Mahajan S, Jensen JD.**Frontiers in Genetics. 4235*

- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- The Hitchhiking Effect on Linkage Disequilibrium Between Linked Neutral LociThe Hitchhiking Effect on Linkage Disequilibrium Between Linked Neutral LociGenetics. Apr 2006; 172(4)2647PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...