NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Frank SA. Immunology and Evolution of Infectious Disease. Princeton (NJ): Princeton University Press; 2002.

Cover of Immunology and Evolution of Infectious Disease

Immunology and Evolution of Infectious Disease.

Show details

Chapter 15Measuring Selection with Population Samples

Experimental evolution provides insight into kinetic and mechanistic aspects of parasite escape from host immunity. Such experimental studies clarify selective forces that influence change at certain amino acid sites. But experimental studies provide only a hint of what actually occurs in natural populations, in which selective pressures and evolutionary dynamics differ significantly from those in controlled laboratory studies. It is important to combine experimental insights with analyses of variation in natural populations. In this chapter, I discuss how population samples of nucleotide sequences provide information about natural selection of antigenic variation.

I focus on themes directly related to the goal of this book—the synthesis between different kinds of biological analyses. In particular, I show how analysis of population samples complements studies of molecular structure and experimental evolution. Several books and articles review the methods to analyze population samples and the many different types of applications (Kimura 1983; Nei 1987; Nee et al. 1995; Hillis et al. 1996; Li 1997; Page and Holmes 1998; Crandall 1999; Hughes 1999; Nei and Kumar 2000; Otto 2000; Rodrigo and Learn 2000; Yang and Bielawski 2000; Bush 2001; Overbaugh and Bangham 2001).

The first section describes how different kinds of natural selection cause different patterns of nucleotide substitutions. Thus, the pattern of nucleotide substitutions observed in a population sample of sequences can sometimes be used to infer the kind of selection. The simplest pattern concerns the number of nucleotide changes that cause an amino acid substitution (nonsynonymous) relative to the number of nucleotide changes that do not cause an amino acid substitution (synonymous). If natural selection does not affect the relative success of amino acid variants, then nonsynonymous and synonymous nucleotide substitutions occur at the same rate. An excess of nonsynonymous substitutions suggests that natural selection favored those changes, providing evidence for positive selection of amino acid replacements.

The second section presents two examples of positive selection on parasite antigens. The surface antigen Tams1 of the protozoan Theileria annulata induces a strong antibody response in cattle, its primary host. A sample of nucleotide sequences showed that strong positive selection occurred in a few small regions of the Tams1 antigen, suggesting that those regions have been under strong selection for escape from host immunity. The group A streptococci cause sporadic epidemics of "strep throat." Streptococcal inhibitor of complement (Sic) is the most variable protein of these bacteria. In a sample of 892 nucleotide sequences, 77 of 86 nucleotide changes caused amino acid substitutions, a large excess of nonsynonymous substitutions. Very strong natural selection by host antibodies apparently drives rapid change in Sic.

The third section continues with more examples of positive selection on parasite antigens. These examples improve on earlier studies by estimating the rates of synonymous and nonsynonymous nucleotide changes for each individual amino acid. This is important because an epitope often requires only one or two amino acid changes to escape from binding by a specific antibody or T cell. Identification of particular amino acid sites under strong selection can confirm predictions for the location of epitopes based on structural data and experimental analysis of escape mutants. Positively selected sites can also suggest the location of new epitopes not found by other methods and provide clues about which amino acid variants should be included in multicomponent vaccines.

The fourth section turns to recent studies of influenza A that correlate amino acid changes at positively selected sites with the subsequent success of the lineage. This correlation between substitutions and fitness provides an opportunity to predict future evolution—new variants arising at positively selected sites are predicted to be the progenitors of future lineages. Yearly influenza A isolates from 1983 to 1997 provided sequences on which to test this prediction method retrospectively. In nine of eleven years, the changes at positively selected sites predicted which lineage would give rise to the future influenza population.

The final section highlights some topics for future research.

15.1. Kinds of Natural Selection

Different kinds of natural selection leave different patterns of nucleotide substitutions. These patterns can be observed in a sample of sequences isolated from a population, allowing one to infer the type of selection.

Synonymous and Nonsynonymous Nucleotide Substitutions

The genetic code maps three sequential nucleotides (a codon) to a single amino acid or to a stop signal. The four different nucleotides combine to make 43 = 64 different codons. The 64 codons specify 20 different amino acids plus a stop signal, leading to an average of 64/21 ≈ 3 different codons for each amino acid or stop signal. This degenerate aspect of the code means that some nucleotide substitutions do not change the encoded amino acid or stop signal.

Nucleotide substitutions that do not cause an amino acid change are called synonymous; those that do change the encoded amino acid are called nonsynonymous. Synonymous substitutions do not affect the amino acid sequence and therefore should not be affected by natural selection of phenotype. By contrast, nonsynonymous substitutions can be affected by selection because they do change the encoded protein. If there is no selection on proteins, then the same forces of mutation and random sampling influence all nucleotide changes, causing the rate of nonsynonymous substitutions, dN, to equal the rate of synonymous substitutions, dS (Nei 1987; Li 1997; Page and Holmes 1998).

Positive and Negative Selection

When natural selection favors change in amino acids, the nonsynonymous substitution rate dN rises. Thus, dN > dS measured in a sample of sequences implies that natural selection has favored evolutionary change. This contribution of selection to the rate of amino acid change above the background measured by dS is called positive selection. Parasite epitopes often show signs of positive selection as they change to escape recognition by host immunity (Yang and Bielawski 2000).

By contrast, negative selection removes amino acid changes, preserving the amino acid sequence against the spread of mutations. Negative selection reduces the nonsynonymous substitution rate, causing dN < dS.

The great majority of sequences show negative selection, suggesting that most amino acid replacements are deleterious and are removed by natural selection. In cases where positive selection does occur, the nonsynonymous replacements often cluster on protein surfaces involved in some sort of specific recognition. In these positively selected proteins, amino acid sites structurally hidden from external recognition often show the typical signs of negative selection (see references in the introduction to this chapter).

Frequency-Dependent Selection

A rare antigenic variant has an advantage because it avoids immune memory in hosts induced by more common variants. This is an example in which the success of an allele depends on its frequency, a kind of frequency-dependent selection (Conway 1997).

Selection favoring rare types can cause two different patterns of evolutionary change. First, transient polymorphisms may arise, in which novel variants increase when rare and eventually dominate the population, driving out the previous variants. This reduces genetic variation at all nucleotide sites linked to the favored substitution.

Second, balanced polymorphisms may occur, in which rare variants increase but then are held in check as they rise in frequency. This protects genetic variants from extinction because they rise when rare but decline when common. Nucleotide sites linked to those sites under selection also enjoy protection against extinction because they receive a selective boost whenever they become rare. This increases genetic variation at all nucleotide sites linked to the site under selection. Thus, transient polymorphisms decrease genetic variation in sequences linked to a favored site, and balanced polymorphisms increase genetic variation in sequences linked to a favored site.

15.2. Positive Selection to Avoid Host Recognition

Many examples of positive selection come from genes involved in host-parasite recognition (Endo et al. 1996; Hughes 1999; Yang and Bielawski 2000). These sequence analyses provide information about how selection has shaped the structure and function of proteins. For example, one may combine analysis of positive selection with structural data to determine which sites are exposed to antibody pressure. In the absence of structural data, sequences can be used to predict which sites are structurally exposed and can change and which sites are either not exposed or functionally constrained. I briefly summarize a few cases in this section.

Theileria annulata

The tick-borne protozoan Theileria annulata causes disease in cattle (Gubbels et al. 2000). The surface antigen Tams1 induces a strong antibody response and has been considered a candidate for developing a vaccine. However, Tams1 varies antigenically; thus studies have focused on the molecular nature of the variability to gain further insight. The structure and function of Tams1 have not been determined.

Recently, Gubbels et al. (2000) analyzed a population sample of nucleotide sequences to predict which domains of Tams1 change in response to host immunity and which domains do not vary because of structural or functional constraints. They found seven domains with elevated rates of nonsynonymous substitutions compared with synonymous substitutions (fig. 15.1), suggesting that these regions may be exposed to antibody pressure. Some domains had relatively little nonsynonymous change, indicating that structural or functional constraints preserve amino acid sequence. These inferences provide guidance in vaccine design and point to testable hypotheses about antigenicity and structure.

Figure 15.1. The seven peaks identify the major regions of positive selection in the Tams1 protein.

Figure 15.1

The seven peaks identify the major regions of positive selection in the Tams1 protein. The eighteen sequences analyzed in this figure have about 870 nucleotides. The analysis focused on a sliding window (Endo et al. 1996) of 60 nucleotides (20 amino acids). (more...)

Group A Streptococci

Group A streptococci (GAS) infect the upper respiratory tract of humans, causing "strep throat." GAS epidemics develop quickly and typically last one to three years (Martin and Single 1993; Muotiala et al. 1997). Streptococcal inhibitor of complement (Sic) is the most variable protein of GAS (Hoe et al. 1999). This extracellular protein interferes with the host's complement system of immunity, a key defense against invading bacteria.

Hoe et al. (1999) sequenced the sic gene in 892 GAS isolates. These sequences had insertions, deletions, and nonsynonymous substitutions that encode 158 variant Sic proteins. Of the single nucleotide changes, 77 of 86 caused amino acid substitutions (nonsynonymous), demonstrating strong positive selection.

Figure 15.2 shows the phylogenetic relationship between sequences from Ontario, Canada, in 1996. Most variants could be linked to each other by a small number of changes, as shown in the figure. The starlike shape of the phylogeny suggests that the isolates diverged rapidly from a common ancestor during the course of the local epidemic. This rapid divergence implies very strong selection for change, most likely caused by escape from host antibodies (Hoe et al. 1999).

Figure 15.2. Phylogeny for Group A streptococcal sic alleles from a local epidemic in Ontario, Canada.

Figure 15.2

Phylogeny for Group A streptococcal sic alleles from a local epidemic in Ontario, Canada. Each tip corresponds to one isolate. The numbers on each branch indicate the number of molecular differences between each node. From Hoe et al. (1999), with permission (more...)

15.3. Phylogenetic Analysis of Nucleotide Substitutions

Initial studies of selection often used small numbers of sequences, typically fewer than one hundred. Small sample sizes required aggregating observations across all nucleotide sites to gain sufficient statistical power. Conclusions focused on whether selection was positive, negative, or neutral when averaged over all sites. With slightly larger samples, one could do a sliding window analysis as in figure 15.1 to infer the kind of selection averaged over sets of amino acids that occur contiguously in the two-dimensional sequence (Endo et al. 1996).

We have seen throughout this book that major changes in binding and antigenicity often require only one or a few amino acid changes. The analytical methods that aggregate over whole sequences or sliding windows often fail to detect selection at the scale of single-site substitutions, which appears to be the proper scale for understanding antigenic evolution.

Recently, larger samples of sequences have provided the opportunity to study the rates of synonymous and nonsynonymous substitutions at individual nucleotide sites. Each individual substitution occurs within a lineal history of descent, that is, a change occurs between parent and offspring. To study each substitution directly, one must first arrange a sample of sequences into lineal relationships by building a phylogenetic tree. From the tree, one can infer the nucleotide sequence of ancestors, and therefore trace the history of each nucleotide change through time.

Each nucleotide change can be classified as synonymous or nonsynonymous. For each amino acid site, one can sum up the numbers of synonymous and nonsynonymous nucleotide changes across the entire phylogeny and derive the associated rates of change. With appropriate statistics, one determines for each amino acid site whether nonsynonymous changes occur significantly more or less often than synonymous changes (Hasegawa et al. 1993; Wakely 1993; Bush et al. 1999; Meyer et al. 1999; Suzuki and Gojobori 1999; Yang and Bielawski 2000; Bush 2001).

The concepts of measuring positive and negative selection remain the same. However, for the first time, the statistical power has been raised to the point where analysis of population samples provides significant insight into the evolution of antigens. The power derives from studying the relative success of alternate amino acids at a single site. Important selective forces include the amino acids at other sites as well as binding properties to host immune molecules and other host receptors.


Yamaguchi-Kabata and Gojobori (2000) analyzed selection on individual amino acid sites in gp120, the major exposed glycoprotein on the HIV-1 envelope. gp120 contains the primary host-cell receptor that binds to the host's CD4 molecules on the surfaces of various immune cells. gp120 also has the secondary host-cell receptor that binds either the host's CCR5 or CXCR4 molecules—the viral binding specificity for these second receptors determines the kinds of host immune cells infected by the virus. gp120 carries major antibody epitopes as well as CTL epitopes.

Yamaguchi-Kabata and Gojobori (2000) studied amino acid variations at 422 sites in 186 sequences of HIV-1 subtype B. Significant positive selection occurred at 33 sites, and significant negative selection occurred at 63 sites. As with most proteins, negative selection or no apparent selection dominated over the whole sequence, with positive selection limited to a minority of sites.

Previous work had split the linear amino acid sequence into five variable and five constant domains based on the inferred tendency for genetic variation in each region (Modrow et al. 1987). The variable domains mostly occur in exposed loops, whereas the constant regions mostly occur in a core that may be partly protected.

Yamaguchi-Kabata and Gojobori (2000) found that, when analyzing selection on individual amino acids, sites in the variable domains did have a relatively greater tendency to be positively rather than negatively selected. By contrast, those sites in the constant domains had a relatively greater tendency to be negatively rather than positively selected. However, many positively selected sites occurred in the constant domains.

Yamaguchi-Kabata and Gojobori (2000) focused on individual sites with regard to location on the three-dimensional structure and in relation to potential selective pressures. For example, fifteen of the thirty-three positively selected sites clustered on the face of the gp120 core opposite the CD4 binding site. Seven of these sites occurred in positions 335–347, which form an α-helix that is alternately exposed on the surface and hidden in the core. The positively selected sites occurred at exposed positions, whereas three of the interior sites were highly conserved although they lacked elevated ratios of synonymous to non-synonymous substitutions.

The other positively selected sites in this region also occurred on the exposed surface near the 335–347 α-helix. These other sites had dispersed sequence locations ranging from positions 291 to 446 that are brought together in the three-dimensional structure. Yamaguchi-Kabata and Gojobori (2000) propose that this cluster of fifteen positively selected sites may form discontinuous epitopes. Previously, this partially recessed region was not considered a key location for antibody binding.

Foot-and-Mouth Disease Virus

Haydon et al. (2001) analyzed selection on individual amino acid sites of foot-and-mouth disease virus. Most sites showed mild to strong negative selection, as usually occurs. At seventeen sites they found evidence of significant positive selection. Twelve of these positively selected sites occurred at positions that had previously been observed to develop escape mutants in experimental evolution studies that imposed pressure by monoclonal antibodies. The other five sites indicate candidates for further experimental analysis.

Haydon et al.'s (2001) study of natural isolates gives further evidence that a small number of amino acid sites determines a large fraction of antigenic evolution to escape antibody recognition. The combination of analyses on structure, experimental evolution, and natural variation provide an opportunity to study how complex evolutionary forces together determine the evolutionary dynamics of particular amino acids.

15.4. Predicting Evolution

The studies on positive selection in the previous section could not correlate amino acid substitutions with the actual success of the viruses. In each case, selection was inferred strictly from the patterns of nucleotide substitutions in a sample of sequences.

Bush et al.'s (1999) study of influenza takes the next step by associating particular amino acid substitutions with the success or failure of descendants that carry the substitutions. Influenza allows such studies because sequences have been collected each year over the past several decades, providing a history of which substitutions have led to success over time.

The influenza data can be used to predict future evolution by two steps. First, previous patterns of substitutions and the successes of associated lineages suggest which amino acid sites contain variants that enhance fitness. Second, new variants arising at those key sites are predicted to be the progenitors of future lineages.

The Shape of Phylogenies

Predicting evolution based on amino acid substitutions requires a correlation between substitutions and the success of lineages. Many parasites do not have such broad-scale correlations. For example, figure 15.2 shows a star-shaped phylogeny for streptococcal divergence. This kind of phylogeny retains multiple, diverging lineages along several branches. Although selection may guide the relative success of different substitutions within a lineage, the lineages along different branches apparently do not compete. Thus, one cannot use particular substitutions to predict which lineages will eventually dominate the future population.

HIV-1 also has multiple diverging lineages that create star-shaped phylogenies (fig. 15.3). This makes sense because HIV-1 currently forms an expanding population with little competition between lineages. Many different lineages continue to spread to naive hosts that have no prior immune memory of infection. Thus, at the population level, immune pressure does not favor one lineage over another by amino acid substitutions that escape widely dispersed immune memory in hosts.

Figure 15.3. The phylogeny of the HIV-1 subtypes based on the env gene, which includes the coding for the gp120 protein.

Figure 15.3

The phylogeny of the HIV-1 subtypes based on the env gene, which includes the coding for the gp120 protein. The letters name the different subtypes. The bar shows the length along branches corresponding to 10% divergence in sequence. From McCutchan (1999), (more...)

In the HIV-1 phylogeny, the different subtypes coalesce to a common ancestor that probably occurred near the origin of the HIV-1 epidemic in humans. Various studies estimate that the ancestor occurred during the first half of the twentieth century (Korber et al. 2000; Yusim et al. 2001).

Comparison of HIV-1 subtypes may not be the appropriate scale at which to study the correlation between amino acid substitutions and fitness. The subtypes are to some extent separated geographically and may not compete directly. Even within regions, HIV-1 continues to spread to naive hosts, so escape from immune memory at a few key antibody epitopes would not dominate the relative success of lineages. It would be interesting to see the shapes of HIV-1 phylogenies based on samples collected over several years from a single region.

Figure 15.4 shows the kinds of phylogenetic shapes that may occur. Only the extreme case in figure 15.4d provides enough differential success (fitness) between lineages to correlate amino acid substitutions with fitness. In the other shapes, the signal of differential success would usually not be strong enough to associate particular substitutions with the survival of a lineage. However, the dominance of a single lineage as in figure 15.4d does not guarantee an association between success and any particular characteristic of the parasite. Powerful epidemics that start from just a few individuals also give rise to skewed phylogenetic trees, but the progenitors of those epidemics may simply have been lucky and may show no tendency to carry particular traits.

Figure 15.4. Differences in success between lineages in a phylogeny influence the shape of the tree.

Figure 15.4

Differences in success between lineages in a phylogeny influence the shape of the tree. All trees shown with their ancestral node on the left. Time increases from left to right. (a) Shape when all lineages survive. This corresponds to a star phylogeny (more...)


Influenza A phylogenies have just the sort of shape that could allow correlation between particular substitutions and fitness. The trees in figure 15.5 show a single successful lineage continuing through time, with many branches diverging and dying off over short periods of time.

Figure 15.5. Phylogeny of influenza A hemagglutinin HA1 domain.

Figure 15.5

Phylogeny of influenza A hemagglutinin HA1 domain. The tree on the left shows evolutionary relationships between isolates from subtype H3 from 1983 to 1994. The horizontal axis measures the number of nucleotide substitutions between isolates, which correlates (more...)

Bush et al. (1999) used trees such as those in figure 15.5 to analyze amino acid substitutions in the hemagglutinin HA1 domain. They assigned each variable amino acid site to zero or more of four different sets: 18 sites were positively selected with dN significantly greater than dS, 16 sites were associated with the receptor binding site of the HA1 surface, 20 sites evolved relatively faster than the other sites, and 41 sites were in or near the well-known antibody epitope domains A and B.

Suppose amino acid changes in one of the four sets consistently correlated with the ultimate success of a lineage. Then, at any time, one could predict which of the currently circulating isolates would be most closely related to the progenitor of future lineages. In particular, those lineages with the most amino acids that had recently changed at the key sites would be most likely to succeed. In influenza, success probably occurs by escaping the host's immunological antibody memory caused by recent epidemics.

Variant sites near key antibody epitopes would be good candidates to produce antibody escape. However, Bush et al. (1999) found that the variant amino acids at positively selected sites provided the best information about future success. In other words, those sites with amino acid replacements favored by selection in the past also provided the best information about which amino acid changes would lead to success in the future.

Bush et al. (1999) did not truly predict future evolution. Instead, they used data from 1983 to 1997 to form eleven retrospective tests. A retrospective test analyzed data from 1983 to year x and predicted subsequent evolution in the years following x, where x varied between 1986 and 1997.

Figure 15.5 shows the structure of one retrospective comparison. The left tree contains data from 1983 to 1994. The bold line along the left marks the single dominant "trunk" lineage. At the question mark, just before 1994, the data can no longer resolve the trunk lineage because several variants cocirculated at that time and the trunk can be resolved only after one knows which of those lineages succeeded.

The filled circles show four isolates from 1994 that represented the four classifications for variable amino acids. Shd5 (A/Shangdong/5/94) represented the lineage with the greatest number of recent amino acid changes at sites that had been positively selected in the past, as inferred from the 1983–1997 data. The Har3 (A/Harbin/3/94) lineage had variant amino acids near the receptor binding site. The Sant (A/Santiago/7198/ 94) lineage had variant amino acids at those sites that had evolved rapidly in the past. The NY15 (A/New York/15/94) lineage had variant sites in or near antibody epitopes A and B.

The right tree includes additional data from 1994 to 1997. Those data show which of the 1994 lineages succeeded and which died out. Successful prediction means choosing the isolate closest on the tree (most alike genetically) with the lineage that continues along the trunk and gives rise to the future population. It turned out that Shd5 was closest to the successful trunk lineage among the candidates. In other words, the most changes in previously positively selected sites predicted which lineage succeeded in subsequent years.

Bush et al. (1999) reported a systematic analysis of retrospective tests in eleven years. In nine of those eleven years, the lineage that contained the most changes relative to its ancestor at the eighteen positively selected sites identified the section of the tree from which the future trunk emerged. The sites in the antibody epitopes only identified seven of eleven trunk lineages, and the other amino acid sets did worse. Thus, positive selection provided the best signal for which amino acid changes correlated most closely with fitness.

Foot-and-Mouth Disease Virus

Feigelstock et al. (1996) sequenced foot-and-mouth disease virus isolates from a 1993–94 epidemic in Argentina. The epidemic strains contained amino acid replacements at a small number of sites that had previously been identified as crucial for escape from monoclonal antibodies. Feigelstock et al. (1996) suggested a prediction method similar to the one used by Bush et al. (1999): identify those few key amino acid sites that correlate with fitness, then predict that lineages with changes at those sites will be likely candidates to spread in the future.

Feigelstock et al. (1996) chose sites by previous reports of escape from monoclonal antibodies in experimental evolution. Bush et al. (1999) chose sites by analysis of positive selection from population samples. It would be interesting to compare these two methods in a single study of the same evolving parasite population.

15.5. Problems for Future Research

1. Episodic selection

Bush et al. (1999) found eighteen amino acid sites under positive selection in subtype H3 of influenza A. Substitutions at these positively selected sites correlated with the future success of lineages during the years of sampling, 1983–1997. In the future, will these eighteen sites continue to be the primary target of selection?

On the one hand, the eighteen sites may indeed be the most important for escape from protective antibodies. If so, future samples will continue to find positive selection focused on these sites. On the other hand, different sites may dominate in the future, with little future selective change in the currently positively selected sites. A changing focus of selection may arise from evolving structural features of the viral surface that expose or hide different sites or from a changed distribution in the immune memory profiles of hosts.

If episodic selection frequently occurs, then the time scale over which one studies substitution patterns plays a critical role in inference. Simply measuring aggregate rates of synonymous and nonsynonymous substitutions may turn out to be a rather crude tool that misses a large proportion of the changes brought about by natural selection. As more data accumulate, it will become important to match statistical methods with explicit hypotheses about the biological processes of selection and the temporal scale over which selection varies.

2. Kinds of selection detectable from standard analyses of population samples

Influenza has certain characters that make it a particularly good model for simple analysis of positive selection. Epidemic strains often have wide distribution; thus, there is relatively less spatial variation in the exposure of hosts to different strains than for many other parasites. The wide and relatively uniform distribution of epidemics creates relatively uniform selective pressure on the virus. In addition, infections do not persist within hosts, so most selective pressure on the surface hemagglutinin glycoprotein arises by escape from antibody recognition during transmission between hosts. The uniformity of selective pressure means that aggregate samples can provide clear signals.

By contrast, other parasites may face multiple selective pressures that vary over relatively small spatial and temporal scales. For example, Rouzine and Coffin (1999) analyzed 213 pro sequences of HIV-1 from eleven infected individuals. This sampling scheme allowed them to analyze the different patterns of selection within hosts and between hosts. This may be particularly important in HIV, which causes long, persistent infections within hosts. HIV probably faces relatively little pressure from immune memory during transmission between hosts, but does experience different MHC genotypes between hosts and different selective pressures on T cell epitopes.

Rouzine and Coffin (1999) found evidence only for negative selection within hosts. They propose various models of selection within and between hosts that could be tested by further sampling and analysis. The point here is that a simple aggregation of sequences over the entire population may not be informative given the different kinds of selection that act over various temporal and spatial scales.

3. Sampling methods to collect sequences

I mentioned in the Problems for Future Research section of chapter 11 that most population samples have been collected for reasons other than phylogenetic analysis. For example, each year epidemic surveillance teams collect thousands of influenza isolates from across the world. Sequencing labs choose only a small fraction of the isolates for analysis. They typically use antigenic screening to pick isolates that differ significantly from the common, recently circulating strains. This biased sampling supports vaccine design but may affect analyses of selection and other population-level processes. Recent calls for wider and better-designed sampling should lead to great opportunities for population studies (Layne et al. 2001).

Nonlinear processes of transmission and stochastic effects of small effective population sizes in epidemics strongly influence the patterns of evolutionary change. Random sampling may not be the best design for studying the population consequences of nonlinear transmission and stochastic fluctuations. New theoretical work on sampling and inference would help to guide the advanced screening and analysis technologies that will be put in place in the coming years.

4. Selection on archival variants

Several parasites such as Trypanosoma brucei and Borrelia hermsii store archival libraries of antigenic variants. They express only one variant at a time. Strong positive selection probably favored diversification of the archival variants during the initial evolution of antigenic switching. However, once a genome contains a large library of diverged variants, negative selection may act primarily to retain the existing antigenic differences between the variants.

I found only one analysis of archival variants. Rich et al. (2001) studied one sequence from each of eleven different loci that contain antigenic variants of the variable short protein (Vsp) of Borrelia hermsii. This sample showed significant recombination between the loci, suggesting that divergence between antigenic variants may arise by intragenomic mixing of protein domains. Their sampling did not provide multiple alleles at individual loci, so they did not report on the selective pressures recently acting on each individual locus. An extended study that analyzed variation within and between loci would be interesting.

5. Inferring selection from the spatial distribution of allele frequencies

Rare antigenic variants often have an advantage because they encounter specific immune memory less often than common antigens. Conway (1997) suggested that this rare-type advantage promotes a balanced distribution of allele frequencies among antigenic variants. By this theory, such balancing selection reduces the fluctuations in allele frequencies when compared with loci experiencing little or no selection. The neutral loci would have allele frequencies drifting over time and space, whereas the balanced antigenic loci would face a continual pressure to raise any allele frequency that temporarily dropped to a low level.

Conway (1997) suggested that one could infer which loci experienced strong immune selection by examining the spatial distribution of allele frequencies. Balancing selection may cause immune-selected loci to have a more even, less variable distribution of allele frequencies across space than other loci.

Conway et al. (2000) tested this idea by examining the spatial distribution of allele frequencies for Msp1, a dominant surface protein of Plasmodium falciparum. They divided the long (5 kb) msp1 gene into domains and measured the allele frequencies for each domain over six African and two Southeast Asian populations. Recombination occurs frequently within the gene, causing low linkage disequilibrium between domains. One domain, block 2, had very even distributions of its three allelic types over the different populations within each continent. The other domains all had significant variations in allele frequency over the populations. Conway et al. (2000) also showed that hosts with IgG antibodies against block 2 enjoyed some protection against malaria. From these data, Conway et al. (2000) concluded that block 2 is an important antigenic site.

Conway et al.'s (2000) spatial analysis of allele frequencies provides an interesting approach to identifying key antigenic sites. However, the theoretical prediction of relatively stable allele frequencies over space requires further study.

Frequency dependence with an advantage for rare types commonly occurs in models of host-parasite interactions (Anderson and May 1991). In the typical model, frequency dependence causes strong fluctuations in allele frequencies rather than stable allele frequencies. The fluctuations arise because of feedbacks between host and parasite types. A rare parasite type, x, increases because most hosts do not recognize the rare type. As x increases in frequency, this favors an increase in the frequency of the hosts that recognize x, causing in turn a decline in the frequency of x. The decline in x favors a loss of host recognition for x. Low frequencies of x and of host recognition start the cycle again.

Conway (1997) suggested that frequency dependence stabilizes allele frequencies rather than causes enhanced fluctuations. This may be true for the particular dynamics that follow from Plasmodium demography and the time course of host immune memory. However, this should be studied with theoretical models that analyze fluctuations over space in antigenic allele frequencies and host memory profiles.

Copyright © 2002, Steven A Frank.
Bookshelf ID: NBK2379


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...