NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Frank SA. Immunology and Evolution of Infectious Disease. Princeton (NJ): Princeton University Press; 2002.

Cover of Immunology and Evolution of Infectious Disease

Immunology and Evolution of Infectious Disease.

Show details

Chapter 10Genetic Structure of Parasite Populations

Variant alleles may be grouped together to form discrete parasite strains. For example, some parasites may be of type A/B or A/B at two distinct epitopes, with intermediates A/B and A/B rare or absent. In this chapter, I consider the processes that group together variants.

The first section reviews different kinds of genetic structure. The example above describes linkage disequilibrium between antigenic loci, a pattern that may arise from host immune selection disfavoring the intermediate forms. Alternatively, allelic variants across the entire genome may be linked into discrete sets because different parasite lineages do not mix. Spatial isolation or lack of sex and recombination can prevent mixing.

The second section asks whether the observed associations between alleles can be used to infer the processes that created the associations. This would be valuable because it is easier to measure patterns of genetic association than to measure processes such as immune selection or the frequency of genetic mixing. However, many different processes can lead to similar patterns of genetic association, making it difficult to infer process from pattern. Detailed data and a careful accounting of alternative hypotheses can allow one to narrow the possible explanations for observed patterns.

The third section describes various processes of genetic mixing between lineages and the consequences for genome-wide linkage disequilibrium. Some parasites have discrete, unmixed lineages, whereas other parasites recombine frequently and have little linkage between different loci. The degree of mixing determines the pace of antigenic recombination. New antigenic combinations have the potential to overcome existing patterns of host immunity.

The fourth section presents one example of antigenic linkage disequilibrium, the case of Neisseria meningitidis. Variants at two antigenic loci group together nonrandomly. Mixed genotypes occur at low frequency, suggesting some recombination. The immune structure of the host population could disfavor recombinant types, explaining the observed linkage between antigenic loci. Alternatively, recent epidemics or linkage with favored alleles at nonantigenic loci could also produce the observed patterns of antigenic linkage.

The fifth section proposes that hosts form isolated islands for parasites (Hastings and Wedgwood-Oppenheim 1997). Island structure confines selection within hosts to the limited genetic variation that enters with initial infection or arises de novo by mutation. Island structure also enhances stochastic fluctuations because each host receives only a very small sample of parasite diversity. As the number of genotypes colonizing a host rises, selection becomes more powerful and stochastic perturbations decline in importance. Rouzine and Coffin (1999) apply the balance between selection and stochastic perturbation to the observed patterns of genetic variability in HIV.

The final section takes up promising lines of study for future research.

10.1. Kinds of Genetic Structure

Genetic structure describes the statistical pattern of associations between alleles (Wright 1969; Crow and Kimura 1970; Li 1976; Hartl and Clark 1997; Hedrick 2000). It is useful to distinguish different kinds of genetic associations.

Linkage Disequilibrium between Antigenic Loci

Statistical association between alleles at different loci is called linkage disequilibrium. Linkage disequilibrium arises when alleles occur together in individuals (or haploid gametes) more or less frequently than expected by chance.

Immune pressure by hosts could potentially create linkage disequilibrium between antigenic loci of the parasite (Gupta et al. 1996). Suppose that the parasite genotype A/B infects many hosts during an epidemic, leaving most hosts recovered and immune to any parasite genotype with either A or B. Then genotypes A/B and A/B will be selected against, but A/B can spread. Thus, host immunity favors strong linkage disequilibrium in the parasites, dominated by the two strains A/B and A/B.

Genome-wide Linkage Disequilibrium

Linkage disequilibrium over the entire genome arises when there is some barrier to genetic mixing between lineages, such as spatial isolation or lack of sex and recombination. When lineages do not mix, then the particular amino acid substitutions in each lineage become locked together by their common pattern of inheritance. Genome-wide linkage disequilibrium has been observed in some parasites but not in others (Maynard Smith et al. 1993; Tibayrenc 1999).

Immune pressure can create associations between different antigenic loci of the parasite. But if the parasite mixes its genome by recombination, nonantigenic loci will often remain in linkage equilibrium and will not be separated into discrete strains. Consider, for example, a third, nonantigenic locus with the allele C causing severe disease symptoms and the equally frequent allele C causing mild symptoms.

Strong host immune pressure could potentially separate the antigenic loci into discrete strains, A/B and A/B. But if recombination occurs, the nonantigenic locus will be randomly associated with each strain, for example, A/B/C and A/B/C will occur equally frequently. The alleles C and C will also be distributed equally within the A/B antigenic strain. Immunity by itself does not organize the entire parasite genome into discrete, nonoverlapping strains (Hastings and Wedgwood-Oppenheim 1997).

The distinction between antigenic and genome-wide linkage is important for medical applications. If genome-wide linkage occurs, then each strain defines a separate biological unit with its own immune interactions, virulence characteristics, and response to drugs (Tibayrenc et al. 1990; Tibayrenc 1999). Strains can be typed, followed epidemiologically, and treated based on information from a small number of identifying markers of the genome.

Association between Coinfecting Parasites

Several parasite genotypes may infect a single host. A recent survey of the literature found nonrandom associations between parasite genotypes within hosts (Lord et al. 1999). For sexual parasites, nonrandom associations within hosts often affect mating patterns. Mating typically occurs between the parasites within a host or between parasites in a vector that were recently derived from one or a few hosts. Nonrandom mating alters heterozygosity at individual loci and the opportunities for recombination between loci.

Host immunity may influence the distribution of strains within hosts. Gilbert et al. (1998) found a positive association within hosts between two antigenic variants of Plasmodium falciparum. Their data suggest that the two variants mutually interfere with T cell attack against the parasite, so both variants do better in the host when they are together. In general, the immunological profile of each host constrains the range of parasite variants that may coinfect that host.

Effective Population Size

The number of adult genotypes sampled to produce the progeny generation influences the effective size of the population (Wright 1969; Crow and Kimura 1970; Li 1976; Hartl and Clark 1997; Hedrick 2000). Population size affects many statistical properties of genetic structure. For example, suppose a particular parasite genotype sweeps through a host population, causing a widespread epidemic. This epidemic genotype rises to a high frequency as other genotypes fail to spread or decline in abundance.

Descendants of the population after an epidemic will likely come from the epidemic genotype (Maynard Smith et al. 1993). The effective size of the population is small because of the limited number of ancestral genotypes. The spread of an epidemic genotype carries along in strong association the alleles of that genotype at different loci. Consequently, strong genome-wide linkage disequilibrium may appear when descendants of the epidemic genotypes are sampled among genotypes descended from other lineages (Maynard Smith et al. 1993; Hastings and Wedgwood-Oppenheim 1997).

Population size also influences the pattern of genomic evolution by natural selection (Kimura 1983). When the effective population size is small, chance events of sampling can favor one allele over another. This stochastic sampling reduces the power of natural selection to shape evolutionary patterns of antigenic variation.

10.2. Pattern and Process

Recently developed molecular tools, such as polymerase chain reaction (PCR), provide the potential for widespread sampling of parasite populations (Enright and Spratt 1999). Statistical descriptions of the sampled data readily allow calculation of heterozygosity levels at single loci, the linkage disequilibrium between loci within genomes, and the spatial distribution of genotypes.

Can we use the measurable patterns of population structure to infer the underlying process that created the pattern? Yes, but only if we can rule out alternative processes that could lead to the same pattern.

Suppose, for example, that we demonstrate genome-wide linkage disequilibrium. The pattern by itself is interesting, because we have established that the parasites fall into discrete strains. Each strain can be identified by its combination of alleles, allowing the movement of strains to be followed. Each strain can also be studied for its unique antigenic and physiological properties, such as response to drugs.

The pattern of genome-wide linkage does not tell us what process created that pattern. The pattern may be created by frequent epidemics, each epidemic stemming from a limited number of genotypes. The parasite may be asexual, binding together alleles at different loci because no process mixes alleles between genotypes. Or, sex and the physical mixing of genotypes by recombination may occur in every generation, but with all mating confined to the pool of genotypes within each host. If only one parasite genotype typically infects a host, then all mating occurs between members of the same lineage with no opportunity for recombination to break down associations between loci.

One can carefully list all processes that could lead to the observed pattern and then do statistical tests of the data to distinguish between the potential causes. Maynard Smith et al. (1993) and Tibayrenc (1999) present detailed statistical analyses to accomplish such tests.

Most statistical analyses have not focused on antigenic variation. Instead, those analyses have used data on genetic variability from loci sampled across the genome. In some cases, the analyses use common enzyme (housekeeping) loci (Enright and Spratt 1999). Housekeeping loci are likely to evolve relatively slowly compared with other parts of the genome. The relatively slow rates of change provide a good indicator of common ancestry between genomes that have been separated for long periods of time. Other analyses use rapidly evolving loci, which provide more information about recent divergence from common ancestors (Tibayrenc 1999).

I review some population studies of genetic structure. I emphasize only the background needed for understanding antigenic variation, leaving out much of the analytical detail. I start with linkage of alleles across the entire genome. I then turn to linkage at antigenic loci.

10.3. Genome-wide Linkage Disequilibrium

Barriers to Genetic Mixing

Genome-wide linkage arises when different lineages rarely mix their genes (Tibayrenc 1999). Four different barriers prevent genetic mixing (Maynard Smith et al. 1993; Hastings and Wedgwood-Oppenheim 1997). First, asexual reproduction separates lineages irrespective of geographical or ecological locality. Differentiated strains will occur jointly in the same area. In addition, particular multilocus combinations of genes may disperse widely and be found in different regions without being broken up by recombination with local varieties.

Second, physical separation by geography or habitat prevents genetic mixing. Geographic subdivision is common in many populations. Ecological subdivision may arise if some genotypes occur mainly in one host species, whereas other genotypes are confined to a different host. Sexual species divided by physical barriers will have mixed genomes within local regions and differentiated genomes across barriers. Particular multilocus genotypes are unlikely to be found far from their native region because they will be broken up by recombination with neighboring genotypes.

Third, demography can separate lineages if each host or vector carries only a single parasite genotype. Single-genotype infections prevent physical contact between different parasite genotypes, isolating lineages from each other even when they occur in the same region. Epidemics may cause a single genotype to spread rapidly, limiting most infections to the epidemic strain. This limited variability reduces opportunity for genetic exchange and causes the region to be dominated by the linked set of alleles within the epidemic strain (Maynard Smith et al. 1993). In the absence of epidemics, single-genotype infections can maintain a greater diversity of distinct genotypes within a region. Obligate intracellular pathogens may be able to exchange genetic information only when two distinct genotypes coinfect a cell.

Fourth, mixing may occur occasionally between separated lineages, but mixed genotypes fail. Hybrid incompatibility separates eukaryotes into distinct, reproductively isolated species. In segmented viruses, certain pairs of segments may be incompatible, causing the absence of some genotypic combinations (Frank 2001). Recombining viruses and bacteria present more complex possibilities. Certain genomic regions may be able to pass from one lineage to another, whereas other genomic regions may be incompatible. Thus, some genomic regions may exhibit linkage disequilibrium between lineages, whereas other regions may be well mixed.

Sexual, diploid species will be primarily homozygous when different lineages do not mix because most matings will be between the same genotype. Asexual species may maintain significant heterozygosity even in regions dominated by a single clone.

At the nucleotide level, epidemics tend to reduce genetic variability because extant parasites have descended from a recent ancestral genotype that started the epidemic. By contrast, endemic diseases will often maintain more nucleotide variability within genotypes because those genotypes trace their ancestry back over a longer time to a common progenitor.

Sexual, physical, and demographic barriers to genomic mixing shape patterns of genetic variability. Conversely, those patterns provide information about key aspects of parasite biology.

An Example

The protozoan Trypanosoma cruzi causes Chagas' disease. Linkage disequilibrium between loci has been observed in several sampling studies (e.g., Tibayrenc et al. 1986; Tibayrenc and Ayala 1988; Oliveira et al. 1998, 1999). Recently, Michel Tibayrenc's laboratory expanded a long-term analysis of genetic variability by using multilocus enzyme electrophoresis (MLEE), random amplified polymorphic DNA (RAPD), and polymerase chain reaction (PCR) (Barnabé et al. 2000; Brisse et al. 2000a, 2000b).

Tibayrenc's laboratory has classified T. cruzi into six groups defined as discrete typing units (DTUs) (Tibayrenc 1999). Each individual in a DTU shares a significant proportion of alleles at several polymorphic loci with other members of the DTU and shares relatively few alleles with members of other DTUs. The discrete, nonoverlapping structure of DTUs is simply another way to describe genome-wide linkage disequilibrium—the association of particular sets of alleles within genomes.

The different methods, MLEE, RAPD, and PCR, give similar results for the classification of T. cruzi isolates into DTUs. Because each method measures genetic variability in different parts of the genome, the concordance between methods further supports the overall classification into separate DTUs.

The DTUs provide a taxonomy to identify new samples. To the extent that DTUs truly capture genome-wide linkage, each DTU will likely have unique properties. Several studies have compared traits such as growth rate, virulence in mice, and sensitivity to drugs (e.g., Revollo et al. 1998; de Lana et al. 2000). The trait values for different isolates of a DTU often vary widely, with the average values of the isolates differing between DTUs. Thus, there appears to be considerable genetic diversity both within and between DTUs.

What processes maintain genome-wide linkage? The previous section classified barriers to genetic mixing as sexual, physical, demographic, or genetic. Tibayrenc and his colleagues have argued that sexual reproduction occurs rarely in T. cruzi and that the lack of sex explains the observed patterns of genetic linkage. They review many lines of evidence, but perhaps the most telling observation concerns repeated occurrences of particular multilocus genotypes.

Barnabé et al. (2000) showed that certain multilocus genotypes recur in samples. Some of the repeated multilocus genotypes were found in widely separated geographic locations. The probability of obtaining the same set of alleles across multiple polymorphic loci would be very small if the loci recombined occasionally.

Epidemics stemming from a single genotype could possibly cause the spread and repeat occurrence of a genotype. But some of the repeated genotypes were found in areas surrounded by other genotypes, far from the geographic foci of their highest frequency. In addition, the high genetic diversity within locations argues against local regions being swept by epidemic strains.

Rare sex seems to be a reasonable explanation given the limited data. Eventually, additional studies will collect more data and develop a clearer picture of genetic structure.

Is a Clonal Population Structure Common?

Tibayrenc et al. (1990, 1991) concluded from the available data that many parasites have genome-wide linkage disequilibrium, suggesting that different lineages rarely mix genes. Tibayrenc et al. use the word "clonal" to describe the observed pattern of genomic linkage, without implying any particular cause such as asexuality or lack of mating between different sexual lineages.

Several recent analyses infer a clonal population structure, including studies of the protozoan Trypanosoma cruzi (citations above), the protozoan Cryptosporidium parvum (Awad-El-Kariem 1999), and the yeast Candida albicans (Xu et al. 1999). However, data from other species present a complex picture, suggesting a wide diversity of genetic structures. I summarize some of the current ideas and observations in the following subsections.

Bacteria and Protozoa

Bacteria reproduce by binary fission, an asexual process. However, bacteria can mix genomes by taking up DNA from neighboring cells (Ochman et al. 2000). Conjugation directly transfers DNA, transduction carries bacterial DNA with infecting viruses, and transformation occurs by uptake of free DNA that has been released into the environment. Foreign DNA fragments can recombine with the host chromosome, inserting a piece of genetic material from a different lineage into the genome.

Maynard Smith et al. (1993) classified bacterial genetic structure as clonal, panmictic, or epidemic. Rare recombination leads to a clonal structure with strong linkage disequilibrium, as observed in Salmonella enterica (Spratt and Maiden 1999).

Frequent recombination leads to a panmictic (widely mixed) genetic structure and relatively little association between alleles within genomes. Helicobacter pylori has a panmictic structure (Spratt and Maiden 1999). Recombination occurs so frequently that even variable nucleotide sites within genes are often in linkage equilibrium (not statistically associated) (Suerbaum et al. 1998; Salaun et al. 1998).

Neisseria gonorrhoeae also has a panmictic structure (Spratt and Maiden 1999). However, a strain that requires arginine, hypoxanthine, and uracil (AHU) to grow has maintained a tightly linked genotype over a thirty-nine-year period (Gutjahr et al. 1997). That strain can take up and recombine with DNA in the laboratory. Perhaps the clonal AHU strain remains within the broader panmictic population because it rarely occurs in mixed infections with non-AHU genotypes.

Neisseria meningitidis has an epidemic population structure (Spratt and Maiden 1999). Recombination occurs frequently, and broad samples of the population typically show highly mixed genomes with little or no linkage disequilibrium. However, it appears that epidemics sometimes arise from single genotypes and spread rapidly within a restricted geographic area. When samples include a large fraction of the epidemic strain, this strain shows a clonal pattern of inheritance and strong linkage disequilibrium when compared against other isolates. The epidemics appear to be sporadic and localized, and the epidemic clone probably mixes its genome with other lineages over the span of several months or a few years. As the epidemic clone mixes with other genotypes, its unique pattern of genetic linkage decays.

Escherichia coli has a particularly interesting population structure (Guttman 1997). The first broad studies found strong linkage disequilibrium and an apparently clonal structure. However, early studies of population structure tend to sample widely and sparsely, obtaining just one or a few isolates from each habitat or geographic locality. Later studies of E. coli provided finer resolution by sampling repeatedly from the same or nearby localities or by using DNA sequences rather than lower-resolution molecular markers. Those later studies found that recombination does occur.

How can E. coli's recombining genetic system maintain widespread linkage disequilibrium? This remains a controversial question with several possible answers. Recombination may be a weak force, introducing changes into genomes at a rate no higher than the mutation rate. Advantageous genes may occasionally sweep through a local population, carrying along linked genes as in epidemics. Frequent sweeps promote linkage and may overwhelm the mixing effect of recombination. Alternatively, different genotypes may be specific for different habitats, so that most recombinational mixing occurs within habitats. This may lead to weaker linkage within habitats but strong linkage when measured between nonmixing lineages that live in different habitats.

I suspect that the relatively complex structure of E. coli reflects the more intensive study of this species at different spatial scales. Many bacteria will likely show different genetic structures when analyzed at different scales. The particular spatial scale over which a species differentiates into nonmixing lineages will vary depending on the relative balance of recombination, genetic drift, selective sweeps, epidemics, and migration.

Recent studies on the protozoan Trypanosoma brucei illustrate the varying genetic structures revealed by careful sampling (MacLeod et al. 2000). The human-infective subspecies T. b. rhodesiense causes African sleeping sickness, whereas the subspecies T. b. brucei cannot infect humans. Both subspecies occur in various domestic animals.

T. brucei can recombine sexually in the laboratory, but the extent of genetic mixing in natural populations has been debated (Tibayrenc et al. 1990; Maynard Smith et al. 1993; Hide et al. 1994). MacLeod et al. (2000) demonstrated that the two subspecies are genetically differentiated and show linkage disequilibrium when compared against each other. Within subspecies, T. b. brucei had an epidemic structure, in which recombination occurs but can be overwhelmed by clonal expansion of a few genotypes during epidemics. By contrast, T. b. rhodesiense appeared clonal within the Ugandan samples obtained. Further sampling may eventually find that the Ugandan isolates are part of a wider population in which some recombination occurs.

The protozoan Plasmodium falciparum has an obligate sexual phase that occurs during transmission in the mosquito vector. In geographic regions where infection is common, the vector frequently picks up multiple genotypes, which then mate and recombine before transmission to a new host. By contrast, regions with sparsely infected hosts have a lower probability of mixed genotypes in the vectors, leading to frequent self-fertilization and limited opportunity for recombination between lineages (Babiker and Walliker 1997; Paul and Day 1998; Conway et al. 1999).

Anderson et al. (2000) studied P. falciparum genetic structure with twelve rapidly mutating microsatellite loci in 465 isolates from nine geographic regions. Within areas of low infection intensity, they found strong linkage disequilibrium, low genetic diversity, and high variation between geographic locations. They observed the opposite patterns within areas of high infection intensity. This provides another example in which the genetic structure varies across space.

Reassortment in Segmented Viruses

The segmented RNA viruses provide an excellent model for studying genomic linkage disequilibrium. Genomes are broken up into two or more segments. Each segment replicates independently during cellular infection. The segments act like distinct chromosomes but do not pair and segregate as in eukaryotic cells. Instead, new viral particles form by a sampling process that chooses approximately one segment of each type. When multiple viruses infect a single cell, their replicating segments mix. The progeny form by reassorted combinations of genomic segments.

Reassortment has the same effect as recombination. However, reassorting segments are easier to study because the segments mark discretely and clearly the units of recombination.

Occasional reassortment plays a crucial role in creating new strains. Reassortment of influenza A's neuraminidase and hemagglutinin surface antigens provides the most famous example (Lamb and Krug 2001). The genes for these antigens occur on two separate RNA segments of the genome—the genome has a total of eight segments.

It appears that rare reassortments have occasionally introduced hemagglutinin or neuraminidase from bird influenza into the genome of human influenza (Webster et al. 1997). The novel antigens cross-reacted very little with those circulating in humans, allowing the new combination to sweep through human populations and cause pandemics.

Lack of reassortment maintains discrete strains with strong linkage disequilibrium between segments. Rare mixing can be traced back phylogenetically to one or a few events. This is another way of saying that, after reassortment, discrete lineages accumulate new mutations on different segments and keep those new mutations together within the lineage, creating linkage disequilibrium.

Common reassortment reduces linkage disequilibrium between segments by bringing together genetic variants that arose in different individuals. Reassortment causes differences in the phylogenetic history of different segments within a virus.

Reassortment may be common between viruses within a population, but that population may not mix with viruses from another population. Measured within each population, linkage disequilibrium will be low, and there will be a weak correlation between phylogenetic patterns of different segments. But isolated populations do not share the same associations between genetic variants and thus exhibit linkage disequilibrium relative to each other. Equivalently, the segments within each isolated population have a common phylogeny that differs relative to the phylogenetic history of the segments in other populations.

No studies have sampled over different spatial and temporal scales or studied the processes that cause barriers to reassortment. The best studies I found examined the phylogenetic histories of the various segments of influenza. Influenza occurs in three major types: A, B, and C.

Several papers describe reassortment between segments of influenza C (Buonagurio et al. 1985; Peng et al. 1994, 1996; Tada et al. 1997). A phylogenetic tree of the NS (nonstructural protein) segment showed that thirty-four isolates over 1947–1992 split into two distinct lineages. Recent isolates had one NS lineage, whereas older isolates had the other NS lineage. Thus, the newer NS lineage seems to have replaced the older lineage. By contrast, phylogenies of the other six segments identify three or four distinct lineages, in which each lineage contains older isolates as well as recent isolates. Alamgir et al. (2000) suggest that the newer NS type has reassorted with the other segments and replaced the older NS type, perhaps because the newer NS type has a functional advantage that enhances its spread.

The phylogenetic patterns for seven of the eight influenza B segments show clear patterns of reassortment (Lindstrom et al. 1999; Hiromoto et al. 2000). Figure 10.1 illustrates the phylogenetic patterns and putative reassortments for segments based on eighteen isolates obtained over twenty years. Concordant phylogenetic patterns between segments suggest cotransmission of those segments. Such concordance may arise by selection of functionally compatible segments, for example, between the PB1 and PB2 segments that encode components of the polymerase complex (Hiromoto et al. 2000). However, the sample size is small, and the observed concordances may simply be the chance outcome from a small number of reassortment events.

Figure 10.1. The phylogenetic affinities for seven of the eight influenza B segments.

Figure 10.1

The phylogenetic affinities for seven of the eight influenza B segments. Each row shows a particular segment. The columns show the segment type for each of eighteen isolates, with each segment separated into two types and assigned primary affinity for (more...)

Lindstrom et al. (1998) sequenced all eight segments from ten isolates of influenza A. The isolates were collected over the years 1993–1997. The hemagglutinin (HA) and neuraminidase (NA) segments encode the surface glycoproteins known to determine the main components of antigenicity and interaction with human immunity. These two segments accumulated amino acid changes sequentially over the 5-year period, the isolates from each year apparently replacing those from the prior year in a single, nonbranching lineage. Thus, these isolates do not show any reassortment between HA and NA.

The six influenza A segments encoding internal proteins reassorted relative to the HA-NA lineage. Those internal genes did not accumulate changes sequentially over time in a single lineage. For example, the basic polymerase-1 protein, the nucleoprotein, and the matrix protein isolated in 1997 were phylogenetically closer to isolates from 1993–1994 than to isolates from 1995.

This study shows linkage of the antigenic determinants but reassortment of other genetic components. Influenza strains are defined by the common procedure of using antigenic determinants, in this case by HA-NA combinations. The reassortments of the internal segments against HA-NA strain definitions mean that the strain definitions do not describe distinct genotypes.

Lindstrom et al. (1998) point out that HA-NA strains often appear about three years before they expand into epidemics (see also Bush et al. 1999). They suggest that new antigenic determinants, arising by mutation primarily in HA, may sometimes require reassortment into a more virulent genetic background before a genotype can initiate an epidemic. However, that leaves open the question of why mutations would often arise in weak genetic backgrounds and require reassortment into strong backgrounds.

Recombination in Viruses

Reassortment is a special case of the more general process of recombination. DNA viruses and many RNA viruses have only a single segment, so genetic exchange typically occurs between similar (homologous) segments. Several cases of recombination have been described (summarized by Worobey and Holmes 1999), for example, between vaccine and wild-type polio strains (Guillot et al. 2000).

Recombinants may strongly affect evolutionary patterns even when the frequency of recombination per generation is very low. Occasional recombinants can create the mosaic progenitors of successful lineages (Worobey and Holmes 1999). In addition, recombination means that a particular virus does not have a single phylogenetic history—instead, each part of the genome may trace back to a different ancestral lineage. This may preclude an unambiguous viral taxonomy based on phylogeny.

Recombination can occur only when host cells are coinfected by different viral genotypes. Preliminary reports suggest that some viruses can recombine frequently when genetic variants coinfect a cell (Martin and Weber 1997; Fujita et al. 1998; Bruyere et al. 2000; Hajós et al. 2000; Jetzt et al. 2000; Zhang et al. 2000). Many viruses may be similar to the Plasmodium example cited above, in which the frequency of multiple infection by different genotypes determines the degree of genetic mixing between lineages.

The frequency of recombination between genetic variants undoubtedly varies among viruses. Worobey (2000) has shown that isolates of the DNA-based TT virus have mosaic genomes generated by recombination. Recombination is sufficiently frequent that a small subset of the genome provides a poor indicator of the phylogenetic history for the entire genome. Thus, strain typing may have little meaning because highly diverged variants merge by recombination into a single gene pool. A similar study of the RNA-based dengue virus found seven genotypes created by recombination events between seventy-one isolates (Worobey et al. 1999).

Distinct strains do not exist under frequent recombination. By contrast, rare recombination leaves most lineages identifiably intact as discrete strains. With discrete strains, occasional recombinant mosaics can be identified as the mixture of known strains.

HIV isolates across the world have been sequenced ( Most isolates appear to have a phylogenetic affinity for a particular clade, but multiple recombination events and genomic mosaics also occur frequently (Bobkov et al. 1996; Cornelissen et al. 1996; Robertson et al. 1999). The opposing aspects of discrete strains and widespread recombination probably reflect heterogeneous histories in different locations, the temporal and spatial scales of sampling, and the rapidly changing nature of the viral populations as the infection continues to spread.

I briefly speculate about the history of HIV to illustrate the sort of processes and patterns that may occur in viral evolution. The epicenter of HIV diversity and the probable origin of the pandemic occur in central Africa (Vidal et al. 2000). Based on sequences from the V3–V5 env region of the genome, all known HIV-1 subtypes occurred in a sample of 247 isolates from the Democratic Republic of Congo. Analysis of the gag genomic regions and longer sequences in the env region showed a high frequency of recombination within this population.

Overall, the Democratic Republic of Congo population had all known subtypes, a high degree of diversity within each subtype, and significant mosaicism across different genomic regions. This suggests a relatively old and large population that has accumulated diversity and probably been the source for many lineages that have colonized different parts of the world (Vidal et al. 2000).

Different lineages dominate different geographic regions of the world ( For example, subtype B has spread throughout the Americas, Europe, Australia, and parts of eastern Asia. Subtype A is relatively common in the eastern African countries around the Ivory Coast, and subtype C dominates southern Africa. These broad patterns probably represent the initial spread from central Africa into those regions, each region founded by a narrow slice of the worldwide HIV diversity.

Each subtype divides more finely into variants. Such variants may dominate smaller localities. For example, a distinctive B subtype is particularly common in the heterosexual population of Trinidad and Tobago (Cleghorn et al. 2000). Another variant B (Thai B) is common in Thailand.

Each region may accumulate significant diversity within its dominant subtype, with frequent recombination between subtype variants. However, as HIV spreads, a region initially pure for a subtype will eventually be colonized by other subtypes. Recombination between subtypes then mixes the distinct phylogenetic histories of the subtypes. Such recombinations probably have become increasingly common, for example, the admixtures of subtypes occurring along the routes of intravenous drug user transmissions in China (Piyasirisilp et al. 2000). Drug users in Greece and Cyprus also appear to be fertile sources of recombinants between subtypes (Gao et al. 1998).

These studies suggest that recombination may be relatively common. Such recombination between antigenic sites can strongly influence the evolutionary dynamics of antigenic variation because new genotypes can be generated by combinations of existing variants rather than waiting for rare combinations of new mutations.

10.4. Antigenic Linkage Disequilibrium

The previous section described studies of distinct strains caused by epidemics or barriers to genetic mixing between lineages. Those studies defined strains mainly by measurement of genetic variability at nonantigenic loci (Enright and Spratt 1999). Methods of measurement include electrophoresis and nucleotide sequencing.

In this section, I focus on genetic variability between lineages when defined by differences at antigenic loci. Immune pressure by hosts can potentially separate the parasite population into discrete, nonoverlapping antigenic types (Gupta et al. 1996, 1999). Suppose that a haploid parasite with alleles at two different loci, A/B, infects many hosts during an epidemic, leaving most hosts recovered and immune to any parasite genotype with either A or B. Then genotypes A/B and A/B will be selected against, but A/B can spread. Thus, host immunity favors strong linkage disequilibrium in the parasites, dominated by the two nonoverlapping genotypes A/B and A/B.

Few data exist on the degree of antigenic overlap between genotypes (reviewed by Gupta et al. 1996, 1999). The best example comes from Feavers et al. (1996), who analyzed variability in the outer membrane protein PorA of Neisseria meningitidis. This protein has two distinct variable antigenic regions, VR1 and VR2. Strong associations occurred between VR1 and VR2 variants in a sample of 222 isolates from England and Wales obtained in 1989–1991. Table 10.1 shows that three combinations account for 61% of the observed genotypes, much higher than the 18% expected for these combinations if the two antigens occurred independently. The "other combinations" include mixtures of the listed types, for example, VR1 type 5 with VR2 type 4, plus other, rarer types not listed.

Table 10.1. Linkage disequilibrium between antigens of Neisseria meningitidis.

Table 10.1

Linkage disequilibrium between antigens of Neisseria meningitidis.

The existence of uncommon combinations suggests that recombination can occur. Some process apparently opposes recombination to maintain strong linkage disequilibrium between VR1 and VR2. Gupta et al. (1996) favor host immune selection as the force that structures bacterial genotypes into nonoverlapping sets. This is certainly a plausible explanation. But, as with most population genetic patterns, other processes can lead to the same observations. For example, the three common types might just happen to be the strains circulating most widely among the individuals sampled. Those strains might be common because of chance events that led to mild epidemics caused by a few different types. Or those types may have advantageous alleles at other loci, possibly antigenic but not necessarily so. Over time, recombination could break down the associations between advantageous alleles and the VR combinations, but over a few years such associations can be strong.

Gupta et al.'s (1996) work on antigenic strain structure calls attention to several interesting questions. Are different antigenic combinations structured into nonoverlapping sets? The pattern by itself is important for the design of vaccines and the study of epidemiological distributions.

If discrete antigenic strains occur, are they associated with other components of the genome that code for attributes such as virulence? Hastings and Wedgwood-Oppenheim (1997) provide a good introduction to the processes that potentially link antigenic type to other characters.

What processes can potentially structure populations into discrete, nonoverlapping antigenic combinations? Immune selection is one possibility, but any process that reduces gene flow relative to the scale of sampling tends to create nonrandom associations between loci.

How can one differentiate between the various processes that lead to similar patterns? A clear understanding of the processes that reduce gene flow and their consequences (Hastings and Wedgwood-Oppenheim 1997) can help. Direct observations of immune selection disfavoring "recombinant" antigenic types would be useful, but perhaps difficult to obtain.

10.5. Population Structure: Hosts as Islands

Parasite populations often subdivide into Wright's (1978) classical "island model" structure from the theory of population genetics (Hastings and Wedgwood-Oppenheim 1997). Each host forms an island colonized by parasites from one or more sources. The population of parasites within the host undergoes selection that depends on the amount of genetic variation between parasites within the host. The host transmits migrant parasites to colonize new hosts (islands). Each population within a host expires when the host dies or clears the infection.

General Aspects of Transmission and Selection

The number of genotypes colonizing a host may often be small. For example, only a few parasites may colonize a host, or all of the parasites may have come from a single donor that itself had little genetic variation among its parasites. If initial genetic variability is low, then selection within the host depends primarily on de novo mutations that arise during the population expansion of the parasites. By contrast, high initial genetic variability within hosts causes intense selection between coinfecting genotypes.

The island structure of parasite populations resembles the genetic structure of multicellular organisms when taking account of selection within individuals. Each new organism begins as a single cell or, in some clonal organisms, as a small number of progenitor cells. The individual develops as a population of cells, with the potential for selection between cellular lineages that vary genetically. Genetic variation may arise from the small number of progenitor cells or from de novo mutations. The individual transmits some of its cells to form new bodies (islands). Eventually, the individual dies.

There is some general theory on the population genetics of mutation and selection within individuals (Slatkin 1984; Buss 1987; Orive 1995; Michod 1997; Otto and Hastings 1998). Levin and Bull (1994) discussed how selection within and between hosts can shape patterns of parasite life history (reviewed by Frank 1996). But there has been little work on the consequences of island population structure for antigenic variation. Hastings and Wedgwood-Oppenheim (1997) illustrated how a quantitative theory of island-model genetics can be used to understand the buildup or decay of linkage disequilibrium.

I found one study that develops the theory of island population structure for parasites.

Genetic Variation of HIV within Individual Hosts

Rouzine and Coffin (1999) sought to explain the high genetic diversity of HIV within hosts. They developed the theory of island population structure for parasites to compare the relative strengths of natural selection and stochastic processes that can cause genetic variability.

Rouzine and Coffin (1999) focused on the pro gene, which encodes a protease that processes other HIV gene products. Analysis of nucleotide sequences for this particular gene suggested that natural selection acts primarily in a purifying way to remove deleterious mutations. Consequently, their model describes the accumulation of nucleotide diversity shaped by two opposing forces. On the one hand, stochastic effects occur because only a small number of viruses invade each host—the founders of that island. Stochastic drift during colonization allows deleterious mutations to rise in frequency. On the other hand, purifying selection within hosts removes deleterious mutations. How do the opposing forces of mutation and selection in parasites play out in the island structure of hosts?

If each new host is colonized by viruses from a single donor host, then the founding population tends to have limited genetic diversity. Low diversity causes natural selection to be weak because there is not much opportunity for competition between genetic variants. Only new mutations that arise within the host provide an opportunity to replace deleterious mutations by genetic variants that restore full fitness.

With colonization from a single donor host, the viruses in each host share a lineage of descent that is isolated from the viruses in other hosts. Isolated lineages and bottlenecks in viral numbers that occur during transmission allow the accumulation of deleterious genetic variation by drift.

Coinfection from different donor hosts mixes lineages, increases genetic variation within hosts, and greatly enhances the power of natural selection to remove deleterious variants. Rouzine and Coffin (1999) estimate that a coinfection frequency higher than 1% provides sufficiently strong selection within hosts to reduce the level of genetic variation relative to the amount of variation that accumulates by drift in isolated lineages.

If coinfection occurs more commonly than 1%, as Rouzine and Coffin (1999) believe to be likely, then some other process must explain the high levels of genetic variability observed. Rouzine and Coffin (1999) discuss an interesting type of selection that purifies within hosts but diversifies between hosts. According to their model, purifying selection within hosts removes T cell epitopes to avoid host immunity. MHC type varies between hosts, causing different T cell epitopes to be recognized by different hosts. Thus, diversifying selection acts between hosts to establish reduced recognition by MHC. Purifying selection within hosts and diversifying selection between hosts may account for the apparently paradoxical observations: nucleotide substitutions leave the signature of purifying selection, yet the viral population maintains significant genetic diversity.

Very few studies have considered how the island population structure of parasites influences the distribution of genetic diversity. As more sequences accumulate, there will be greater opportunity to match the observed patterns to the combined stochastic and selective processes that shape parasite diversity.

10.6. Problems for Future Research

1. Statistical inference

Patterns of genetic structure must be interpreted with regard to alternative models. For example, table 10.1 shows linkage between two antigenic loci of Neisseria meningitidis. I mentioned three hypotheses to explain those data: immune selection against recombinants, epidemics, and linkage with favored alleles at other loci.

Each hypothesis leads to a model dependent on several parameters. For example, the rarity of recombinant genotypes under immune selection depends on the distribution of immune profiles in hosts, the intensity of selection against the recombinant genotypes, and the frequency of recombination.

To determine if an observed pattern favors one model over another, one must understand the range of outcomes likely to follow from each model. This requires mathematical development to calculate the predicted outcomes from the different models. Then one must design sampling schemes to obtain data that can differentiate between the models. Theoretical analysis of sampling schemes can compare the information in different sampling procedures with regard to the alternative processes under study.

Technical advances will continue to improve the rate at which samples can be processed and analyzed. Improved technical facilities will allow designed sampling procedures and hypothesis testing.

2. Scale-dependent population structure

Sampling over different distances will often reveal a hierarchy of scale-dependent processes that depend on the epidemiology and demography of the parasite. It may be common to find spatial isolation at longer scales, mixing in dense aggregations at local scales, and occasional swaths of genome-wide linkage at varying scales caused by population bottlenecks or the rapid spread of epidemic strains. The relative scaling of these processes will differ greatly among parasites.

3. Different phylogenetic histories of genomic components

Very intense selection on antigenic loci can occur in parasites. This focused selection can cause different components of the genome to have different genetic structures and phylogenetic histories. I briefly mention one example to provide hints about what may happen and to encourage further work.

I described earlier in this chapter the example of influenza. In that case, Lindstrom et al. (1998) found that the two antigenic segments, hemagglutinin and neuraminidase, cotransmitted in an epidemic fashion over five years of samples. By contrast, the other six segments appeared to mix their lineages relative to the single line of cotransmitted antigenic segments. Thus, epidemically bound linkage groups may occur against a mixing genetic background. More data of this sort might show different genomic components changing their population structures relative to each other over different temporal and spatial scales. Such data could provide insight into the scale-dependent effects of demographic, genetic, and selective processes.

4. Population bottlenecks and genomic diversity

Rich et al. (2000) argue that all of the very diverse antigenic variants of Plasmodium falciparum have arisen since a recent population bottleneck that occurred less than fifty thousand years ago. Variant alleles at antigenic loci appear to trace their phylogenetic history back to common ancestors more recent than the putative bottleneck event. This pattern suggests intense natural selection favoring novel diversity at antigenic sites against a background of low genome-wide diversity caused by a recent bottleneck.

Alternatively, the antigenic variants could trace their history back to ancestors that predated the bottleneck (Hughes 1992; Hughes and Hughes 1995; Hughes and Verra 2001). This pattern arises when natural selection strongly favors rare variant antigens, holding diverse antigens in the population through the bottleneck that reduced variation in the rest of the genome. Ancient polymorphisms of this sort suggest that natural selection preserves existing variants rather than favors de novo generation of new variants (Ayala 1995; O'hUigin et al. 2000).

A recent, more detailed study by Volkman et al. (2001) estimates that the most recent common ancestor of P. falciparum lived less than ten thousand years ago. If this estimate applies to the var genes as well as the loci studied by Volkman et al. (2001), then the diverse var family of antigenic variants must have evolved very rapidly. Further studies of different genomic regions will contribute to understanding the speed of diversification in the var archival library.

5. Island structure

Many classical genetic models develop the island structure for populations (Wright 1978). However, those general studies of migration, selection, and stochastic perturbation provide little guidance for the genetic structure of parasites. Studies for parasites must account for the density and variability of host immune memory, the longevity of infections, the genetic diversity of inocula, and the patterns of genetic mixing between parasites.

Much insight can be gained by island models focused specifically on the special biology of parasites (Hastings and Wedgwood-Oppenheim 1997). Rouzine and Coffin's (1999) study shows how a clear model of population genetic process can lead to predictions about the expected patterns in the data. This suggests how one could couple process-oriented theory with the problem of statistical inference.

Copyright © 2002, Steven A Frank.
Bookshelf ID: NBK2398


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...