NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Griffiths AJF, Miller JH, Suzuki DT, et al. An Introduction to Genetic Analysis. 7th edition. New York: W. H. Freeman; 2000.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of An Introduction to Genetic Analysis

An Introduction to Genetic Analysis. 7th edition.

Show details

Variation and its modulation

Population genetics is both an experimental and a theoretical science. On the experimental side, it provides descriptions of the actual patterns of genetic variation in populations and estimates the parameters of the processes of mating, mutation, natural selection, and random variation in reproductive rates. On the theoretical side, it makes predictions of what the genetic composition of populations should be and how they can be expected to change as a consequence of the various forces operating on them.


Population genetics is the experimental and theoretical study of the pattern of inherited variation in populations and its modulation in time and space.

Observations of variation

Population genetics necessarily deals with genotypic variation, but, by definition, only phenotypic variation can be observed. The relation between phenotype and genotype varies in simplicity from character to character. At one extreme, the phenotype may be the observed DNA sequence of a stretch of the genome. In this case, the distinction between genotype and phenotype disappears, and we can say that we are, in fact, directly observing the genotype. At the other extreme lie the bulk of characters of interest to plant and animal breeders and to most evolutionists—the variations in yield, growth rate, body shape, metabolic ratio, and behavior that constitute the obvious differences between varieties and species. These characters have a very complex relation to genotype, and we must use the methods introduced in Chapter 25 to say anything at all about the genotypes. But, as we shall see in Chapter 25, it is not possible to make very precise statements about the genotypic variation underlying quantitative characters. For that reason, most of the study of experimental population genetics has concentrated on characters with simple relations to the genotype, much like the characters studied by Mendel. A favorite object of study for human population geneticists, for example, has been the various human blood groups. The qualitatively distinct pheno-types of a given blood group—say, the MN group—are encoded by alternative alleles at a single locus, and the phenotypes are insensitive to environmental variations.

The study of variation, then, consists of two stages. The first is a description of the phenotypic variation. The second is a translation of these phenotypes into genetic terms and the redescription of the variation genetically. If there is a perfect one-to-one correspondence between genotype and phenotype, then these two steps merge into one, as in the MN blood group. If the relation is more complex—for example, as the result of dominance, heterozygotes resemble homozygotes—it may be necessary to carry out experimental crosses or to observe pedigrees to translate phenotypes into genotypes. This is the case for the human ABO blood group (see page 110).

The simplest description of Mendelian variation is the frequency distribution of genotypes in a population. Table 24-1 shows the frequency distribution of the three genotypes at the MN blood group locus in several human populations. Note that there is variation between individuals in each population, because there are different genotypes pres-ent, and there is variation in the frequencies of these genotypes from population to population. More typically, instead of the frequencies of the diploid genotypes, the frequencies of the alternative alleles are used. The frequency of an allele is simply the proportion of that allelic form of the gene among all copies of the gene in the population. There are twice as many gene copies in the population as there are individuals, because every individual is diploid and homozygotes for an allele have two copies of that allele, whereas heterozygotes have only one copy. So we calculate the frequency of an allele by counting homozygotes and adding half the heterozygotes. Thus, if the frequency of A/A individuals were, say, 0.36 and the frequency of A/a individuals were 0.48, the allele frequency of A would be 0.36 +  0.48/2 = 0.60. Box 24-1 gives the general form of this calculation. Table 24-1 shows the values of p and q, the gene frequency or allele frequency of the two alleles in the different populations.

Table 24-1. Frequencies of Genotypes for Alleles at MN Blood Group Locus in Various Human Populations.

Table 24-1

Frequencies of Genotypes for Alleles at MN Blood Group Locus in Various Human Populations.

Box Icon

Box 24-1

Calculation of Allele Frequency. If fA / A , fA / a , and fa / a are the proportions of the three genotypes at a locus with two alleles, then the frequency p(A) of the A allele and the frequency q(a) of the a allele are obtained by counting alleles. (more...)

A measure of genetic variation (in contrast with its description by gene frequencies) is the amount of heterozygosity at a locus in a population, which is given by the total frequency of heterozygotes at a locus. If one allele is in very high frequency and all others are near zero, then there will be very little heterozygosity because, by necessity, most individuals will be homozygous for the common allele. We expect heterozygosity to be greatest when there are many alleles at a locus, all at equal frequency. In Table 24-1, the heterozygosity is simply equal to the frequency of the M/N genotype in each population. When more than one locus is considered, there are two possible ways of calculating heterozygosity. Locus S (the secretor factor, determining whether the M and N proteins are also contained in the saliva) is closely linked to the M N locus in humans. Table 24-2 shows the frequencies, commonly symbolized by g’s, of the four haplotypes (M S, M s, N S, and N s) in various populations. First, we can calculate the frequency of heterozygotes at each locus separately. Alternatively, we can consider each haplotype as a unit, as in Table 24-2, and calculate the proportion of all individuals who carry two different haplotypic or gametic forms. This form of heterozygosity is also referred to as haplotype diversity or gametic diversity. The results of both calculations are given in Table 24-2. Note that the haplotype diversity is always greater than the average heterozygosity of the separate loci, because an individual is a haplotypic heterozygote if either of its loci is heterozygous. (See the discussion of the Hardy-Weinberg equilibrium in Box 24-2 on page 722 for the calculation of heterozygosity.)

Table 24-2. Frequencies of Gametic Types for MNS System in Various Human Populations.

Table 24-2

Frequencies of Gametic Types for MNS System in Various Human Populations.

Box Icon

Box 24-2

Hardy-Weinberg Equilibrium. If the frequency of allele A is p in both the sperm and the eggs and the frequency of allele a is q = 1 − p, then the consequences of random unions of sperm and eggs are shown in the (more...)

Simple Mendelian variation can be observed within and between populations of any species at various levels of phenotype, from external morphology down to the amino acid sequence of enzymes and other proteins. Indeed, with the new methods of DNA sequencing, variations in DNA sequence (such as third-position variants that are not differentially coded in amino acid sequences and even variations in nontranslated intervening sequences) have been observed. Every species of organism ever examined has revealed considerable genetic variation, or polymorphism, manifested at one or more levels of phenotype, within populations, between populations, or both. A gene or a phenotypic trait is said to be polymorphic if there is more than one form of the gene or trait in a population. Genetic variation that might be the basis for evolutionary change is ubiquitous. The tasks for population geneticists are to describe that ubiquitous variation quantitatively and to build a theory of evolutionary change that can use these observations in prediction.

It is impossible in this text to provide an adequate picture of the immense richness of genetic variation that exists in species. We can consider only a few examples of the different kinds of Mendelian variation to gain a sense of the genetic diversity within species. Each of these examples can be multiplied many times over in other species and with other traits.

Morphologic variation.  

The shell of the land snail Cepaea nemoralis may be pink or yellow, depending on two alleles at a single locus, with pink dominant to yellow. In addition, the shell may be banded or unbanded (Figure 24-1) as a result of segregation at a second linked locus, with unbanded dominant to banded. Table 24-3 shows the variation of these two loci in several European colonies of the snail. The populations also show polymorphism for the number of bands and the height of the shells, but these characters have complex genetic bases.

Figure 24-1. Shell patterns of the snail Cepaea nemoralis: (a) banded yellow; (b) unbanded pink.

Figure 24-1

Shell patterns of the snail Cepaea nemoralis: (a) banded yellow; (b) unbanded pink.

Table 24-3. Frequencies of Snails (Cepaea nemoralis) with Different Shell Colors and Banding Patterns in Three Populations in France.

Table 24-3

Frequencies of Snails (Cepaea nemoralis) with Different Shell Colors and Banding Patterns in Three Populations in France.

Examples of naturally occurring morphologic variation within plant species are Plectritis (see Figure 1-14), Collinsia (blue-eyed Mary, page 58), and clover (see Figure 4-5).

Chromosomal polymorphism.  

Although the karyotype is often regarded as a distinctive characteristic of a species, in fact, numerous species are polymorphic for chromosome number and morphology. Extra chromosomes (supernumeraries), reciprocal translocations, and inversions segregate in many populations of plants, insects, and even mammals.

Table 24-4 gives the frequencies of supernumerary chromosomes and translocation heterozygotes in a population of the plant Clarkia elegans from California. The “typical” species karyotype would be hard to identify.

Table 24-4. Frequencies of Plants with Supernumerary Chromosomes and of Translocation Heterozygotes in a Population of Clarkia elegans from California.

Table 24-4

Frequencies of Plants with Supernumerary Chromosomes and of Translocation Heterozygotes in a Population of Clarkia elegans from California.

Immunologic polymorphism.  

A number of loci in vertebrates encode antigenic specificities such as the ABO blood types. More than 40 different specificities on human red cells are known, and several hundred are known in cattle. Another major polymorphism in humans is the HLA system of cellular antigens, which are implicated in tissue graft compatibility. Table 24-5 gives the allelic frequencies for the ABO blood group locus in some very different human populations. The polymorphism for the HLA system is vastly greater. There appear to be two main loci, each with five distinguishable alleles. Thus, there are 52 = 25 different possible gametic types, making 25 different homozygous forms and (25)(24)/2 = 300 different heterozygotes. All genotypes are not phenotypically distinguishable, however; so only 121 phenotypic classes can be seen. L. L. Cavalli-Sforza and W. F. Bodmer report that, in a sample of only 100 Europeans, 53 of the 121 possible phenotypes were actually observed!

Table 24-5. Frequencies of the Alleles IA, IB, and i at the ABO Blood Group Locus in Various Human Populations.

Table 24-5

Frequencies of the Alleles IA, IB, and i at the ABO Blood Group Locus in Various Human Populations.

Protein polymorphism.  

Studies of genetic polymorphism have been carried down to the level of the polypeptides encoded by the structural genes themselves. If there is a nonredundant codon change in a structural gene (say, GGU to GAU), the result is an amino acid substitution in the polypeptide produced at translation (in this case, aspartic acid is substituted for glycine). If a specific protein could be purified and sequenced from separate individuals, then it would be possible to detect genetic variation in a population at this level. In practice, such detection is tedious for large organisms and impossible for small ones unless a large mass of protein can be produced from a homozygous line.

There is, however, a practical substitute for sequencing that makes use of the change in the physical properties of a protein when an amino acid is substituted. Five amino acids (glutamic acid, aspartic acid, arginine, lysine, and histidine) have ionizable side chains that give a protein a characteristic net charge, depending on the pH of the surrounding medium. Amino acid substitutions may directly replace one of these charged amino acids or a noncharged substitution near one of them in the polypeptide chain may affect the degree of ionization of the charged amino acid or a substitution at the joining between two α helices may cause a slight shift in the three-dimensional packing of the folded polypeptide. In all these cases, the net charge on the polypeptide will be altered because the net charge on a protein is not simply the sum of all the individual charges on its amino acids but depends on their exposure to the liquid medium surrounding them.

To detect the change in net charge, protein can be subjected to gel electrophoresis. Figure 24-2 shows the outcome of such an electrophoretic separation of variants of an esterase enzyme in Drosophila pseudoobscura, where each track is the protein from a different individual fly. Figure 24-3 shows a similar gel for different variant human hemoglobins. In this case, most individuals are heterozygous for the variant and normal hemoglobin A. Table 24-6 shows the frequencies of different alleles for three enzyme-encoding loci in D. pseudoobscura in several populations: a nearly monomorphic locus (malic dehydrogenase), a moderately polymorphic locus (α-amylase), and a highly polymorphic locus (xanthine dehydrogenase).

Figure 24-2. Electrophoretic gel showing homozygotes for three different alleles at the esterase-5 locus in Drosophila pseudoobscura.

Figure 24-2

Electrophoretic gel showing homozygotes for three different alleles at the esterase-5 locus in Drosophila pseudoobscura. Repeated samples of the same allele are identical, but there are repeatable differences between alleles.

Figure 24-3. Electrophoretic gel showing heterozygotes of normal hemoglobin A and a number of variant hemoglobin alleles.

Figure 24-3

Electrophoretic gel showing heterozygotes of normal hemoglobin A and a number of variant hemoglobin alleles. One of the dark-staining bands is marked as hemoglobin A. The second dark-staining band in each track (seen most clearly in tracks 3 and 4) represents (more...)

Table 24-6. Frequencies of Various Alleles at Three Enzyme-Encoding Loci in Four Populations of Drosophila pseudoobscura.

Table 24-6

Frequencies of Various Alleles at Three Enzyme-Encoding Loci in Four Populations of Drosophila pseudoobscura.

The technique of gel electrophoresis (or sequencing) differs fundamentally from other methods of genetic analysis in allowing the study of loci that are not segregating, because the presence of a polypeptide is prima facie evidence of a structural gene—that is, a DNA sequence encoding a protein. Thus, it has been possible to ask what proportion of all structural genes in the genome of a species is polymorphic and what the average heterozygosity is in a population. Very large numbers of species have been sampled by this method, including bacteria, fungi, higher plants, vertebrates, and invertebrates. The results are remarkably consistent over species. About one-third of structural-gene loci are polymorphic, and the average heterozygosity in a population over all loci sampled is about 10 percent. This means that scanning the genome in virtually any species would show that about 1 in every 10 loci is in heterozygous condition and that about one-third of all loci have two or more alleles segregating in any population. Thus the potential of variation for evolution is immense. The disadvantage of the electrophoretic technique is that it detects variation only in structural genes. If most of the evolution of shape, physiology, and behavior rests on changes in regulatory genetic elements, then the observed variation in structural genes is beside the point.

DNA sequence polymorphism

DNA analysis makes it possible to examine variation among individuals and between species in their DNA sequences. There are two levels at which such studies can be done. Studying variation in the sites recognized by restriction enzymes provides a coarse view of base-pair variation. At a finer level, methods of DNA sequencing allow variation to be observed base pair by base pair.

Restriction-site variation.  

A restriction enzyme that recognizes six-base sequences (a “six cutter”) will recognize an appropriate sequence approximately once every 46 = 4096 base pairs along a DNA molecule [determined from the probability that a specific base (of which there are four) will be found at each of the six positions]. If there is polymorphism in the population for one of the six bases at the recognition site, then there will be a restriction fragment length polymorphism (RFLP) in the population, because in one variant the enzyme will recognize and cut the DNA, whereas in the other variant it will not (see pages 398–399). A panel of, say, eight enzymes will then sample every 4096/8 ≅ 500 base pairs for such polymorphisms. However, when one is found, we do not know which of the six base pairs at the recognition site is polymorphic.

If we use enzymes that recognize four-base sequences (“four cutters”), there is a recognition site every 44 = 256 base pairs; so a panel of eight different enzymes can sample about once every 32 base pairs along the enzyme. In addition to single base-pair changes that destroy restrictionenzyme recognition sites, there are insertions and deletions of stretches of DNA that also cause restriction fragment lengths to vary.

Extensive samples have been made for several regions of the genome in a number of species of Drosophila with the use of both four-cutting and six-cutting enzymes. The result of one such study of the xanthine dehydrogenase gene in Drosophila pseudoobscura is shown in Figure 24-4. The figure shows, symbolically, the restriction pattern of 53 chromosomes (haplotypes) sampled from nature, polymorphic for 78 restriction sites along a sequence 4.5 kb in length. Among the 53 haplotypes, there are 48 different ones. (Try to find the identical pairs.) Clearly there is an immense amount of nucleotide variation at the xanthine dehydrogenase locus in nature.

Figure 24-4. The result of a four-cutter survey of 53 chromosomes, probed for the xanthine dehydrogenase gene in Drosophila pseudoobscura.

Figure 24-4

The result of a four-cutter survey of 53 chromosomes, probed for the xanthine dehydrogenase gene in Drosophila pseudoobscura. Each line is a chromosome (haplotype) sampled from a natural population. Each position along the line is a polymorphic restriction (more...)

Twenty restriction-enzyme studies of different regions of the X chromosome and the two large autosomes of Drosophila melanogaster have found between 0.1 and 1.0 percent heterozygosity per nucleotide site, with an average of 0.4 percent. A study of the very small fourth chromosome, however, found no polymorphism at all.

Tandem repeats

Another form of DNA sequence variation that can be revealed by restriction fragment surveys arises from the occurrence of multiply repeated DNA sequences. In the human genome, there are a variety of different short DNA sequences dispersed throughout the genome, each one of which is multiply repeated in a tandem row. The number of repeats may vary from a dozen to more than 100 in different individual genomes. Such sequences are known as variable number tandem repeats (VNTRs). If the restriction enzymes cut sequences that flank either side of such a tandem array, a fragment will be produced whose size is proportional to the number of repeated elements. The different-sized fragments will migrate at different rates in an electrophoretic gel. Unfortunately, the individual elements are too short to allow distinguishing between, say, 64 and 68 repeats, but size classes (bins) can be established, and a population can be assayed for the frequencies of the different classes. Table 24-7 shows the data for two different VNTRs sampled in two American Indian groups from Brazil. In one case, D14S1, the Karitiana are nearly homozygous, whereas the Surui are very variable; in the other case, D14S13, both populations are variable but with different frequency patterns.

Table 24-7. Size Class Frequencies for Two Different VNTR Sequences, D14S1 and D14S13, in the Karitiana and Surui of Brazil.

Table 24-7

Size Class Frequencies for Two Different VNTR Sequences, D14S1 and D14S13, in the Karitiana and Surui of Brazil.

Complete sequence variation.  

Studies of variation at the level of single base pairs by DNA sequencing can provide information of two kinds. First, translating the sequences of coding regions obtained from different individuals in a population or from different species allows the exact amino acid sequence differences to be determined. Electrophoretic studies can show that there is variation in amino acid sequences but cannot identify how many or which amino acids differ between individuals. So, when DNA sequences were obtained for the various electrophoretic variants of esterase-5 in Drosophila pseudoobscura (see Figure 24-2), electrophoretic classes were found to differ from each other by an average of 8 amino acids, and the 20 different kinds of amino acids were involved in polymorphisms at about the frequency that they were present in the protein. Such studies also show that different regions of the same protein have different amounts of polymorphism. For the esterase-5 protein, consisting of 545 amino acids, 7 percent of amino acid positions are polymorphic, but the last amino acids at the carboxyl terminus of the protein are totally invariant between individuals, probably because of functional constraints on these amino acids.

Second, DNA base-pair variation can also be studied for those base pairs that do not determine or change the protein sequence. Such base-pair variation can be found in DNA in introns, in 5′-flanking sequences that may be regulatory, in nontranscribed DNA 3′ to the gene, and in those nucleotide positions within codons (usually third positions) whose variation does not result in amino acid substitutions. Within coding sequences, these so-called silent or synonymous base-pair polymorphisms are much more common than are changes that result in amino acid polymorphism, presumably because many amino acid changes interfere with normal function of the protein and are eliminated by natural selection. An examination of the codon translation table (see Figure 10-27) shows that approximately 25 percent of all random base-pair changes would be synonymous, giving an alternative codon for the same amino acid, whereas 75 percent of random changes would change the amino acid coded. For example, a change from AAT to AAC still encodes asparagine, but a change to ATT, ACT, AAA, AAG, AGT, TAT, CAT, or GAT, all single-base-pair changes from AAT, changes the amino acid encoded. So, if mutations of base pairs are at random and if the substitution of an amino acid made no difference to function, we would expect a 3:1 ratio of amino acid replacement to silent polymorphisms. The actual ratios found in Drosophila vary from 2:1 to 1:10. Clearly, there is a great excess of synonymous polymorphism, showing that most amino acid changes are subject to natural selection. It should not be assumed, however, that silent sites in coding sequences are entirely free from constraints. Different alternative triplet codings for the same amino acid may differ in speed and accuracy of transcription, and the mRNA corresponding to different alternative triplets may have different accuracy and speed of translation because of limitations on the pool of tRNAs available. Evidence for the latter effect is that alternative synonymous triplets for an amino acid are not used equally, and the inequality of use is much more pronounced for genes that are transcribed at a very high rate.

There are also constraints on 5′ and 3′ noncoding sequences and on intron sequences. Both 5′ and 3′ noncoding DNA sequences contain signals for transcription, and introns may contain enhancers of transcription (see Chapter 11).


Within species, there is great genetic variation. This variation is manifest at the morphologic level of chromosome form and number and at the level of DNA segments that may have no observable developmental effects.

Quantitative variation

Not all variation in traits can be described in terms of allelic frequencies, because many characteristics, such as height, vary continuously over a range rather than falling into a few qualitatively distinct classes. There is no allele for being 5′8″ or 5′4″ tall. Such characters, if they are varying as a consequence of genetic variation, will be affected by several or many genes and by environmental variation as well. Special techniques are needed for the study of such quantitative traits, and these techniques are presented in Chapter 25. For the moment, we confine ourselves to the question of whether genetic differences between individuals affect the trait at all. In experimental organisms, a simple way to answer this question is to choose two groups of parents that differ markedly in the trait and to raise offspring from both groups in the same environment. If the offspring of the two groups are different, then the trait is said to be heritable (see Chapter 25 for a more detailed discussion of the concept and estimation of heritability). A simple measure of the degree of heritability of the variation is the ratio of the difference between the offspring groups to the difference between the parental groups. So, if two groups of Drosophila parents differed by, say, 0.1 mg in weight, whereas the offspring groups, raised in identical environments, differed by 0.03 mg, the heritability of weight difference would be estimated as 30 percent. When this technique is applied to morphologic variation in Drosophila, virtually every variable trait is found to have some heritability. It is important to note that this method cannot be applied to organisms for which no rigorous control over developmental environment is possible. In humans, for example, children of different parental groups differ from one another not only because their genetic makeup is different, but also because the environments of different families, social classes, and nations are different. Japanese are, on the average, shorter than Europeans, but the difference between children of Japanese ancestry and children of European ancestry, both born in North America, is less and becomes even less in the second generation, presumably because of diet. It is not clear whether all the differences in height would disappear or even be reversed if the family environments were identical.

Image ch1f14a
Image ch4f5
Image ch10f27

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2000, W. H. Freeman and Company.
Bookshelf ID: NBK22048


  • Cite this Page
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...