NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Griffiths AJF, Miller JH, Suzuki DT, et al. An Introduction to Genetic Analysis. 7th edition. New York: W. H. Freeman; 2000.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of An Introduction to Genetic Analysis

An Introduction to Genetic Analysis. 7th edition.

Show details

Accurate calculation of large map distances

In Chapter 5, we learned that the basic genetic method of measuring map distance is based on recombinant frequency (RF). A genetic map unit (m.u.) was defined as a recombinant frequency of 1 percent. This is a useful fundamental unit that has stood the test of time and is still used in genetics. However, the larger the recombinant frequency, the less accurate it is as a measure of map distance. In fact, map units calculated from larger recombinant frequencies are smaller than map units calculated from smaller recombinant frequencies. We encountered this effect in examples in Chapter 5. Typically, when measuring recombination between three linked loci, the sum of the two internal recombinant frequencies is greater than the recombinant frequency between the outside loci. With the use of such data, what is the most accurate estimate that can be made of map distance between the two outside loci A and C in the following diagram?

Image ch6fu2.jpg

The answer is that x + y is the best estimate, and more accurate than the smaller overall AC value. This gives us the following useful mapping principle:


The best estimates of map distance are obtained from the sum of the distances calculated for shorter subintervals.

However, what if we have no intervening marker loci available to measure recombination in shorter intervals? Such a situation would be commonly encountered when beginning to map a new experimental organism or in cases in which the genome is huge, as it is in human beings. For example, in the preceding diagram, what if there were no known B locus? Would we have to make do with the map distance value obtained directly from the AC recombinant frequency? Furthermore, what about the shorter intervals themselves? If there were other loci between A and B and between B and C, then we might obtain even better estimates of the A to C distance. Luckily, there is a way of taking any recombinant frequency and performing a calculation to make it a more accurate measure of map distance, without studying shorter and shorter intervals.

Before we consider the calculation, let’s think about the reason why larger RF values are less accurate measures of map distance. We have already encountered the culprit: multiple crossovers. In Chapter 5, we learned that double crossovers often lead to a parental arrangement of alleles and therefore the resulting meiotic products are not counted when measuring recombinant frequency. The same is true for other types of multiple crossovers: triples, quadruples, and so forth. So it is easy to see that multiple crossovers automatically lead to an underestimate of map distance, and, because the multiples are expected to be relatively more common over longer regions, we can see why the problem is worse for larger recombinant frequencies.

How can we take these multiple crossovers into account when calculating the map distances? What we need is a mathematical function that accurately relates recombination to map distance. In other words, what we need is a mapping function.


A mapping function is a formula for using recombinant frequencies to calculate map distances corrected for multiple crossover products.

Poisson distribution

To derive a mapping function, we need a mathematical tool widely used in genetic analysis because it is useful in describing many different types of genetic processes. This mathematical tool is the Poisson distribution. A distribution is merely a description of the frequencies of the different types of classes that arise from sampling. The Poisson distribution describes the frequency of classes containing 0, 1, 2, 3, 4, . . . , i items when the average number of items per sample is known. The Poisson distribution is particularly useful when the average is small in relation to the total number of items possible. For example, the possible number of tadpoles obtainable in a single dip of a net in a pond is quite large, but most dips yield only one or two or none. The number of dead birds on the side of a highway is potentially very large, but in a sample mile the number is usually small. Such samplings are described well by the Poisson distribution.

Let’s consider a numerical example. Suppose that we randomly distribute 100 one-dollar bills to 100 students in a lecture room, perhaps by scattering them over the class from some point near the ceiling. The average (or mean) number of bills per student is 1.0, but common sense tells us that it is very unlikely that each of the 100 students will capture one bill. We would expect a few lucky students to grab three or four bills each and quite a few students to come up with two bills each. However, we would expect most students to get either one bill or none. The Poisson distribution provides a quantitative prediction of the results.

In this example, the item being considered is the capture of a bill by a student. We want to divide the students into classes according to the number of bills each captures and then find the frequency of each class. Let m represent the mean number of items (here, m = 1.0 bill per student). Let i represent the number for a particular class (say, i = 3 for those students who get three bills each). Let f(i) represent the frequency of the i class—that is, the proportion of the 100 students who each capture i bills. The general expression for the Poisson distribution states that

Image ch6e1.jpg

where e is the base of natural logarithms (e is approximately 2.7) and ! is the factorial symbol. As examples, 3! = 3 ×  2 × 1 = 6 and 4! = 4 × 3 × 2 × 1 = 24. By definition, 0! = 1. When computing f(0), recall that any number raised to the power of 0 is defined as 1. Table 6-1 gives values of e m for m values from 0.000 to 1.000. Values for m greater than 1 can be obtained by calculation.

Table 6-1. Values of e−m for m Values of 0 to 1*.

Table 6-1

Values of e−m for m Values of 0 to 1*.

In our example, m = 1.0. Using Table 6-1, we compute the frequencies of the classes of students capturing 0, 1, 2, 3, and 4 bills as follows:

Image ch6e2.jpg

Figure 6-1 is a histogram of this distribution. We predict that about 37 students will capture no bills, about 37 will capture one bill, about 18 will capture two bills, about 6 will capture three bills, and about 2 will capture four bills. This accounts for all 100 students; in fact, you can verify that the Poisson distribution yields f(5) = 0.003, which makes it likely that no student in this sample of 100 will capture five bills.

Figure 6-1. Poisson distribution for a mean of 1.

Figure 6-1

Poisson distribution for a mean of 1.0, illustrated by a random distribution of dollar bills to students.

Similar distributions may be developed for other m values. Some are shown in Figure 6-2 as curves instead of bar histograms.

Figure 6-2. Poisson distributions for five different mean values: m is the mean number of items per sample, and i is the actual number of items per sample.

Figure 6-2

Poisson distributions for five different mean values: m is the mean number of items per sample, and i is the actual number of items per sample. (After R. R. Sokal and F. J. Rohlf, Introduction to Biostatistics. W. H. Freeman and Company, 1973.)

Derivation of a mapping function

The Poisson distribution can also describe the distribution of crossovers along a chromosome in meiosis. In any chromosomal region, the actual number of crossovers is probably small in relation to the total number of possible crossovers in that region. If crossovers are distributed randomly (that is, there is no interference), then, if we knew the mean number of crossovers in the region per meiosis, we could calculate the distribution of meioses with zero, one, two, three, four, and more multiple crossovers. This calculation is unnecessary in the present context because, as we shall see, the only class that is really crucial is the zero class. We want to correlate map distances with observable RF values. Meioses in which there are one, two, three, four, or any finite number of crossovers per meiosis all behave similarly in that they produce an RF of 50 percent among the products of those meioses, whereas the meioses with no crossovers produce an RF of 0 percent. To see how this can be so, consider a series of meioses in which nonsister chromatids do not cross over, cross over once, and cross over twice, as shown in Figure 6-3. We obtain recombinant products only from meioses with at least one crossover in the region, and always precisely half the products of such meioses are recombinant. We see then that the real determinant of the RF value is the size of the zero crossover class in relation to the rest.

Figure 6-3. Demonstration that the average RF is 50 percent for meioses in which the number of crossovers is not zero.

Figure 6-3

Demonstration that the average RF is 50 percent for meioses in which the number of crossovers is not zero. Recombinant chromatids are brown. Two-strand double crossovers produce all parental types, so all the chromatids are orange. Note that all crossovers (more...)

As noted in Figure 6-3, we consider only crossovers between nonsister chromatids; sister-chromatid exchange is thought to be rare at meiosis. If it occurs, it can be shown to have no net effect in most meiotic analyses.

At last, we can derive the mapping function. Recombinants make up half the products of those meioses having at least one crossover in the region. The proportion of meioses with at least one crossover is 1 minus the fraction with zero crossovers. The zero-class frequency will be:

Image ch6e3.jpg

which equals

Image ch6e4.jpg

So the mapping function can be stated as

Image ch6e5.jpg

This formula relates recombinant frequency to m, the mean number of crossovers. Because the whole concept of genetic mapping is based on the occurrence of crossovers, as well as proportionality between crossover frequency and the physical size of a chromosomal region, you can see that m is probably the most fundamental variable in the whole process. In fact, m could be considered to be the ultimate genetic mapping unit.

If we know an RF value, we can calculate m by solving the equation. After obtaining many values of m, we can plot the function as a graph, as in Figure 6-4. Viewing the function plotted as a graph should help us see how it works. First, notice that the function is linear for a certain range corresponding to very small m values. (Remember that m is our best measure of genetic distance.) Therefore, RF is a good measure of distance where the dashed line coincides with the function in Figure 6-4. In this region, the map unit defined as 1 percent RF has real meaning. Therefore, let’s use this region of the curve to define corrected map units by considering some small values of m:

Image ch6e6.jpg

Figure 6-4. The blue line gives the mapping function in graphic form.

Figure 6-4

The blue line gives the mapping function in graphic form. Where the blue curve and the dashed line coincide, the function is linear and RF’s estimate correct map distances well.

We see that RF = m/2, and this relation defines the dashed line in Figure 6-4. It allows us to translate m values into corrected map units. Expressing m as a percentage, we see that an m of 100 percent (=1) is the equivalent of 50 corrected map units. Because an m value of 1 is the equivalent of 50 corrected map units, we can express the horizontal axis of Figure 6-4 in our new map units. Now we can see from the graph that two loci separated by 150 corrected map units show an RF of only 50 percent. We can use the graph of the function to convert any RF into map distance simply by drawing a horizontal line from the RF value to the curve and dropping a perpendicular to the map unit axis—a process equivalent to using the equation RF = 1/2(1 − e−m) to solve for m.

Let’s consider a numerical example of the use of the mapping function. Suppose that we get an RF of 27.5 percent. How many corrected map units does this represent? From the function,

Image ch6e7.jpg


Image ch6e8.jpg

From e m tables (or by using a calculator), we find that m = 0.8, which is the equivalent of 40 corrected map units. If we had been happy to accept 27.5 percent RF as representing 27.5 map units, we would have considerably underestimated the distance between the loci.


To estimate map distances most accurately, put RF values through the mapping function. Alternatively, add distances that are each short enough to be in the region where the mapping function is linear.

A corollary of the second statement of this message is that for organisms for which the chromosomes are already well mapped, such as Drosophila, a geneticist seldom needs to calculate from the map function to place newly discovered genes on the map. This is because the map is already divided into small, marked regions by known loci. However, when the process of mapping has just begun in a new orga-nism or when the available genetic markers are sparsely distributed, the corrections provided by the function are needed.

Notice that no matter how far apart two loci are on a chromosome, we never observe an RF value of greater than 50 percent. Consequently, an RF value of 50 percent would leave us in doubt about whether two loci are linked or are on separate chromosomes. Stated another way, as m gets larger, e m gets smaller and RF approaches 1/2(1 − 0) =1/2 × 1 = 0.5, or 50 percent. This is an important point: RF values of 100 percent are not observed, no matter how far apart the loci are.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2000, W. H. Freeman and Company.
Bookshelf ID: NBK21819


  • Cite this Page
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...