Poisson distribution for a mean of 1.0, illustrated by a random distribution of dollar bills to students.
In Chapter 5, we learned that the basic genetic method of measuring map distance is based on recombinant frequency (RF). A genetic map unit (m.u.) was defined as a recombinant frequency of 1 percent. This is a useful fundamental unit that has stood the test of time and is still used in genetics. However, the larger the recombinant frequency, the less accurate it is as a measure of map distance. In fact, map units calculated from larger recombinant frequencies are smaller than map units calculated from smaller recombinant frequencies. We encountered this effect in examples in Chapter 5. Typically, when measuring recombination between three linked loci, the sum of the two internal recombinant frequencies is greater than the recombinant frequency between the outside loci. With the use of such data, what is the most accurate estimate that can be made of map distance between the two outside loci A and C in the following diagram?

The answer is that x + y is the best estimate, and more accurate than the smaller overall A–C value. This gives us the following useful mapping principle:
The best estimates of map distance are obtained from the sum of the distances calculated for shorter subintervals.
However, what if we have no intervening marker loci available to measure recombination in shorter intervals? Such a situation would be commonly encountered when beginning to map a new experimental organism or in cases in which the genome is huge, as it is in human beings. For example, in the preceding diagram, what if there were no known B locus? Would we have to make do with the map distance value obtained directly from the A–C recombinant frequency? Furthermore, what about the shorter intervals themselves? If there were other loci between A and B and between B and C, then we might obtain even better estimates of the A to C distance. Luckily, there is a way of taking any recombinant frequency and performing a calculation to make it a more accurate measure of map distance, without studying shorter and shorter intervals.
Before we consider the calculation, let’s think about the reason why larger RF values are less accurate measures of map distance. We have already encountered the culprit: multiple crossovers. In Chapter 5, we learned that double crossovers often lead to a parental arrangement of alleles and therefore the resulting meiotic products are not counted when measuring recombinant frequency. The same is true for other types of multiple crossovers: triples, quadruples, and so forth. So it is easy to see that multiple crossovers automatically lead to an underestimate of map distance, and, because the multiples are expected to be relatively more common over longer regions, we can see why the problem is worse for larger recombinant frequencies.
How can we take these multiple crossovers into account when calculating the map distances? What we need is a mathematical function that accurately relates recombination to map distance. In other words, what we need is a mapping function.
A mapping function is a formula for using recombinant frequencies to calculate map distances corrected for multiple crossover products.
To derive a mapping function, we need a mathematical tool widely used in genetic analysis
because it is useful in describing many different types of genetic processes. This mathematical
tool is the Poisson distribution. A distribution
is merely a description of the frequencies of the different types of classes that arise from
sampling. The Poisson distribution describes the frequency of classes containing 0, 1, 2, 3,
4,
.
.
.
, i items when the average number of items per sample is known. The
Poisson distribution is particularly useful when the average is small in relation to the total
number of items possible. For example, the possible number of tadpoles obtainable in a single
dip of a net in a pond is quite large, but most dips yield only one or two or none. The number
of dead birds on the side of a highway is potentially very large, but in a sample mile the
number is usually small. Such samplings are described well by the Poisson distribution.
Let’s consider a numerical example. Suppose that we randomly distribute 100 one-dollar bills to 100 students in a lecture room, perhaps by scattering them over the class from some point near the ceiling. The average (or mean) number of bills per student is 1.0, but common sense tells us that it is very unlikely that each of the 100 students will capture one bill. We would expect a few lucky students to grab three or four bills each and quite a few students to come up with two bills each. However, we would expect most students to get either one bill or none. The Poisson distribution provides a quantitative prediction of the results.
In this example, the item being considered is the capture of a bill by a student. We want to divide the students into classes according to the number of bills each captures and then find the frequency of each class. Let m represent the mean number of items (here, m = 1.0 bill per student). Let i represent the number for a particular class (say, i = 3 for those students who get three bills each). Let f(i) represent the frequency of the i class—that is, the proportion of the 100 students who each capture i bills. The general expression for the Poisson distribution states that

| m | e−m | m | e−m |
|---|---|---|---|
| 0.000 | 1.00000 | 0.550 | 0.57695 |
| 0.050 | 0.95123 | 0.600 | 0.54881 |
| 0.100 | 0.90484 | 0.650 | 0.52205 |
| 0.150 | 0.86071 | 0.700 | 0.49659 |
| 0.200 | 0.81873 | 0.750 | 0.47237 |
| 0.250 | 0.77880 | 0.800 | 0.44933 |
| 0.300 | 0.74082 | 0.850 | 0.42741 |
| 0.350 | 0.70469 | 0.900 | 0.40657 |
| 0.400 | 0.67032 | 0.950 | 0.38674 |
| 0.450 | 0.63763 | 1.000 | 0.36788 |
| 0.500 | 0.60653 |
Values for m greater than 1 can be obtained from an electronic calculator or by using logarithms.
Source: F. James Rohlf and Robert R. Sokal, Statistical Tables, 3d ed. W. H. Freeman and Company, 1995.

Poisson distribution for a mean of 1.0, illustrated by a random distribution of dollar bills to students.
=
0.003, which makes it likely that no student in this sample of
100 will capture five bills.
Poisson distributions for five different mean values: m is the mean number of items per sample, and i is the actual number of items per sample. (After R. R. Sokal and F. J. Rohlf, Introduction to Biostatistics. W. H. Freeman and Company, 1973.)
Demonstration that the average RF is 50 percent for meioses in which the number of crossovers is not zero. Recombinant chromatids are brown. Two-strand double crossovers produce all parental types, so all the chromatids are orange. Note that all crossovers are between nonsister chromatids. Try the triple crossover class yourself.
At last, we can derive the mapping function. Recombinants make up half the products of those meioses having at least one crossover in the region. The proportion of meioses with at least one crossover is 1 minus the fraction with zero crossovers. The zero-class frequency will be:

which equals

So the mapping function can be stated as

This formula relates recombinant frequency to m, the mean number of crossovers. Because the whole concept of genetic mapping is based on the occurrence of crossovers, as well as proportionality between crossover frequency and the physical size of a chromosomal region, you can see that m is probably the most fundamental variable in the whole process. In fact, m could be considered to be the ultimate genetic mapping unit.
The blue line gives the mapping function in graphic form. Where the blue curve and the dashed line coincide, the function is linear and RF’s estimate correct map distances well.

=
m/2, and this relation defines the dashed line in Figure 6-4
=
1/2(1
−
e−m) to solve
for m.
Let’s consider a numerical example of the use of the mapping function. Suppose that we get an RF of 27.5 percent. How many corrected map units does this represent? From the function,

Therefore

From e−m tables (or by using a calculator), we find that m
=
0.8, which is
the equivalent of 40 corrected map units. If we had been happy to accept 27.5 percent RF as
representing 27.5 map units, we would have considerably underestimated the distance between the
loci.
To estimate map distances most accurately, put RF values through the mapping function. Alternatively, add distances that are each short enough to be in the region where the mapping function is linear.
A corollary of the second statement of this message is that for organisms for which the chromosomes are already well mapped, such as Drosophila, a geneticist seldom needs to calculate from the map function to place newly discovered genes on the map. This is because the map is already divided into small, marked regions by known loci. However, when the process of mapping has just begun in a new orga-nism or when the available genetic markers are sparsely distributed, the corrections provided by the function are needed.
Notice that no matter how far apart two loci are on a chromosome, we never observe an RF
value of greater than 50 percent. Consequently, an RF value of 50 percent would leave us in
doubt about whether two loci are linked or are on separate chromosomes. Stated another way, as
m gets larger, e−m gets smaller and RF approaches 1/2(1
−
0)
=1/2
×
1
=
0.5, or 50 percent. This is an
important point: RF values of 100 percent are not observed, no matter how far apart the loci
are.