• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS Comput Biol. May 2012; 8(5): e1002462.
Published online May 17, 2012. doi:  10.1371/journal.pcbi.1002462
PMCID: PMC3355070

Crossing Over…Markov Meets Mendel

Fran Lewitter, Editor

Abstract

Chromosomal crossover is a biological mechanism to combine parental traits. It is perhaps the first mechanism ever taught in any introductory biology class. The formulation of crossover, and resulting recombination, came about 100 years after Mendel's famous experiments. To a great extent, this formulation is consistent with the basic genetic findings of Mendel. More importantly, it provides a mathematical insight for his two laws (and corrects them). From a mathematical perspective, and while it retains similarities, genetic recombination guarantees diversity so that we do not rapidly converge to the same being. It is this diversity that made the study of biology possible. In particular, the problem of genetic mapping and linkage—one of the first efforts towards a computational approach to biology—relies heavily on the mathematical foundation of crossover and recombination. Nevertheless, as students we often overlook the mathematics of these phenomena. Emphasizing the mathematical aspect of Mendel's laws through crossover and recombination will prepare the students to make an early realization that biology, in addition to being experimental, IS a computational science. This can serve as a first step towards a broader curricular transformation in teaching biological sciences. I will show that a simple and modern treatment of Mendel's laws using a Markov chain will make this step possible, and it will only require basic college-level probability and calculus. My personal teaching experience confirms that students WANT to know Markov chains because they hear about them from bioinformaticists all the time. This entire exposition is based on three homework problems that I designed for a course in computational biology. A typical reader is, therefore, an instructional staff member or a student in a computational field (e.g., computer science, mathematics, statistics, computational biology, bioinformatics). However, other students may easily follow by omitting the mathematically more elaborate parts. I kept those as separate sections in the exposition.

Introduction

Mendel and High School Biology

Sexually reproducing organisms generally combine heritable traits from two parents. The biological process that combines those traits is called meiosis. While mutations could occur during meiosis, most of the variation arises from the combinations of parental traits. How do these parental traits combine? The dominant theory was that some sort of blending or averaging took place. However, such a mode of inheritance would result in an average of all ancestors after only a modest number of generations (imagine repeatedly mixing colors). Instead, by performing experiments on plants, Mendel pointed out the existence of discrete elements that combine but do not mix. Figure 1 shows the simulated number of types of individuals as a function of time. Averaging, with traits taking real values in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e001.jpg, is used on one population, and the model described in the section “A Simple Model”, with elements (later called alleles) taking discrete values in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e002.jpg, is used on another. Mutations are ignored. In both cases, a population size of 100 is kept constant for the entire duration of the simulation (100 time steps). The simulation is repeated 1,000 times to obtain an average for each time step.

Figure 1
Fast convergence of inheritance by averaging.

Mendel formulated the concept of a gene (unit of inheritance), and hypothesized that inheritance is governed by the following two laws of genetics:

  1. Segregation: Each sexually reproducing organism has two alleles (copies) for each gene, one inherited from each parent; and in turn will contribute, with equal probability (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e003.jpg), only one of these two alleles.
  2. Independent assortment: Alleles of different genes are inherited independently (later deemed not so accurate).

The state of a gene, the genotype, is determined by the two alleles. The resulting trait, the phenotype, is then a function of this state. When the alleles are the same, the gene, or equivalently the genotype, is homozygous; otherwise, it is heterozygous. For example, if an allele can be either An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e004.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e005.jpg, then the possible genotypes are An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e006.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e007.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e008.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e009.jpg. Table 1 shows the possible segregations of parental genotypes when at least one of them is heterozygous.

Table 1
Genotypes.

In a dominant/recessive mode where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e034.jpg is dominant, the corresponding phenotype is obtained as a function of the genotype as shown in Table 2, leading to a 3[ratio]1 ratio, a 1[ratio]1 ratio, and a 1[ratio]0 ratio of dominant to recessive phenotypes, respectively.

Table 2
Phenotypes.

Students often overlook that these ratios are not simply based on counting the entries, but the result of the segregation law: each allele is contributed with equal probability, i.e., An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e059.jpg, resulting in a probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e060.jpg for each entry in the tables. Table 3 shows another example involving two heterozygous dominant/recessive genotypes that lead to a 9[ratio]3[ratio]3[ratio]1 ratio of phenotypes. In addition to the segregation law, students should be reminded that this ratio assumes that the law of independent assortment holds: alleles of different genes are inherited independently, resulting in a probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e061.jpg for each assortment (refer to the next section for a mathematical definition of independence), thus a probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e062.jpg for each entry in the table.

Table 3
Phenotypes for two heterozygous genotypes.

Chromosome, Crossover, and Recombination

About 100 years later, it was established that the physical structure underlying Mendel's laws is the chromosome (for simplicity, a long molecule of DNA). This discovery matched Mendel's experiments really well: In diploid organisms like us chromosomes come in pairs (thus the name diploid), one from each parent! With few exceptions, each chromosome of the pair has copies of the same genes (special stretches of DNA) arranged in the same order: the alleles! In an attempt to explain experimental results and confirm Mendel's laws, chromosomal crossover was formulated and described by Thomas Morgan (coincidentally, his student John Northrop was a teacher of botany at Hunter College, the author's institution), but demonstrated only about 20 years later. Crossover is a mechanism that occurs at the early stages of the meiotic prophase, and combines the two chromosomes of the pair into one, a process called genetic recombination. During this process, the chromosome of the pair that is the source of the allele alternates every so often. Exactly when the switch—the crossover—happens is almost arbitrary.

When two alleles come from different chromosomes of the pair, their corresponding genes are said to recombine (can you identify the recombinations in Table 3?). Figure 2 illustrates a genetic recombination with one crossover.

Figure 2
One chromosomal crossover and a genetic recombination.

A Slight Discrepancy and Genetic Linkage

Mendel's laws (segregation and independent assortment) dictate that genetic recombination occurs with a probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e091.jpg. Let's re-examine why this holds true. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e092.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e093.jpg be the two alleles of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e094.jpg on the two chromosomes. Similarly, let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e095.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e096.jpg represent the same for gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e097.jpg, respectively. Chromosomal crossover will result in recombination of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e098.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e099.jpg if one of the two assortments An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e100.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e101.jpg occurs. Since each allele is contributed with equal probability (segregation), both An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e102.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e103.jpg are contributed with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e104.jpg. Since alleles of different genes are inherited independently (independent assortment), the assortment An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e105.jpg occurs with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e106.jpg (refer to the next section for a mathematical definition of independence). The same analysis applies for the assortment An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e107.jpg, leading to an overall recombination probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e108.jpg.

However, it has been observed that some pairs of genes show a correlation in their alleles, e.g., their probability of recombination is less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e109.jpg. In this case, there is a linkage between the genes. How can we now incorporate this notion into the mathematics of Mendel's laws, which so far have relied on the fact that genes are not correlated (assorted independently)? Fortunately, a simple probabilistic model based on Figure 2 (1 crossover) will capture the effect of linkage, and as a result, alleles that are near each other on a chromosome will tend to be inherited together. The inaccuracy of Mendel's law of independent assortment lies therein. Nevertheless, one should still expect that genes which are far from each other on a chromosome (or on different chromosomes altogether) will assort independently, as Mendel once observed. It will require a better probabilistic model to reflect those two contradictory behaviors (genetic linkage and independence); the later introduction of the Markov chain will take care of this. But first, I will present a simple probabilistic model for genetic linkage. And before doing so, let's review some basic mathematics.

What Do We Need to Know?

Probability

Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e110.jpg. A subset of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e111.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e112.jpg, is considered as an event (but not all events are subsets of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e113.jpg). Given a variable An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e114.jpg, define the following probabilities of events:

equation image
equation image

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e117.jpg denotes the size of a set. So An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e118.jpg. The negation of an event will always satisfy:

equation image

Given two events An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e120.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e121.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e122.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e123.jpg are exclusive (cannot occur together) if and only if

equation image

Given two events An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e125.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e126.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e127.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e128.jpg are independent if and only if

equation image

For instance, if An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e130.jpg is an event of probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e131.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e132.jpg, then An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e133.jpg. In general, however, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e134.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e135.jpg may not be independent. So we define the probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e136.jpg conditional on An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e137.jpg, i.e., the probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e138.jpg given that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e139.jpg occurs.

equation image

For instance, let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e141.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e142.jpg with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e143.jpg. Note that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e144.jpg. Then,

equation image

Matrix Multiplication

I will assume some familiarity with matrices. If, however, this notion is unfamiliar, the parts of the exposition that use matrices may be skipped. Only An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e146.jpg matrices will be considered in this exposition. The multiplication of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e147.jpg matrices is defined below.

equation image
equation image

Geometric Series

One of the series that is almost invariably covered in basic calculus is the geometric series.

equation image

Exponential Limit

This is one of the basic expressions covered when studying limits.

equation image

Therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e152.jpg for large An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e153.jpg.

Logarithm

Here's the definition of natural logarithm and some of its properties:

equation image
equation image
equation image

Harmonic Series

Another famous encounter is the harmonic series and its approximation.

equation image

Derivatives

A function An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e158.jpg reaches a local maximum or minimum when its derivative An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e159.jpg. Here are some examples of derivatives:

equation image
equation image
equation image

A Simple Model

Motivated by Figure 2, a uniform 1-crossover model can be constructed as follows: Consider a chromosome with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e163.jpg genes, i.e., An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e164.jpg alleles on each chromosome of the pair. A crossover An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e165.jpg is equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e166.jpg if it separates gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e167.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e168.jpg, where gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e169.jpg is hypothetical when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e170.jpg, i.e., no crossover. Assume that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e171.jpg is uniform in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e172.jpg (thus the name of the model).

Linkage

Based on the above setting, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e173.jpg takes any value in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e174.jpg with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e175.jpg. Two genes at a distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e176.jpg, say An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e177.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e178.jpg, will recombine if An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e179.jpg is in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e180.jpg, i.e., with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e181.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e182.jpg times),

equation image

This confirms that genes within a close distance (small An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e184.jpg) on the chromosome are less likely to be subject to recombination (genetic linkage). Genes that are far apart (large An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e185.jpg) have a high probability (up to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e186.jpg) of recombination, but are they independent (see “What Is Wrong” section)?

Segregation

To find the probability that a given allele of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e187.jpg is inherited, let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e188.jpg with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e189.jpg be the event that the recombination process starts on the given chromosome of the pair. This event and that genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e190.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e191.jpg recombine (an event of probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e192.jpg) are independent. The probability of inheriting the given allele is:

equation image
equation image

The addition is justified by the exclusivity of the events: a given allele is inherited when the process starts on the given chromosome and genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e195.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e196.jpg do not recombine, or when the process starts on the other chromosome and genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e197.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e198.jpg recombine. Due to the independence of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e199.jpg and recombination, the above becomes:

equation image

A reasonable assumption is that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e201.jpg and, in this case, the above evaluates to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e202.jpg for every An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e203.jpg, as predicted by the segregation law.

Genetic Mapping

Genetic mapping is the problem of placing the genes along the chromosome in their correct relative order. The bad news: It is hard! The good news: Genetic linkage can be used to infer genetic mapping. Though obsolete (it has been done), genetic mapping can be considered to be the first effort towards a computational approach to biology. How does it work?

In the uniform 1-crossover model, genetic linkage tells us that the probability of recombination of two genes is proportional to the distance between these genes. Now consider the genotyping depicted in Table 4 where frequency of recombination can be used as a measure of distance. In a way analogous to Table 4, analyzing the frequency of different pairs of the phenotypes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e204.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e205.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e206.jpg might reveal, for instance, that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e207.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e208.jpg recombine more often than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e209.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e210.jpg; therefore, we infer that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e211.jpg is closer to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e212.jpg than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e213.jpg. Such arguments help us to derive the gene order on the chromosome (relative order, not exact distances). While it may be hard to set up the experiment and obtain many offsprings to estimate probabilities, such arguments were definitely behind the construction of the early genetic maps, e.g., the first map of the human genome (all the chromosomes) in 1987.

Table 4
Frequency and distance.

What Is Wrong?

The reader may choose to skip this section to the next. The uniform 1-crossover model is very insightful in explaining Mendel's law of segregation with independent assortment corrected to reflect genetic linkage. However, it suffers from a few deficiencies.

Linkage: OK But…

Nothing is seriously wrong about this aspect. By assigning lower probabilities of recombination for smaller distances, the distance between two genes justifies their linkage when they do not assort independently. However, the actual probability of recombination may not necessarily be proportional to distance or have a dependence on the chromosome length, as in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e226.jpg (but more on this in the Markov section).

Segregation: Too Sensitive

The probability of inheriting a given allele is contingent on the probability that the recombination process starts on the given chromosome of the pair, previously called An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e227.jpg. If An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e228.jpg, the probability of inheriting a given allele is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e229.jpg, as it should be by the segregation law. While this is a biologically reasonable assumption on An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e230.jpg, the segregation law stands very sensitive to this particular choice. A slight deviation from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e231.jpg could result in a similar deviation in the probability of inheriting the given allele. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e232.jpg, then this probability for gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e233.jpg is (from the “Segregation” section):

equation image

When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e235.jpg, i.e., An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e236.jpg, this is approximately An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e237.jpg. If the starting of the recombination process favors one chromosome, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e238.jpg can be large, say close to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e239.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e240.jpg). The above probability becomes arbitrarily close to 1. This means that the given allele will be inherited almost always.

Independent Assortment: Breaks

Despite genetic linkage, one should still expect that genes which are far from each other on the chromosome will assort independently. Because each chromosome can be treated separately, this independence is certainly true for genes that are on different chromosomes altogether. But on the same chromosome, the probability of recombination An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e241.jpg implies, for instance, that recombination of gene 1 and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e242.jpg occurs with a probability of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e243.jpg for large values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e244.jpg. Therefore, gene 1 and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e245.jpg are highly correlated, and thus dependent (they will almost always recombine).

In retrospect, two genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e246.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e247.jpg recombine when the alleles of the two genes are inherited from different chromosomes. Since the probability of inheriting a given allele is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e248.jpg when the segregation law holds, independence then dictates that the probability of recombination of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e249.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e250.jpg must be equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e251.jpg. To see this, let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e252.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e253.jpg represent the events of inheriting a given allele for gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e254.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e255.jpg, respectively, then:

equation image
equation image
equation image
equation image

where addition is justified by exclusivity of events, and the last equality follows from that gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e260.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e261.jpg are independent. When the segregation law holds, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e262.jpg and the above expression evaluates to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e263.jpg. Assuming An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e264.jpg in the previous section is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e265.jpg, genes are independent if and only if An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e266.jpg. Therefore, the law of independent assortment fails when genes are on the same chromosome.

Now, why do we insist that the model must satisfy, among other properties, the law of independent assortment? Well, first because it is a correct law for distant genes. And second, since the probability of recombination increases with distance due to genetic linkage, the law of independent assortment tells us that the probability of recombination increases up to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e267.jpg, but cannot exceed An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e268.jpg (this statement excludes hotspots, which are regions on the chromosome that experience a high probability of recombination even at small distances). It is important for students to make this realization, which will come in handy when solving genetic mapping problems, as illustrated in the section “A Computational Example of Genetic Mapping”.

Generalization: Not Easy

One might consider extending the uniform 1-crossover model as an attempt of generalization to mimic the actual biological process. However, I will show that extending this model in the most natural way (mathematically, that is) will break the linkage property. For this purpose, consider a uniform 2-crossover model. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e269.jpg be the first crossover which is uniform in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e270.jpg (as before), and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e271.jpg be the second crossover which, conditional on An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e272.jpg, is uniform in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e273.jpg. Therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e274.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e275.jpg are not independent, for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e276.jpg cannot precede An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e277.jpg. The choice of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e278.jpg simplifies the math, but making An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e279.jpg does not change the results.

Now, why even bother to show that this model, which is more difficult to analyze than its predecessor, does not work? Well, my experience in teaching has been the following: While it is important to show students what works, it is equally important to show them what does not work.

With this in mind, all we need is a counter example, so consider gene 1 and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e280.jpg (these two genes are at a distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e281.jpg from each other). The probability of a recombination of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e282.jpg and gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e283.jpg is:

equation image

Using conditional probability and the harmonic series approximation, the “Uniform 2-crossover Model” section shows that when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e285.jpg is large, this probability is approximately

equation image

We can rewrite the above as:

equation image

This is not an increasing function of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e288.jpg. In fact, consider An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e289.jpg. This function has a maximum of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e290.jpg when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e291.jpg. Therefore, we have the highest probability of recombination when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e292.jpg, i.e., An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e293.jpg. Note that in this case An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e294.jpg, which is large (as required above) when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e295.jpg is large. This means that gene 1 is most likely to recombine with a gene located at a distance approximately 63% of the chromosome length (see Figure 3). While this is an interesting result, it stands as a pure mathematical endeavor with no biological basis.

Figure 3
The uniform 2-crossover model.

A Better Model: When Markov Meets Mendel

While the uniform 1-crossover model captures the essentials of segregation and linkage, it is lacking in some important aspects. First, the probability that a given allele is inherited (should be An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e298.jpg) depends on an implicit parameter of the model (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e299.jpg in the “Segregation” section must be An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e300.jpg). Second, genes exhibit the linkage property but they are almost never independent, as this would require a probability of recombination equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e301.jpg (see “A Slight Discrepancy and Genetic Linkage” section). From the “Linkage” section, this probability is expressed as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e302.jpg, implying that only genes at a distance equal to half the chromosome length are independent. Moreover, the probability of recombination depends on the chromosome length and, therefore, two chromosomes that are locally similar but have different lengths exhibit different local recombination behavior. This is not biologically justifiable. Finally, a generalization (with uniformity maintained) to mimic the real biological process with multiple crossovers is not conceivable.

A better mathematical model is needed to rectify the above deficiencies. In principle, the model should satisfy the following three laws with multiple crossovers:

  1. Segregation: The probability that a given allele of the gene is inherited is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e303.jpg.
  2. Linkage (missed by Mendel): The probability of recombination of two genes is an increasing function of the distance between them, so it is higher for distant genes. Nevertheless, it should not depend on the chromosome length.
  3. Independent assortment: This is impossible due to linkage where distance is a determining factor in the recombination. The alternative is to require genes to be asymptotically independent. As a result, the probability of recombination must approach An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e304.jpg when the distance between the two genes becomes large.

Being a computer scientist by training and not a biologist, when I first suggested to my students a model based on a Markov chain, I called it the jumping model of recombination. I also expressed to them my concern that it may not be real, but as it turned out, it made perfect sense. To be loyal to my first terminology, I will call it here the jumping model.

The Jumping Model

The jumping model is based on a Markov chain. A Markov chain consists of a set of states with probabilities of transition between them (thus the jumping term). For computer scientists, this is often illustrated as a directed weighted graph with vertices representing the states and directed edges representing the transitions between states. The weight of an edge is the probability of the corresponding transition. This is shown in Figure 4 for a Markov chain with two states. Operationally, one would start at a given state and follow transitions in discrete time steps as indicated by their probabilities, thus changing state from one step to another. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e305.jpg be the probability of transition from state An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e306.jpg to state An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e307.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e308.jpg be the state at time step An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e309.jpg. Figure 4 shows a transition probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e310.jpg between the two states (and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e311.jpg to the same state, because the transition probabilities of a given state must sum up to 1). A generalized notion of a transition is captured by a conditional probability with the following property:

Figure 4
A simple Markov chain.

Markov property: For An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e313.jpg,

equation image

When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e315.jpg, this probability is the transition probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e316.jpg. In the event An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e317.jpg only An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e318.jpg is relevant. In other words, the probability of a state at a given time depends only on the most recently known state.

What is the biological significance of the Markov chain in Figure 4? Each state represents a chromosome of the pair, and time in the Markov chain corresponds to genes on the chromosome. A transition between states in one time step signifies a crossover, and the probability of such a crossover is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e319.jpg. Therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e320.jpg represents a crossover when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e321.jpg. One could then inquire about the probability of being in a given state at a given time. The event of being in a given state at time An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e322.jpg parallels the event that the corresponding chromosome is the source of the allele for gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e323.jpg. This is illustrated in Figure 5 by conceptually duplicating the chain for each gene to reflect the change of state over time.

Figure 5
Crossover and recombination as a Markov chain.

A useful representation of a Markov chain is by a matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e326.jpg where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e327.jpg (the term in the An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e328.jpg row and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e329.jpg column of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e330.jpg) is the probability of transition from state An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e331.jpg to state An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e332.jpg; therefore, every row in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e333.jpg must add up to 1. If we call the states in Figure 4 state 1 and state 2, then our Markov chain can be expressed as:

equation image

In this matrix, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e335.jpg can be interpreted as An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e336.jpg. Why is this matrix representation useful? Let's multiply An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e337.jpg by itself:

equation image

Note for instance that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e339.jpg is equal to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e340.jpg, because to transition from 1 to 2 in two time steps we can transition from 1 to 1 to 2 with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e341.jpg or from 1 to 2 to 2 with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e342.jpg. As it turns out, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e343.jpg. The proof of this fact is in the “Markov Transitional Probabilities” section and uses conditional probability and the Markov property. Thus, every row in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e344.jpg must also add up to 1.

Because An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e345.jpg is a symmetric matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e346.jpg, a final note is that all powers of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e347.jpg are symmetric matrices. Therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e348.jpg, which now implies that every column in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e349.jpg must also add up to 1. We can finally establish that the probability of recombination is

equation image
equation image
equation image
equation image
equation image
equation image

Segregation and Independent Assortment

Following the logic of previous sections, the probability that a given allele of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e356.jpg is inherited is:

equation image

Again, if An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e358.jpg the above probability is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e359.jpg, which makes the jumping model subject to the same sensitivity to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e360.jpg as the uniform 1-crossover model. However, this can now be alleviated. The theory of Markov chains tell us that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e361.jpg will converge for large values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e362.jpg and all rows of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e363.jpg become identical. Therefore, the rows will define a steady state probability for each state. In other words, the effect of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e364.jpg will be washed out. This theory will not be presented here, but Figure 6 shows a few powers of a given matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e365.jpg.

Figure 6
Convergence to steady state probabilities.

Because An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e367.jpg is symmetric in our case,

equation image
equation image
equation image

Since rows and columns of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e371.jpg must both add up to 1, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e372.jpg converges to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e373.jpg for large enough An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e374.jpg. By exchanging the roles of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e375.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e376.jpg in the top expression, we also get An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e377.jpg, maintaining the segregation law for large enough distances when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e378.jpg.

In addition, since both An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e379.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e380.jpg approach An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e381.jpg, we have that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e382.jpg for large An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e383.jpg. This makes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e384.jpg when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e385.jpg is large. Therefore, genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e386.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e387.jpg are asymptotically independent, confirming the law of independent assortment for large enough distances.

Linkage (and Hotspots!)

The previous sections show that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e388.jpg and that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e389.jpg converges to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e390.jpg for large values of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e391.jpg, thus establishing the laws of segregation and independent assortment. However, we wish to determine An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e392.jpg for every value of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e393.jpg. This will re-establish the above results. This time, however, and instead of using the theory of matrices (e.g., eigen decomposition) to study how An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e394.jpg evolves, I will revert to elementary mathematics. Two genes at a distance An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e395.jpg from each other will recombine if and only if their chromosome experiences an odd number of crossovers along that distance. This is equivalent to the event of making an odd number of transitions between the two states of the Markov chain during An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e396.jpg time steps. Let An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e397.jpg be this event (thus An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e398.jpg). It is not hard to see that

equation image

Observe that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e400.jpg. Therefore, we can write:

equation image

The Markov property is essential to justify the multiplication by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e402.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e403.jpg in the above equation because it makes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e404.jpg independent of the history An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e405.jpg. Technically, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e406.jpg does depend on the state at time step An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e407.jpg, but given the symmetry in our Markov chain, it is always An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e408.jpg. By rearranging and taking care of the special case when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e409.jpg we get:

equation image

It is easy to verify that the solution

equation image

satisfies the above recurrence with a base case An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e412.jpg (following the pattern of the recurrence, we can retrieve the above expression if we replace An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e413.jpg by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e414.jpg, multiply by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e415.jpg, and add An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e416.jpg).

While it is easy to verify the solution, obtaining it should not remain a wild guess. By working out a few iterations for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e417.jpg, the “Recurrence for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e418.jpg” section shows how to derive the solution using a geometric series.

The mathematically savvy could verify that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e419.jpg is an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e420.jpg, and that the same expression could have been obtained using a technique called eigen decomposition. This expression for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e421.jpg reveals interesting properties (all can be verified from Figure 7):

Figure 7
The jumping model, two modes of recombination, for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e437.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e438.jpg.
  • When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e422.jpg is large (and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e423.jpg), An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e424.jpg goes to zero, causing An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e425.jpg to converge to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e426.jpg. This convergence was discussed in the previous section, and should not be surprising by now.
  • When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e427.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e428.jpg is positive), An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e429.jpg is greater than zero and less than one, causing An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e430.jpg to increase with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e431.jpg (linkage). This increase, however, is not linear as in the uniform 1-crossover model; therefore, it is biologically more realistic.
  • When An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e432.jpg (An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e433.jpg is negative), the sign of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e434.jpg alternates, causing An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e435.jpg to alternate between a typical value for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e436.jpg and high (hotspots, first time captured).

The jumping model captures the essential biology of crossover and recombination through the laws of segregation, linkage, and independent assortment. In addition, it reveals the non-typical high recombination probabilities of hotspots. Hotspots are regions on the chromosome that experience a high probability of recombination even at small distances. Therefore, depending on the parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e439.jpg, the jumping model embodies two modes of chromosomal recombination.

While a hotspot does not present a difficult concept, it is usually misinterpreted by students as a region with high probability of recombination. This is true if the region is too small (a peak in Figure 7), which is biologically typical of hotspots. However, if the region is large enough, there can be a high probability of recombination only if there is a corresponding low probability, as seen by the alternating pattern in Figure 7. What is interesting about the jumping model (which may not be true biologically) is that this low probability is the typical one for the given distance when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e440.jpg is replaced with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e441.jpg. This is also confirmed by the expression we derived for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e442.jpg, because when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e443.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e444.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e445.jpg is even and, therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e446.jpg:

equation image

The alternation itself should be intuitive because a high probability of recombination at a small distance must be driven by a high probability of crossover, which in turn means a high probability of crossing over back to the same chromosome. The jumping model captures this fact through the parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e448.jpg with a threshold of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e449.jpg as a high probability of crossover.

Back to the Days of Morgan

Morgan established that the probability of recombination as a function of distance is the following:

equation image

which does not account for hotspots. In addition, the notion of distance in the above expression is not the same as ours. To see this, assume that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e451.jpg is close to zero in the jumping model (no hotspots) and, therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e452.jpg is large. Using the exponential limit,

equation image

By making An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e454.jpg, and replacing An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e455.jpg with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e456.jpg in the expression obtained for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e457.jpg, we get

equation image

which has the same form as Morgan's expression. So what is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e459.jpg?

equation image

where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e461.jpg is the distance and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e462.jpg is the average distance until the next crossover (because a crossover occurs with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e463.jpg). So An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e464.jpg is the average number of crossovers between the two genes, and this is how Morgan defined his distance.

Why This Way?

I could have simply argued that the probability of recombination An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e465.jpg is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e466.jpg, and that this is consistent with the laws of inheritance. Therefore, I will list what I believe are important aspects of this exposition.

  • There is a rapid prototyping with a simple uniform 1-crossover model that reflects the essential biological properties of crossover and recombination (though not perfectly). This allows the student to quickly make a connection between the biology and the mathematics.
  • There is no need for advanced calculus or probability (e.g., no mention of Poisson processes or probability distributions other than uniform).
  • To achieve a better understanding of the biological properties, the exposition proceeds by pointing out the deficiencies of the simple model.
  • The simple model itself is a useful tool that is actually used for simulation, e.g., genetic algorithms.
  • Having a model (whether mathematical or not) provides some operational sense, so the biology is made more concrete.
  • Moving progressively through the models illustrates what it takes to make attempts, including wrong ones, in the modeling of biological systems.
  • Multiple models reinforce the ideas by exposing them in different settings.
  • Markov chains are useful as a tool for modern biological sciences and, therefore, introducing them in this context gives the student an early preparation.
  • The jumping model captures two modes of recombination, normal and hotspots, and puts them in their biological context by means of the parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e467.jpg.
  • The jumping model also provides the insight that the probability of crossover must be less than An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e468.jpg to observe the typical behavior of recombination (linkage), and hence giving the correct impression that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e469.jpg is rather small.
  • The alternating behavior of the jumping model corrects one major misunderstanding of hotspots.
  • Morgan's first result can be derived as a special case.
  • The jumping model can be described (not necessarily analyzed) very easily and satisfies all the required biological properties of crossover and recombination. Therefore, a student can effectively retain and communicate the recombination process.

A Computational Example of Genetic Mapping

Consider the hypothetical family in Table 5 where alleles take values in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e470.jpg (inspired by a homework assigned by Bonnie Berger at MIT).

Table 5
A hypothetical family and three genes An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e471.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e472.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e473.jpg shown with their alleles.

To map the genes (genetic mapping), we count the number of recombinations, both paternal and maternal, for each pair of genes, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e498.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e499.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e500.jpg. Then we estimate the probabilities of recombination and relate them to distances.

There are An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e501.jpg recombinations of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e502.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e503.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e504.jpg recombinations of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e505.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e506.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e507.jpg recombinations of An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e508.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e509.jpg. Therefore, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e510.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e511.jpg recombine with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e512.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e513.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e514.jpg with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e515.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e516.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e517.jpg with probability An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e518.jpg. Let's denote these probabilities by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e519.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e520.jpg, and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e521.jpg, respectively. If An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e522.jpg is large enough, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e523.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e524.jpg.

First Attempt

Since An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e525.jpg (for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e526.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e527.jpg), and it is not generally assumed that genes represent hotspots, we might suspect that our knowledge of the alleles of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e528.jpg is wrong. It is more plausible that the alleles of gene An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e529.jpg are 1,0 for the father and mother, as shown in Table 6.

Table 6
The same hypothetical family after the alleles of gene A have been switched.

This will make An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e544.jpg and will keep An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e545.jpg. Since the probability of recombination of distant genes is higher, the order of genes is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e546.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e547.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e548.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e549.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e550.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e551.jpg.

This solution puts An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e552.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e553.jpg at equal distances from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e554.jpg and, therefore, makes the distance from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e555.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e556.jpg twice the distance from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e557.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e558.jpg (and that from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e559.jpg to An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e560.jpg). However, doubling the distance should not double the probability of recombination unless the probability is a linear function of distance like in the uniform 1-crossover model. We may adopt this model here if we know in advance that only one crossover occurs; this conditioning makes the crossover uniform even when the underlying model is the jumping one (because of the symmetry in the Markov chain). For this argument to work we will also need An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e561.jpg; otherwise, we observe a double crossover for Offspring An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e562.jpg in Table 6.

Second Attempt

If we believe that our knowledge of the alleles in Table 5 is correct, then the genes are in a hotspot region. The obtained probabilities An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e563.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e564.jpg must correspond to the alternating pattern in Figure 7. Therefore, the order is again An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e565.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e566.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e567.jpg or An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e568.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e569.jpg, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e570.jpg, with An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e571.jpg situated at equal distances from An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e572.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e573.jpg. But are the probabilities consistent? In the jumping model, one could easily show that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e574.jpg. Therefore, we must verify that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e575.jpg, so we will need An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e576.jpg to be small enough. Note also that if An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e577.jpg is small enough, the probability that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e578.jpg and An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e579.jpg recombine is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e580.jpg, which is consistent. Moreover, the probability of a double crossover is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e581.jpg, which is the proportion of offsprings in Table 5 that exhibit the double crossover.

A Possible Delivery Method

Here's a possible method for delivering the content of this exposition to students:

  1. Describe the recombination process and genetic linkage with the uniform 1-crossover model as a hypothetical prototype, and explain how genetic mapping can be done based on observed probabilities. Introduce hotspots as an exception to the normal behavior of recombination.
  2. As part of a homework assignment, ask which biological properties are satisfied by the uniform 1-crossover model and which are not. Assume that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e582.jpg in the “Segregation” section is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e583.jpg. In addition, ask the students to solve a genetic mapping problem with the biological properties in mind and determine whether hotspots are involved or not.
  3. (optional) As an advanced question, ask to prove that a uniform 2-crossover model breaks the linkage property.
  4. Provide solutions and briefly go over them in class. Introduce Markov chains and the jumping model.
  5. As a programming assignment, ask to simulate the jumping model with various values of the parameter An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e584.jpg and observe how the probability of recombination changes with distance. Assume that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e585.jpg in the “Segregation” section is An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e586.jpg.
  6. Provide solutions and wrap up by explaining some of the properties of a Markov chain through the jumping model, including the ability to model hotspots.

Uniform 2-Crossover Model

The derivation of the result is as follows:

equation image
equation image
equation image
equation image

By the exclusivity of events, this is

equation image
equation image
equation image
equation image
equation image
equation image

and since An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e597.jpg means An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e598.jpg is in An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e599.jpg, this is

equation image
equation image
equation image
equation image
equation image
equation image

when An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e606.jpg is large.

Markov Transitional Probabilities

The proof is by induction where An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e607.jpg is the base case.

equation image
equation image
equation image

By exclusivity of the two events, this is:

equation image
equation image

Note that

equation image

which can be derived from the definition of conditional probability. Therefore, we can rewrite the above as:

equation image
equation image

By the Markov property this is:

equation image
equation image
equation image

The equality before last represents the inductive step of the proof. The last equality follows immediately from the definition of matrix multiplication.

Recurrence for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e619.jpg

Knowing that An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e620.jpg, we have a recurrence for An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e621.jpg that we can solve, An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e622.jpg. To obtain An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e623.jpg we multiply An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e624.jpg by An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e625.jpg and add An external file that holds a picture, illustration, etc.
Object name is pcbi.1002462.e626.jpg. Here are a few attempts:

equation image
equation image
equation image
equation image
equation image
equation image
equation image
equation image

We can easily generalize those attempts to obtain a geometric series:

equation image
equation image

Conclusion

I am not aware of any other exposition of chromosomal crossover, recombination, genetic linkage, hotspots, and genetic mapping that takes the approach outlined herein. The approach represents a simple and modern treatment of an ancient subject, without a compromise of its scientific and mathematical integrity.

The reader should find an insightful explanation with a focus on reinforcing the ideas by exposing them in different settings. In addition, there is an attempt to introduce the reader to the process of modeling by showing what works and what doesn't. Most importantly, this should provide an early chance to convey to our students that biology is a computational science.

Disclaimer

I ignored some of the biological detail in favor of simplicity and consistency. Keep in mind, however, that in biology there is always an exception to the rule!

Further Readings

There is no explicit referencing in the text. This is intentional. I used what everyone would now consider folklore from biology, probability, and calculus. All can be found in textbooks, even elementary ones. For the interested reader, however, and in addition to any introductory texts on probability and calculus, here is a list (in alphabetical order by author) of book chapters that will provide enough background for further endeavors.

  1. Gallager RG (1996) Finite State Markov Chains. In: Discrete Stochastic Processes (pp. 103–112). Norwell, MA: Kluwer Academic Publishers.
  2. Hunter LE (2009) Evolution. In: The Process of Life: An Introduction to Molecular Biology (pp. 19–47). Cambridge, MA: The MIT Press.
  3. Lovász L, Pelikán J, Vesztergombi K (2003) Combinatorial Probability. In: Discrete Mathematics: Elementary and Beyond (pp. 77–80, Uniform Probability). New York, NY: Springer.
  4. Stein C, Drysdale RL, Bogart K (2011) Probability. In: Discrete Mathematics for Computer Scientists (pp. 276–279, Conditional Probability). Boston, MA: Pearson Education Inc. (Addison-Wesley).
  5. Pevzner PA (2001) Computational Gene Hunting. In: Computational Molecular Biology: An Algorithmic Approach (pp. 1–18). Cambridge, MA: The MIT Press.

Acknowledgments

I would like to thank the QuBi (QUantitative BIology) program committee at Hunter College for their encouragement to write this article, and the reviewers for their valuable suggestions to improve it.

Footnotes

The author has declared that no competing interests exist.

The author received no specific funding for this article.


Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...