NCBI » Bookshelf » Human Molecular Genetics 2 » Genetic mapping of mendelian characters
 
hmg
Human Molecular Genetics 2
2nd
Tom Strachan1 and Andrew P Read2
1University of Newcastle, Newcastle-upon-Tyne, UK
2University of Manchester, Manchester, UK
BIOS Scientific Publishers Ltd1-85996-202-51999
genetics

 Chapter 11:  Genetic mapping of mendelian characters

A1378

11.1. Recombinants and nonrecombinants

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f1.jpg.

Figure 11.1

.

   Recombinants and nonrecombinants

Alleles at two loci (locus A, alleles A1 and A2; locus B, alleles B1 and B2) are segregating in this family. Where this can be deduced, the combination of alleles a person received from his or her father is boxed. Persons in generation III who received either A1B1 or A2B2 from their father are the product of nonrecombinant sperm; persons who received A1B2 or A2B1 are recombinant. The information shown does not enable us to classify any of the individuals in generations I and II as recombinant or nonrecombinant, nor does it identify recombinants arising from oogenesis in individual II2.

In principle, genetic mapping in humans is exactly the same as genetic mapping in any other sexually reproducing diploid organism. The aim is to discover how often two loci are separated by meiotic recombination. Consider a person who is heterozygous at two loci, and so types as A1A2 B1B2. Suppose the alleles A1 and B1 in this person came from one parent, and A2 and B2 from the other. Any of that person's children who inherit one of these parental combinations (A1B1 or A2B2) is nonrecombinant, whereas children who inherit A1B2 or A2B1 are recombinant (Figure 11.1). The proportion of children who are recombinant is the recombination fraction between the two loci A and B.

11.1.1. The recombination fraction is a measure of genetic distance

If two loci are on different chromosomes, they will segregate independently. Considering spermatogenesis in individual II1 in Figure 11.1, at the end of meiosis I, whichever sperm receives allele A1, there is a 50% chance that it will receive allele B1 and a 50% chance it will receive B2. Thus, on average, 50% of the children will be recombinant and 50% nonrecombinant. The recombination fraction is 0.5. If the loci are syntenic, that is if they lie on the same chromosome, then they might be expected always to segregate together, with no recombinants. However, this simple expectation ignores meiotic crossovers. During prophase of meiosis I, pairs of homologous chromosomes synapse and exchange segments (Figure 2.14). Only two of the four chromatids are involved in any particular crossover. A crossover, if it occurs between the positions of the two loci, will create two recombinant chromatids carrying A1B2 and A2B1, and leave the two noninvolved chromatids nonrecombinant. Thus one crossover generates 50% recombinants between loci flanking it.

Recombination will rarely separate loci that lie very close together on a chromosome, because only a crossover located precisely in the small space between the two loci will create recombinants. Therefore sets of alleles on the same small chromosomal segment tend to be transmitted as a block through a pedigree. Such a block of alleles is known as a haplotype. Haplotypes mark recognizable chromosomal segments which can be tracked through pedigrees and through populations. When not broken up by recombination, haplotypes can be treated for mapping purposes as alleles at a single highly polymorphic locus.

The further apart two loci are on a chromosome, the more likely it is that a crossover will separate them. Thus the recombination fraction is a measure of the distance between two loci. Recombination fractions define genetic distance, which is not the same as physical distance. Two loci that show 1% recombination are defined as being 1 centimorgan (cM) apart on a genetic map.

11.1.2. Recombination fractions do not exceed 0.5 however great the physical distance

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f2.jpg.

Figure 11.2

.

   Single and double recombinants

Each crossover involves two of the four chromatids of the two synapsed homologous chromosomes. The black chromosome carries alleles A1 and B1 at two loci, while the blue chromosome carries alleles A2 and B2. Gametes in which the chromatid is the same color at the two loci are nonrecombinant for these loci, those where the chromatids are different colors are recombinant. (A) A single crossover generates two recombinant and two nonrecombinant chromatids. (B) A two-strand double crossover leaves flanking markers nonrecombinant on all four chromatids. (C) A three-strand double crossover leaves flanking markers recombinant on two of the four strands. (D) A four-strand double crossover generates 100% recombinants. The three types of double crossover occur in random proportions, so the average effect of a double crossover is to give 50% recombinants.

A single recombination event produces two recombinant and two nonrecombinant chromatids. When loci are well separated there may be more than one crossover between them. Double crossovers can involve two, three or four chromatids, but Figure 11.2 shows that the overall effect, averaged over all double crossovers, is to give 50% recombinants. Loci very far apart on the same chromosome might be separated by three, four or more crossovers. Again, the overall effect is to give 50% recombinants. Recombination fractions never exceed 0.5, however far apart the loci are.

11.1.3. Mapping functions define the relationship between recombination fraction and genetic distance

Because recombination fractions never exceed 0.5, they are not simply additive across a genetic map. If a series of loci, A, B, C, … are located at 5 cM intervals on a map, locus M may be 60 cM from locus A, but the recombination fraction between A and M will not be 60%. The mathematical relationship between recombination fraction and genetic map distance is described by the mapping function. If crossovers occurred at random along a bivalent and had no influence on one another, the appropriate mapping function would be Haldane's function:

graphic element

where Θ is the map distance and θ the recombination fraction; as usual ln means logarithm to the base e, and exp means ‘e to the power of’. However, we know that crossovers do not occur at random. The presence of one chiasma inhibits formation of a second chiasma nearby. This phenomenon is called interference. A variety of mapping functions exist that allow for varying degrees of interference. A widely used function for human mapping is Kosambi's function:

graphic element

A mapping function is needed in multipoint mapping (Section 11.4) to convert the raw data on the recombination fraction into a genetic map. The interested reader should consult Ott's book (see Further reading) and Broman and Weber (1998) for a fuller discussion of mapping functions.

11.1.4. The relation between physical and genetic distances is not constant across the genome

Chiasma counts in human male meiosis show an average of 49 crossovers per cell (Morton et al., 1982). Since each crossover gives 50% recombinants, the chiasma count implies a total male genetic map length of 2450 cM. The current version of the Location Database (Collins et al., 1996) suggests a total male map length of 2851 cM. Chiasmata are more frequent in female meiosis (exemplifying Haldane's rule that the heterogametic sex has the lower chiasma count), and the total female map length in the Location Database is 4296 cM (excluding the X). Thus over the 3000 Mb autosomal genome, 1 male cM averages 1.05 Mb and 1 female cM averages 0.70 Mb; the sexaveraged figure is 1 cM = 0.88 Mb.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f3.jpg.

Figure 11.3

.

   Relation of physical and genetic maps of chromosome 19

180 markers from chromosome 19 were mapped genetically and physically. The physical map of the 65 Mb chromosome is compared with genetic maps separately computed for male and female meioses. Note the uneven distribution of recombinants along the chromosome, with more recombination towards the telomeres, and the varying male:female recombination ratio. The female map is about 10% longer than the male map; for most chromosomes the difference is more marked. Data from Mohrenweiser et al. (1998).

The approximation 1 cM = 1 Mb is a useful rule of thumb, but the actual correspondence varies widely for different chromosomal regions. In general, there is more recombination towards the telomeres of chromosomes in males, while centromeric regions have recombinants in females but not in males (see Figure 11.3 and Broman et al., 1998). The most extreme deviation is shown by the pseudoautosomal region at the tip of the short arms of the X and Y chromosomes (see Figure 14.7). Males have an obligatory crossover within this 2.6 Mb region, so that it is 50 cM long. Thus, for this region in males 1 Mb = 19 cM, whereas in females 1 Mb = 2.7 cM. Uniquely, the Y chromosome, outside the pseudoautosomal region, has no genetic map because it is not subject to synapsis and crossing over in normal meiosis. The X chromosome of course undergoes normal recombination in females, and can be genetically mapped in female meioses.

11.2. Genetic markers

11.2.1. Mapping human disease genes requires genetic markers

Since most human geneticists are interested in diseases, we would like a map to show the order and distance apart of all disease genes. Scoring the recombination fraction between pairs of diseases would be the obvious way to construct such a map, but disease-disease mapping is not possible in humans. Defining recombinants, as we have seen (Figure 11.1) requires double heterozygotes. People heterozygous for two different diseases are extremely rare. Even if they can be found, they will probably have no children, or be unsuitable for genetic analysis in some other way. For this reason human genetic mapping depends on markers. Any mendelian character can in principle be used as a genetic marker. It helps if the character can be scored easily and cheaply using readily available material (blood cells rather than a brain biopsy), but the crucial thing is that it should be sufficiently polymorphic that a randomly selected person has a good chance of being heterozygous. Box 11.1 summarizes the development of human genetic markers, from blood groups and polymorphisms of serum proteins through to the present generation of DNA microsatellites and single nucleotide polymorphisms.

Gene mappers could not set out to map a disease with a reasonable hope of success until markers were available that were spaced throughout the genome. Disease-marker mapping, if it is not to be a purely blind exercise, requires framework maps of markers. These are generated by marker-marker mapping. Although in theory linkage can be detected between loci 40 cM apart, the amount of data required to do this is prohibitive. Ten meioses are sufficient to give evidence of linkage if there are no recombinants, but 85 meioses would be needed to give equally strong evidence of linkage if the recombination fraction was 0.3 (see Box 11.3 for a guide to these calculations). Obtaining enough family material to test much more than 30 meioses can be seriously difficult for a rare disease. Thus mapping requires markers spaced at intervals no greater than about 20 cM across the genome. Given the genome lengths calculated above, this means that we need a minimum of 150 markers. Allowing for imperfect informativeness (see below), we need at least 300. In fact much denser maps, down to 1 cM or less average spacing of markers, are needed to guide progress from initial mapping of a disease through to cloning the gene. A major achievement of the Human Genome Project has been to generate upwards of 10 000 highly polymorphic markers and place them on framework maps (Collins et al., 1996; Broman et al., 1998).

11.2.2. The heterozygosity or polymorphism information content measure how informative a marker is

For linkage analysis we need informative meioses (see Box 11.2). The examples in the box show that a meiosis is not informative with a given marker if the parent is homozygous for the marker, and also in half of the cases where both parents have the same heterozygous genotype. For most purposes the mean heterozygosity of a marker (the chance that a randomly selected person will be heterozygous) is used as the measure of informativeness. If there are marker alleles A1, A2, A3 … with gene frequencies p1, p2, p3 …, then the proportion of people who are heterozygous is 1 - (p12 + p22 + p32 + …) (Section 3.1). A more sophisticated measure, the polymorphism information content (PIC) allows for couples who are both heterozygous A1A2. Half their children will also be A1A2 and therefore uninformative. The PIC of a marker is given by:
graphic element
where pi is the frequency of the ith allele. The third term takes out half the matings of similar heterozygotes. For X-linked markers the PIC and heterozygosity are the same; for autosomal markers the heterozygosity somewhat overstates the informativeness, especially for 2-allele markers. For an autosomal marker with two alleles of equal frequency the heterozygosity is 0.5 but the PIC is only 0.375.

11.2.3. DNA polymorphisms are the basis of all current genetic markers

In the early 1980s, DNA polymorphisms provided, for the first time, a set of markers that were sufficiently numerous and spaced across the entire genome. DNA markers have the additional advantage that they can all be typed by the same technique. Moreover their chromosomal location can be determined using FISH or radiation hybrid mapping (Sections 10.1 and 10.2), allowing DNA-based genetic maps to be cross-referenced to physical maps. This avoids the frustrating situation that arose when the long-sought cystic fibrosis gene (CFTR) was first mapped. Linkage was established to a protein polymorphism of the enzyme paraoxonase, but the chromosomal location of the paraoxonase gene was not known. The development of DNA markers allowed human gene mapping to start in earnest.

Restriction fragment length polymorphisms (RFLPs)

The first generation of DNA markers were restriction fragment length polymorphisms (RFLPs). RFLPs were initially typed by preparing Southern blots from restriction digests of the test DNA, and hybridizing with radiolabeled probes (see Figure 5.12). This technology required plenty of time, money and DNA, and made a whole genome search a heroic undertaking. Nowadays this is less of a problem because RFLPs can usually be typed by PCR. A sequence including the variable restriction site is amplified, the product is incubated with the appropriate restriction enzyme and then run out on a gel to see if it has been cut (see Figure 6.6). A more fundamental limitation is their limited informativeness. RFLPs have only two alleles: the site is present or it is absent. The maximum heterozygosity is 0.5. Disease mapping using RFLPs is frustrating because all too often a key meiosis in a family turns out to be uninformative.

Minisatellites

Minisatellite VNTR (variable number tandem repeat) markers were a great improvement. The VNTRs have many alleles and high heterozygosity. Most meioses are informative. However, the technical problems of Southern blotting and radioactive probes were still an obstacle to easy mapping, and VNTRs are not evenly spread across the genome.

Microsatellites

The advent of PCR finally made mapping relatively quick and easy. Minisatellites are too long to amplify well, and so the standard tools for PCR linkage analysis are microsatellites. These are mostly (CA)n repeats. Tri- and tetranucleotide repeats are gradually replacing dinucleotide repeats as the markers of choice because they give cleaner results - dinucleotide repeat sequences are peculiarly prone to replication slippage during PCR amplification. Each allele gives a little ladder of ‘stutter bands’ on a gel, making it hard to read (see Figure 6.8). Much effort has been devoted to producing compatible sets of microsatellite markers that can be amplified together in a multiplex PCR reaction and give nonoverlapping allele sizes, so that they can be run in the same gel lane. With fluorescent labeling in several colors, it is possible to score perhaps ten markers on a sample in a single lane of an automated gel.

Single nucleotide polymorphisms (SNPs)

After 10 years of developing more and more polymorphic markers, it may seem perverse that the newest generation of markers are 2-allele single nucleotide polymorphisms. They include the classic RFLPs, but also polymorphisms that do not happen to create or abolish a restriction site. The advantage of SNPs is that they can be scored on solid-state arrays without recourse to gel electrophoresis (Wang et al., 1998). The gain in throughput more than offsets the lower informativeness of SNPs. Typically the test DNA is PCR amplified in a very large multiplex and hybridized to an array comprising a series of anchored oligonucleotide primers, each terminating with a polymorphic nucleotide. A single primer extension step is carried out on the array, using a mixture of four fluorescently-labeled dideoxynucleotides. Label adds to primers that perfectly match the test DNA, but not to those with a 3′ mismatch (see Figure 17.10). Reading the cells of the array for presence or absence of fluorescence allows the types for every SNP on the array to be read off. Although the technology is still being put together, it is hoped that an array of a few thousand primers can be used to genotype markers spaced closely across the whole genome in a single chip hybridization.

11.3. Two-point mapping

11.3.1. Scoring recombinants in human pedigrees is not always simple

Having collected families where a mendelian disease is segregating, and typed them with an informative marker, how do we know when we have found linkage? There are two aspects to this question:

  1. How can we work out the recombination fraction?

  2. What statistical test should we use to see if the recombination fraction is significantly different from 0.5, the value expected on the null hypothesis of no linkage?

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f4.jpg.

Figure 11.4

.

   Recognizing recombinants: three versions of a family with an autosomal dominant disease, typed for a marker A

(A) All meioses are phase-known. We can identify III1–III5 unambiguously as nonrecombinant and III6 as recombinant. (B) The same family, but phase-unknown. The mother, II1, could have inherited either marker allele A1 or A2 with the disease; thus her phase is unknown. Either III1–III5 are nonrecombinant and III6 is recombinant; or III1–III5 are recombinant and III6 is nonrecombinant. (C) The same family after further tracing of relatives. III7 and III8 have also inherited marker allele A1 along with the disease from their father, but we cannot be sure whether their father's allele A1 is identical by descent to the allele A1 in his sister II1. Maybe there are two copies of allele A1 among the four grandparental marker alleles. The likelihood of this depends on the gene frequency of allele A1. Thus although this pedigree contains linkage information, extracting it is problematic.

In some families the first question can be answered very simply by counting recombinants and nonrecombinants. The family shown in Figure 11.1 is one example. There are two recombinants in seven meioses and the recombination fraction is 0.28. Figure 11.4A shows another example. The double heterozygote who is informative for linkage (individual II1 in both Figure 11.1 and Figure 11.4A), is phase-known: we know which alleles were inherited from which parent, and so we can unambiguously score each meiosis as recombinant or nonrecombinant. In Figure 11.4B, individual II1 is again doubly heterozygous, but this time phase-unknown. Among her children, either there are five nonrecombinants and one recombinant, or else there are five recombinants and one nonrecombinant. We can no longer identify recombinants unambiguously, even if the first alternative seems much more likely than the second. Figure 11.4C adds yet more complications, yet if this is a family with a rare disease no researcher would be willing to discard it. Some method is needed to extract the linkage information from a collection of such imperfect families.

11.3.2. Computerized lod score analysis is the best way to analyze complex pedigrees for linkage between mendelian characters

In the pedigree shown in Figure 11.4B it is not possible to identify recombinants unambiguously and count them. It is possible, however, to calculate the overall likelihood of the pedigree, on the alternative assumptions that the loci are linked (recombination fraction = θ) or not linked (recombination fraction = 0.5). The ratio of these two likelihoods gives the odds of linkage, and the logarithm of the odds is the lod score. Morton (1955) demonstrated that lod scores represent the most efficient statistic for evaluating pedigrees for linkage, and derived formulae to give the lod score (as a function of θ) for various standard pedigree structures. Box 11.3 shows how this is done for simple structures. Being a function of the recombination fraction, lod scores are calculated for a range of θ values. In a set of families, the overall probability of linkage is the product of the probabilities in each individual family, therefore lod scores (being logarithms) can be added up across families.

Calculating the full lod score for the family in Figure 11.4C is difficult. To calculate the likelihood that III7 and III8 are recombinant or nonrecombinant, we must take likelihoods calculated for each possible genotype of I1, I2 and II3, weighted by the probability of that genotype. For I1 and I2, the genotype probabilities depend on both the gene frequencies and the observed genotypes of II1, III7 and III8. Genotype probabilities for II3 are then calculated by simple mendelian rules. Human linkage analysis, except in the very simplest cases, is entirely dependent on computer programs that implement algorithms for handling these branching trees of genotype probabilities, given the pedigree data and a table of gene frequencies.

11.3.3. Lod scores of +3 and -2 are the criteria for linkage and exclusion (for a single test)

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f5.jpg.

Figure 11.5

.

   Lod score curves

Graphs of lod score against recombination fraction from a hypothetical set of linkage experiments. Curve 1: evidence of linkage (Z > 3) with no recombinants. Curve 2: evidence of linkage (Z > 3) with the most likely recombination fraction being 0.23. Curve 3: linkage excluded (Z < -2) for recombination fractions below 0.12; inconclusive for larger recombination fractions. Curve 4: inconclusive at all recombination fractions.

The result of linkage analysis is a table of lod scores at various recombination fractions, like the two tables in Box 11.3. Positive lods give evidence in favor of linkage and negative lods give evidence against linkage. Note that only recombination fractions between 0 and 0.5 are meaningful, and that all lod scores are zero at θ = 0.5 (because they are then measuring the ratio of two identical probabilities, and log10(1) = 0). The results can be plotted to give curves like those in Figure 11.5.

Returning to the two questions posed at the start of this section, we now see that the most likely recombination fraction is the one at which the lod score is highest. If there are no recombinants, the lod score will be maximum at θ = 0. If there are recombinants, Z will peak at the most likely recombination fraction (0.167 = 1/6 for the family in Figure 11.4A, but harder to predict for Figure 11.4B).

The second question concerned the threshold of significance. Here the answer is at first sight surprising. Z = 3.0 is the threshold for accepting linkage, with a 5% chance of error. Linkage can be rejected if Z < -2.0. Values of Z between -2 and +3 are inconclusive. For most statistics p < 0.05 is used as the threshold of significance, but Z = 3.0 corresponds to 1000 : 1 odds (log10(1000) = 3.0). The reason why such a stringent threshold is chosen lies in the inherent improbability that two loci, chosen at random, should be linked. With 22 pairs of autosomes to choose from, it is not likely they would be located on the same chromosome (syntenic) and, even if they were, loci well separated on a chromosome are unlinked. Common sense tells us that if something is inherently improbable, we require strong evidence to convince us that it is true. This common sense can be quantified in a Bayesian calculation (see Box 11.4), which shows that 1000 : 1 odds in fact corresponds precisely to the conventional p = 0.05 threshold of significance. The same logic suggests a threshold lod of 2.3 for establishing linkage between an X-linked character and an X-chromosome marker (prior probability of linkage [congruent with] 1/10).

Confidence intervals are hard to deduce analytically, but a widely accepted support interval extends to recombination fractions at which the lod score is 1 unit below the peak value (the lod-1 rule). Thus, curve 2 in Figure 11.5 gives acceptable evidence of linkage (Z > 3) with the most likely recombination fraction 0.23 and support interval 0.17–0.32. The curve will be more sharply peaked the greater the amount of data, but in general peaks are quite broad. It is important to remember that distances on human genetic maps are often very imprecise estimates.

Negative lod scores exclude linkage for the region where Z < -2. Curve 3 on Figure 11.5 excludes the disease from 12 cM either side of the marker. While gene mappers hope for a positive lod score, exclusions are not without value. They tell us where the disease is not (exclusion mapping). This can exclude a possible candidate gene, and if enough of the genome is excluded, only a few possible locations may remain.

11.3.4. For whole genome searches a genome-wide threshold of significance must be used

In disease studies, families are typed for marker after marker until positive lods are obtained. The appropriate threshold for significance is a lod score such that there is only a 0.05 chance of a false positive result occurring anywhere during a search of the whole genome. As shown in Box 11.4, a lod score of 3.0 corresponds to a significance of 0.05 at a single point. But if 50 markers have been used, the chance of a spurious positive result is greater than if only one marker is used. A stringent procedure would multiply the p value by 50 before testing its significance. The threshold lod score for a study using n markers would be 3 + log(n), that is a lod score of 4 for 10 markers, 5 for 100, etc. However, this is over-stringent. Linkage data are not independent. If a character is mendelian, then it is determined at a single chromosomal location. If it does not map to one location, then the prior probability that it maps to another location is raised. The threshold for a genome-wide significance level of 0.05 has been much argued over, but a widely accepted answer for mendelian characters is 3.3 (Lander and Schork, 1994). For nonmendelian characters see Section 12.5. In practice, lod scores below 5, whether with one marker or many, should be regarded as provisional.

11.4. Multipoint mapping is more efficient than two-point mapping

11.4.1. Multipoint linkage can locate a disease locus on a framework of markers

Table 11.1

Gene ordering by three-point crosses
Class of offspringPosition of recombination (x)Number
ABC/abcNonrecombinant853
abc/abc
ABc/abc(A, B)-x-C5
abC/abc
Abc/abcA-x-(B, C)47
aBC/abc
AbC/abcB-x-(A, C)95
aBc/abc

A cross has been set up between mice heterozygous at three linked loci (ABC/abc) and triple homozygotes abc/abc. The offspring are classified as shown. The rarest class of offspring will be those whose production requires two crossovers. Of the 1000 animals, 142 (95 + 47) are recombinant between A and B, 52 (47 + 5) between A and C, and 100 (95 + 5) between B and C. Only five animals are recombinant between A and C, but not between A and B, so these must have double crossovers, A-x-C-x-B. Therefore the map order is A-C-B. The genetic distances are approximately A-(5 cM)-C-(10 cM)-B.

Linkage analysis can be more efficient if data for more than two loci are analyzed simultaneously. Multilocus analysis is particularly useful for establishing the chromosomal order of a set of linked loci. Experimental geneticists have long used three-point crosses for this purpose. The rarest recombinant class is that which requires a double recombination. In Table 11.1, the gene order A-C-B is immediately apparent. This procedure is more efficient than estimating the recombination fractions for intervals A-B, A-C and B-C separately in a series of two-point crosses. Ideally, in any linkage analysis the whole genome would be screened for linkage, and the full dataset would be used to calculate the likelihood at each location across the genome.

A second advantage of multilocus mapping in humans is that it helps overcome problems caused by the limited informativeness of markers. Some meioses in a family might be informative with marker A, and others uninformative for A but informative with the nearby marker B. Only simultaneous linkage analysis of the disease with markers A and B extracts the full information. This is less important for mapping using highly informative microsatellite markers rather than two-allele RFLPs, but it will resurface if SNPs (Box 11.1) become the main mapping tool.

11.4.2. Multipoint mapping by computer

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f6.jpg.

Figure 11.6

.

   Multipoint mapping in man

The horizontal axis is a map of markers and the vertical axis is the lod score. The linkmap program has moved the unmapped disease locus across the map, calculating the lod score at each position. Lod scores dip to strongly negative values near to the position of markers which show recombinants with the disease. The highest peak shows the most likely location. Odds in favor of this position are measured by the degree to which the highest peak overtops its rivals. Redrawn from Hughes et al. (1994) with permission from the author.

For disease-marker mapping the starting point is usually a two-point lod score showing that the disease maps near one particular marker, plus a map of the framework of markers. The marker map is taken as given, and the aim is to locate the disease gene in one of the intervals of the framework. Programs such as linkmap (part of the Linkage package) or genehunter can notch the disease locus across the marker framework, calculating the overall likelihood of the pedigree data at each position. The result (Figure 11.6) is a curve of likelihood against map location. The y-axis is usually a lod score, the log likelihood ratio for this location versus a location off the end of the map. Occasionally, for reasons based on statistical theory, a location score is used. Location scores are twice the natural logarithm of the likelihood ratio, i.e. 4.6 × the lod score. This method is also useful for exclusion mapping: if the curve stays below a lod score of -2 across the region, then the disease locus is excluded from that region.

The apparently quantitative nature of Figure 11.6 is largely spurious. Peak heights depend crucially on the precise distances between markers and on the mapping function (Section 11.1.3). In reality these are seldom accurately known. The distances on marker-marker maps should be regarded as only rough guides, and moreover none of the mapping functions in linkage programs even approximates to the real complexities of chiasma distribution (see Figure 11.3). However, unless the marker map is radically wrong, it remains true that the highest peak marks the most likely location.

11.4.3. Multipoint linkage is essential for constructing marker framework maps

Disease-marker mapping suffers from the necessity of using whatever families can be found where the disease of interest is segregating. Such families will rarely have ideal structures. All too often the number of meioses is undesirably small, and missing persons mean that some meioses are phase-unknown. Marker-marker mapping can avoid these problems. Markers can be studied in any family, so families can be chosen that have plenty of children and ideal structures for linkage, like the family in Figure 11.1. Construction of marker framework maps has benefited greatly from a collection of families (the CEPH families) assembled specifically for the purpose by the Centre pour l'Étude du Polymorphisme Humain (now the Centre Jean Dausset) in Paris. Immortalized cell lines from every individual ensure a permanent supply of DNA, and sample mix-ups and nonpaternity have long since been ruled out by typing with many markers. The first goal of the Human Genome Project was to produce high-density framework maps of highly polymorphic markers. This phase is now complete. As an example, the current map from CHLC (Cooperative Human Linkage Center) is based on the results of scoring eight CEPH families with 8325 microsatellites, resulting in over 1 million genotypes (Broman et al., 1998).

11.4.4. Integrated maps combine genetic and physical data

Ordering the loci in multipoint mapping is not a trivial problem. There are n!/2 possible orders for n markers, and current maps have hundreds of markers per chromosome. Something more intelligent than brute force computing must be used to work out the correct order. Physical mapping information can be immensely helpful here. Markers that can be typed by PCR can be used as sequence-tagged sites (STS) and grouped into physically localized sets using radiation hybrids or YAC clones. Within a set, the number of possible orders should be small enough to test against the multipoint mapping data. As the genome is increasingly covered with clone contigs, physical distance data are becoming available for more markers. The overall goal of mapping is an integrated map, that lists features in order of chromosomal location and gives their distances on both genetic (preferably separate male and female cM) and physical scales, and relates all this to the chromosomal bands. The Location Database (Collins et al., 1996) contains such integrated maps, and the latest versions can be consulted at http://cedar.genetics.soton.ac.uk/public_html/ .

11.5. Standard lod score analysis is not without problems

Standard lod score analysis is a tremendously powerful method for scanning the genome in 20-Mb segments to locate a disease gene, but it can run into difficulties. These include:

11.5.1. Errors in genotyping and misdiagnoses can generate spurious recombinants

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is ch11f7.jpg.

Figure 11.7

.

   Apparent double recombinants suggest errors in the data

Because of interference (Section 11.1.3), the probability of a true double recombinant with markers 5 cM apart is small, well below 0.05 × 0.05 = 0.0025. Apparent double recombinants usually signal an error in typing the markers, a clinical misdiagnosis, or locus heterogeneity such that the disease in this case does not map to locus D but elsewhere in the genome. Mutation in one of the genes or germinal mosaicism are rarer causes.

With highly polymorphic markers, common errors such as misread gels, switched samples or nonpaternity will usually result in a child being given a genotype incompatible with the parents. The linkage analysis program will stall until such errors have been corrected. Errors that introduce possible but wrong genotypes are more of a problem. These include misdiagnosis of somebody's disease status. Such errors inflate the length of genetic maps by introducing spurious recombinants, because if a child has been assigned the wrong parental allele, it will appear to be a recombinant. Multilocus analysis can help, because spurious recombinants appear as close double recombinants (Figure 11.7). Error-checking routines test the extent to which the map can be shortened by omitting any single test result (see Broman et al., 1998). Results that significantly lengthen the map (i.e. add recombinants) are suspect.

11.5.2. Computational difficulties limit the pedigrees that can be analyzed

As we saw in Section 11.3.2, human linkage analysis depends on computer programs that implement algorithms for handling branching trees of genotype probabilities, given the pedigree data and gene frequencies. liped was the first generally useful program, and mlink (part of a package called linkage) used the same basic algorithm, the Elston-Stewart algorithm, but extended it to multipoint data. The Elston-Stewart algorithm can handle arbitrarily large pedigrees, but the computing time increases exponentially with increasing numbers of possible haplotypes (more alleles and/or more loci). This limits the ability of mlink to analyse multipoint data. An alternative algorithm, the Lander-Green algorithm, can cope with any number of genotypes but the computing time increases exponentially with the size of the pedigree. This algorithm is implemented in the genehunter program (see Section 12.2.4), which is particularly good for analysing whole-genome searches of modest sized pedigrees. The general theory of linkage analysis is excellently covered in the book by Ott (Further reading), while the book by Terwilliger and Ott (Further reading) is full of practical advice indispensable to anybody undertaking human linkage analysis.

11.5.3. Locus heterogeneity is always a pitfall in human gene mapping

As we saw in Section 3.1.4, it is common for mutations in several unlinked genes to produce the same clinical phenotype. Even a dominant condition with large families can be hard to map if there is locus heterogeneity within the collection of families studied. It took years of collaborative work to show that tuberous sclerosis was caused by mutations at either of two loci, TSC1 (MIM 191100) at 9q34 and TSC2 (MIM 191092) at 16p13. With recessive conditions, the difficulty is multiplied by the need to combine many small families. Autozygosity mapping (Section 11.5.5) is the main solution in such cases.

genehunter or homog and related programs (see Terwilliger and Ott, 1994) can compare the likelihood of the data on the alternative assumptions of locus homogeneity (all families map to the location under test) and heterogeneity (a proportion α of unlinked families), and give a maximum likelihood estimate of α.

11.5.4. The limited resolution of human genetic mapping may be overcome by typing single sperm or by using linkage disequilibrium

Once a marker is found for which all meioses are informative and nonrecombinant, linkage analysis comes to a halt. In typical collections of disease families, the target region thus identified is likely to be 1 Mb or more. This is uncomfortably large for positional cloning of an unknown disease gene. One possible way to increase the resolution of marker-marker mapping is to type sperm instead of children. Humans have far too few children for optimal linkage analysis, but men produce untold millions of sperm, and modern PCR technology allows markers to be scored on single separated sperm from a doubly heterozygous man. Yu et al. (1996) show examples. Apart from technical problems, one drawback is that a single sperm cannot be resampled repeatedly to confirm interesting results, in the same way as a child can. Whole genome amplification (Zhang et al., 1992) partially circumvents this problem. Individual spermatozoa are subjected to whole genome amplification followed by multiplex PCR amplification of markers from an aliquot. Further aliquots can be used to check any recombinants. Unfortunately sperm typing could not be used for disease-marker mapping, unless the disease mutations were already characterized.

Linkage disequilibrium provides the best hope of narrowing down the candidate region in disease-marker mapping. Genotypes or haplotypes for markers spread across the candidate region are examined in a series of unrelated affected patients. If the patients all carry independent mutations, as may very well be the case for a dominant or X-linked disease, this exercise will reveal nothing of interest. However, if a proportion of the disease genes in apparently unrelated patients derive from a common ancestor, as often happens with recessive conditions, it may be possible to find a shared ancestral haplotype that defines a small part of the candidate region. This approach is illustrated in Section 12.4.1.

11.5.5. Autozygosity mapping can map recessive conditions efficiently in extended inbred families

Autozygosity is a term used to mean homozygosity for markers identical by descent, inherited from a recent common ancestor. People with rare recessive diseases in consanguineous families are likely to be autozygous for markers linked to the disease locus. Suppose the parents are second cousins: they would be expected to share 1/32 of all their genes because of their common ancestry, and a child would be autozygous at only 1/64 of all loci. If a child is homozygous for a particular marker allele, this could be because of autozygosity, or it could be because a second copy of the same allele has entered the family independently. The rarer the allele is in the population, the greater the likelihood that homozygosity represents autozygosity. For an infinitely rare allele, a single homozygous affected child born to second cousin parents generates a lod score of log10(64) = 1.8. If there are two other affected sibs who are both also homozygous for the same rare allele, the lod score is 3.0 (log10(64 × 4 × 4); the chance that a sib would have inherited the same pair of parental haplotypes even if they are unrelated to the disease is 1 in 4).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is permission.jpg.

Figure 11.8

.

   Autozygosity mapping

A large multiply inbred family in which several members suffer from profound congenital deafness (filled symbols). A whole genome screen with 160 polymorphic microsatellite markers showed that all affected family members were homozygous for markers D2S2144 and D2S158, thus mapping the DFNB9 locus to the 2 cM region between the flanking markers, D2S2303 and D2S174. Redrawn from Chaib et al. (1996)Hum. Mol. Genet. 5, 155–158, with permission from Oxford University Press.

Thus quite small inbred families can generate significant lod scores, and autozygosity mapping becomes a powerful tool for linkage analysis if families can be found with multiple affected people in two or more sibships, linked by inbreeding. Suitable families may be found in Middle Eastern countries where inbreeding is common. The method has been applied with great success to locating genes for autosomal recessive hearing loss, which otherwise presents intractable problems because of extensive locus heterogeneity (Guilford et al., 1994). An example is shown in Figure 11.8.

The same principle can be extended to populations where the common ancestry is inferred rather than demonstrated. A bold application of this principle enabled Houwen et al. (1994) to map the rare recessive condition, benign recurrent intrahepatic cholestasis, using only four affected individuals (two sibs and two supposedly unrelated people) from an isolated Dutch village. The more remote the shared ancestor, the smaller is the proportion of the genome that is shared by virtue of that common ancestry, and therefore the greater the significance for linkage if autozygosity can be demonstrated. But at the same time, the remoter the common ancestor, the more chances there are for a second independent allele to enter the family from outside, and so the less likely is it that homozygosity represents autozygosity, either for the disease or for the markers. With remote common ancestry, as in the study of Houwen et al., everything depends on finding people with a very rare recessive condition who are homozygous for a very rare marker allele or (more likely) haplotype. The power of Houwen's study seems almost miraculous, but it is important to remember that this methodology applies only to diseases and populations where most affected people are descended from a common ancestor who was a carrier. The wider use of allelic association is described in the next chapter (Sections 12.3 and 12.4).

11.5.6. Characters whose inheritance is not mendelian are not suitable for mapping by the methods described in this chapter

The methods of lod score analysis described in this chapter require a precise genetic model that specifies the mode of inheritance, gene frequencies and penetrance of each genotype. For mendelian characters, penetrance is the main problem area. If no allowance is made for unaffected people being nonpenetrant gene carriers, or affected people being phenocopies, then these people may be wrongly scored as recombinant. On the other hand, if the penetrance is set too low there is a reduction in the power to detect linkage, because a less precise hypothesis is being tested. Errors in the order of markers on marker framework maps can cause problems, but these are diminishing as genetic maps are cross-checked against physical mapping data. Given sufficient meioses, the main obstacle in linkage analysis of mendelian characters is locus heterogeneity. However, for common complex diseases like diabetes or schizophrenia, the problems are far more intractable. Any genetic model is no more than a hypothesis - we have no real idea of the gene frequencies or penetrance of any susceptibility alleles, or even the mode of inheritance. This makes it near-impossible to apply the methods we have described in this chapter to such diseases. Nevertheless, identifying the genetic components of susceptibility to complex diseases is now a major part of human genetics research. The ways one can attempt to do this are the subject of the next chapter.

Further reading
Ott J (1991) Analysis of Human Genetic Linkage, revised edn. Johns Hopkins University Press, Baltimore, MD.
References
Broman K W, Weber J L. Characterization of human crossover interference. Am. J. Hum. Genet. (1998); 63 (suppl.): A1632.
Broman K W, Murray J C, Sheffield V C, White R L, Weber J L. Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am. J. Hum. Genet. (1998); 63: 861869. [PubMed]
Chaib H, Place C, Salem N. et al. A gene responsible for a sensorineural nonsyndromic recessive deafness maps to chromosome 2p22-23. Hum. Molec. Genet. (1996); 5: 155158. [PubMed]
Collins A, Frezal J, Teague J, Morton N E. A metric map of humans: 23,500 loci in 850 bands. Proc. Natl Acad. Sci. USA. (1996); 93: 1477114775. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Guilford P, Ben Arab S, Blanchard S, Levilliers J, Weissenbach J, Belkahia A, Petit C. A non-syndromic form of neurosensory, recessive deafness maps to the pericentromeric region of chromosome 13q. Nature Genet. (1994); 6: 2428. [PubMed]
Houwen R H J, Baharloo S, Blankenship K, Raeymaekers P, Juyn J, Sandkuijl L A, Freimer N B. Genome screening by searching for shared segments: mapping a gene for benign recurrent intrahepatic cholestasis. Nature Genet. (1994); 8: 380386. [PubMed]
Hughes A, Newton V E, Liu X Z, Read A P. A gene for Waardenburg syndrome Type 2 maps close to the human homologue of the microphthalmia gene at chromosome 3p12-p14.1. Nature Genet. (1994); 7: 509512. [PubMed]
Lander E S, Schork N J. Genetic dissection of complex traits. Science. (1994); 265: 20372048. [PubMed]
Mohrenweiser H W, Tsujimoto S, Gordon L, Olsen A S. Regions of sex-specific hypo- and hyper-recombination identified through integration of 180 genetic markers into the metric physical map of human chromosome 19. Genomics. (1998); 47: 153162. [PubMed]
Morton N E. Sequential tests for the detection of linkage. Am. J. Hum. Genet. (1955); 7: 277318. [PubMed]
Morton N E, Lindsten J, Iselius L, Yee S. Data and theory for a revised chiasma map of man. Hum. Genet. (1982); 62: 266270. [PubMed]
Terwilliger J, Ott J (1994) Handbook for Human Genetic Linkage. Johns Hopkins University Press, Baltimore, MD.
Wang D G, Fan J B, Siao C J. et al. Large-scale identification, mapping and genotyping of single nucleotide polymorphisms in the human genome. Science. (1998); 280: 10771082. [PubMed]
Yu J, Lazzeroni L, Qin J, Huang M -M, Navidi W, Ehrlich H, Arnheim N. Individual variation in recombination among human males. Am. J. Hum. Genet. (1996); 59: 11861192. [PubMed]
Zhang L, Cui X, Schmitt K, Hubert R, Navidi W, Arnheim N. Whole genome amplification from a single cell: implications for genetic analysis. Proc. Natl Acad. Sci. USA. (1992); 89: 58475851. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Help ǀ Contact Bookshelf
Human Molecular Genetics 21999
(navigation arrows) Go to previous chapter Go to next chapter Go to top of this page Go to bottom of this page Go to Table of Contents