Standard lod score analysis is called parametric because it requires a precise genetic model, detailing the mode of inheritance, gene frequencies and penetrance of each genotype. As long as a valid model is available, parametric linkage provides a wonderfully powerful method for scanning the genome in 20-Mb segments to locate a disease gene. For mendelian characters, specifying an adequate model should be no great problem. Nonmendelian conditions, however, are much less tractable.
A major problem is establishing diagnostic criteria. With mendelian syndromes it is usually fairly obvious which features of a patient form part of the syndrome and which are coincidental. Different features may have different penetrances, but basically the components of the syndrome are those that cosegregate. No such check exists for nonmendelian conditions. Great efforts are made, especially with psychiatric diseases, to establish diagnostic categories that are valid, in the sense that two independent psychiatrists will agree whether or not a certain label applies to a given patient. But a diagnostic label can be valid without being biologically meaningful. Any mendelian pattern must be biologically meaningful. Without a mendelian pattern, sometimes physiology will provide an alternative reality check but, especially for psychiatric and behavioral phenotypes, the diagnostic criteria are often biologically arbitrary. Adhering to them helps make different studies comparable, but does not guarantee that the right genetic question is being asked.
Once diagnostic criteria are agreed, segregation analysis (Section 19.4) can identify the most likely mode of inheritance, gene frequencies and penetrances. However, these estimates are averages over a probably heterogeneous set of families, and over all the loci within a family, and they are rarely much use for gene mapping. In the face of all these difficulties, there are several possible ways to proceed:
Both breast cancer and schizophrenia are, in most cases, nonmendelian, but rare families can be found with many affected people in a pattern consistent with autosomal dominant inheritance, albeit with reduced penetrance and, for breast cancer, sex limitation. In each case, these families have been used for a genome search using standard lod score analysis. There are two justifications for this strategy. First, the disease may be heterogeneous and include one or more mendelian conditions phenotypically indistinguishable from the nonmendelian majority. Second, the near-mendelian families may represent cases where, by chance, many determinants of the disease are already present in most people, so that the balance is tipped by the mendelian segregation of just one of the normal susceptibility factors. In the first case, identifying the mendelian subset does not necessarily cast any light on the causes of the nonmendelian disease. In the second case, the loci mapped are also susceptibility factors for the common nonmendelian disease.
The breast cancer work led to the identification of the BRCA1 and BRCA2 genes, as described in Chapter 19, whereas the first such attempt in schizophrenia produced a lod score of 6 that is now generally agreed to have been spurious. What was the difference? With hindsight, it is clear that whereas a subset of breast cancer patients really do have a mendelian form of the disease, the apparently mendelian schizophrenia families must have been chance aggregations of affected people within one family. Of itself, this should have simply produced negative lod scores across the whole genome in the schizophrenia families. The other problem (apart from bad luck) was multiple testing. Because the diagnostic criteria for schizophrenia are arbitrary, the researchers tried a number of different criteria, and checked which one gave the highest lod score. This is a perfectly valid procedure - any number of variables can be estimated from a given dataset - but each variable adds more degrees of freedom, and the raw p value needs correcting accordingly.
One solution to the problem of having to specify the penetrance in parametric linkage analysis is to use a parametric method but analyze only the affected family members. The penetrance is irrelevant for affected people, and unaffected members are scored as having an unknown disease phenotype. If the penetrance is low, unaffected people provide relatively little information. We can infer the genotype of affected people (they must have the susceptibility allele), but not of unaffected people, therefore not too much is lost by ignoring unaffected family members. This strategy is useful for testing candidate susceptibility loci for oligogenic diseases. It is often sensible to check for linkage before starting to screen a candidate gene for mutations. Since a parametric analysis is used, it is still necessary to specify a genetic model, and so there is still the danger of getting meaningless results if the model is wrong. The risk of false positives is reduced if the analysis is restricted to checking a few candidate loci. It helps if the disease is rare but distinctive, so that the risk of heterogeneity and of phenocopies is minimized.
If the need to specify a complete genetic model is too daunting, one can use model-free or nonparametric methods of linkage analysis. These methods ignore unaffected people, and look for alleles or chromosomal segments that are shared by affected individuals. Shared segment methods can be used within nuclear families (sib pair analysis, see below), within known extended families, or in whole populations. At the population level they constitute association studies, which are considered in the following section.
(A) By random segregation sib pairs share 0, 1 or 2 parental haplotypes ¼, ½ and ¼ of the time, respectively. (B) Pairs of sibs who are both affected by a dominant condition share one or two parental haplotypes for the relevant chromosomal segment. (C) Pairs of sibs who are both affected by a recessive condition share both parental haplotypes for the relevant chromosomal segment.
Because sib pair analysis is model-free, it can be performed without making any assumptions about the genetics of the disease. Thus it has been used as one of the main tools for seeking genes conferring susceptibility to common nonmendelian diseases like diabetes or schizophrenia. One drawback is that candidate regions defined by sib pair analysis are usually uncomfortably large for positional cloning. Sib pair analysis has no process analogous to the end-game of mendelian mapping, where closer and closer markers are tested until there are no more recombinants. It is not likely that a chromosomal segment can be defined that is shared by all affected sib pairs. If a susceptibility factor is neither necessary nor sufficient for disease, then not all affected sib pairs will share the chromosomal segment that contains the susceptibility locus. Moreover, sib pairs share many segments by chance, including, perhaps, segments that coincidentally lie close to a susceptibility locus. The mathematics of ASP analysis have been detailed by Sham and Zhao (1998), and examples of some systematic applications of ASP analysis to complex diseases are given in Section 19.5.
The affected pedigree member (APM) method of Weeks and Lange (1992) extends the logic of affected sib pair analysis to other relationships. In a complex pedigree with several affected people, for each pair of affected pedigree members the distribution of alleles identical by state is observed, and compared to the expectation on the null hypothesis of no linkage. APM allows multipoint data to be analysed in large pedigrees; however, because it uses IBS and not IBD data, it does not necessarily use all the linkage information that could in theory be extracted from a pedigree.
A more radical approach to nonparametric analysis of complex pedigrees is implemented in the genehunter program of Kruglyak et al. (1996). This is based on a generalization of the mapmaker/sibs program for analysis of multipoint ASP data mentioned above. The basic algorithm in these programs is able to handle any number of loci (the computing time increases linearly with the number of loci), but is limited to fairly small pedigrees. Pedigrees contain founders (people whose parents are not included in the pedigree) and nonfounders (people whose parents are included). If somebody has a sib in the pedigree, then they must be nonfounders, because the only way to tell the computer that they are sibs is to include the parents. If a pedigree contains f founders and n nonfounders, the genehunter computing time increases exponentially with (2n - f). Current versions fail to cope with pedigrees where 2n - f > 16.
Provided a pedigree falls within the size limit, genehunter can include any number of loci in a multipoint analysis. It is in fact able to compute parametric lod scores, if a concrete genetic model is provided. For complex characters where no model can be provided, the result is expressed as a nonparametric lod (NPL) score. These are based on calculating the extent to which affected relatives share alleles identical by descent, and comparing the result across all affected pedigree members with the null hypothesis of simple mendelian segregation (markers will segregate according to mendelian ratios unless the segregation is distorted by linkage or association). This method appears to extract the linkage information from a pedigree more efficiently than the APM method. However, the threshold of significance for a NPL is not so obvious as with the parametric lod score that would be calculated for a single pair of mendelian characters. The significance is best expressed as a genome-wide p value, as discussed in Section 12.5.2.
In principle, linkage and association are totally different phenomena. Association is simply a statistical statement about the co-occurrence of alleles or phenotypes. Allele A is associated with disease D if people who have D also have A more (or maybe less) often than would be predicted from the individual frequencies of D and A in the population. For example, HLA-DR4 is found in 36% of the general UK population but 78% of people with rheumatoid arthritis. An association can have many possible causes, not all genetic (see below). Linkage, on the other hand, is a specific genetic relationship between loci (not alleles or phenotypes). Linkage does not of itself produce any association in the general population. The STR45 locus is linked to the dystrophin locus. Within a family where a dystrophin mutation is segregating, we would expect affected people to have the same allele of STR45, but over the whole population the distribution of STR45 alleles is just the same in people with and without muscular dystrophy. Thus linkage creates associations within families, but not among unrelated people. However, if two supposedly unrelated people with disease D have actually inherited it from a distant common ancestor, they may well also tend to share particular ancestral alleles at loci closely linked to D. Where the family and the population merge, linkage and association merge.
A fully-outbred person has 2n ancestors n generations ago. If the UK population were fully outbred, two ‘unrelated’ present-day people would have shared ancestors in 1500, if not more recently. Reprinted from Read (1989) Medical Genetics: An Illustrated Outline, by permission of Mosby.
For a locus showing recombination fraction θ with the susceptibility locus, a proportion θ of ancestral chromosomes will lose the association each generation, and a proportion (1-θ) will retain it. After n generations, a fraction (1-θ)n of chromosomes will retain the association. Considering the 44 meioses that may separate our two patients, loci showing 1% recombination per meiosis would have a better than 50% chance of remaining in the same combination, since (0.99)44 = 0.64. Loci 3 cM apart would have (0.97)44 = 0.26 chance of remaining together. This argument is grossly simplified because it ignores population substructure and assumes the entire British population has been one freely interbreeding unit over the past 500 years. However, for what it is worth, it suggests that allelic associations reflecting sharing of ancestral chromosomes in the British population might begin to be noticeable for loci within 2–3 cM of each other.
Although this calculation is crude, it does show how the extent of allelic association depends on the history of the population concerned. There has been considerable debate about the differences to be expected when comparing a population that has expanded rapidly from a relatively recent bottleneck with one that has maintained or only gradually increased its size over many generations. The Finns would typify the former and the British the latter. Diseases, being subject to selection, are likely to be more mutationally homogeneous in a rapidly expanded population; the mutant alleles found will reflect those present in the founder population. For selectively neutral markers the position is less clear. In a study of 20 microsatellites from chromosome 18q21 in 664 British and 430 Finnish subjects, Eaves et al. (1998) observed significant disequilibrium for all 53 pairs of loci less than 1 cM apart in both populations, for 20/75 (UK) and 61/75 (Finland) pairs 1–3 cM apart, and for 0/62 (UK) and 20/66 (Finland) pairs more than 3 cM apart. In other words, disequilibrium is present in both populations, but extends over a longer range in Finns.
Searching for population associations is an attractive option for identifying disease susceptibility genes. Association studies are easier to conduct than linkage analysis, because no multicase families or special family structures are needed. Also, because linkage disequilibrium is a short-range phenomenon, if an association is found, it defines a small candidate region in which to search for the susceptibility gene. Finally, recent work suggests that association is more powerful than linkage for detecting weak susceptibility alleles (Section 12.5.4). However, there are several pitfalls to be avoided if a claimed association is to provide a reliable pointer to a nearby susceptibility locus.
Linkage disequilibrium is not the only possible reason for an association between a disease D and allele A. Possible causes include the following:
Direct causation - having allele A makes you susceptible to disease D. Possession of A is neither necessary nor sufficient for somebody to develop D, but it increases the likelihood. In this case one would expect to see the same allele A associated with the disease in any population studied (unless the causes of the disease vary from one population to another).
Natural selection - people who have disease D might be more likely to survive and have children if they also have allele A.
Population stratification - the population contains several genetically distinct subsets. Both the disease and allele A happen to be particularly frequent in one subset. Lander and Schork (1994) give the example of the association in the San Francisco Bay area between HLA-A1 and ability to eat with chopsticks. HLA-A1 is more frequent among Chinese than among Caucasians.
Statistical artefact - association studies often test a range of loci, each with several alleles, for association with a disease. The raw p values need correcting for the number of questions asked (Section 12.5.1). In the past, researchers often applied inadequate corrections, and associations were reported that could not be replicated in subsequent studies.
Linkage disequilibrium - close linkage can produce allelic association at the population level, provided that most disease-bearing chromosomes in the population are descended from one or a few ancestral chromosomes. If linkage disequilibrium is the cause of the association, there should be a gene near to the A locus that has mutations in people with disease D. The particular allele at the A locus that is associated with disease D may be different in different populations
Direct causation and selective advantage are unlikely if the associated allele is a variant in the noncoding DNA and not closely associated with any gene, but studies in several ethnically distinct populations are useful to help distinguish these causes of association from linkage disequilibrium. Statistical artefacts are reduced by proper correction of probabilities (see Section 12.5.1).
The choice of the control group in association studies is crucial. Many studies in the past have used published gene frequencies, often without adequate certainty that these frequencies were representative of the population from whom the patients were recruited. Alternatively, students or staff from the investigator's university may be used as a control series. Again, this is undesirable because they may well not be typical of the population from which the patients were drawn. Thus, when an association is found, it may be impossible to know whether it is caused by linkage disequilibrium with a susceptibility locus or by inadequately matched controls.
Recently, a clutch of methods have been developed that largely circumvent the stratification problem. Collectively they can be called association studies with internal controls. They involve 50% more work than standard case-control studies because three people (proband and parents) are typed in each family. This seems a small price to pay for the gain in reliability. Parents must be available, which restricts the usefulness of these tests for late-onset diseases. One method, the haplotype relative risk (HRR) method, handles the data like typical case-control data, except that the control is not a real person but is made out of the two alleles that the parents did not transmit to their affected offspring.
The most popular method is the transmission disequilibrium test (TDT; Schaid, 1998). The TDT starts with couples who have one or more affected offspring. It is irrelevant whether either parent is affected or not. To test whether marker allele M1 is associated with the disease, we select those parents who are heterozygous for M1. The test simply compares the number of such parents who transmit M1 to their affected offspring with the number who transmit their other allele (Box 12.1). The result is unaffected by population stratification. The TDT can be used when only one parent is available, but this may bias the result (Schaid, 1998). There has been some argument about whether the TDT is a test of linkage or association. Since it asks questions about alleles and not loci, it is fundamentally a test of association. The associated allele may itself be a susceptibility factor, or it may be in linkage disequilibrium with a susceptibility allele at a nearby locus. The TDT cannot detect linkage if there is no disequilibrium - a point to remember when considering schemes to use the TDT for whole-genome scans.
| Marker alleles | CF chromosomes | Normal chromosomes |
|---|---|---|
| X1, K1 | 3 | 49 |
| X1, K2 | 147 | 19 |
| X2, K1 | 8 | 70 |
| X2, K2 | 8 | 25 |
Data from typing for the RFLP markers XV2.c (alleles X1 and X2) and KM19 (alleles K1 and K2) in 114 British families with a cystic fibrosis (CF) child. Chromosomes carrying the CF disease mutation tend also to carry allele X1 of XV2.c and allele K2 of KM19. Data derived from Ivinson et al. 1989.
16 markers from 8p21, shown in chromosomal order across the top of the table, were used to generate 74 NBS-associated haplotypes in unrelated patients. Alleles attributed to an ancestral haplotype are marked A. Other alleles are shaded. Where there are no data, cells are left blank. Non-ancestral alleles are marked with the number of nucleotides by which they differ from the ancestral allele. Differences of two nucleotides are often the result of a mutation of the marker, but larger differences are likely to be the result of ancestral recombinations. All 74 haplotypes have the ancestral alleles at markers 11 and 12, which therefore indicated the likely location of the NBS gene. After Varon et al, 1998.
For positional cloning of a disease where a large number of patients are available, quantitative measures of linkage disequilibrium can be calculated for a series of markers across the target region. Hopefully the disease gene will be located at the peak of disequilibrium. The simplest measures of disequilibrium are affected by the gene frequencies. A better measure is the Yule coefficient (Krawczak and Schmidtke, 1998). For two loci A and B with alleles A1, A2, B1 and B2, this is

where p1,1 and p1,2 are the frequency of allele A1 on chromosomes carrying alleles B1 and B2, respectively.
S10, S125, etc. are shorthand for the DNA markers D4S10, D4S125, etc., shown in their map positions relative to the HD locus. The total distance represented is 2500 kb. For some loci, several different RFLPs exist, which sometimes show very different allelic association, for example marker S95 (see text). Linkage disequilibrium is measured by the Yule coefficient. From Krawczak and Schmidtke (1998) DNA Fingerprinting, 2nd edn, BIOS Scientific Publishers.
Whereas most mendelian loci localized by significant lod scores have been successfully cloned, the history of complex disease analysis has been marked by a succession of false dawns and irreproducible results. Innumerable HLA-disease associations have been reported, but few proved reproducible. Positive lod scores in families with schizophrenia turned into a serious embarrassment (reviewed by Byerley, 1989). More recently, there are a number of complex diseases where several different groups have undertaken independent large-scale sib pair analyses. The candidate regions defined in the different studies have seldom coincided. Risch and Botstein (1996) outline a typical history, that of manic-depressive psychosis, and similar results with multiple sclerosis are discussed in Section 19.5.5. Whatever the exact cause of these problems in the various cases, a clear common thread is the difficulty of deciding when to call the results of a linkage or association study significant.
| D1 | D2 | D3 | D4 | D5 | D6 | D7 | D8 | D9 | D10 | |
|---|---|---|---|---|---|---|---|---|---|---|
| M1 | 0.29 | 0.47 | 0.80 | 0.47 | 0.36 | 0.13 | 0.93 | 0.15 | 0.08 | 0.08 |
| M2 | 0.21 | 0.26 | 0.38 | 0.55 | 0.96 | 0.61 | 0.46 | 0.28 | 0.10 | 0.40 |
| M3 | 0.36 | 0.87 | 0.61 | 0.76 | 0.80 | 0.51 | 0.44 | 0.11 | 0.76 | 0.99 |
| M4 | 0.12 | 0.77 | 0.20 | 0.68 | 0.88 | 0.47 | 0.39 | 0.05 | 0.50 | 0.53 |
| M5 | 0.09 | 0.56 | 0.01 | 0.93 | 0.24 | 0.81 | 0.18 | 0.28 | 0.04 | 0.18 |
| M6 | 0.61 | 0.83 | 0.27 | 0.95 | 0.66 | 0.03 | 0.24 | 0.05 | 0.03 | 0.87 |
| M7 | 0.63 | 0.64 | 0.12 | 0.33 | 0.76 | 0.09 | 0.54 | 0.77 | 0.42 | 0.09 |
| M8 | 0.24 | 0.12 | 0.06 | 0.65 | 0.98 | 0.52 | 0.91 | 0.63 | 0.68 | 0.23 |
| M9 | 0.36 | 0.03 | 0.15 | 0.62 | 0.68 | 0.88 | 0.15 | 0.96 | 0.94 | 0.55 |
| M10 | 0.27 | 0.94 | 0.31 | 0.32 | 0.54 | 0.06 | 0.20 | 0.63 | 0.53 | 0.38 |
Panels of patients with diseases (D1-D10) and a panel of controls were typed for markers (M1-M10). For each possible association the p value is tabulated. In reality none of the diseases is associated with any of the markers, but five of the 100 p values are significant at the 5% level, including one at the 1% level. This is of course exactly what is expected of a series of 100 random numbers. If n questions are asked, the appropriate threshold of significance is p = 0.05/n (Bonferroni correction).
The problems of deciding appropriate thresholds of significance are partly technical and partly philosophical. We have already noted the distinction between pointwise (or nominal) and genome-wide significance (Section 11.3.4):
The pointwise p value of a linkage statistic is the probability of exceeding the observed value at a specified position in the genome, assuming the null hypothesis of no linkage.
The genome-wide p value is the probability that the observed value will be exceeded anywhere in the genome, assuming the null hypothesis of no linkage.
For a whole-genome study, the appropriate significance threshold is a value where the probability of finding a false positive anywhere in the genome is 0.05. This will be more stringent than the pointwise threshold for a single test. But suppose an association study finds a significant result (pointwise p < 0.05) with the very first marker tested. Had the result been negative, the researchers would no doubt have gone on to test marker after marker until either they found something or else they had got negative results across the whole genome. Should they apply the genome-wide threshold, even though they did only one test?
According to Lander and Kruglyak (1995), the genome-wide false-positive rate, αT* is related to the pointwise false positive rate, αT by the equation

T is the threshold lod score; C = 23, the number of chromosomes, and G = 33, the total length of the genome in Morgans. The parameter ρ measures the crossover rate, and takes different values depending on the relationship being studied, so that the formula cannot be simply applied to complex pedigrees. For affected sib pairs the formula suggests genome-wide lod score thresholds of 3.6 for IBD testing and 4.0 for IBS testing. Note that the formula applies strictly only to large samples and to stringent thresholds.
Because the associations underlying TDT tests operate over much shorter chromosomal distances than the linkage underlying ASP testing, and because TDT, as an association test, must be performed separately for every allele of each locus, the total number of tests needed for a genome-wide scan by TDT is huge. Risch and Merikangas (1996) considered the ultimate case of testing five diallelic polymorphisms at each of 100 000 gene loci by TDT. Applying a full Bonferroni correction for 1 million independent tests means the threshold significance for a positive result is p = 5 × 10-8.
Most complex disease studies avoid these theoretical approaches by basing the significance threshold on simulation. Typically, 1000 replicates of the family collection are generated by computer with random genotypes, but based on correct allele frequencies, recombination fractions, etc. A whole-genome search is conducted in each simulated dataset and the maximum lod score noted. The genome-wide threshold of significance is taken as a score that is exceeded in less than 5% of replicates.
In response to the frequent failure to replicate claimed localizations of disease susceptibility genes, Lander and Kruglyak (1995) proposed a series of thresholds:
Suggestive linkage is a lod score or p value that would be expected to occur once by chance in a whole genome scan.
Significant linkage is a lod score or p value that would be expected to occur by chance 0.05 times in a whole genome scan (i.e. the conventional p = 0.05 threshold of significance)
Highly suggestive linkage is a lod score or p value that would be expected to occur by chance 0.001 times in a whole genome scan.
Confirmed linkage - linkage is to be regarded as confirmed when a significant linkage observed in one study is confirmed by finding a lod score or p value that would be expected to occur 0.01 times by chance in a specific search of the candidate region.
The pointwise p values for significant linkage work out at 1 - 5 × 10-5 for different genome-wide study designs. Note that these values do not imply threshold lod scores of 4.3–5.0. A lod score of 5 means that the data are 105 times more likely on the given linkage hypothesis than on the null hypothesis; a p value of 10-5 means that the stated lod score will be exceeded only once in 105 times, given the null hypothesis. The two measures are not the same. The lod scores for genome-wide significant linkage are in the range 3.3–4.0, again depending on the study design. For some discussion of the Lander and Kruglyak criteria, see the correspondence section of the April 1996 issue of Nature Genetics.
| ASP analysis | TDT analysis | ||||
|---|---|---|---|---|---|
| γ | p | Y | N-ASP | P(trA) | N-TDT |
| 5 | 0.01 | 0.534 | 2530 | 0.830 | 747 |
| 0.1 | 0.634 | 161 | 0.830 | 108 | |
| 0.5 | 0.591 | 355 | 0.830 | 83 | |
| 3 | 0.01 | 0.509 | 33797 | 0.750 | 1960 |
| 0.1 | 0.556 | 953 | 0.750 | 251 | |
| 0.5 | 0.556 | 953 | 0.750 | 150 | |
| 2 | 0.1 | 0.518 | 9167 | 0.667 | 696 |
| 0.5 | 0.526 | 4254 | 0.667 | 340 | |
| 1.5 | 0.1 | 0.505 | 115537 | 0.600 | 2219 |
| 0.5 | 0.510 | 30660 | 0.600 | 950 | |
| 1.2 | 0.1 | 0.501 | 3951997 | 0.545 | 11868 |
| 0.5 | 0.502 | 696099 | 0.545 | 4606 | |
γ is the relative risk for individuals of genotype Aa compared to aa; p is the frequency of the A susceptibility allele. For affected sib pair (ASP) analysis, Y is the expected allele sharing and N-ASP the number of pairs required for significance, based on IBD testing (α = 3 × 10-5). For transmission disequilibrium testing (TDT), P(trA) is the probability that an Aa parent will transmit A to an affected child, and N-TDT is the number of parent-child trios required for significance. After Risch and Merikangas (1996).
Their conclusion is clear: ASP analysis would require unfeasibly large samples to detect susceptibility loci conferring a relative risk of less than about 3, whereas TDT might detect loci giving a relative risk below 2 with manageable sample sizes. Susceptibility genes conferring a relative risk below 1.5 would be very hard to find by either method. Note, however, that their result is obtained with one particular genetic model, and might not apply to all. In particular, linkage disequilibrium is not necessarily present between alleles at tightly-linked loci.
In many ways linkage and association provide complementary data. Linkage operates over a long chromosomal range. Linkage analysis, whether parametric or nonparametric, can scan the entire genome in a few hundred tests. A typical study of 250 sib pairs with 300 markers would require 1.5–3 × 105 genotypes to be generated (depending whether or not the parents were typed). Such a study might be completed in a few months by a well-organized and well-funded laboratory using an automated fluorescence sequencer. However, as noted (Section 12.2.2), candidate regions defined by linkage are usually uncomfortably large for positional cloning.
Association tests like the TDT have the opposite characteristics. Linkage disequilibrium is seldom striking over more than a megabase, so a genome screen by TDT would involve huge numbers of tests; on the other hand, a positive result would localize the susceptibility factor rather accurately. A natural study design is therefore to start with a genome-wide screen by linkage, probably in affected sib pairs, and then, once an initial localization has been achieved, to narrow the candidate region by linkage disequilibrium mapping.
It is important to remember that linkage disequilibrium is not an inevitable result of tight linkage. Association due to disequilibrium will be seen only if a significant proportion of the disease chromosomes derive from one not too distant common ancestor. There is a balance in this. Some serious dominant or X-linked mendelian diseases show no linkage disequilibrium because natural selection ensures a rapid turnover of disease genes, and most affected people are the result of independent mutations. For susceptibility factors in common disease, the problem is more likely to lie at the opposite end of the spectrum. Susceptibility factors may be common variants that have existed in the population at high frequency for a very long time, and that are nonpathogenic except when they get into bad company. A very old variant may have reached linkage equilibrium with adjacent markers. Equally, if many different changes to a given gene each acts as a susceptibility factor (in the same way that many different changes can cause loss of function, see Figure 16.1), then there may be no linkage disequilibrium. Therefore even if a susceptibility factor can be roughly localized by linkage, it does not necessarily follow that it can be fine-mapped by linkage disequilibrium or a method such as TDT that relies on it.