• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ajhgLink to Publisher's site
Am J Hum Genet. Oct 2004; 75(4): 687–692.
Published online Aug 13, 2004.
PMCID: PMC1182056

Guidelines for Genotyping in Genomewide Linkage Studies: Single-Nucleotide–Polymorphism Maps Versus Microsatellite Maps

Abstract

Genomewide linkage scans have traditionally employed panels of microsatellite markers spaced at intervals of ~10 cM across the genome. However, there is a growing realization that a map of closely spaced single-nucleotide polymorphisms (SNPs) may offer equal or superior power to detect linkage, compared with low-density microsatellite maps. We performed a series of simulations to calculate the information content associated with microsatellite and SNP maps across a range of different marker densities and heterozygosities for sib pairs (with and without parental genotypes), sib trios, and sib quads. In the case of microsatellite markers, we varied density across 11 levels (1 marker every 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 cM) and marker heterozygosity across 6 levels (2, 3, 4, 5, 10, or 20 equally frequent alleles), whereas, in the case of SNPs, we varied marker density across 4 levels (1 marker every 0.1, 0.2, 0.5, or 1 cM) and minor-allele frequency across 7 levels (0.5, 0.4, 0.3, 0.2, 0.1, 0.05, and 0.01). When parental genotypes were available, a map consisting of microsatellites spaced every 2 cM or a relatively sparse map of SNPs (i.e., at least 1 SNP/cM) was sufficient to extract most of the inheritance information from the map (>95% in most cases). However, when parental genotypes were unavailable, it was important to use as dense a map of markers as possible to extract the greatest amount of inheritance information. It is important to note that the information content associated with a traditional map of microsatellite markers (i.e., 1 marker every ~10 cM) was significantly lower than the information content associated with a dense map of SNPs or microsatellites. These results strongly suggest that previous linkage studies that employed sparse microsatellite maps could benefit substantially from reanalysis by use of a denser map of markers.

The past few years have witnessed an explosion in the number of linkage studies of complex diseases and traits (Kostanje and Paigen 2002). Typically, these studies have involved genomewide scans that use low-density maps of microsatellite markers that are spaced at intervals of ~10 cM across the genome. To maximize the chances of detecting linkage, it is critical that any map of markers extracts the optimum amount of inheritance information (Kruglyak and Lander 1995; Kruglyak 1997).

More recently, the discovery of SNPs and the development of automated high-throughput genotyping methods have enabled investigators to type thousands of markers across the genome quickly and economically. Although SNPs are biallelic and usually have lower heterozygosities than microsatellite markers, they are present at a greater density throughout the genome and are associated with lower genotyping error rates than their microsatellite counterparts (Kennedy et al. 2003). Indeed, there is growing evidence that a map of closely spaced SNPs may offer several advantages over low-density microsatellite maps, including superior power to detect linkage (Kruglyak 1997; Wilson and Sorant 2000; Goddard and Wijsman 2002; Matise et al. 2003; Middleton et al. 2004) and improved localization of the underlying disease/trait locus (John et al. 2004). However, previous simulation studies have only examined sparse maps of SNPs, with a limited range of pedigree structures. For example, the highest density that Kruglyak (1997) examined was 1 SNP/cM, and this was only in the case of cousin pairs for which parental genotypes were available. Very recently, reanalysis of existing microsatellite scans with a denser map of SNPs found significant linkages missed by the initial scans (Middleton et al. 2004). Given that SNP technology has evolved to allow high-throughput, high-density genotyping at densities >1 SNP/cM, it is important to determine whether high-density maps offer significant benefits over lower-density maps, in terms of power to detect linkage. In this study, we examined the information content associated with both microsatellite and high-density SNP maps in the context of the most common types of pedigree designs—nuclear family and sibling studies. In particular, we were interested not only in the conditions under which a dense map of SNPs might provide an advantage over traditional microsatellite maps but also in what the optimal density of markers might be for each of these pedigree structures.

For all simulations, we segregated 100-cM chromosomes in 200 pedigrees. Pedigrees consisted of either a sib pair with parental genotypes or a sib pair, trio, or quad without parental genotypes. In the case of the microsatellite maps, we simulated equally spaced markers every 0.5, 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 cM, with each marker consisting of 2, 3, 4, 5, 10, or 20 equally frequent alleles. In the case of the SNP maps, we varied the spacing of the markers so that SNPs were equally spaced every 0.1, 0.2, 0.5, or 1 cM. We also varied the minor-allele frequency (MAF) of the SNPs across seven levels: 0.5, 0.4, 0.3, 0.2, 0.1, 0.05, and 0.01. This range of parameters encompasses all currently available SNP and microsatellite linkage panels, including the commercially available 4,600- and 10,000-SNP panels (Oliphant et al. 2002; Matsuzaki et al. 2004), the 300–400-microsatellite sets widely used in the Applied Biosystems and Cooperative Human Linkage Center collections (Dib et al. 1996; Broman et al. 1998), and the recent deCODE set of 1,068 microsatellite markers (Helgadottir et al. 2004). Information content was computed using the program MERLIN (Abecasis et al. 2002) and was averaged over the 100 simulations either at the marker closest to the middle of the chromosome or halfway between the two markers closest to the middle of the chromosome (Kruglyak 1997).

Figure 1 displays the information content associated with microsatellite maps across the different heterozygosities, marker densities, and pedigree structures. As expected, information content increased with heterozygosity and was greatest at markers and lowest in between markers. Information content also increased with marker density, although this depended strongly on the availability of parental genotypes. When parents were genotyped, a density of 1 microsatellite every 1 or 2 cM was sufficient to extract nearly 100% of the inheritance information from the map (see left-hand panels in fig. 1). Thus, when parental genotypes are available, there is little point in genotyping at a density >1 microsatellite/2 cM. However, most linkage studies have employed a sparse map of 300–400 microsatellite markers, which are spaced every 10 cM, on average, across the genome (Altmüller et al. 2001). With this map, the information content dipped to ~70% in between markers. Even recent scans that have employed a denser panel of ~1,600 microsatellite markers from the deCODE set (i.e., ~1 marker/3 cM) do not quite extract the maximum amount of inheritance information and could benefit from an increase in marker density (e.g., Helgadottir et al. 2004).

Figure  1
Information content for microsatellites, as a function of marker density (X-axis) and marker heterozygosity (colored lines). Information content is calculated at the middle marker of the chromosome (A) and halfway between the two middle markers (B).

In contrast, when parental genotypes were unavailable, the information content associated with the highest density of microsatellites we examined was only ~70%—approximately the same as that associated with a sparse map of microsatellite markers when parents were genotyped. We note that, although it would be theoretically possible to examine higher densities of microsatellite markers in our simulations, this would be unrealistic, since the density of microsatellites in the human genome is not much greater than 1 marker/0.5 cM. Most alarming, the information content associated with a traditional map of microsatellites (i.e., 1 marker/10 cM) was as low as ~30% in between markers. This figure is dramatically lower than the figure of 58% reported by Kruglyak (1997) in the case of cousin pairs with parental genotypes available. Since the expected LOD score, and therefore the power of linkage analysis, is proportional to the amount of information extracted from the map (Kruglyak 1997), this result suggests that the majority of sib-pair linkage studies that have used the typical sets of 300–400 microsatellite markers have been seriously underpowered.

A previous study by Kruglyak and Lander (1995) reported trends similar to our results, but with higher values for information content. For example, in the case of sib pairs without parental genotypes, Kruglyak and Lander (1995) reported that the information content associated with a microsatellite map of 1 marker/10 cM was >60% (as compared with our figure of ~30%). The reason for this discrepancy is that Kruglyak and Lander (1995) used an older definition of information content that was based on the variance associated with the distribution of estimated identical-by-descent probabilities. This has since been shown to be inferior to an entropy-based measure (the classical measure of information content) that scales linearly with the expected LOD score (Kruglyak 1997). We used this more-appropriate entropy measure, as implemented in MERLIN (Abecasis et al. 2002).

Table 1 displays the information content associated with SNP maps as a function of marker density, allele frequency, and pedigree structure. Increasing the density of SNPs has little effect when parents have been genotyped, since, even at relatively low densities (e.g., 1 SNP/cM [~3,000 SNPs genomewide]), the majority of inheritance information has been extracted from the pedigree. Therefore, if parents can be genotyped, a sparse map of SNPs should suffice. In contrast, when parental genotypes were not available, increasing the marker density significantly increased the amount of inheritance information extracted—in some cases, by [gt-or-equal, slanted]20%. This effect was most marked for sib pairs and decreased as more siblings were added to the pedigree. This is expected, since typing additional siblings increases certainty about inheritance, even at sparse marker densities. We note that when parents were not typed, increasing the density of markers beyond ~2–5 SNPs/cM (6,000–15,000 SNPs total) produced increasingly diminishing returns. In the end, it will be the investigator who decides whether the additional cost associated with genotyping at a higher density is worth the small gain in information. Finally, we note that, except in the case of rare SNPs (<5% MAF), allele frequency was far less important in determining information content than SNP density (Kruglyak 1997).

Table 1
Information Content Calculated at SNP Markers as a Function of Marker Density, MAF, and Pedigree Structure[Note]

Which is superior in terms of extracting inheritance information—a dense map of SNPs or a map of microsatellites? A comparison of figure 1 and table 1 shows that, in most cases, a very dense map of SNPs (i.e., 1 SNP/0.1 cM) performs similarly to, or only marginally better than, a relatively dense map (i.e., 1 marker/0.5 cM) of microsatellite markers. It is important to note that a dense map of SNPs extracts considerably more information than a sparse map of 300–400 microsatellites (fig. 2). For example, in the case of sib pairs without parental genotypes, reanalysis with a map consisting of 5 SNPs/cM (~15,000 SNPs) across the genome, which is close to the density offered by some current gene chips (Matsuzaki et al. 2004), would approximately double the inheritance information extracted from the sib pair. Such a map of SNPs would offer greater power than that associated with the densest microsatellite panels currently available (fig. 2). SNPs also have some advantages relative to microsatellites, in that they are associated with a lower genotype error (Kennedy et al. 2003) and are amenable to automation and thus may be cheaper in terms of cost and labor.

Figure  2
Information content associated with a single 100-cM simulated region for a variety of SNP and microsatellite panels. Each line is representative of a marker panel currently available. Broken lines represent sparse marker densities (1 SNP/0.5 cM or 1 microsatellite/10 ...

These results point to some clear guidelines for genotyping in linkage studies, but it is important to note some assumptions on which they are based. First, we have assumed that an accurate genetic map of SNP markers is available, when this may not be the case in reality. Misspecification of genetic distances may result in decreased power to detect linkage (Daw et al. 2000). Although locations of SNPs could be interpolated by use of currently available maps, the best option would be the construction of a genetic map of SNPs like the deCODE microsatellite map (Kong et al. 2002). Also, we have assumed the absence of linkage disequilibrium in the calculation of information content at very high marker densities. The software package MERLIN uses a modification of the Lander-Green algorithm, which assumes linkage equilibrium between genetic markers (Abecasis et al. 2002). It has been demonstrated that this algorithm can give misleading results with missing data in the presence of linkage disequilibrium (Schaid et al. 2002). Although several recent studies have shown that the extent and distribution of linkage disequilibrium is extremely variable throughout the genome, in most cases significant linkage disequilibrium does not influence markers separated by >0.1 cM in outbred populations (Dawson et al. 2002; Phillips et al. 2003; Ke et al. 2004). Last, we have not investigated the effect that genotyping error may have on the results of these simulations. It is well known that multipoint linkage analysis is extremely sensitive to genotyping error and that error rates as small as 1% can significantly decrease the power to detect loci (Douglas et al. 2000; Abecasis et al. 2001). Thus, if an increase in marker density also increases the number of genotyping errors present in the data, the net effect may actually be a decrease in the power to detect linkage. We suggest that the consequences of genotyping error for high-density linkage scans be thoroughly investigated in future studies.

The present findings suggest a number of guidelines for genotyping in linkage studies. First, genotyping parents is the most effective way to ensure that the maximum amount of inheritance information is extracted from any panel of markers. This holds true regardless of marker density and regardless of whether SNPs or microsatellite markers are typed. Of course, it is not always possible to genotype parents, as is often the case in late-onset diseases and psychiatric disorders. One compromise is to genotype additional siblings. Such a strategy not only increases the amount of inheritance information extracted from the marker panel but also provides a more powerful analysis, since more pairwise comparisons are provided in the relationship (Dolan et al. 1999; Williams and Blangero 1999).

Second, performing genomewide linkage analysis by use of a dense map of SNPs is preferable to performing linkage analysis by use of a sparse map of microsatellites. When parental genotypes are available, a moderately dense map of SNPs or microsatellites should suffice (i.e., ~1 marker/1 cM in the case of SNPs and 1 marker/2 cM in the case of microsatellites). When parental genotypes are unavailable, the higher the density of markers, the better.

Finally, the very low values of information content associated with sparse panels of microsatellite markers suggest that previous linkage studies that have employed these panels would benefit substantially from reanalysis with a dense map of SNPs. This is particularly true for sib-pair studies in which parents have not been genotyped. Several recent studies that have reanalyzed existing microsatellite scans with a denser map of SNPs have found either suggestive or significant linkages missed by the initial scans (John et al. 2004; Middleton et al. 2004). Our results suggest that reanalysis with a denser map of markers will result in a substantial gain in the power to detect linkage.

Acknowledgments

This work was supported by Affymetrix, the Wellcome Trust, the SNP Consortium, the National Institutes of Health (EY-126562 [to L.R.C]), and the Medical Research Council (G9801327).

References

Abecasis GR, Cherny SS, Cardon LR (2001) The impact of genotyping error on family-based analysis of quantitative traits. Eur J Hum Genet 9:130–134. [PubMed]
Abecasis GR, Cherny SS, Cookson WO, Cardon LR (2002) Merlin—rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet 30:97–101. [PubMed] [Cross Ref]10.1038/ng786
Altmüller J, Palmer LJ, Fischer G, Scherb H, Wjst M (2001) Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 69:936–950. [PMC free article] [PubMed]
Broman KW, Murray JC, Sheffield VC, White RL, Weber JL (1998) Comprehensive human genetic maps: individual and sex-specific variation in recombination. Am J Hum Genet 63:861–869. [PMC free article] [PubMed]
Daw EW, Thompson EA, Wijsman EM (2000) Bias in multipoint linkage analysis arising from map misspecification. Genet Epidemiol 19:366–380. [PubMed] [Cross Ref]10.1002/1098-2272(200012)19:4<366::AID-GEPI8>3.0.CO;2-F
Dawson E, Abecasis G, Bumpstead S, Chen Y, Hunt S, Beare DM, Pabial J, et al (2002) A first-generation linkage disequilibrium map of human chromosome 22. Nature 418:544–548. [PubMed] [Cross Ref]10.1038/nature00864
Dib C, Faure S, Fizames C, Samson D, Drouot N, Vignal A, Millasseau P, Marc S, Hazan J, Seboun E, Lathrop M, Gyapay G, Morissette J, Weissenbach J (1996) A comprehensive genetic map of the human genome based on 5,264 microsatellites. Nature 380:152–154. [PubMed] [Cross Ref]10.1038/380152a0
Dolan CV, Boomsma DI, Neale MC (1999) A note on the power provided by sibships of sizes 2, 3, and 4 in genetic covariance modeling of a codominant QTL. Behav Genet 29:163–170. [PubMed] [Cross Ref]10.1023/A:1021687817609
Douglas JA, Boehnke M, Lange K (2000) A multipoint method for detecting genotype errors and mutations in sibling-pair linkage data. Am J Hum Genet 66:1287–1297. [PMC free article] [PubMed]
Goddard KAB, Wijsman EM (2002) Characteristics of genetic markers and maps for cost-effective genome screens using diallelic markers. Genet Epidemiol 22:205–220. [PubMed] [Cross Ref]10.1002/gepi.0177
Helgadottir A, Manolescu A, Thorleifsson G, Gretasdottir S, Jonsdottir H, Thorsteinsdottir G, Samani NJ, et al (2004) The gene encoding 5-lipoxygenase activating protein confers risk of myocardial infarction and stroke. Nat Genet 36:233–239. [PubMed] [Cross Ref]10.1038/ng1311
John S, Shephard N, Liu G, Zeggini E, Cao M, Chen W, Vasavda N, Mills T, Barton A, Hinks A, Eyre S, Jones KW, Ollier W, Silman A, Gibson N, Worthington J, Kennedy GC (2004) Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am J Hum Genet 75:54–64. [PMC free article] [PubMed]
Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588. [PubMed] [Cross Ref]10.1093/hmg/ddh060
Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, Su X, Cao M, Chen W, Zhang J, Liu W, Yang G, Di X, Ryder T, He Z, Surti U, Phillips MS, Boyce-Jacino MT, Fodor SP, Jones KW (2003) Large scale genotyping of complex DNA. Nat Biotechnol 21:1233–1237. [PubMed] [Cross Ref]10.1038/nbt869
Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, Shlien A, Palsson ST, Frigge ML, Thorgeirsson TE, Gulcher JR, Stefansson K (2002) A high-resolution map of the human genome. Nat Genet 31:241–247. [PubMed]
Kostanje R, Paigen B (2002) From QTL to gene: the harvest begins. Nat Genet 31:235–236. [PubMed] [Cross Ref]10.1038/ng0702-235
Kruglyak L (1997) The use of a genetic map of biallelic markers in linkage studies. Nat Genet 17:21–24. [PubMed] [Cross Ref]10.1038/ng0997-21
Kruglyak L, Lander ES (1995) Complete multipoint sib-pair analysis of qualitative and quantitative traits. Am J Hum Genet 57:439–454. [PMC free article] [PubMed]
Matise TC, Sachidanandam R, Clark AG, Kruglyak L, Wijsman E, Kakol J, Buyske S, et al (2003) A 3.9-centimorgan-resolution human single-nucleotide polymorphism linkage map and screening set. Am J Hum Genet 73:271–284. [PMC free article] [PubMed]
Matsuzaki H, Loi H, Dong S, Tsai YY, Fang J, Law J, Di X, Liu WM, Yang G, Liu G, Huang J, Kennedy GC, Ryder TB, Marcus GA, Walsh PS, Shriver MD, Puck JM, Jones KW, Mei R (2004) Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res 14:414–425. [PMC free article] [PubMed] [Cross Ref]10.1101/gr.2014904
Middleton FA, Pato MT, Gentile KL, Morley CP, Zhao X, Eisner A, Brown A, Petryshen TL, Kirby AN, Medeiros H, Carvalho C, Macedo A, Dourado A, Coelho I, Valente J, Soares MJ, Ferreira CP, Lei M, Azevedo MH, Kennedy JL, Daly MJ, Sklar P, Pato CN (2004) Genomewide linkage analysis of bipolar disorder by use of a high-density single-nucleotide-polymorphism (SNP) genotyping assay: a comparison with microsatellite marker assays and finding of significant linkage to chromosome 6q22. Am J Hum Genet 74:886–897. [PMC free article] [PubMed]
Oliphant A, Barker DL, Stuelpnagel JR, Chee MS (2002) BeadArray technology: enabling an accurate, cost-effective approach to high-throughput genotyping. Biotechniques Suppl: 56–58, 60–61. [PubMed]
Phillips MS, Lawrence R, Sachidanandam R, Morris AP, Balding DJ, Donaldson MA, Studebaker JF, et al (2003) Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots. Nat Genet 33:382–387. [PubMed] [Cross Ref]10.1038/ng1100
Schaid DJ, McDonnell SK, Wang L, Cunningham JM, Thibodeau SN (2002) Caution on pedigree haplotype inference with software that assumes linkage equilibrium. Am J Hum Genet 71:992–995. [PMC free article] [PubMed]
Williams JT, Blangero J (1999) The power of variance component linkage analysis to detect quantitative trait loci. Ann Hum Genet 63:545–563. [PubMed] [Cross Ref]10.1046/j.1469-1809.1999.6360545.x
Wilson AF, Sorant AJ (2000) Equivalence of single- and multilocus markers: power to detect linkage with composite markers derived from biallelic loci. Am J Hum Genet 66:1610–1615. [PMC free article] [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...