• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of ajhgLink to Publisher's site
Am J Hum Genet. Mar 2003; 72(3): 598–610.
Published online Feb 13, 2003. doi:  10.1086/368203
PMCID: PMC1180236

Undetected Genotyping Errors Cause Apparent Overtransmission of Common Alleles in the Transmission/Disequilibrium Test

Abstract

The transmission/disequilibrium test (TDT), a family-based test of linkage and association, is a popular and intuitive statistical test for studies of complex inheritance, as it is nonparametric and robust to population stratification. We carried out a literature search and located 79 significant TDT-derived associations between a microsatellite marker allele and a disease. Among these, there were 31 (39%) in which the most common allele was found to exhibit distorted transmission to affected offspring, implying that the allele may be associated with either susceptibility to or protection from a disease. In 27 of these 31 studies (87%), the most common allele appeared to be overtransmitted to affected offspring (a risk factor), and, in the remaining 4 studies, the most common allele appeared to be undertransmitted (a protective factor). In a second literature search, we identified 92 case-control studies in which a microsatellite marker allele was found to have significantly different frequencies in case and control groups. Of these, there were 37 instances (40%) in which the most common allele was involved. In 12 of these 37 studies (32%), the most common allele was enriched in cases relative to controls (a risk factor), and, in the remaining 25 studies, the most common allele was enriched in controls (a protective factor). Thus, the most common allele appears to be a risk factor when identified through the TDT, and it appears to be protective when identified through case-control analysis. To understand this phenomenon, we incorporated an error model into the calculation of the TDT statistic. We show that undetected genotyping error can cause apparent transmission distortion at markers with alleles of unequal frequency. We demonstrate that this distortion is in the direction of overtransmission for common alleles. Therefore, we conclude that undetected genotyping errors may be contributing to an inflated false-positive rate among reported TDT-derived associations and that genotyping fidelity must be increased.

Introduction

Much attention has recently been paid to methods for predicting genotyping error rates, studying types of errors that can be expected with various technologies, detecting the presence of such errors, predicting the effect of undetected errors on genetic analysis, and developing analytical methods that are robust to such errors. Inconsistencies in genotype data can occur for many reasons, including sample mix-up, pedigree errors such as nonpaternity or incorrectly specified relationships, technician error, and technology failure. In the context of a genome scan or in the analysis of many unlinked candidate genes or of haplotypes, misspecified relationships can usually be detected. In this analysis, we focus only on random errors that have occurred because of imperfect genotyping technologies and manual mistakes that result in an incorrect genotype call at a single marker.

Different types of errors are encountered with different types of markers. For example, with single-nucleotide polymorphisms (SNPs), heterozygotes are believed to be more difficult to call than homozygotes, leading to allelic dropout (Cutler et al. 2001). This is because SNP genotyping technology often involves carrying out two reactions per locus. If an individual is heterozygous and one of the two reactions fails, the individual will appear homozygous. Microsatellite genotypes, on the other hand, are prone to both call errors and missed alleles (Ewen et al. 2000; Sobel et al. 2002). Call errors can result from errors in sample loading, low fluorescence, bleedthrough fluorescence from other markers run in the same lane, or inconsistencies in genotype scoring, particularly when alleles are called by eye. In addition, errors can be introduced by repeat expansion during PCR (Clarke et al. 2001), particularly for dinucleotide repeats.

A fraction of genotyping errors can be detected by checking for the presence of deviations from Mendelian inheritance in families. Rates of genotyping error detection through Mendelian inheritance vary with pedigree structure, the number of alleles at a marker, and the frequencies of the alleles; errors can be detected more easily in highly polymorphic microsatellite markers than in SNPs (Douglas et al. 2002). Through simulation, true error rates in SNPs have been shown to be approximately three to four times the Mendelian-detectable error rate (Gordon et al. 1999). Additional errors can be detected by investigating apparent double recombinants within small regions when multipoint data are available and haplotypes can be constructed. However, extended pedigrees are often not available, making haplotype construction difficult and reducing the chances of detecting an error through Mendelian analysis, as well. Even with highly polymorphic microsatellite loci, many errors will be undetectable—particularly errors in parental genotypes in trios and errors in any family member when data are only available from one parent (Douglas et al. 2002). As trends in the study of complex diseases are shifting away from the collection of large family pedigrees and toward the study of smaller family units such as sibling pairs or even toward unrelated individuals (Risch and Merikangas 1996), Mendelian checking is becoming less useful. Consequently, in such analyses, more undetected errors remain in the data. One way of handling such invisible errors is to treat each genotype as uncertain, making use of multipoint sibling pair data to compute the posterior probability of a genotyping error at each marker for each sib pair (Douglas et al. 2000). Another method involves the construction of all possible haplotypes and computation of the posterior probability for each pedigree, depending on user-specified allele frequencies, map distances, and penetrances (Sobel et al. 2002). This method is more powerful than the traditional selection of a single, best-fitting haplotype in the identification of potential double recombinants. Finally, an error model can be directly incorporated into a likelihood-based LOD score calculation (Lincoln and Lander 1992) or linkage disequilibrium (LD) analysis (Gordon et al. 2001).

Even if undetected errors are random, their effects are often nonrandom. In parametric linkage analysis, the presence of undetected genotyping errors inflates the apparent recombination fraction, resulting in increased apparent map distance (Buetow 1991; Shields et al. 1991), incorrect marker ordering, loss of power, and increased false-negative rate. Multipoint linkage analysis is affected by undetected errors to a greater degree than two-point linkage analysis. A significant two-point LOD score that becomes nonsignificant under multipoint linkage is often treated as a two-point false-positive result, when it may in fact be a multipoint false negative (Goring and Terwilliger 2000).

Less has been written about the effects of undetected genotyping error on association and LD analyses than about its effects on linkage analysis. In traditional case-control analysis, a loss of power can result from the presence of undetected genotyping errors (Gordon and Ott 2001). Apparent background LD rates are also affected by such errors. The direction and magnitude of the effect depends on allele frequencies, true haplotype frequencies, the measure of LD employed, and the genotyping error rate. Error rates in the range of 3% can have substantial effects on two-point measures of background LD between SNPs (Akey et al. 2001). Most relevant to the current study, some recent attention has been focused on inflated transmission/disequilibrium test (TDT) false-positive rates resulting from undetected genotyping errors (Gordon et al. 2001; Heath 1998).

The TDT is a family-based test for linkage in the presence of LD (Spielman et al. 1993). Its appeal lies in its simplicity and its robustness to population stratification (Ewens and Spielman 1995). In its most simple form, the TDT is carried out by counting the number of times a particular allele is transmitted and the number of times it is not transmitted from heterozygous parents to affected children. The null hypothesis is that there is no association between the allele and the disease under study, in which case transmission and non-transmission should be equally likely. Significant deviation from Mendelian transmission is assessed via the McNemar χ2 test statistic:

equation image

where t and u are the transmitted and untransmitted counts, respectively, for the allele under investigation. We have determined that the presence of undetected genotyping errors can lead to substantial inflation of the false-positive rate of this basic TDT and that the bias leads to a particular phenomenon, namely the apparent overtransmission of common alleles.

How could undetected genotyping error lead to inflation of the TDT statistic? If the true genotypes of a father, mother, and child in a trio are A1A1, A1A2, and A1A1, respectively, the father will not be included in the TDT calculation, as he is homozygous for allele A1. However, if his genotype is miscalled as A1A2, the error is Mendelian consistent and he will be included in the analysis, inflating the transmitted count for allele A1 and the untransmitted count for allele A2. If all families are trios, as is common in studies using the TDT, a homozygous parent miscalled as a heterozygote will never be detected through Mendelian checking. Genotype errors in children are easier to detect than errors in parents (Douglas et al. 2002), but complete detection is by no means possible.

We believe that genotyping error rates are high enough to substantially affect the TDT. Although the overall error rate for a particular data set may be low, the errors may not be distributed evenly across all markers. Because the TDT is performed for each marker individually, the mean error rate across all markers is not the critical statistic. Rather, the variance in error rates between markers must be considered. For example, a global error rate of 1% across 10 markers could be the result of a 1% error rate at each marker or nine markers could be error-free and one marker could have a 10% error rate. We explore this issue and present evidence that, in practice, the latter scenario is more likely. We assert that in any set of markers typed it is possible that there exists a subset of markers with genotyping error rates high enough to produce false-positive TDT results.

Undetected genotyping errors have been shown, via simulation, to lead to inflated TDT false-positive rates for biallelic markers (Gordon et al. 2001). Here, we present a mathematical model for quantifying the effect of the presence of undetected genotyping errors on the false-positive rate of the TDT. The model can be applied to biallelic loci, as well as to highly polymorphic microsatellites and it allows for the specification of a variety of error models, as described by Sobel et al. (2002). Consistent with the simulations of Gordon et al. (2001), our model predicts several characteristics of the effects of the errors. (1) In the presence of undetected genotyping errors, alleles of unequal frequency appear to be transmitted in a distorted manner—that is, there is an apparent deviation from Mendelian transmission that is not due to linkage to a disease-causing allele. (2) The apparent bias in transmission is in a specific direction: common alleles appear to be overtransmitted to affected children, which seems to increase susceptibility to the disease, and rare or midfrequency alleles appear to be undertransmitted, which seems to be protective. (3) Increasing the number of trios included in the analysis linearly increases the magnitude of the apparent distortion. (4) Only a fraction of genotyping errors are detectable by Mendelian checking, with only 25%–33% detectable in SNPs and 40%–60% detectable in highly polymorphic microsatellites. These characteristics suggest that many reported TDT-derived associations between diseases and marker alleles may be false positives.

Methods

In this section, we present three of many possible genotyping error models and develop a method for estimating the size and direction of the effects of undetected errors on the TDT. This is done by calculating expected transmitted and untransmitted counts of each allele at a given locus from heterozygous parents to affected offspring in the presence of undetected errors. Expected transmitted and untransmitted counts are obtained through several simple manipulations of four matrices. Two of these four matrices delineate the genotyping error model by specifying the conditional probability of observing a particular genotype, given a particular true genotype. This is done for both parents and children. It is in these two matrices that our method provides the flexibility to explore any desired error model. The third matrix specifies the relationship between true parental genotypes and true child genotypes—that is, Mendelian inheritance. The fourth matrix gives the expected frequency of each possible combination of parental genotypes under the assumption of Hardy-Weinberg equilibrium. The four matrices and the error models are described in greater detail below.

Error Models

In our analyses, we maintain distinct allele order within genotypes—that is, genotype A1A2 ≠ genotype A2A1. This simplifies computer generation of matrices, particularly in the multiallelic case, as permutations of genotypes do not need to be considered within a single matrix element.

We consider three simple genotyping error models. The first is a random error model (Gordon et al. 2001; Sobel et al. 2002), under which each allele has an equal probability, e, of being miscalled and the call is equally likely to be any of the other alleles. The second model is a proportional error model (Douglas et al. 2002; Sobel et al. 2002), under which each allele has an equal probability, e, of being miscalled and the call is made in proportion to the frequencies of the other alleles at that marker—that is, allele i is miscalled with probability e and, given that allele i has been miscalled, it will be called as allele j with probability pj/(1−pi). The third model is an allelic dropout model, in which the only source of error is miscalling a heterozygote as either homozygote.

Matrix Descriptions

M, the “true-parent/true-child” or Mendelian inheritance matrix, is of size s4 × s2, where s is the number of alleles at the marker under observation. The rows in M represent all possible parental genotypes (AiAj × AkAl; 1[less-than-or-eq, slant]i,j,k,l[less-than-or-eq, slant]s) and the columns in M represent all possible child genotypes (AmAn; 1[less-than-or-eq, slant]m, n[less-than-or-eq, slant]s). Element Mxy contains the conditional probability that a child produced by parents with true mating type in row x has the true genotype specified in column y. For example, row A1A2 × A2A2 contains elements 0, 1/2, 0, 1/2, corresponding to the probability of parents A1A2 and A2A2 having a child with genotype A1A1, A1A2, A2A1 or A2A2, respectively.

C, the “true-child/observed-child” genotype matrix, is of size s2 × s2. The rows of the matrix represent all possible true child genotypes and the columns represent all possible observed child genotypes. Element Cxy gives the conditional probability of observing a child’s genotype as that of column y, given that the child’s true genotype is that of row x. Under the random error model, elements of C will be of the form eN(1-e)2–N, where N is the number of alleles that have been miscalled.

P, the “true-parent/observed-parent” matrix, is of size s4 × s4. The rows in P represent all possible true parental genotypes, and the columns in P represent all possible observed parental genotypes. Each element in the matrix, Pxy, is the conditional probability of observing the parental genotypes in column y, given the true parental genotypes in row x. Under the random error model, elements of P will be of the form eN(1-e)4-N. For example, the element of P that corresponds to row A1A1 × A1A2 and column A1A1 × A2A2 would be e(1-e)3, as three alleles have been called correctly and one has been miscalled. The error parameters in P and C can be tailored to describe any desired error model. This allows the specification of a model that is specific to the genotyping technology employed and the type of marker utilized (Sobel et al. 2002).

Let H, the Hardy-Weinberg proportion matrix, be a diagonal s4 × s4 matrix. Each element, Hxy, where x=y, is the expected population frequency of the true parental genotypes represented by row x, under Hardy-Weinberg equilibrium. For example, with a biallelic marker, the nonzero element in row A1A1 × A1A2 would be p3(1-p), where p is the frequency of the A1 allele. Because genotypes A1A2 and A2A1 are considered as separate entities, permutations do not need to be included within a single matrix element. Thus, the frequency of A1A2 in this example is p(1-p), rather than 2p(1-p). Note that deviations from Hardy-Weinberg due to inbreeding or other causes can be incorporated into H.

Calculations

Taking the product of P, the true-parent/observed-parent genotype matrix, with H, the Hardy-Weinberg proportion matrix, produces an s4 × s4 matrix, F, containing elements giving the frequency of each possible true-parent/observed-parent genotype pair under the assumption of Hardy-Weinberg equilibrium:

equation image

Transposing F and taking the product with M, the Mendelian inheritance matrix relating true parent and true child genotypes, yields an s4 × s2 matrix, T, containing the expected frequency of each possible observed-parent/true-child genotype triple:

equation image

Taking the product of T, the observed-parent/true-child genotype matrix, with C, the true-child/observed-child genotype matrix, yields D, an s4 × s2 matrix containing the expected frequencies of all possible observed-parent/observed-child triples. In other words, D contains the data that would be expected experimentally under the error model specified in P and C, with population allele frequencies specified in H:

equation image

Matrix D has the same layout as M, except that the rows in D represent observed parental genotype pairs and columns represent observed child genotypes, rather than true parent and true child genotypes, as found in M. Since D contains the expected frequencies of all possible observed parent-child trio genotypes, it can be used to compute a predicted bias in the TDT statistic under any genotyping error model and to estimate the fraction of all genotyping errors that would be detectable by checking for Mendelian inconsistencies in trios. The cells containing zeros in M contain the frequencies of Mendelian detectable errors in D, as they represent parent-child genotype trios that are inconsistent with the rules of Mendelian inheritance. The sum of these elements in D is the expected frequency of Mendelian-detectable genotyping errors, E.

To determine the expected fraction of errors that would be detectable through Mendelian checking, the total expected frequency of trios with genotyping errors can be calculated and compared to E, the expected frequency of Mendelian-detectable errors. Under the random error model, the probability that a trio contains no genotyping errors is (1−e)6. The probability that a trio contains at least one error is then 1−(1−e)6. Thus, the expected fraction of errors that could be detected through Mendelian checking is:

equation image

The TDT Statistic

Since each element in D gives the probability of observing a particular parent-child genotype combination, D can be multiplied by n, the total number of trios in the data set, to give expected counts for each type of trio. It is then possible to determine the number of trios that would contribute to the TDT statistic and their relative contributions. All Mendelian-inconsistent genotypes typically are removed from a data set prior to any analysis. Under this protocol, the trios represented in the calculation of E (trios with Mendelian errors) would be removed and only the remaining Mendelian-consistent trios would be considered in the calculation of the TDT statistic. Depending upon the genotype of the child, trios with one parent heterozygous for allele A1 would each contribute one count to the transmitted or untransmitted tally of A1, while trios with two parents heterozygous for A1 would contribute either two transmitted counts, two untransmitted counts or one of each. Using these expected transmitted and untransmitted counts for A1, it is possible to compute the expected bias in the TDT statistic under the error model employed and subsequently to calculate the false-positive rate under the specified model.

Let h be the number of observed heterozygous A1A2 parents included in the computation of the TDT statistic and let t and u be the observed transmitted and untransmitted counts, respectively, of the A1 allele from these heterozygous parents to affected children, such that t+u=h. Under the null hypothesis (H0), there is assumed to be no linkage, no LD, no genotyping error, and, consequently, no bias in transmission. Thus, under H0, the probability of transmission of the A1 allele, p0, is equal to 0.5. If t ~ Binomial(h,p0), its expected value is hp0=h/2, with variance hp0(1-p0)=h/4. Then,

equation image

the familiar McNemar χ2 test statistic employed in the TDT.

Under the alternative hypothesis (HA), the assumptions of no linkage and no LD still hold, but genotyping error is allowed. The probability of transmission of the A1 allele under HA is

equation image

where epsilon is the deviation of p0 from 0.5, or the bias in transmission of allele A1. This leads to t ~ Binomial(h,pA), with expected value equation M1 and variance equation M2.

To obtain the distribution of the TDT statistic under the alternative hypothesis, consider the transformation

equation image

Under HA,

equation image

and

equation image

So, equation M3, when epsilon is small. Therefore, Z2~Noncentral χ21 with noncentrality parameter λ=[E{Z}]2=4epsilon2h. The McNemar χ2 test statistic, derived above, is equivalent to Z2, as shown below:

equation image

Therefore, under HA, the TDT statistic follows a noncentral χ21 distribution with noncentrality parameter λ = 4epsilon2h.

Note that if the observed epsilon is assumed to equal its expected value,  t/t+u- 1/2, λ can be expressed in a more intuitive manner. Substituting t + u for h and  t/t+u- 1/2 for epsilon in λ=4epsilon2h yields

equation image

which is the TDT test statistic using observed values of t and u. When t=u, λ=0 and when tu, λ represents the magnitude of the bias introduced into the TDT statistic by the presence of undetected genotyping errors.

We used D to determine expected transmitted and untransmitted counts for varying allele frequencies, number of alleles, sample sizes and error rates. Using these expected counts as t and u, we calculated λ, the noncentrality parameter, and then used a noncentral χ21 distribution to compute the false-positive rates of the TDT statistic under the same parameters. This is analogous to the most common application of the noncentral χ2 distribution, as we have used it to compute the power to detect a shift from λ=0 (under H0) to a nonzero λ (under HA). For multiallelic markers, each allele was taken individually against all other alleles grouped together.

Literature Searches

Using PubMed searches for “TDT + association,” “TDT + susceptibility,” “transmission/disequilibrium,” and “transmission disequilibrium,” we compiled a list of studies that used the basic TDT to demonstrate significant (P<.05) association between a marker allele and a disease. Studies that considered haplotypes rather than single markers and studies that applied a global TDT (i.e., those that considered all alleles simultaneously, rather than each allele individually against all others as a group) were excluded. The collection was then narrowed to include only studies available online or in the Johns Hopkins University School of Medicine library. Finally, only those that provided allele frequencies for the marker in question or those for which marker allele frequencies could be located for an ethnically matched control population were analyzed. Given allele frequencies in both the study and a control population, the control population frequencies were chosen. Case-control studies were compiled by searching PubMed for “association + case + control + genetic” and “microsatellite + case + control.” Only analyses of microsatellite or SNP marker alleles were compiled. Articles not available online were excluded. A complete list of all TDT and case-control studies can be found in appendix A (online only).

Estimated genotyping error rates for collected TDT studies were obtained by using the sample size and significance level to compute the lowest value of λ that would be required to render the reported TDT statistic nonsignificant. Our random error model was then used to determine the magnitude of genotyping error that would generate the calculated λ value for the number of trios included in the analysis.

Results

Impact of Allele Frequency on Transmission Distortion

Table 1 shows λ, the magnitude of the expected bias in the TDT statistic, and α, the expected false-positive rate under the random error model for varying genotyping error rates, numbers of trios and allele frequencies for a biallelic marker. Results are identical for the proportional error model. Table 2 gives α for an eight-allele marker, with one common allele, one midfrequency allele, and six equally frequent rare alleles under the random error model and the proportional error model. For both types of markers and both error models, we found an increasing false-positive rate with increasing genotyping error rate, sample size, and difference in frequency between alleles at the marker. In all cases, the most common allele appears to be overtransmitted. Under the random error model, the rarest alleles at a multiallelic marker appear to be undertransmitted, whereas alleles of midfrequency are unaffected. Under the proportional error model, midfrequency alleles sometimes appear to be undertransmitted, and rare alleles are almost always unaffected.

Table 1
Expected Transmitted and Untransmitted Counts for Major Allele at a Biallelic Marker and Expected TDT False-Positive Rate for P<.05
Table 2
Expected TDT False-Positive Rates for P<.05 at an Eight-Allele Locus, Testing One Allele Against All Others as a Group

Under the allelic dropout model, common alleles appear to be overtransmitted and rare alleles appear to be undertransmitted, but error rates must be approximately twice as high as with the random error model to see the same magnitude of apparent distortion (data not shown).

Table 3 gives the total number of expected errors and the expected rate of error detection in trios for varying allele frequencies and error rates under the random error model for multiallelic markers. The detectable error rate ranged from 25% to 33% for SNPs (data not shown), and from 40% to 60% for multiallelic markers, with increasing rates of detection with increasing major allele frequency and decreasing error rates.

Table 3
Expected Fraction of Mendelian-Detectable Errors at an Eight-Allele Marker with One High-Frequency Allele, One Midfrequency Allele, and Six Equally Frequent Rare Alleles under the Random Error Model

TDT in the Literature

PubMed searches yielded 189 studies that used the TDT to demonstrate significant LD between a single marker allele and a disease. Significance was assessed at the 0.05 level, without adjustment for multiple testing. Of these 189 studies, 23 used a likelihood-based or global TDT, 24 did not provide allele frequencies, 6 analyzed a quantitative trait, 32 were not available online or in the Johns Hopkins University medical library, and 5 were not in English. In the remaining 99 studies, 79 microsatellite, 18 variable number of tandem repeats (VNTR), 8 human leukocyte antigen (HLA), and 29 SNP alleles and 4 microdeletions were found to exhibit significantly biased transmission to affected offspring. Table A (online only) provides allele frequencies, transmitted and untransmitted counts, and TDT scores for the SNPs and multiallelic markers in the 99 studies.

Table A
Frequency of Distorted Allele and TDT Score for Multiallelic Markers and SNPs, Respectively

Figure 1 shows a histogram of frequencies of overtransmitted microsatellite alleles in controls from TDT studies in the literature, as compared to frequencies of all microsatellite alleles in the Marshfield Clinic Human Diversity Panel. The difference between the distributions is highly significant (P<1.0×10-11, Kolmogorov-Smirnov Test).

Figure  1
Distribution of frequencies of overtransmitted microsatellite alleles among controls in TDT studies from the literature (striped bars) and of all microsatellite alleles in the Marshfield Clinic Human Diversity Panel (solid bars). The difference between ...

TDT versus Case-Control

Of the 79 identified transmission distortions of microsatellite alleles, 31 involved the most common allele at the locus. Of these 31 transmission distortions of the most common allele, 27 (87%) found it to be overtransmitted to affected offspring, implying that the most common allele may be in LD with a disease-causing mutation. Of the 29 identified associations between SNP alleles and disease, 17 (59%) were between the most common allele and susceptibility to the disease. To assess the significance of the tendency of the TDT to identify the most common allele as associated with susceptibility to disease, rather than protection from disease, we compared these results to those of case-control analyses. Of 92 significant case-control associations involving microsatellite alleles, 37 implicated the most common allele as conferring either protection or risk. Of these, 12 (32%) found the most common allele to be enriched in cases relative to controls (a risk allele). Of 182 case-control studies involving SNPs, 57 (31%) found the common allele to be enriched in cases relative to controls. The difference between the TDT and case-control results, for both microsatellites and SNPs, is highly significant (P=3.6×10-6, Fisher’s exact test). These results are summarized in table 4 and figure 2.

Figure  2
Breakdown of compiled microsatellite association studies by study type (TDT or case-control), allele implicated (most common or not), and nature of allele (risk or protective).
Table 4
TDT versus Case-Control Studies in the Literature[Note]

Error Rate Estimation

In addition to the 31 identified distortions of the most common allele at a microsatellite marker, we found 9 distortions of the most common allele at an HLA marker or VNTR, for a total of 40 associations between disease status and the most common allele at a multiallelic marker. For 29 of these 40 associations, frequencies were available or could be approximated for the remaining alleles at the marker. Using our random error model, we estimated the genotyping error rates that would be required to render these 29 results insignificant. We found that genotyping error, as we have modeled it, could not explain the significance of 10 of the 29 results. All 10 of these studies contained <200 trios. Of the results that could be explained by a random genotyping error model, four would require an allelewise error rate of 10% or higher, eight would require 4%–8%, and the remaining seven significant results could be explained by genotyping error rates of 3% or less. On average, an allelewise genotyping error rate of 5.5% would create a noncentrality parameter large enough to bring the true P values above .05. Error rate estimation was also performed on the 17 associations between the most common SNP allele and disease susceptibility. Genotyping error, as we have modeled it here, could not account for four of the associations, and the remainder could be explained by an average allelewise error rate of 3.2%. These results are presented in table B (online only).

Table B
Allelewise Error Rates Required to Account for Significant TDT Results, Using the Random Error Model, for Multiallelic Loci and SNPs, Respectively[Note]

To understand the magnitude of error rates in practice, we compiled a list of microsatellite genotyping error rate estimates from the literature. The results are summarized in table 5.

Table 5
Estimates of Microsatellite Genotyping Error Rates

Discussion

The apparent overtransmission of common alleles observed in the literature could have several explanations, each accounting for a fraction of the total trend. First, the associations could be real and the distorted allele could be the disease-causing mutation, lending support to the common disease-common allele hypothesis. From a population genetics standpoint, however, this seems unlikely, as there will be selection against any allele that reduces reproductive fitness by even a small amount. To have a disease-causing allele reach a frequency of a few percent in the population would require that the allele also confer some selective advantage. For a disease-causing allele to become the most common allele in a population would be highly unusual. For this to happen as a rule, rather than as the exception, is unreasonable. If the diseases in the collected studies were uniformly diseases of late-onset, it might be easier to imagine that historically there was less selective pressure against them. However, the collected studies cover a wide range of diseases, with 55% of them occurring primarily in childhood, making this scenario unlikely to explain many of the compiled associations. Alternatively, the distorted marker allele may be in LD with the disease-predisposing allele. In this case, the probability that a mutation occurs on any given allele may be proportional to the frequency of that allele, resulting in association between the frequency of an allele and over- or undertransmitted status. Finally, genotyping error rates may be substantial enough to be significantly contributing to the false-positive rate of the TDT and, consequently, strengthening the association between allele frequency and over- or undertransmission. Table 4 and figures figures11 and and22 can help distinguish between the latter two possibilities.

As figure 1 shows, the frequency distribution of overtransmitted microsatellite alleles among controls in TDT studies is significantly different than the frequency distribution of all microsatellite alleles in the Marshfield Clinic Human Diversity Panel (P<1.0×10-11). If the tendency of the most common microsatellite allele to appear associated with increased risk of disease were due entirely to LD between the marker allele and the disease-predisposing allele, we might expect the two frequency distributions in figure 1 to be more similar. The marked difference between the two distributions implies that the TDT finds common alleles to be disproportionately overtransmitted.

We also found significant differences between the TDT and case-control studies, as depicted in table 4 and figure 2. Presumably, there is no difference in biology between associations that could be detected using the TDT and those that could be detected using case-control methods. The processes of mutation, recombination, natural selection, and genetic drift that influence patterns of LD should affect the TDT and case-control methods equally. Thus, the expectation is that the ratio of over- to undertransmission of the most common allele at a marker would be similar for both tests. However, as table 4 shows, this is not what is observed. Given that the most common allele at a microsatellite marker has been associated with disease status, the TDT finds it to be a risk factor 87% of the time, whereas case-control finds it to be a risk factor 32% of the time. The trend for SNPs is similar, with the most common allele being implicated as a risk factor 59% of the time with the TDT and 31% of the time with case-control. The difference between the TDT and case-control results is highly significant (P=3.6×10-6, Fisher’s exact test). It is not clear what effect differences in power to detect rare versus common alleles may have in a comparison between TDT and case-control results; further exploration of this issue is warranted. However, as figure 2 shows, there appears to be no difference in ascertainment rate of the most common allele, as 39% and 40% of significant microsatellite TDT and case-control results, respectively, involve the most common allele. We believe that the tendency of the TDT to identify the most common allele as a risk factor is a direct effect of the presence of undetected genotyping errors.

We used our random error model to estimate the magnitude of genotyping error that could account for the apparent overtransmission of common microsatellite alleles found in the literature. We found that approximately one-third of the results could not be due to random undetected genotyping error as we have modeled it (table B). Another third would require error rates in the range of 5%–10% per allele, or 10%–20% per genotype. The final third would require 0.3%–3% error per allele, or 0.6%–6% per genotype.

To assess whether these error rates are realistic, we returned to the literature, compiling all available analyses of microsatellite genotyping error rates. As table 5 shows, estimates in global mean error rates range from 0.55% (Hall et al. 1996) to 5.9% (Brzustowicz et al. 1993) per genotype. However, because the TDT is computed on a per-marker basis, one must consider both the mean error rate across all markers and the variance in error rates between markers when estimating true error rates. Several of the studies listed in table 5 address this issue. Brzustowicz et al. (1993) examined four markers on chromosome 5 and found per-marker error rates of 0.7%, 2.7%, 3.4%, and 6.8%. If only the mean error rate across these markers (3.0%) were reported, the high error rate at the last marker would not be evident. Weeks et al. (2002) compared genotype data generated by two high-throughput genotyping centers, the Center for Inherited Disease Research (CIDR) (National Human Genome Research Institute, National Institutes of Health, Bethesda, MD) and the Mammalian Genotyping Service (MGS) (Marshfield Clinic, Marshfield, WI). A total of 321 markers were typed by both centers in 30 samples. Of the 321 markers, 276 showed approximately one-to-one correspondence in allele size between centers (0% discordance). Among the remaining 45 markers, there were 116 discordant pairs of alleles. Interestingly, the discordancies were not distributed evenly across the 45 markers; 50 of the 116 discordancies occurred at 3 markers, resulting in per-marker discordance rates >50% at these markers. Fourteen markers showed discordance rates of 3.3%–17%, and 28 markers showed discordance rates of 3.3% for a global mean discordance rate of 1.3%. Again, the global mean does not reflect the high variance between markers. Finally, in an analysis of 50 markers on three chromosomes, Sobel et al. (2002) observed a large variance in per-marker error rates. A global mean error rate of 5.6% included one marker with a 29% error rate.

It is evident that genotyping error rates are not uniform across markers; some markers show error rates that are many times higher than those shown by others. The erratic behavior of a subset of markers could be a problem particularly when genotyping is performed by an individual laboratory, rather than a high-throughput center, as the markers included in high-throughput panels have been studied extensively. They have specifically been selected for inclusion because they produce highly replicable genotypes, and their individual PCR reactions have been optimized (CIDR, MGS). In addition, large-scale genotyping centers are highly practiced, calling millions of genotypes per year. These centers report global mean error rates on the order of 0.5% per genotype for blind duplicates run on the same gel. Since few of the studies included in our analyses outsourced their genotyping to such centers, it is not difficult to imagine that they would have mean error rates that are several times higher than those of the large centers, particularly when gel-to-gel and technician-to-technician variation are included in the error rate calculation. More importantly, there is no reason to expect that the studies included in our analyses would have variances in error rates between markers that would be any smaller than those listed in table 5. Therefore, we believe that in practice genotyping error rates are sufficiently high to bias the TDT, inflating the false-positive rate and causing common alleles to appear to be overtransmitted.

We recommend several strategies for coping with the TDT’s inflated false-positive rate associated with undetected genotyping error. First, an association between a common allele and disease susceptibility or a rare allele and protection, identified thorough the TDT, should be treated with caution. More confidence can be placed in an association in the opposite direction than the predicted bias. For an association in the predicted direction, confidence can be increased by repeat genotyping, analysis of nearby markers, use of a haplotype rather than an individual marker, or use of a functional assay in the case of a SNP. Second, we recommend regenotyping all individuals at any marker showing significant transmission distortion. For SNPs, the regenotyping should be carried out using a different technology than was originally employed. If the same genotypes are obtained through both methods, one can be more confident in the association. Third, a likelihood-based TDT method that is robust to undetected genotyping errors (Gordon et al. 2001) should be used in conjunction with the standard TDT. Fourth, a more stringent cutoff for significance could be employed, on the basis of the frequency of the allele under study and the sample size. Fifth, investigators should explore the variance in genotyping error rates across the markers in their own data sets. This is particularly important if many markers are considered, as the error rate at an individual marker will have a much smaller effect on the mean error rate in a large pool of markers than in a small one. Every effort should be made to ensure that the error rate is under 0.5% at each marker and to identify markers with higher error rates. If a significant association is identified through any type of analysis at a marker with a higher error rate, it should raise suspicion.

Our analyses underscore the need to continue to improve data quality through the pursuit of increasingly accurate genotyping technology. Highly accurate genotype data will become increasingly critical as human geneticists attempt genomewide association studies using thousands of SNPs in an effort to identify genes of increasingly smaller effect in the etiology of complex traits.

Acknowledgments

This work was supported by a grant from the National Institutes of Health (5U10HL54466-08). A.A.M. was supported by the Predoctoral Training Program in Human Genetics and Molecular Biology at The Johns Hopkins University School of Medicine (5T32GM07814).

Appendix: Results of Literature Search

Results of TDT Literature Search

Am J Epidemiol 2001 Apr 15;153(8):794–8

Am J Hum Genet 1994 Nov;55(5):932–6

Am J Hum Genet 1993 Mar;52(3):506–16

Am J Hum Genet 1995 Aug;57(2):257–72

Am J Hum Genet 1996 Feb;58(2):363–70

Am J Hum Genet 1996 Sep;59(3):644–52

Am J Hum Genet 1997 Aug;61(2):354–62

Am J Hum Genet 1998 Dec;63(6):1767–76

Am J Hum Genet 1998 May;62(5):1077–83

Am J Hum Genet 1998 Oct;63(4):1139–52

Am J Hum Genet 1999 Apr;64(4):1096–109

Am J Hum Genet 2001 Sep;69(3):544–52

Am J Med Genet 2001 May 8;105(4):375–80

Am J Med Genet 2001 May 8;105(4):381–6

Am J Med Genet 2002 Mar 8;114(2):150–3

Am J Med Genet 1995 Oct 9;60(5):465–7

Am J Med Genet 1997 Dec 19;73(3):337–44

Am J Med Genet 1998 May 8;81(3):228–34

Am J Med Genet 2000 Dec 4;96(6):784–90

Am J Med Genet 2000 Dec 4;96(6):836–8

Am J Med Genet 2000 Jun 12;96(3):273–7

Am J Med Genet 2000 Jun 12;96(3):289–92

Am J Med Genet 2001 Apr 8;105(3):279–82

Am J Med Genet 2001 Dec 8;105(8):783–8

Am J Psychiatry 1999 May;156(5):768–70

Am J Respir Crit Care Med 2000 May;161(5):1655–9

Am J Respir Crit Care Med 2000 Sep;162(3 Pt 1):861–4

Ann Hum Genet 1998 Sep;62 ( Pt 5):379–96

Ann Hum Genet 2000 Jul;64(Pt 4):307–20

Arthritis Rheum 2001 Jan;44(1):239–40

Arthritis Rheum 2001 Jun;44(6):1261–5

Biochem Biophys Res Comm 1998 Oct 20;251(2):662–5

Biol Psychiatry 1999 May 1;45(9):1178–89

BMC Genet 2001;2(1):6

Br J Dermatol 2002 Apr;146(4):601–8

Cleft Palate Craniofac J 2002 Mar;39(2):149–56

Clin Exp Allergy 1998 Feb;28(2):151–5

Clin Exp Allergy 1998 Apr;28(4):449–53

Clin Exp Allergy 2002 Jan;32(1):93–6

Clin Exp Immunol 1998 Jul;113(1):28–32

Diabetes Care 2001 Jan;24(1):33–8

Diabetes 1997 Oct;46(10):1637–42

Diabetes 2000 Dec;49(12):2190–5

Diabetes 2000 May;49(5):876–8

Genes Immun 2001 Aug;2(5):263–8

Genomics 1998 Jul 15;51(2):177–81

Genomics 1997 Nov 15;46(1):159–62

Hum Genet 1997 Jan;99(1):22–6

Hum Genet 1998 Nov;103(5):540–6

Hum Mol Genet 1996 Jul;5(7):1071–4

Hum Mol Genet 1997 Aug;6(8):1275–82

Hum Mol Genet 1997 Dec;6(13):2233–8

Hum Mol Genet 1997 Jul;6(7):1003–10

Hum Mol Genet 1997 Jul;6(7):1011–6

Hum Mol Genet 2002 May 16;11(11):1281–9

Hypertension 1998 Feb;31(2):627–31

Immunogenetics 2000 Nov;52(1–2):107–11

J Allergy Clin Immunol 1998 Sep;102(3):443–8

J Allergy Clin Immunol 2001 Apr;107(4):654–8

J Am Acad Child Adol Psych 2000 Dec;39(12):1537–42

J Am Soc Nephrol 1999 Oct;10(10):2120–4

J Autoimmun 1996 Feb;9(1):97–103

J Clin Endocrinol Metab 2002 Jan;87(1):404–7

J Clin Invest 1999 Apr;103(8):1135–40

J Gastroenterol Hepatol 2000 Jul;15(7):771–4

J Med Genet 1998 Aug;35(8):632–6

J Neuroimmunol 1999 Aug 3;98(2):208–13

J Neuroimmunol 1999 Jun 1;97(1–2):182–90

J Neuroimmunol 2002 Apr;125(1–2):141–8

J Neurol Sci 1997 Mar 20;147(1):21–5

Kidney Int 2000 Aug;58(2):513–9

Kidney Int 2000 Feb;57(2):405–13

Mol Psychiatry 1999 Mar;4(2):192–6

Mol Psychiatry 2000 Jan;5(1):91–5

Mol Psychiatry 1998 May;3(3):270–3

Mol Psychiatry 2000 Jul;5(4):396–404

Mol Psychiatry 2000 Sep;5(5):531–6

Mol Psychiatry 2000 Sep;5(5):537–41

Mol Psychiatry 2001 Jan;6(1):109–11

Mol Psychiatry 2001 Jul;6(4):425–8

Mol Psychiatry 2001 Jul;6(4):434–9

Mol Psychiatry 2001 Mar;6(2):150–9

Mol Psychiatry 2001 Mar;6(2):243–5

Mol Psychiatry 2002;7(1):72–4

Mol Psychiatry 2002;7(1):82–5

Mol Psychiatry 2002;7(3):278–88

Mol Psychiatry 2002;7(3):302–10

Mol Psychiatry 2002;7(3):311–6

Nat Genet 1995 Jun;10(2):240–2

Nat Genet 1996 Aug;13(4):472–6

Nat Genet 2001 Feb;27(2):218–21

Neurology 2001 Nov 13;57(9):1555–60

Neurology 2002 Feb 26;58(4):658–60

Psychiatr Genet 1996 Fall;6(3):131–3

Psychol Med 1999 Sep;29(5):1249–54

Thorax 2000 Dec;55(12):1023–7

Tissue Antigens 1999 May;53(5):470–5

Tissue Antigens 2000 Jul;56(1):38–44

Tissue Antigens 2000 Oct;56(4):350–5

Results of Case–Control Literature Search, Microsatellites

Am J Hematol 2001 Nov;68(3):164–9

Am J Hum Genet 1998 Aug;63(2):557–68

Am J Hum Genet 1998 Sep;63(3):839–46

Am J Med Genet 1997 May 31;74(3):324–30

Am J Med Genet 1998 Mar 28;81(2):196–205

Am J Med Genet 1999 Apr 16;88(2):140–4

Am J Med Genet 1999 Aug 20;88(4):352–7

Am J Med Genet 2000 Jun 12;96(3):273–7

Am J Med Genet 2001 Mar 8;105(2):152–8

Am J Med Genet 2002 Apr 8;114(3):269–71

Am J Med Genet 2002 Mar 8;114(2):177–85

Am J Respir Cell Mol Biol 2000 Jun;22(6):672–5

Ann Epidemiol 2001 Aug;11(6):434–42

Arch Neurol 1998 Aug;55(8):1122–4

Arthritis Rheum 2000 Aug;43(8):1749–55

Arthritis Rheum 2001 Jun;44(6):1261–5

Biochem Biophys Res Commun 1998 Jun 18;247(2):452–6

Biochem Biophys Res Commun 2000 Jun 7;272(2):391–4

BMJ 2002 Jun 8;324(7350):1369

Br J Dermatol 2000 Mar;142(3):441–5

Chest 1999 Oct;116(4):880–6

Diabet Med 2000 Feb;17(2):111–8

Diabetes 2002 Jul;51(7):2313–6

Diabetes Care 2001 Apr;24(4):753–7

Eur J Epidemiol 2000;16(8):745–50

Eur J Hum Genet 1999 Feb–Mar;7(2):110–6

Genes Immun 1999 Sep;1(1):45–52

Genomics 1995 Aug 10;28(3):566–9

Hum Immunol 2002 Feb;63(2):121–8

Hypertension 1998 Oct;32(4):676–82

Hypertension 1999 Apr;33(4):1052–6

Hypertension 1999 Dec;34(6):1186–92

Hypertension 2000 Jan;35(1 Pt 1):135–43

Int J Cancer 2000 Jul 15;87(2):204–10

Int J Cancer 2001 Jul 20;95(4):271–5

Int J Obes Relat Metab Disord 1997 Nov;21(11):1032–7

J Clin Endocrinol Metab 2002 Jan;87(1):404–7

J Interferon Cytokine Res 1999 Sep;19(9):1037–46

J Mol Med 2001 Apr;79(2–3):109–15

J Neuroimmunol 1998 Jan;81(1–2):158–67

J Neuroimmunol 2000 Aug 1;108(1–2):153–9

J Neuroimmunol 2002 Apr;125(1–2):141–8

J Neurol Neurosurg Psychiatry 2001 Aug;71(2):262–4

J Neurol Sci 1997 Mar 20;147(1):21–5

J Neurol Sci 2002 Aug 15;200(1–2):43–8

Mol Psychiatry 2000 Sep;5(5):523–30

Mol Psychiatry 2001 Mar;6(2):150–9

Mol Psychiatry 2001 May;6(3):268–73

Mol Psychiatry 2001 May;6(3):311–4

Mov Disord 1999 Mar;14(2):219–24

Nephrol Dial Transplant 2000 Nov;15(11):1794–800

Scand J Gastroenterol 2001 Jul;36(7):766–70

Stroke 1999 Dec;30(12):2612–6

Tuber Lung Dis 1998;79(2):83–9

Results of Case-Control Literature Search, SNPs

AIDS 2002 Oct 18;16(15):2013–8

Am J Cardiol 2000 Jan 1;85(1):8–12

Am J Gastroenterol 2001 Jan;96(1):146–9

Am J Hum Genet 1998 Aug;63(2):557–68

Am J Hum Genet 2001 Mar;68(3):674–85

Am J Hum Genet 2002 Mar;70(3):781–6

Am J Hum Genet 2002 Sep;71(3):554–64

Am J Hypertens 2000 Jun;13(6 Pt 1):719–23

Am J Hypertens 2001 Dec;14(12):1201–4

Am J Ind Med 2002 Jul;42(1):29–37

Am J Med 2002 May;112(7):540–4

Am J Med Genet 1999 Oct 15;88(5):458–61

Am J Med Genet 2001 Apr 8;105(3):279–82

Am J Med Genet 2001 Dec 8;105(8):801–4

Am J Med Genet 1999 Sep 17;86(3):232–6

Am J Ophthalmol 2000 Dec;130(6):769–73

Am J Psychiatry 2002 Jan;159(1):23–9

Am J Respir Cell Mol Biol 2001 Sep;25(3):377–84

Am J Respir Crit Care Med 1998 Nov;158(5 Pt 1):1566–70

Am J Respir Crit Care Med 1999 Sep;160(3):1009–14

Am J Respir Crit Care Med 2001 May;163(6):1404–9

Am J Respir Crit Care Med 2002 Mar 1;165(5):690–3

Ann Clin Biochem 2002 Sep;39(Pt 5):478–81

Ann Epidemiol 1997 Jan;7(1):13–21

Ann Rheum Dis 2001 Aug;60(8):791–5

Ann Rheum Dis 2002 Jan;61(1):48–51

Ann Rheum Dis 2002 Mar;61(3):213–8

Ann Surg 2002 Feb;235(2):297–302

Arch Neurol 2000 Nov;57(11):1579–83

Arch Neurol 2001 Feb;58(2):209–13

Arq Neuropsiquiatr 2001 Mar;59(1):11–7

Arterioscler Thromb Vasc Biol 2000 Mar;20(3):892–8

Arthritis Rheum 2001 Mar;44(3):638–46

Arthritis Rheum 2002 Jun;46(6):1629–33

Atherosclerosis 2002 Apr;161(2):463–7

Biochem Biophys Res Commun 1999 Aug 2;261(2):332–9

Biochem Biophys Res Commun 2001 Feb 23;281(2):267–71

Biol Psychiatry 1997 Apr 1;41(7):827–9

Biol Psychiatry 2001 Jul 15;50(2):144–7

Biol Psychiatry 2002 May 1;51(9):762–5

Blood 1999 Jul 1;94(1):46–51

Blood 2001 Jul 1;98(1):36–40

Br J Cancer 2002 Jul 15;87(2):212–7

Br J Haematol 2002 Aug;118(2):477–81

Br J Haematol 2002 Feb;116(2):376–82

Cancer 2002 Mar 1;94(5):1443–8

Cancer Epidemiol Biomarkers Prev 2000 Feb;9(2):147–50

Cancer Epidemiol Biomarkers Prev 2000 Oct;9(10):1037–42

Cancer Epidemiol Biomarkers Prev 2001 Apr;10(4):355–60

Cancer Epidemiol Biomarkers Prev 2001 Mar;10(3):209–16

Cancer Epidemiol Biomarkers Prev 2001 Mar;10(3):217–22

Cancer Epidemiol Biomarkers Prev 2001 Sep;10(9):931–6

Cancer Epidemiol Biomarkers Prev 2002 Aug;11(8):730–8

Cancer Lett 2002 Mar 28;177(2):173–9

Cancer Res 2000 Sep 15;60(18):5017–20

Cancer Res 2001 Jun 1;61(11):4398–404

Cancer Res 2001 Nov 1;61(21):7825–9

Cancer Res 2002 May 15;62(10):2819–23

Cancer Res 2002 Sep 1;62(17):4992–5

Carcinogenesis 1999 Aug;20(8):1465–9

Carcinogenesis 2001 Jun;22(6):913–6

Carcinogenesis 2001 Jun;22(6):923–8

Carcinogenesis 2002 Feb;23(2):257–64

Carcinogenesis 2002 Jul;23(7):1229–34

Circulation 1999 Jun 1;99(21):2717–9

Circulation 2001 Jul 10;104(2):187–90

Clin Endocrinol (Oxf) 2000 May;52(5):575–80

Clin Genet 1999 Dec;56(6):428–33

Crit Care Med 2002 Oct;30(10):2236–41

Diabet Med 2000 Feb;17(2):111–8

Dig Dis Sci 1999 Nov;44(11):2324–9

Eur Heart J 1999 Nov;20(21):1587–91

Eur J Gastroenterol Hepatol 2001 Sep;13(9):1077–81

Eur J Hum Genet 1999 Oct–Nov;7(7):801–6

Eur J Hum Genet 2002 Jan;10(1):82–5

Fertil Steril 1999 Jul;72(1):164–6

Gastroenterology 2001 Jul;121(1):124–30

Genes Immun 1999 Nov;1(2):91–6

Genet Epidemiol 2001 Feb;20(2):271–83

Genomics 2000 Jan 1;63(1):7–12

Gut 2000 Aug;47(2):211–4

Gut 2001 Apr;48(4):461–7

Heart 2001 Jun;85(6):635–8

Hum Biol 2001 Feb;73(1):81–9

Hum Genet 2000 Jun;106(6):639–45

Hum Genet 2002 Jun;110(6):553–60

Hum Immunol 2001 Jul;62(7):714–24

Hum Immunol 2001 Oct;62(10):1153–8

Hum Mol Genet 1996 Jul;5(7):1075–80

Hum Mol Genet 1999 Jun;8(6):1135–40

Hum Mol Genet 2000 Oct 12;9(17):2517–21

Hum Mol Genet 2001 Jun 1;10(12):1265–73

Hum Mol Genet 2002 Jun 1;11(12):1399–407

Hypertension 1997 Sep;30(3 Pt 1):321–5

Hypertension 1999 Dec;34(6):1186–92

Int J Cancer 2001 Jun 1;92(5):683–6

Int J Cancer 2002 Jan 10;97(2):230–6

Int J Cancer 2001 Jul 20;95(4):271–5

Int J Obes Relat Metab Disord 2001 Apr;25(4):462–6

J Am Coll Cardiol 2001 May;37(6):1536–42

J Am Soc Nephrol 1999 Jul;10(7):1530–41

J Am Soc Nephrol 2002 Jul;13(7):1847–54

J Clin Endocrinol Metab 2000 May;85(5):1984–8

J Hum Hypertens 2001 May;15(5):335–9

J Hypertens 2000 Sep;18(9):1197–206

J Hypertens 2001 May;19(5):879–84

J Infect Dis 1999 Jan;179(1):187–91

J Intern Med 1999 Aug;246(2):211–8

J Mol Med 2001 Jun;79(5–6):300–5

J Natl Cancer Inst 2000 Mar 1;92(5):412–7

J Neuroimmunol 2002 Mar;124(1–2):101–5

J Neurol 2002 Jul;249(7):801–4

J Neurol Sci 1999 Jun 15;166(1):47–52

J Neurosurg 1999 May;90(5):935–8

Lancet 1996 Aug 31;348(9027):581–3

Lancet 1997 May 10;349(9062):1353–7

Lancet 2000 Feb 19;355(9204):618–21

Lancet 2001 Feb 10;357(9254):436–9

Lancet 2002 Feb 2;359(9304):397–401

Life Sci 2000 Dec 8;68(3):259–72

Mayo Clin Proc 2002 Jan;77(1):17–22

Mol Pathol 2000 Apr;53(2):88–93

Mol Psychiatry 2001 Jul;6(4):440–4

Mol Psychiatry 2001 Mar;6(2):202–10

Mol Psychiatry 2001 May;6(3):268–73

Mutat Res 2001 Jun;458(1–2):41–7

Nat Genet 2001 Jun;28(2):178–83

Nature 2000 Mar 23;404(6776):398–402

Nature 2002 Jul 25;418(6896):426–30

Neurology 2001 Feb 13;56(3):375–82

Neurology 2001 Oct 9;57(7):1341–2

Neurology 2002 Jul 23;59(2):215–9

Neurology 2002 Sep 10;59(5):724–8

Neurosci Lett 1998 May 8;247(1):33–6

Neurosci Lett 2000 Dec 1;295(1–2):41–4

Neurosci Lett 2002 Aug 16;328(3):325–7

Neurosci Lett 2002 Jul 5;326(3):171–4

Neurosci Lett 2002 Oct 4;331(1):60

Oncogene 2002 Mar 14;21(12):1928–33

Oral Oncol 2001 Oct;37(7):593–8

Prostate 2002 Sep 15;53(1):65–8

Psychopharmacology (Berl) 2000 Nov;152(4):408–13

Schizophr Res 2001 Apr 15;49(1–2):203–12

Spine 2002 Aug 15;27(16):1765–71

Stroke 1999 Sep;30(9):1881–6

Stroke 2000 Apr;31(4):936–9

Stroke 2000 Jul;31(7):1634–9

Thromb Res 2000 Aug 1;99(3):223–30

Tissue Antigens 1999 Jun;53(6):527–33

Electronic-Database Information

Accession numbers and URLs for data presented herein are as follows:

References

Akey JM, Zhang K, Xiong M, Doris P, Jin L (2001) The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures. Am J Hum Genet 68:1447–1456 [PMC free article] [PubMed]
Brzustowicz LM, Merette C, Xie X, Townsend L, Gilliam TC, Ott J (1993) Molecular and statistical approaches to the detection and correction of errors in genotype databases. Am J Hum Genet 53:1137–1145 [PMC free article] [PubMed]
Buetow KH (1991) Influence of aberrant observations on high-resolution linkage analysis outcomes. Am J Hum Genet 49:985–994 [PMC free article] [PubMed]
Clarke LA, Rebelo CS, Goncalves J, Boavida MG, Jordan P (2001) PCR amplification introduces errors into mononucleotide and dinucleotide repeat sequences. Mol Pathol 54:351–353 [PMC free article] [PubMed]
Cutler DJ, Zwick ME, Carrasquillo MM, Yohn CT, Tobin KP, Kashuk C, Mathews DJ, Shah NA, Eichler EE, Warrington JA, Chakravarti A (2001) High-throughput variation detection and genotyping using microarrays. Genome Res 11:1913–1925 [PMC free article] [PubMed]
Douglas JA, Boehnke M, Lange K (2000) A multipoint method for detecting genotyping errors and mutations in sibling-pair linkage data. Am J Hum Genet 66:1287–1298 [PMC free article] [PubMed]
Douglas JA, Skol AD, Boehnke M (2002) Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data. Am J Hum Genet 70:487–495 [PMC free article] [PubMed]
Ewen KR, Bahlo M, Treloar SA, Levinson DF, Mowry B, Barlow JW, Foote SJ (2000) Identification and analysis of errors in high-throughput genotyping. Am J Hum Genet 67:727–736 [PMC free article] [PubMed]
Ewens WJ, Spielman RS (1995) The transmission/disequilibrium test: history, subdivision, and admixture. Am J Hum Genet 57:455–464 [PMC free article] [PubMed]
Ghosh S, Karanjawala ZE, Hauser ER, Ally D, Knapp JI, Rayman JB, Musick A, Tannenbaum J, Te C, Shapiro S, Eldridge W, Musick T, Martin C, Smith JR, Carpten JD, Brownstein MJ, Powell JI, Whiten R, Chines P, Nylund SJ, Magnuson VL, Boehnke M, Collins FS (1997) Methods for precise sizing, automated binning of alleles, and reduction of error rates in large-scale genotyping using fluorescently labeled dinucleotide markers. Genome Research 7:165–178 [PubMed]
Gordon D, Heath SC, Ott J (1999) True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms. Hum Hered 49:65–70 [PubMed]
Gordon D, Ott J (2001) Assessment and management of single nucleotide polymorphism genotype errors in genetic association analysis. Pac Symp Biocomput 2001:18–29 [PubMed]
Gordon D, Heath SC, Liu X, Ott J (2001) A transmission disequilibrium test that allows for genotyping errors in the analysis of single-nucleotide polymorphism data. Am J Hum Genet 69:371–380 [PMC free article] [PubMed]
Goring HHH, Terwilliger JD (2000) Linkage analysis in the presence of errors II: marker-locus genotyping errors modeled with hypercomplex recombination fractions. Am J Hum Genet 66:1107–1118 [PMC free article] [PubMed]
Hall JM, LeDuc CA, Watson AR, Roter AH (1996) An approach to high-throughput genotyping. Genome Research 6:781–790 [PubMed]
Heath SC (1998) A bias in TDT due to undetected genotyping errors. Am J Hum Genet Suppl 63:A292
Idury RM, Cardon LR (1997) A simple method for automated allele binning in microsatellite markers. Genome Res 7:1104–1109 [PMC free article] [PubMed]
Lincoln SE, Lander ES (1992) Systematic detection of errors in genetic linkage data. Genomics 14:604–610 [PubMed]
Mansfield DC, Brown AF, Green DK, Carothers AD, Morris SW, Evans HJ, Wright AF (1994) Automation of genetic linkage analysis using fluorescent microsatellite markers. Genomics 24:225–233 [PubMed]
Risch N, Merikangas K (1996) The future of genetic studies of complex human diseases. Science 273:1516–1517 [PubMed]
Shields DC, Collins A, Buetow KH, Morton NE (1991) Error filtration, interference and the human linkage map. Proc Natl Acad Sci USA 88:6501–6505 [PMC free article] [PubMed]
Smith JR, Carpten JD, Brownstein MJ, Ghosh S, Magnuson VL, Gilbert DA, Trent JM, Collins FS (1995) Approach to genotyping errors caused by nontemplated nucleotide addition by Taq DNA polymerase. Genome Research 5:312–317 [PubMed]
Sobel E, Papp JC, Lange K (2002) Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet 70:496–508 [PMC free article] [PubMed]
Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–516 [PMC free article] [PubMed]
Weeks DE, Conley YP, Ferrell RE, Mah TS, Gorin MB (2002) A tale of two genotypes: consistency between two high-throughput genotyping centers. Genome Research 12:430–435 [PMC free article] [PubMed]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...