Skip to main content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Eur J Hum Genet. 2010 Sep; 18(9): 1032–1038.
Published online 2010 May 5. doi: 10.1038/ejhg.2010.57
PMCID: PMC2921480
NIHMSID: NIHMS191437
PMID: 20442746

The power of the Transmission Disequilibrium Test in the presence of population stratification

Abstract

The Transmission Disequilibrium Test (TDT) is a family-based test for association based on the rate of transmission of alleles from heterozygous parents to affected offspring, and has gained popularity as this test preserves the Type I error rate. Population stratification results in a decreased number of heterozygous parents compared to that expected assuming Hardy–Weinberg Equilibrium (Wahlund Effect). We show that population stratification changes the relative proportion of the informative mating types. The decrease in the number of heterozygous parents and the change in the relative proportion of the informative mating types result in significant changes to the sample sizes required to achieve the power desired. We show examples of the changes in sample sizes, and provide an easy method for estimating TDT sample sizes in the presence of population stratification. This method potentially aids in reducing the number of false-negative association studies.

Keywords: Transmission Disequilibrium Test, population stratification, power

Introduction

Case–control analysis using singletons has been shown to be generally more powerful than any other design per genotyped individual for detecting genes associated with disease.1 However, this design is vulnerable to excessive false-positive findings in the presence of population stratification.2, 3, 4, 5, 6, 7 If a study population comprises two subpopulations, and the disease prevalence is greater in one subpopulation compared to the other, then cases will typically be over-sampled from the subpopulation with higher disease prevalence. Spurious associations will be observed between the disease and any genetic marker where the allele frequencies differ in both subpopulations.

The Transmission Disequilibrium Test (TDT) proposed by Spielman et al8 compares the rate of transmission of each allele from a heterozygous parent to an affected offspring. The TDT maintains the desired Type I error rate in the presence of population stratification, as non-transmitted parental alleles from heterozygous parents serve, in effect, as the control population. Therefore, the power of the TDT is heavily dependent on the number of heterozygotes (informative parents), who may or may not transmit the allele of interest. It is well appreciated that population stratification results in a loss of heterozygosity compared to that expected assuming Hardy–Weinberg Equilibrium (HWE). This is known as the Wahlund Effect, which results in less informative parents for the TDT. However, no guidelines currently exist for sample size calculations for the TDT in the presence of population stratification.

In this paper, we lay out a method of estimating the two parent genotype patterns (mating types) seen in the presence of population stratification using the parental allele frequencies and Wright's coefficient of inbreeding F. The estimates of the mating types are then used for power calculations using the method provided by Knapp.9 We then compare the sample sizes assumed using HWE to those calculated using our new method and show how these differences arise.

Methods

Notation and terminology

Population stratification occurs when a population comprises two or more subpopulations, where there is random mating and HWE within subpopulations, but no mating between subpopulations. Assume that there are G separate subpopulations, where G, as well as the actual members of each subpopulation are unknown and let wi be the proportion of the stratified population represented by subpopulation i. Consider a single biallelic marker or single nucleotide polymorphism (SNP) with two alleles, A and B, respectively. If pi is the frequency of allele A in subpopulation i, then An external file that holds a picture, illustration, etc.
Object name is ejhg201057e1.jpg is the allele A frequency (averaged over all subpopulations) in the overall population. Let qi=1−pi be the frequency of the B allele in subpopulation i, so that the average allele frequency of allele B in the overall population is 1−p̄=q̄. The variance of the A allele frequency between subpopulations is defined as An external file that holds a picture, illustration, etc.
Object name is ejhg201057e2.jpg

. As we assume random mating and HWE within subpopulations, the frequencies of the AA, AB and BB genotypes in subpopulation i are pi2, 2piqi and qi2, respectively. Let the proportion of individuals in the overall stratified population with genotypes AA, AB and BB be μAA, μAB and μBB, respectively so that An external file that holds a picture, illustration, etc.
Object name is ejhg201057e3.jpg

The mating type is defined as the two-parent genotype combination. We assume symmetry between the mating types (ie AA × AB=AB × AA, etc.), so that instead of nine unique mating types, we have only six mating types. Let An external file that holds a picture, illustration, etc.
Object name is ejhg201057e4.jpg.

We define ψ2 and ψ1 to be the respective genotypic relative risks (GRR) for the risk allele (allele A) homozygotes and heterozygotes compared to the BB homozygote. The GRR, ψi is defined as the ratio of the probability of disease in subjects with i risk alleles compared to subjects with 0 risk alleles. We assume that there is no segregation distortion and that the offspring genotype determines the offspring's disease risk. We also assume that the GRRs are the same in each subpopulation (absence of risk heterogeneity). Therefore, the difference in disease prevalence between subpopulations is only because of differences in the risk allele frequency between the subpopulations, that is the risk conferred by a given allele does not vary by subpopulation. For the sample size calculations done in this paper, the following modes of inheritance (MOI) were considered for comparison with Knapp: multiplicative model (ψ1=γ and ψ2=γ2), additive model (ψ1=γ and ψ2=2γ), dominant model (ψ1=ψ2=γ) and the recessive model (ψ1=1 and ψ2=γ), where γ≥1 for all models.9 We note that the multiplicative and additive MOI definitions were chosen to correspond to those defined previously by Camp.10

Population stratification results in a decrease in heterozygosity compared to that expected assuming HWE

If a population is divided into distinct subpopulations with random mating within each subpopulation, the decrease in the proportion of heterozygotes compared to that occurring under random mating of the whole population is proportional to the variance of the allele frequency between subpopulations. This phenomenon is known as the Wahlund Effect.11 Wright's coefficient of inbreeding, F, is commonly used to describe population stratification, where An external file that holds a picture, illustration, etc.
Object name is ejhg201057e5.jpg

, and the variance of the allele frequency between subpopulations, An external file that holds a picture, illustration, etc.
Object name is ejhg201057e6.jpg

. Population stratification results in a loss of heterozygosity that is exactly 100 (1−F)% of that expected assuming HWE.

F values expected in practice

It is difficult to know a priori how much population stratification may exist in practice. Cavalli-Sforza12 estimated that the worldwide F values averaged over all genes in the genome is approximately 0.139±0.010. A study by Akey et al13 analyzed 26 530 SNPs in three populations (African-American, East Asian and European-American). The F value for each locus was calculated and the distribution of these F values was noted. Of the 25 549 autosomal SNPs genotyped, 29.8% were common in all three populations, 26.8% were common in only two populations, 22.9% were unique to only one population and 5.1% were uncommon. Here, an SNP was defined as common if the minor allele frequency was greater than 20%. The average F value obtained in the coding, intronic and non-coding regions were 0.107, 0.118 and 0.123, respectively. The distribution of F values was skewed to the right, which suggested that F values in the range 0–0.15 are feasible values that should be considered in practice. We note that F values calculated by Akey et al13 assume that equal proportions of each subpopulation are present in the overall stratified population, whereas we allow for differing proportions of each subpopulation in the overall stratified population. Akey's approach is best suited for calculating the genetic distance between subpopulations, whereas our approach is geared toward detecting the change in the genotype frequencies in the presence of stratification.

The effect of population stratification on mating type frequencies

Let Δi be the difference between the allele A frequency in subpopulation i and the allele A frequency in the entire population, where Δi=pip̄, for i=1, 2,…G. Furthermore, define the central moments of the allele frequency distribution, An external file that holds a picture, illustration, etc.
Object name is ejhg201057e7.jpg for j=1, 2, 3, 4. Therefore,

equation image

equation image

equation image

equation image

Then,

equation image

This yields

equation image

Similarly,

equation image

This yields

equation image

All of the six mating types can now be re-written in terms of the average risk allele frequency p̄, Wright's coefficient of inbreeding F, ϑ3 and ϑ4 as shown in Table 1.

Table 1

Mating type frequencies in the presence of population stratification parameterized in terms of the central moments of the allele frequency distribution
An external file that holds a picture, illustration, etc.
Object name is ejhg201057t1.jpg

Change in the relative proportion of informative families because of population stratification

Three of the six mating types are informative for the TDT because they contain at least one heterozygous parent – AA × AB, AB × AB and AB × BB (m2, m4 and m5). However, as shown above, Wright's coefficient of inbreeding, F, is insufficient to appropriately calculate all mating type frequencies without ϑ3 and ϑ4. Under HWE the relative proportions of the informative mating types m2:m4:m5 are clearly p̄2:p̄ (1−p̄):(1−p̄)2; however, these proportions no longer hold in the presence of population stratification. To illustrate the changes in the relative proportions of the mating types, we consider a study population comprising two equal-sized subpopulations, where F=0 and 0.10, ϑ3=0 and ϑ4 is allowed to vary for a multiplicative MOI where γ=2 (only p̄, F and ϑ3 are required to completely describe a two-subpopulation model). The changes in the relative proportions of the informative mating types AA × AB, AB × AB and AB × BB are shown in Figure 1.

An external file that holds a picture, illustration, etc.
Object name is ejhg201057f1.jpg

Relative proportion of the informative mating types for a multiplicative MOI where γ=2.

The power of the TDT and sample size calculations in the presence of population stratification

The TDT tests the null hypothesis of no association between a marker and disease in the presence of linkage. Knapp provided a method for reliably calculating the power of the TDT in a homogeneous population assuming HWE for affected child trios (ACTs).9 We retain his symbols for ease of comparison with our method. At the heart of Knapp's method is the characterization of family types (genotypes of both parents as well as the affected offspring) for the TDT and the calculation of each family type probability (s1, s2,…, s10), as shown in Table 2. Seven family types are informative for the TDT (s1, s2,…, s7) and Knapp showed how their multinomial frequencies could be calculated based only on ψ1, ψ2 and p̄.9 However, if population stratification exists, the average frequency of the risk allele A in the stratified population (p̄) is not sufficient to calculate the mating type proportions. The true family type proportions (s*1, s*2,…, s*10) can be calculated from the correct mating type proportions based on the method described earlier (using p̄, F, ϑ3 and ϑ4) and are shown in Table 2. However, ϑ3 and ϑ4 are almost always unknown, and though there are several published papers about F values expected in practice, there are no published reports of ϑ3 and ϑ4 values expected in practice. Yasuda14 showed that the higher central moments (terms in ϑ3 and ϑ4) can be ignored if p̄>F and 1−p̄>F. Therefore, (s1, s2,…, s7) can be estimated by (ŝ*1, ŝ*2,…, ŝ*7) using p̄ and F as shown in Table 2 and these estimates can then be used in power calculations. GRRs are set assuming γ=2.0 and 1.5 for multiplicative, additive, dominant and recessive models. To evaluate our method for estimating sample sizes, we considered a stratified study population comprising two smaller subpopulations. In the first example, 70% of the trios are from subpopulation 1, and 30% of the trios are derived from subpopulation 2. In the second example, the study population comprises equal proportions of trios from both subpopulations. We then calculated the true sample size requirements to achieve 80% power, using a Type I error rate of α=1 × 10−7 at various values of the average allele frequency, p̄ (0.1, 0.3, 0.5 and 0.8), and at various levels of population stratification measured by F (0.01, 0.05 and 0.10). The true sample size estimates were compared to our estimates using only p̄ and F, as well as compared to the sample size calculations assuming HWE. These sample size calculations are shown in Tables 3 and and4.4. A Type I error rate of α=1 × 10−7 is used for comparison with Knapp,9 and for comparision with sample size estimates from genomewide association studies.

Table 2

Family type probabilities given affected child
An external file that holds a picture, illustration, etc.
Object name is ejhg201057t2.jpg

Table 3

Sample size necessary to gain 80% power in TDT with singletons (α=10−7, γ=2), comparing Knapp's first approximation assuming HWE to the sample size estimates assuming population stratification
    Multiplicative MOIAdditive MOIRecessive MOIDominant MOI
p̄Fϑ3ϑ4ABCABCABCABC
0.10.012.3 × 10−51.4 × 10−6689692 (100.4%)692689692 (100.4%)69245 07139 168 (86.9%)39 354949978 (103.1%)977
  08.1 × 10−7  692  692  39 168  978
0.10.052.6 × 10−43.5 × 10−5689702 (101.9%)704689702 (101.9%)70245 07124 102 (53.5%)25 1419491110 (117.0%)1101
  02.0 × 10−5  702  732  24 102  1110
0.10.107.5 × 10−41.4 × 10−4689714 (103.6%)720689714 (103.6%)72045 07114 797 (32.8%)16 2739491324 (139.5%)1292
  08.1 × 10−5  714  714  14 798  1324
0.30.018.4 × 10−57.8 × 10−6349352 (100.9%)352349352 (100.9%)35225462513 (98.7%)2519907927 (102.2%)925
  04.4 × 10−6  352  352  2513  927
0.30.059.4 × 10−41.9 × 10−4349364 (104.3%)365349364 (104.3%)36525462383 (93.6%)24499071015 (111.9%)999
  01.1 × 10−4  364  364  2383  1015
0.30.102.7 × 10−37.8 × 10−4349380 (108.9%)384349380 (108.9%)38425462227 (87.5%)24069071150 (126.8%)1097
  04.4 × 10−4  380  380  2227  1149
0.50.011.1 × 10−41.1 × 10−5338342 (101.2)342338342 (101.2)342957969 (101.3%)97118391855 (100.9%)1852
  06.3 × 10−6  342  342  969  1855
0.50.051.2 × 10−32.8 × 10−4338358 (105.9%)359338358 (105.9%)3599571017 (106.3%)103718391923 (104.6%)1881
  01.6 × 10−4  358  358  1017  1923
0.50.103.5 × 10−31.1 × 10−3338380 (112.4%)384338380 (112.4%)3849571084 (113.3%)114718392017 (109.7%)1889
  06.3 × 10−4  380  380  1084  2017
0.80.015.6 × 10−54.5 × 10−6634643 (101.4%)643634643 101.4%)643851872 (102.5%)87221 99820 879 (94.9%)20 879
  02.6 × 10−6  643  643  872  20 879
0.80.056.2 × 10−41.1 × 10−4634682 (107.6%)683634682 (107.6%)683851965 (113.4%)97521 99817 047 (77.5%)16 435
  06.4 × 10−5  682  682  965  17 047
0.80.101.8 × 10−34.5 × 10−4634736 (116.1%)741634736 (116.1%)7418511111 (130.6%)114521 99813 379 (60.8%)12 178
  02.6 × 10−4  736  736  1111  13 378

A is Knapp's first approximation assuming HWE.

B is estimate in presence of population stratification, based on p̄ and F only.

C is true sample size requirement.

Table 4

Sample size necessary to gain 80% power in TDT with singletons (α=10−7, γ=1.5), comparing Knapp's first approximation assuming HWE to the sample size estimates assuming population stratification
    Multiplicative MOIAdditive MOIRecessive MOIDominant MOI
p̄Fϑ3ϑ4ABCABCABCABC
0.10.012.3 × 10−51.4 × 10−622102224 (100.6%)222517551738 (99.0%)1739174 695151 448 (86.7%)152 18128972983 (103.0%)2981
  08.1 × 10−7  2224  1738  151 448  2983
0.10.052.6 × 10−43.5 × 10−522102283 (103.3%)228717551671 (95.2%)1683174 69592 276 (52.8%)96 32828973375 (116.5%)3349
  02.0 × 10−5  2283  1671  92 277  3375
0.10.107.5 × 10−41.4 × 10−422102361 (106.8%)237117551591 (90.7%)1623174 69555 922 (32.0%)61 63728974011 (138.5%)3916
  08.1 × 10−5  2361  1591  55 924  4011
0.30.018.4 × 10−57.8 × 10−610371046 (100.9%)1047608611 (100.5%)61190978963 (98.5%)898624402493 (102.2%)2490
  04.4 × 10−6  1046  611  8963  2493
0.30.059.4 × 10−41.9 × 10−410371084 (104.5%)1086608621 (102.1%)62690978443 (92.8%)868624402730 (111.9%)2690
  01.1 × 10−4  1084  621  8443  2730
0.30.102.7 × 10−37.8 × 10−410371135 (109.5%)1142608633 (104.1%)64990977821 (86.0%)847324403092 (126.7%)2956
  04.4 × 10−4  1135  633  7822  3092
0.50.011.1 × 10−41.1 × 10−5947957 (101.1%)957464469 (101.1%)47030993133 (101.1%)313845684610 (100.9%)4602
  06.3 × 10−6  957  469  3133  4610
0.50.051.2 × 10−32.8 × 10−4947999 (105.5%)1001464492 (106.0%)49530993279 (105.8%)334345684789 (104.8%)4687
  01.6 × 10−4  999  492  3279  4788
0.50.103.5 × 10−31.1 × 10−39471057 (111.6%)1063464523 (112.7%)53530993479 (112.3%)369045685034 (110.2%)4724
  06.3 × 10−4  1057  523  3479  5033
0.80.015.6 × 10−54.5 × 10−616581678 (101.2%)1679698710 (101.7%)71023562415 (102.5%)241750 82648 284 (95.0%)48 116
  02.6 × 10−6  1678  710  2415  48 284
0.80.056.2 × 10−41.1 × 10−416581767 (106.6%)1769698763 (109.3%)76623562680 (113.8%)270750 82639 566 (77.8%)38 166
  06.4 × 10−5  1767  763  2680  39 566
0.80.101.8 × 10−34.5 × 10−416581891 (114.1%)1899698840 (120.3%)85223563095 (131.4%)319450 82631 208 (61.4%)28 448
  02.6 × 10−4  1891  840  3095  31 206

A is Knapp's first approximation assuming HWE.

B is estimate in presence of population stratification, based on p̄ and F only.

C is true sample size requirement.

Results

Population stratification alters the frequency of the mating types when compared to those calculated assuming HWE. The change in the distribution of the mating type frequencies directly changes the family type frequencies (mother–father–offspring genotype combinations). When HWE is assumed for TDT power calculations, the expected value and variance of the test statistic under the alternative hypothesis as well as the variance of the test statistic under the null hypothesis differ from that calculated when accounting for population stratification. The variance of the TDT in the presence of stratification under the null hypothesis could be larger or smaller than that calculated assuming HWE depending on the mating type parameters (p̄, F, ϑ3, ϑ4). The difference between the variance of the TDT under the alternative hypothesis accounting for population stratification and the variance of the TDT assuming HWE depends on the model (ψ2, ψ1) and mating type parameters (p̄, F, ϑ3, ϑ4). The current methods existing for calculating TDT sample sizes ignore the fact that parental genotypes used in the TDT statistic are no longer in HWE if there is population stratification. Furthermore, these methods ignore the fact that the parents cannot be considered independent as population stratification induces non-random mating, when considered on a whole-population basis.

Knapp showed that the power of the TDT is a function of the genetic model, genotypic risk parameters and the risk allele frequency.9 Our model for TDT sample size calculations in the presence of population stratification based on estimations of the mating type frequencies using the average allele frequency, p̄ and Wright's coefficient of inbreeding F allows sampling from any number of subpopulations with any ascertainment scheme, and is generalizeable to any stratified study population. In addition, we show that the change in the power of the TDT in a stratified population is due to three main factors: (1) the loss in heterozygosity, which is 100 (1−F)% of that predicted assuming HWE; (2) the change in the distribution of mating type frequencies; and (3) the change in the relative proportion of the informative mating types to each other. These factors should not be ignored if there is the possibility of even mild population stratification.

Interestingly, and somewhat contradictory to initial expectation, population stratification does not always result in decreased power of the TDT because of the loss in heterozygosity. Tables 3 and and44 show that the power of the TDT can be increased in the presence of population stratification because of the change in the relative proportion of the informative mating types and the disease model. If the disease model is a multiplicative MOI, then sample sizes calculated assuming HWE are generally smaller than those actually required. For a dominant, additive or recessive model, the relative frequencies of the AA × AB, AB × BB and AB × AB mating types dictate the change in sample size requirements as the relative information content for these three mating types may not be in the ratio 1:1:2. For example, if the disease is inherited in a dominant MOI, the mating type AA × AB has no information and the mating type AB × BB has the most information.

In a stratified population, the risk allele frequency and mating type frequencies in a single subpopulation may cause subjects from that subpopulation to be more informative than subjects from other subpopulations. This results in one subpopulation that heavily influences the overall TDT sample size calculations. The subjects from other subpopulations may not be as informative and add very little to the overall TDT statistic. This suggests that it may be useful to partition a study population into its component subpopulations prior to analysis.

Larger values of Wright's F were correlated larger discrepancies in the sample size requirements compared to those calculated assuming HWE. For example, in Table 3 where F=0.01 and γ=2, estimates of the sample size required varied from 3% larger to approximately 15% smaller than the sample size calculated using Knapp's first approximation. However, when F=0.1 and γ=2, estimates of the sample size required varied from 39% larger to about 68% smaller than the sample size calculated using Knapp's first approximation.

Wright's F cannot capture all the information about population stratification pertaining to the distortion in mating type frequencies; however, reasonable estimates of the mating type frequencies can be made using F and average allele frequency p̄ when F≤min (p̄, 1−p̄).

To estimate sample sizes for the TDT we ignore the third and higher order risk allele frequency moments (ϑ3 and ϑ4). However, small changes in ϑ3 and ϑ4 are important and result in sample size discrepancies when our estimates are compared to the true sample size estimates for a fixed F, as seen in Tables 3 and and44.

One potential limitation of our method is that it does not take into account population admixture, as we assume both parents are sampled from the same subpopulation. Population admixture occurs when the study population comprises multiple subpopulations, but there is mating within and between subpopulations. Population admixture is complex, and the rate of admixture depends on socioeconomic, racial, ethnic, linguistic, migratory factors and several other factors. Additional work is needed to assess how the power of the TDT changes in recently admixed populations.

Knapp uses the disease prevalence as a normalizing factor so that the sum of the probabilities of all family types with an affected offspring sums to 1.9 We note that the disease prevalence, calculated assuming HWE denoted by R, is different from the disease prevalence R̄ calculated assuming population stratification. The disease prevalence in the stratified population, An external file that holds a picture, illustration, etc.
Object name is ejhg201057e16.jpg is the disease prevalence in subpopulation i. We also note that the normalization factor, R̄ in the stratified population is greater in magnitude than the normalization factor in the population in HWE, R, if ψ2+1>2ψ1, however, if ψ2+1<2ψ1 the normalization factor in the stratified population, R̄ is smaller in magnitude than that of the population in HWE. The difference in the normalization factors between the stratified population and the population in HWE, Fp̄q̄ (ψ2−2ψ1+1), depends on the genetic model chosen and the GRRs. The difference is due to the altered proportion of risk genotypes because of population stratification.

Distortion in the mating type frequencies can be seen when there is underlying population stratification. In a randomly mating population in HWE, F=ϑ3=ϑ4=0. Distortion in the mating type frequencies can also be observed in the parents of an affected child. For example, the parents of a child with a rare recessive disease will most likely be both heterozygous at the disease locus (AB × AB). Similarly, for a rare dominant disorder, the parental genotypes of affected children are most likely a heterozygote and a wild type (AB × BB). This distortion in mating type frequencies becomes most apparent when the risk allele is rare (less than 1% frequency in the population) but confers a substantial risk of disease and the allele explains the majority of the variability seen in the disease and the disease can be considered a simple Mendelian disease from a genetic perspective. The method proposed accounts for the genetic MOI, and takes into account the distortions in the mating type frequencies seen because the parents are the parents of an affected child. Our method is suitable for complex diseases (where the GRR conferred by any single polymorphism is generally less than 3), and the risk allele is relatively common (ie the risk allele frequency in the population is greater than 1%).

Discussion

The TDT was proposed as the solution to the challenge of finding a disease susceptibility gene in a stratified study population. This method was proposed primarily because of its robustness to population stratification and admixture, which made it superior to the traditional case–control tests, which are susceptible to false-positive results. Typically, in the design of family-based studies, HWE is assumed when calculating adequate sample sizes required for a pre-specified power level. However, when there is population stratification, there is an increase in homozygosity beyond that expected by HWE. The homozygous parents in the study population do not contribute to the TDT statistic.15 For example, in a sample of 500 ACTs, in which the estimated disease allele frequency in the parents is 0.1, 180 heterozygous parents would be expected assuming HWE. However, if there is a large degree of population substructure where F=0.25, only 135 heterozygous parents are expected under the null hypothesis (the locus of interest is not associated with the disease of interest).

Although there are several methods for calculating the power of the TDT,9, 16, 17 none of these methods takes into consideration the power of the TDT in the presence of population stratification. Despite the fact that the TDT maintains the correct Type I error rate, the power of the TDT is significantly affected in the presence of population stratification. This is extremely important, because the TDT is often used when population stratification is possible. Studies with sample sizes calculated ignoring population stratification might be underpowered, and fail to detect putative disease genes. Our method is the first method that proposes incorporating Wright's coefficient of inbreeding, F as a measure of population stratification to approximate the mating type frequencies in the presence of population stratification for TDT sample size calculation.

One of our major points is that in the presence of population stratification, there is considerable variation in the power of the TDT to detect an association between a locus and a disease. The power of the TDT depends on the underlying genetic structure of the study population. We suggest that in future, researchers present estimates of the parameters required to describe the population structure (p̄, F, ϑ3 and ϑ4) as well as the standard errors of these parameter estimates with all TDT findings. Studies using the TDT to replicate previously published findings may be more challenging than previously thought, as two studies with the same number of subjects could have very different powers to detect the same genetic association depending on the underlying population substructure. This finding may shed light on the cornucopia of studies that have failed to replicate previously published positive findings of association at certain loci. However, one must note that the initial finding may have been spurious.

In addition to possibly losing power because of population stratification (as a result of altered proportions and relative ratios of informative family types), the power of the TDT can be reduced if there is genetic risk heterogeneity between the subpopulations. Genetic risk heterogeneity occurs if the relative risk (RR) of disease conferred by the putative deleterious allele varies from one subpopulation to another. Quantitative risk heterogeneity occurs when the effect of the deleterious allele is not homogenous across all subpopulations, but is more profound in some subpopulations. A classical example for quantitative risk heterogeneity exists between ApoE and Alzheimer's disease, where the association exists pan-ethnically but is strongest in Caucasians and Asians, and weaker in Hispanics and African-Americans.18 The results of the TDT remain valid when there is quantitative risk heterogeneity, but the power of the TDT to detect this association with a given sample size will vary depending on the ethnic composition of the study sample.

Qualitative risk heterogeneity occurs when one allele is deleterious in one subpopulation (RR>1), but is protective in another (RR<1). An important point to be emphasized is that the power of the TDT diminishes greatly if there is qualitative risk heterogeneity. In fact, the case–control genomic control method proposed by Devlin and Roeder3 and the population stratification model (STRAT) proposed by Pritchard et al19 are both more powerful than the TDT per genotyped individual in the presence of qualitative risk heterogeneity as illustrated in Table 2 of Pritchard and Donnelly.6

In summary, statistical geneticists and genetic epidemiologists should carefully identify their study population and based on a conservative level of population stratification, follow the guidelines proposed when calculating sample sizes in anticipation of genetic analysis using the TDT and other family-based tests.

Acknowledgments

We thank Dr Nan Laird, Dr Christoph Lange, Dr Neil Risch and the reviewers for their valuable comments in preparing this manuscript. Ronnie Sebro was supported by a Howard Hughes Medical Institute Pre-Doctoral Fellowship.

Notes

The authors declare no conflict of interest.

References

  • Risch N, Merikangas K. The future of genetic studies of complex human diseases. Science. 1996;273:1516–1518. [PubMed] [Google Scholar]
  • Devlin B, Roeder K, Bacanu SA. Unbiased methods for population-based association studies. Genet Epidemiol. 2001;21:273–284. [PubMed] [Google Scholar]
  • Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. [PubMed] [Google Scholar]
  • Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3-5,13,14 and type-2 diabetes mellitus – an association in American-Indians with genetic admixture. Am J Hum Genet. 1988;43:520–526. [PMC free article] [PubMed] [Google Scholar]
  • Lander ES, Schork N. Genetic dissection of complex traits. Science. 1994;265:2037–2048. [PubMed] [Google Scholar]
  • Pritchard JK, Donnelly P. Case-control studies of association in structured or admixed populations. Theor Popul Biol. 2001;60:227–237. [PubMed] [Google Scholar]
  • Pritchard JK, Rosenberg NA. Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. 1999;65:220–228. [PMC free article] [PubMed] [Google Scholar]
  • Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed] [Google Scholar]
  • Knapp M. A note on power approximations for the transmission/disequilibrium test. Am J Hum Genet. 1999;64:1177–1185. [PMC free article] [PubMed] [Google Scholar]
  • Camp NJ. Genomewide transmission/disequilibrium testing – consideration of the genotypic relative risks at disease loci. Am J Hum Genet. 1997;61:1424–1430. [PMC free article] [PubMed] [Google Scholar]
  • Elandt-Johnson RC. Probability Models and Statistical Methods in Genetics. New York: Wiley; 1971. [Google Scholar]
  • Cavalli-Sforza LL, Menozzi P, Piazza A. The History and Geography of Human Genes. Princeton, NJ: Princeton University Press; 1994. [Google Scholar]
  • Akey JM, Zhang G, Zhang K, Jin L, Shriver MD. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 2002;12:1805–1814. [PMC free article] [PubMed] [Google Scholar]
  • Yasuda N. An extension of Wahlund's principle to evaluate mating type frequency. Am J Hum Genet. 1968;20:1–23. [PMC free article] [PubMed] [Google Scholar]
  • Risch N, Teng J. The relative power of family-based and case-control designs for association studies of complex human diseases. I. DNA pooling. Genome Res. 1998;8:1273–1288. [PubMed] [Google Scholar]
  • Iles MM. On calculating the power of a TDT study – comparison of methods. Ann Hum Genet. 2002;66:323–328. [PubMed] [Google Scholar]
  • Chen WM, Deng HW. A general and accurate approach for computing the statistical power of the transmission disequilibrium test for complex disease genes. Genet Epidemiol. 2001;21:53–67. [PubMed] [Google Scholar]
  • Risch N. Searching for genetic determinants in the new millennium. Nature. 2000;405:847–856. [PubMed] [Google Scholar]
  • Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet. 2000;67:170–181. [PMC free article] [PubMed] [Google Scholar]

Articles from European Journal of Human Genetics are provided here courtesy of Nature Publishing Group