• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Ann Hum Genet. Nov 2008; 72(6): 793–800.
PMCID: PMC2659381

Power, Validity, Bias and Robustness of Family-based Association Analysis Methods in the Presence of Linkage for Late Onset Diseases


This simulation-based report compares the performance of five methods of association analysis in the presence of linkage using extended sibships: the Family-Based Association Test (FBAT), Empirical Variance FBAT (EV-FBAT), Conditional Logistic Regression (CLR), Robust CLR (R-CLR) and Sibship Disequilibrium Test (SDT). The two tests accounting for residual familial correlation (EV-FBAT and R-CLR) and the model-free SDT showed correct test size in all simulated designs, while FBAT and CLR were only valid for small effect sizes. SDT had the lowest power, while CLR had the highest power, generally similar to FBAT and the robust variance analogues. The power of all model-dependent tests dropped when the model was misspecified, although often not substantially. Estimates of genetic effect with CLR and R-CLR were unbiased when the disease locus was analysed but biased when a nearby marker was analysed. This study demonstrates that the genetic effect does not need to be extreme to invalidate tests that ignore familial correlation and confirms that analogous methods using robust variance estimation provide a valid alternative at little cost to power. Overall R-CLR is the best-performing method among these alternatives for the analysis of extended sibship data.

Keywords: Extended sibships, conditional logistic regression, robust variance, simulation


Family-based association tests are tools of genetic association analysis robust to confounding effects of population stratification, a recurrent problem in classical case-control designs. The most popular family-based association test, the Transmission Disequilibrium Test or TDT, was originally designed to test for distortion in genetic transmission between parents and their affected offspring (Spielman et al. 1993). It has since been extended to other family and trait types by different authors (e. g. Allison, 1997; Spielman & Ewens, 1998; Clayton, 1999; Abecasis et al. 2000). Probably the most comprehensive extension is the Family-Based Association Test (FBAT) approach (Rabinowitz & Laird, 2000) implemented in the popular software of the same name (http://www.biostat.harvard.edu/~fbat). This method can be applied to both dichotomous and quantitative phenotypes in a variety of family types regardless of the structure of missing data, and multiple relatives are used while accounting for correlation in allele transmission among them (Lake et al. 2000). The ability to analyse families where parents are missing is of particular interest for late onset diseases where the parents of a proband have generally died. Accounting for correlation in allele transmission among siblings due to genetic linkage is important when multiple siblings are affected. In fact, while this correlation has no impact when the null hypothesis is ‘no linkage and no association’, it does induce spurious association when the null is ‘linkage but no association’ (Martin et al. 2000; Lake et al. 2000). Previously some authors had proposed to resolve the problem of residual familial correlation by selecting only one affected per family (Curtis, 1997), but this approach is clearly inefficient. Lake et al. (2000) proposed to account for residual familial correlation by using an empirical variance-covariance estimator in place of the classical variance-covariance matrix used in standard FBAT, giving a new test that they termed EV-FBAT (option –e in FBAT program).

Conditional logistic regression (CLR, available in most statistical software e.g. SAS, SAS Institute Inc., Cary, NC, USA, and STATA, StataCorp, 2005) is a traditional method of analysis of matched case-control studies in epidemiology. This is also appropriate for testing genetic association using familial data (Siegmund et al., 2000), since the within-family analysis provides protection against population stratification. In a simulation study, Siegmund et al. (2000) showed that, although standard CLR (using the Wald test) applied to families of multiple affected siblings is valid in the case of low genetic effect, it has an inflated type 1 error when the locus has a strong effect. They demonstrated that using the robust variance-covariance estimator (Rogers, 1993) in place of classical variance-covariance removes this effect of residual familial correlation. The new test derived by Siegmund et al. (2000) by applying robust variance-covariance will be referred to here as R-CLR (robust conditional logistic regression).

FBAT and EV-FBAT are dedicated to genetic association analysis while CLR and R-CLR are more general methods available in general software. They have important distinctive features such as the straightforward extension to haplotypes in FBAT which is unavailable for general programs (haplotypes have to be externally reconstructed to be analysed with CLR/R-CLR, see e. g. Nsengimana et al. 2007). On the other hand, additional risk factors and their interactions can be easily incorporated in the CLR/R-CLR approach, while they can only be indirectly considered in FBAT/EV-FBAT (by separate regression analysis and application of FBAT/EV-FBAT to residuals of this regression, see e. g. Lu & Cantor, 2007). In addition, CLR/R-CLR allow estimation of the size of genetic effect in terms of the odds ratio, which is not feasible in FBAT.

To our knowledge there is no published study comparing FBAT and EV-FBAT in a large range of family designs and disease models and showing the effectiveness and cost of empirical variance in removing the effect of residual familial correlation. Likewise, there is no published study comparing the power of FBAT/EV-FBAT and CLR/R-CLR approaches. Here we report on a simulation study comparing type 1 error rate, power, bias and robustness to model misspecification associated with these methods when testing for association in the presence of linkage using extended sibships. We consider a mixture of sibship sizes and structures (numbers of affected and unaffected siblings) in the absence of parental data. Because the true genetic model is generally unknown, it may be preferable to use a method that does not make any assumptions about the mode of inheritance. For this reason, we further compare the power of all these model-based tests (FBAT, EV-FBAT, CLR and R-CLR) under correct and incorrect models with the Sibship Disequilibrium Test (SDT, Horvath & Laird, 1998), a model-free sign test contrasting the number of sibships where a particular allele is more common in affected versus unaffected siblings. SDT has been shown to be less powerful than CLR and R-CLR under a correct genetic model (Siegmund et al. 2000) but we investigate whether SDT is still inferior when the model is misspecified.


Five methods are compared: FBAT, EV-FBAT, CLR, R-CLR and SDT. Firstly the type 1 error rate of all five tests is evaluated using a variety of family designs and disease models and the power is compared wherever the size of the test is correct. Secondly, for tests requiring a genetic model specification (FBAT, EV-FBAT, CLR and R-CLR) robustness to model misspecification is assessed by calculating power assuming a dominant and a recessive model with data simulated using an additive model. Also, since there is generally no guarantee that the polymorphism under investigation is the actual disease locus, situations are considered in power estimation where they coincide and are maximally associated (D’ = 1.0) and where they are at a certain (small) distance from each other and linkage disequilibrium (LD) between them is weaker (D’ = 0.5). As CLR and R-CLR allow estimation of the strength of genetic effect, the bias of this estimate is evaluated when the tested locus is the disease locus and when it is a nearby marker. Nicodemus et al. (2007) have recently compared TDT-derived methods in an attempt to draw some guidelines on how to choose the appropriate test according to the data or how to design the study in order to apply a particular test. Unlike these authors who compared the methods that use different types of data, here the experimental design is fixed a priori as discordant sibships with missing parents, with the common situation of late onset diseases in mind. A mixture of sibship sizes are considered with variable numbers of affected and unaffected siblings. The family structures simulated are based on those found in a cardiovascular disease candidate-gene study (Nsengimana et al. 2007).

Simulated Designs

A dichotomous disease outcome is considered, and for each design (Table 1) 10,000 replicates are simulated. For type 1 error evaluation, the marker and disease locus were linked but not associated, i.e. they were in linkage equilibrium. For most designs the recombination fraction θ was set to the most extreme value of zero, since the tighter the linkage the more inflation of type 1 error is expected. For the more extreme designs (12 and 13), where some inflation of type 1 error was seen (see Results), θ was varied between 0 and 0.5 to examine the effect of weaker linkage. For power estimation, two situations were considered: marker = disease locus and distance from marker to disease locus equals 50 kb (recombination rate θ= 0.0005, assuming 1 Mb ≈ 1 cM) with D’ = 0.5 (r2= 0.25). This level of LD at this distance was chosen because an average D’ of 0.50 has been observed at ~50 kb in 19 randomly selected regions across the human genome (Reich et al. 2001). We fixed the distance between the marker and the gene locus because we defined the LD level in the parental generation, whereas the analysis is done in the offspring generation. The LD decreases between the two generations but the low distance chosen means that the decrease is negligible. In all designs, the marker and disease locus were biallelic and had equal minor allele frequency ranging from 0.10 to 0.50. At the disease locus, the susceptibility allele was the one with lowest frequency. The additive genetic model (on the logistic scale) was simulated, and genotype penetrances were varied from 0.10 to 0.90, giving overall population prevalence of the disease between 17 and 50% with population-attributable fraction (PAF) of the locus ranging between 5 and 80% and genetic odds ratio (GOR) of 1.3 to 9 per copy of variant allele (Table 1). These parameters were chosen to be consistent with a common disease model with small to high GOR from the locus of interest, the highest values being set to assess the behaviour of the tests in extreme situations.

Table 1
Designs simulated to assess type 1 error and power

For power calculations, a total of 1,000 sibships were simulated with fixed proportions of various numbers of affected and unaffected siblings (Table 2) close to the proportions in our cardiovascular study. For type 1 error evaluation, larger sibships were considered (Table 3) to allow for a higher impact of familial correlation. The simulated data were analysed within the FBAT program for tests FBAT, EV-FBAT and SDT, while STATA v.9 (StataCorp, 2005) was used for CLR and R-CLR (testing for association using the Wald test). The simulation program (written in C) is available upon request.

Table 2
Number and structure of simulated sibships for power comparison
Table 3
Number and structure of sibships simulated for type 1 error evaluation


Type 1 Error Rate

Because most genetic association studies involve testing multiple hypotheses, we report test size and power at the 0.001 level. In all the designs with GOR<2, all five tests have correct size as shown in Table 4. In all the designs with GOR>2, FBAT and CLR showed significantly inflated type 1 error, while SDT, EV-FBAT and R-CLR remained valid. For the two most extreme designs (designs 12 and 13), simulations were carried out with different distances between the marker and the susceptibility locus. In both designs, type 1 error inflation in FBAT and CLR was higher with tighter linkage (Table 4), remaining significant at a recombination fraction of 0.10 in design 12 (GOR = 3.5/PAF = 37%) and 0.20 in design 13 (GOR = 9/PAF = 80%).

Table 4
Type 1 error at level 0.001

Power Comparison when the Correct Model is Used

Power is compared between the 5 methods in the designs where they all showed correct test size (those with GOR<2, see Table 4). In order to assess any loss of power due to correcting for familial correlation for a stronger locus effect, the power of all tests is also compared in designs 9 and 10 where the tests using the standard variance-covariance matrix showed inflated type 1 error (designs with GOR = 2.67). As shown in Table 5, FBAT, EV-FBAT, CLR and R-CLR have comparable powers in all designs, while SDT is consistently less powerful. At the 0.001 significance level all tests have a poor power (<20%) in the design of weakest genetic effect (GOR = 1.33/PAF = 5%). Power rises however to over 56% (40% for SDT) in the same design with a more common disease allele (GOR = 1.33/PAF = 13%) when the marker locus coincides with the disease locus. At a GOR of 1.55 and allele frequency 30% (design 4), the power jumps to ~97% (91% for SDT) when the marker is the disease locus (Table 5).

Table 5
Power at level 0.001 under the correct model (%)

The power is lower when disease and marker loci do not coincide and the LD between them is 50% of the maximum. It remains lower than 35% for all designs with GOR<2 and variant frequency = 10% (designs 1, 3, 5 and 7), but it hits 99% (97% for SDT) in the strongest model analysed. In designs where GOR = 2.67, valid tests EV-FBAT and R-RCLR appear to remain as powerful as FBAT and CLR, indicating that there is little cost to power from using robust variance even for strong locus effects. However some drop of power would be expected when robust variance is used, and this is more apparent when more stringent significance levels are applied, moving the power away from 100%. As can be seen in Table 6, the power of standard and robust/empirical variance methods remain of the same magnitude in designs 4 to 8 but robust/empirical variance is associated with substantially lower power in designs 9 and 10. It also appears that at high GORs and low α values the R-CLR method is more powerful than EV-FBAT.

Table 6
Power at more stringent significance levels under the correct model (%)

Estimating the Strength of Genetic Effect

CLR and R-CLR offer the possibility of estimating the genetic effect as a GOR. Along with true simulated and estimated GORs, Table 7 shows the proportion of simulations where the 95% confidence interval (CI) contains the true value. Both methods give the same estimate of GOR but have different CIs, as the only difference between them is the variance they use (i.e. standard vs. robust variance). In all the designs considered, CLR and R-CLR show an unbiased estimate of GOR and good CI coverage as long as the analysed marker is the disease locus (Table 7). However, as expected, when LD is reduced the estimate is biased and the 95% CI coverage is clearly poor. The bias and drop in CI coverage increase with the strength of true genetic effect and the variant allele frequency. The true value is contained in the 95%CI in only 3 simulations in 10,000 when GOR = 2.67, variant frequency = 30% and D’ = 0.5 (design 10).

Table 7
Estimated genetic odds ratio and confidence interval coverage

Power Comparison Under an Incorrect Genetic Model

The data simulated using an additive model were also analysed with four tests (FBAT, EV-FBAT, CLR, R-CLR) assuming dominant and recessive models. Under the dominant model, all four methods remain equally powerful (Table 8). Power is lower than when the correct model was used but remains higher or comparable to that of SDT (Tables 5 and and8).8). The largest drop in power is observed in designs with a common disease variant. For example, when the marker is the disease locus the power of the four tests drops from ~56% to 37% in design 2, and where the marker is in LD with the disease locus (D’ = 0.5), the power drops from ~74% to 54% in design 8. When the analysis model is recessive, the power decreases more sharply for all four tests and it falls below the power of SDT in all the designs. In contrast to dominant models, the most dramatic fall is observed in the designs with smallest disease allele frequency (Table 8) as one might expect. In design 7 (GOR = 1.89), the highest power in the four tests hardly reaches 14% where the power of SDT hits 94%.

Table 8
Power at level 0.001 using an incorrect model


Family-based association methods were originally designed to test for the presence of linkage or association. While testing this composite null hypothesis can be interesting in some instances, many candidate gene fine-mapping studies aim to search for association in a region where linkage has already been found or where it can be reasonably assumed. Therefore, the methods that can test association in families while accounting for linkage are valuable tools. In this study, we compare the performances of five methods of family-based association analysis: FBAT, EV-FBAT, CLR, R-CLR and SDT. Our study shows that in the case of low genetic risk all five tests have a correct type 1 error rate (Table 4). However, for genes with a GOR higher than 2, FBAT and CLR have an inflated type 1 error while their counterparts using robust variance estimation remain valid. For CLR and R-CLR, our observations confirm those made by Siegmund et al. (2000), although the two studies bear some differences. In their study, Siegmund et al. (2000) used 200 and 500 sibships of size 3 and 4, assessed test validity at GOR = 2 and 20 (log additive model) and used a nominal p-value of 0.05. They concluded that ‘the genetic effect needs to be quite extreme before residual familial correlation due to linkage led to false positive inference’ using CLR. In our study, we found that CLR is not valid (level 0.001) for a GOR of 2.67 which is far less extreme than 20. Our study demonstrates that familial correlation is not only influenced by the strength of the genetic effect, but also, maybe more critically, by the number of affected siblings within a family. Robust variance should therefore be recommended in the analysis of large sibships even when a modest genetic effect is suspected.

Our study of power confirms the results of Siegmund et al. (2000) and Hancock et al. (2007) that robust variance resolves the problem of familial correlation at little or no cost to power in CLR for most practical situations, and we make a similar observation with FBAT. However, when GOR is high, robust variance induced a substantial power loss to both FBAT and CLR at stringent significance levels (moving power away from 100%). Also, while EV-FBAT and R-CLR showed comparable power at level 0.001 in all simulated designs, R-CLR appears as the most powerful test at very low alpha values. Combined with the flexibility of this method (for example the ease of adding covariates, testing for interactions and estimating the strength of genetic effects), this result suggests that R-CLR is a better choice than EV-FBAT for the analysis of extended sibship data.

All the tests compared under incorrect genetic model specification have shown similar results: with the additive model simulated, a small drop in power when the dominant model is assumed and a sharper power decrease with the recessive model (Table 8). Unsurprisingly, the rarer the disease allele, the more pronounced the loss of power under the recessive model. In effect, only the homozygous affecteds contribute to the test, and their number is very small even for relatively common alleles. Although SDT was the least powerful method when the model is correctly specified in the other tests (Table 5), it is actually more powerful than any other test in all simulated designs when the analysis assumes (incorrectly) a recessive model. SDT may therefore be preferable to other tests in some instances or, alternatively, it can be used in addition to EV-FBAT or R-CLR. In such a case, a highly significant association found with SDT undetected by other approaches would suggest wrong model specification.

CLR and R-CLR provide precise estimates of the GOR when the marker is the disease locus. However, as expected, when the marker and the disease loci are distinct and are not maximally associated, the estimate of GOR is always biased and the 95% CI coverage can be very poor. The stronger the true genetic effect and the commoner the disease allele frequency, the higher the bias. Cordell (2004) reported a similar observation using a CLR method based on conditioning on parental genotypes (using parents and their affected offspring). The bias increased with the number of affected siblings and with the size of true genetic effect. More recently, Hancock et al. (2007) reported the same bias using R-CLR and generalised estimating equations. They showed that generalised estimating equations are even more biased than R-CLR, particularly for gene x gene interaction effects. In our study, the bias was influenced by LD, the true strength of genetic effect and the allele frequency. Missing parental genotypes do not appear to be a source of bias, since bias was also found when parents were used (Cordell, 2004; Hancock et al. 2007).

Our study has some limitations. First, this is not a comprehensive evaluation of all existing family-based association tests. There are now many methods, most of which were optimised for particular data configurations and are not necessarily perfectly comparable. We chose to focus on one general statistical approach (CLR) and one specialised genetic method (FBAT) both of which are very widely used for sibship data. Second, a comprehensive comparison would necessitate a larger spectrum of family structures, sample sizes and disease models (genetic effect, allele frequency, linkage disequilibrium, etc.), not all of which would be feasible to consider in one study. We chose instead to concentrate on one large mixture of family structures as observed in an actual study of affected and unaffected siblings with cardiovascular disease (Nsengimana et al. 2007). Although this design only represents one type of study, a mixture of family structures (sibship sizes + numbers of affected/unaffected) is common and is therefore in this context more realistic than the range of homogeneous designs often considered (e. g. Siegmund et al. 2000, Martin et al. 2000).

In conclusion, our simulations show that the genetic effect does not need to be extreme to invalidate tests that use standard variance estimators (CLR and FBAT). For more complex pedigrees EV-FBAT has clear advantages, but, based on power, validity and robustness to model misspecification, R-CLR provides a powerful suitable alternative for the analysis of extended sibships.


This research was supported by Cancer Research UK (grant number C588/A4994).


  • Abecasis GR, Cardon LR, Cookson WO. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000;66:279–292. [PMC free article] [PubMed]
  • Allison DB. Transmission-disequilibrium tests for quantitative traits. Am J Hum Genet. 1997;60:676–690. [PMC free article] [PubMed]
  • Clayton D. A generalization of the transmission/disequilibrium test for uncertain haplotype transmission. Am J Hum Genet. 1999;65:1170–1177. [PMC free article] [PubMed]
  • Cordell HJ. Properties of case/pseudocotrol analysis for genetic association studies: effects of recombination, ascertainment and multiple affected offspring. Genet Epidemiol. 2004;26:86–205. [PubMed]
  • Curtis D. Use of siblings as controls in case-control association studies. Ann Hum Genet. 1997;61:319–333. [PubMed]
  • Hancock DB, Martin ER, Li YJ, Scott WK. Methods for interaction analysis using family-based case control data: conditional logistic regression versus generalized estimating equations. Genet Epidemiol. 2007;31:883–893. [PubMed]
  • Horvath S, Laird NM. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am J Hum Genet. 1998;63:1886–1897. [PMC free article] [PubMed]
  • Lake SL, Blacker D, Laird NM. Family-based tests of association in presence of linkage. Am J Hum Genet. 2000;67:1515–1525. [PMC free article] [PubMed]
  • Lu AT, Cantor RM. Weighted variance FBAT: a powerful method for including covariates in FBAT analyses. Genet Epidemiol. 2007;31:327–337. [PubMed]
  • Martin ER, Monks SA, Warren LL, Kaplan NL. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am J Hum Genet. 2000;67:146–154. [PMC free article] [PubMed]
  • Nicodemus KK, Luna A, Shugart YY. An evaluation of power and type 1 error of single-nucleotide polymorphism transmission/disequilibrium-based statistical methods under different family structures, missing parental data, and population stratification. Am J Hum Genet. 2007;80:178–185. [PMC free article] [PubMed]
  • Nsengimana J, Brown BD, Hall AS, Barrett JH. Family-based association analysis of inflammatory genes’ polymorphisms and haplotypes in premature coronary artery disease. Ann Hum Genet. 2007;71:555. abstract.
  • Rabinowitz D, Laird NM. A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000;50:211–223. [PubMed]
  • Reich DE, Gargill M, Bolk S, Ireland J, Sabeti PC, Richter DJ, Lavery T, Kouyoumjian R, Farhadan SF, Ward R, Lander ES. Linkage disequilibrium in the human genome. Nature. 2001;411:199–204. [PubMed]
  • Rogers WH. Regression standard errors in clustered samples. Stata technical bulletin. 1993;13:19–23. Reprinted in Stata technical bulletin reprints vol. 3, 88–94.
  • Siegmund KD, Langholz B, Kraft P, Thomas DC. Testing linkage disequilibrium in sibships. Am J Hum Genet. 2000;67:244–248. [PMC free article] [PubMed]
  • Spielman RS, McGinnis RE, Ewens WJ. Transmission test of linkage disequilibrium: the insulin gene region and insulin dependent diabetes mellitus (IDDM) Am J Hum Genet. 1993;52:506–516. [PMC free article] [PubMed]
  • Spielman RS, Ewens WJ. A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test. Am J Hum Genet. 1998;62:450–458. [PMC free article] [PubMed]
  • StataCorp. Stata statistical software: Release 9. College station TX USA: StataCorp LP; 2005.

Articles from Wiley-Blackwell Online Open are provided here courtesy of Wiley-Blackwell, John Wiley & Sons


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...