- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# A General Test of Association for Quantitative Traits in Nuclear Families

## Summary

High-resolution mapping is an important step in the identification of complex disease genes. In outbred populations, linkage disequilibrium is expected to operate over short distances and could provide a powerful fine-mapping tool. Here we build on recently developed methods for linkage-disequilibrium mapping of quantitative traits to construct a general approach that can accommodate nuclear families of any size, with or without parental information. Variance components are used to construct a test that utilizes information from all available offspring but that is not biased in the presence of linkage or familiality. A permutation test is described for situations in which maximum-likelihood estimates of the variance components are biased. Simulation studies are used to investigate power and error rates of this approach and to highlight situations in which violations of multivariate normality assumptions warrant the permutation test. The relationship between power and the level of linkage disequilibrium for this test suggests that the method is well suited to the analysis of dense maps. The relationship between power and family structure is investigated, and these results are applicable to study design in complex disease, especially for late-onset conditions for which parents are usually not available. When parental genotypes are available, power does not depend greatly on the number of offspring in each family. Power decreases when parental genotypes are not available, but the loss in power is negligible when four or more offspring per family are genotyped. Finally, it is shown that, when siblings are available, the total number of genotypes required in order to achieve comparable power is smaller if parents are not genotyped.

## Introduction

Increasingly large numbers of single-nucleotide polymorphisms are available in public and private databases (Collins et al. 1997). The emergence of high-throughput methods for their analysis holds promise for saturation mapping of human complex-disease loci (Risch and Merikangas 1996; Chakravarti 1998; Lander 1999). Whereas allele-sharing methods of linkage analysis can localize disease genes to broad chromosomal regions, in complex diseases their resolution is often poor. Accordingly, the effort spent in generating an increasingly fine map provides rapidly diminishing returns when these conventional methods of linkage analysis are used (Kruglyak 1997).

In outbred samples, allelic association due to linkage disequilibrium is expected to operate over very short distances. Appropriately designed tests of association that use family-based controls to account for population substructure can provide direct tests of linkage disequilibrium and efficient fine-mapping tools. These tests should have much greater power when a fine map is available, and their high resolution should be well suited to identification of candidate genes.

The most popular of these family-based tests of association is the transmission/disequilibrium test (TDT), which was introduced by Spielman et al. (1993) as a test of linkage in the presence of allelic association. When either a single affected child is tested in each family or, with appropriate adjustments (Martin et al. 1997), multiple children are tested, it is often used as a test of linkage disequilibrium (i.e., a test of the joint hypothesis of linkage and association). The TDT was designed for the analysis of dichotomous traits, and a number of refinements have been proposed to allow, for example, the use of siblings as controls (Curtis 1997) and increased power in the presence of imprinting or dominance (Weinberg et al. 1998).

It has been shown that tests of transmission disequilibrium require a larger number of families in order to achieve comparable power when siblings, rather than parents, are used to construct controls (Curtis 1997). However, it is not always practical to collect parents, and attempting to deduce parental genotypes is fraught with pitfalls (Curtis and Sham 1995). Ideally, TDT-like tests should use parental genotypes when available and sibling genotypes otherwise, to consider all available information in the most efficient manner possible.

For many complex diseases, quantitative phenotype scores contain more information than is provided by dichotomous traits. Quantitative traits can provide effective descriptions of conditions as varied as asthma, type II diabetes, learning difficulties, and osteoporosis. The use of quantitative traits is well established in linkage studies, and these traits should be equally effective in family-based tests of association. Allison (1997) and Rabinowitz (1997) introduced family-based linkage tests, for quantitative traits; like the TDT, these tests use parental genotypes to construct well-matched controls and that are tests of linkage disequilibrium in simplex families. Fulker et al. (1999) described an analogous test for sib-pair data that does not use parental genotypes.

Here we present a general linkage-disequilibrium test that is applicable to the analysis of quantitative traits in nuclear families of any size and that optionally uses parental genotypes. The method builds on the recent approach of Fulker et al. (1999), in that association effects are partitioned into between- and within-family components. The model also makes use of the powerful and flexible variance-components framework, to construct tests of linkage, linkage disequilibrium, and population admixture that use information from all available offspring. In addition to extending this model to accommodate nuclear families of any size, we derive the expectations of the model parameters and show that a test of the within-family component is indeed free of confounding population-substructure effects, regardless of the composition of nuclear families. We also show that admixture impacts the between-family–component estimate when samples from a number of population strata are combined. This general model encompasses the specific test and study design of Fulker et al. (1999), as well as that of Rabinowitz (1997). The properties of the model in terms of power and error rates are explored in a number of situations, including moderate sample sizes and violation of the multivariate normality assumption underlying variance-components methods. Power and optimal study designs in terms of parental information and family size are also examined.

## Maximum-Likelihood Tests of Association

Consider a candidate diallelic marker, *M,* with alleles
arbitrarily designated as “1” (with frequency *p*)
and “2” (with frequency *q*=1-*p*) and an
additive genetic value *a.* Note that our usage of the additive
genetic value refers to the observed marker, not the trait locus, and that
*a*≠0 only when the marker locus is either the
trait locus or in disequilibrium with it. Also, as long as the phase of the
association is the same in all subpopulations,
*a*=0 only when there is no linkage disequilibrium.
Given a set of *i*=1…*K* nuclear families, each
with *n*_{i} children so that the total
number of offspring is
*N*=Σ_{i}*n*_{i}, define the
marker phenotype *m*_{ij} and the
genotype score *g*_{ij} for the
*j*th offspring
(*j*=1…*n*_{i}) in the *i*th
family as *m*_{ij}=*number* of
“1” alleles at locus *M,* and
*g*_{ij}=*m*_{ij}-1. If both parental
genotypes are known, label their analogous genotype scores
*g*_{iM} and
*g*_{iF} for the male and female parent,
respectively.

Following the usual biometric model (Falconer 1989), we assume that the phenotype scores for the trait of interest are defined by a major-gene effect, familial effects (which include the effects of shared environment and half the additive polygenic variance), and a residual environmental component. The expected mean of the residual resemblance and unique environmental effects are assumed to be 0, so that

and,
for the offspring in each family, the
*n*_{i}×*n*_{i}
variance-covariance matrix,
Ω_{i}, has
elements

where
π_{ijk} denotes the proportion of alleles shared
identical by descent (IBD) between siblings *j* and
*k* in family *i,*
σ^{2}_{a} is the additive
genetic variance of the major gene,
σ^{2}_{s} is the residual
sibling resemblance, and
σ^{2}_{e} is the residual
environmental variance component. Note that these expectations do not
include any dominance variance, although the general method can easily
accommodate such effects (Fulker et al. 1999;
also discussed below).

Variance-components approaches allow simultaneous modeling of the means and variances, so that all the information in a set of related individuals can be used to construct a test of association. For a particular means model, such as

and
for estimates of all of the variances in
Ω_{i}, the likelihood of the data for the
complete set of parameters,
θ=[μ,β_{a},σ^{2}_{a},σ^{2}_{s},σ^{2}_{e}],
is

Evidence
for association can be evaluated by maximization of *L* with the
constraint β_{a}=0
(null-hypothesis likelihood, *L*_{0}) and
without constraints on the parameters (alternative-hypothesis
likelihood, *L*_{1}). Asymptotically, the quantity
2[*ln*(*L*_{1})-*ln*(*L*_{0})]
is distributed as χ^{2}, with df equal to the difference in
number of parameters estimated. Similar likelihood-ratio tests have
been proposed as tests of association between marker and phenotype (e.g.,
see George and Elston 1987), and, in the
absence of population admixture, they are valid tests of linkage
disequilibrium, because
*E*(β_{a})=*a*. For families
with exactly two siblings, this model is the same as the general model of
Fulker et al. (1999).

## Population Admixture

We allow for the most extreme form of population admixture, in which
each family is drawn from a different stratum. Define
μ_{i},
*p*_{i}*,* and
*q*_{i}*,* the phenotypic mean
and allele frequencies for the stratum from which family *i* was
drawn. Assume that within each subpopulation there is random mating and
random transmission of parental alleles to offspring
and that the total sample of *N* individuals is centered on mean
0; that is,
μ=Σ_{i}*n*_{i}μ_{i}=0.

In this situation, the expectation given in
equation (1) can be expressed as
*E*(*y*_{ij})=*E*(μ_{i}+*g*_{ij}*a*)=μ_{i}+(*p*_{i}-*q*_{i})*a*,
and the alternative hypothesis of no linkage disequilibrium is no longer
encompassed in a test of β_{a}≠0.
As shown in Appendix A, in this case,
*E*(β_{a})=Σ_{i}*n*_{i}(*p*_{i}-*q*_{i})μ_{i}/(*NV*_{g})+*a*,
where *V*_{g} is the variance of the
genotypic scores. The numerator of the first term in this expression
represents “spurious” association, at the population level,
that is independent of linkage.

## Orthogonal Decomposition of the Genotype Scores

Fulker et al. (1999) proposed that, for
sib-pair data, the genotype score could be decomposed into
orthogonal between-family (*b*) and within-family
(*w*) components, in which only the former is sensitive to
population structure and in which the latter is significant only in the
presence of linkage disequilibrium. The means model under this
specification
is

where
*b*_{i} and
*w*_{ij} are orthogonal between-
and within-family components of
*g*_{ij}*.* We extend
this model to accommodate any number of offspring, with or without parental
genotypes, as

so
that *b*_{i} is the expectation of each
*g*_{ij} conditional on family data and
*w*_{ij} is the deviation from
this expectation for offspring *j.* Note that, when
*b*_{i}=(Σ_{j}*g*_{ij})/*n*_{i},
all possible siblings are considered and parental data are ignored,
whereas, when
,
only parental data are used (for an example of how
*b*_{i} and
*w*_{ij} are scored, with and without
parental data, see table 1).
Positive values of *w*_{ij} indicate that
a child inherits more copies of allele “1” than would be
expected, whereas negative values refer to inheritance of an excess of
allele “2.” For sib-pair data for which parental
genotypes are not available, which is the situation described by Fulker et
al. (1999),
*w*_{i1}=-*w*_{i2} for any
*i,* so that different alleles are tested in each member of a
sib pair and the distribution of β_{w} is
unaffected by linkage in the absence of association.

Fulker et al. (1999) suggested that the
β_{w} regression coefficient, when estimated in
the context of a variance-components model and in the absence of
population admixture, provides a direct estimate of the additive genetic
value *a.* Allowing for population admixture and extending the
model to allow both for sibships of any size and for the inclusion of
parental genotype data, we derive the expectation of
β_{b} and β_{w} for
each of these two alternative definitions and show that all the
“spurious” association between genotype score and phenotype is
accounted for by β_{b} and that the
β_{w} regression coefficient remains a direct
estimate of the additive genetic value *a* (at the marker).

By use of the normal equations, equation
(5) can be expressed as
*y*=*Xb*+*e*,
so that
**Xy**, where **X** is the design matrix and
.
**X′X** and **Xy** are asymptotically the covariance
matrices between the independent and dependent variables, respectively. To
solve these equations, only the expectations for the
variances, *V*_{b} and
*V*_{w}, and covariances,
*C*_{b,y} and
*C*_{w,y}, are required, since
*b*_{i} and
*w*_{ij} are orthogonal by design. These
quantities are derived in Appendices B and
C.

## Parental Genotypes Available

As shown in Appendix B,

and

so that

Note
that all the population-substructure effects in the means,
μ_{i}, that are apparent in the general
specification of *E*(β_{a}) are
included in the expectation of β_{b}
exclusively. Whereas β_{b} provides a direct
estimate of *a* only if there is no “spurious”
association between *g* and *y* (e.g., when all
μ_{i} are 0 or, since
Σ_{i}*n*_{i}μ_{i}=0,
when *p* and *q* are constant),
β_{w} is independent of any
“spurious” effects and remains a valid estimate of
*a* even when there is admixture.

## Parental Genotypes Unobserved or Unused

As shown in Appendix C,

and

so
that, as
,
the contribution of each family to
*V*_{b},
*V*_{w},
*C*_{b,y}, and
*C*_{w,y} approximates that of the case
in which parents are available (given in the previous section). On solution
of the normal
equations,

so
that β_{w} remains a direct estimate of
*a,* independent of population admixture and the number of
offspring observed in each family. In contrast,
β_{b} is again a direct estimate of
*a* only when there is no “spurious” association
between genotype scores, *g,* and phenotype, *y.*
These expectations show that the admixture test of
β_{b}=β_{w} proposed by
Fulker et al. (1999) is valid even when
parental genotypes are included in analysis and when sibships of size >2
are evaluated.

## Simulations

A number of simulation studies were conducted to explore the
properties of this orthogonal decomposition of genotype scores as a test of
linkage disequilibrium. Data were simulated in nuclear families each having
one to eight offspring. Trait values were constructed as the sum of a
major-gene effect (with variance
σ^{2}_{a}) generated by an
additive quantitative-trait locus (QTL), *Q,* having two
equifrequent alleles, a residual sibling
correlation (σ^{2}_{s}),
and an environmental effect
(σ^{2}_{e}), each assigned
independently from a normal distribution with mean 0. Except where noted, a
diallelic marker locus *M,* with allele frequencies
*p*=*q*= 1/2,
was simulated with a recombination fraction (θ) of 0. For
convenience, the total trait variance
σ^{2}=σ^{2}_{a}+σ^{2}_{s}+σ^{2}_{e}
was fixed at 100 in all simulations.

Linkage disequilibrium between the trait and marker locus was
introduced in the parental chromosomes. When appropriate, disequilibrium
was modeled in the usual fashion as
*D*=*p*_{M1Q1}-*p*_{M1}*p*_{Q1} (*p*_{M1Q1}
is the frequency of the haplotype with alleles
*M*_{1} and *Q*_{1}, and
*p*_{M1} and
*p*_{Q1} are the frequencies of
alleles *M*_{1} and *Q*_{1};
Lewontin and Kojima 1960), so that
*D*_{max}=*min*(*p*_{M1},*p*_{Q1})-*p*_{M1}*p*_{Q1},
and the standardized disequilibrium coefficient is
*D*/*D*_{max}.

Where noted, population admixture was generated by the mixing of
families drawn from one of two populations (A and B) with different
phenotypic means (μ_{A} and μ_{B}) and marker
allele frequencies (*p*_{A}=.7 and
*p*_{B}=.3) in equal sampling
proportions. μ_{A} and μ_{B} were selected such
that admixture accounted for 20% of the total phenotypic variance in the
combined population; that is,

Except where noted, parental genotypes were used to estimate π by
use of information available from the single-marker locus (Haseman
and Elston 1972). By use of the
variance-components model (eq. [2]) and
the orthogonal model for the means (eq. [5]),
the likelihood was maximized under the null
(β_{w}=0) and alternative
hypotheses, to calculate *L*_{0} and
*L*_{1}, respectively (eq.
[4]). As suggested by Searle et al.
(1992), the variance-component
estimates were constrained to be positive. To examine the benefits of
parental information, each simulated data set was examined under the two
alternative definitions of *b*_{i} and
*w*_{ij} (see eq.
[6]). Only families having at least one heterozygous parent (when
parental genotypes were used) or at least two different types of offspring
(when parental genotypes were ignored) were included in the likelihood
calculations, since other families do not contribute to estimates of
β_{w}. For the purpose of our simulations,
power and error rates were defined as the proportion of simulations
exceeding nominal significance levels for the χ^{2}
distribution under the likelihood-ratio criteria.

It is well known that segregation of a major locus violates the
multivariate normality assumption and that maximum-likelihood
estimates of the variance components can be biased in small samples (Amos
et al. 1996). To characterize the effect that
that bias has on the present test, error rates of
β_{w} were examined for a range of sample sizes
(i.e., 120–1,920 offspring) and family structures (1–8
children), in the following situations: (1) no sibling correlation or
major-gene effect
(σ^{2}_{s}=σ^{2}_{a}=0)
in homogeneous and admixed populations, (2) a large
major-gene effect
(σ^{2}_{a}=30) or a large
residual sibling correlation
(σ^{2}_{s}=50), and (3)
both residual sibling correlation and major-gene effect
(σ^{2}_{a}=20,
σ^{2}_{s}=30). Five
thousand simulated data sets were examined in each test case. No
disequilibrium was modeled in any of these simulations.

The effects that family structure and linkage disequilibrium have on
power were examined in a sample of 480 total offspring when
σ^{2}_{a}=10 and
σ^{2}_{s}=30, by varying
*D* between 0 and *D*_{max} and varying,
from one to eight, the number of offspring in each family. In these
assessments, the total number of offspring sampled was fixed, so that the
number of families varied according to sibship size. Finally,
the sensitivity of the test to linkage disequilibrium was estimated for a
variety of sample sizes and family structures. Sensitivity was defined as
the most stringent significance level that could be obtained with 80% power
in simulated data sets in which the trait and marker loci were identical.
When power and test sensitivity were examined, 1,000 simulated data sets
were analyzed in each case.

## Permutation Test

For each family, the vector **w**_{i}
denotes the observed pattern of allelic transmission. In the absence of
linkage disequilibrium, the vectors **w**_{i}
and **−w**_{i} are equally likely, as
long as there is no segregation distortion. Construct a random permutation
of any set of *K* families by replacing each
**w**_{i} with either itself or
**−w**_{i}*,* with equal
probability, so that, for any given data set, there are
2^{K} different permutations of the data. The
distribution of the maximum likelihood of the data,
*L*_{1}, under the hypothesis of no linkage
disequilibrium is the distribution of the maximum likelihoods of these
2^{K} permuted data sets. When the distribution of
2[*ln*(*L*_{1})-*ln*(*L*_{0})]
is not well approximated by the χ^{2} distribution, the
distribution of *L*_{1} can be estimated by a sampling
of a large number of permutations and their respective likelihoods. These
empirical *P* values and the likelihood-ratio criterion
were compared in a small sample (120 total offspring) in which the
major-gene effect was introduced by simulation of a dominant QTL
with equally frequent alleles. In this situation, empirical significance
levels should be desirable because both dominance (which both induces
nonnormality into the trait distribution and makes the variances model
[eq. {2}] incomplete) and the small sample
size are expected to reduce the accuracy of asymptotic significance levels.
For dichotomous traits, the increased error rates of
likelihood-ratio tests in small samples, as well as their spurious
effect on power, have been described by Whittaker and Thompson
(1999). The rationale for permutation tests
in linkage studies has been discussed by Wan et al.
(1997).

## Type 1–Error Rates

Error rates in estimates of the β_{w}
parameter in various test cases are
presented in tables tables22 and
and3.3. Additional family structures
were examined, but the results were intermediate between those tabulated
and are not shown. A large major gene and sibling resemblance were selected
for description of error rates, to make any possible biases obvious. When
one child from each family was considered, error rates were not influenced
by population admixture or by the effects of the linked major locus or
additional sibling resemblance, so that only a summary error rate over all
test cases is reported.

The error rates for the rows labeled “Null” in tables tables22 and and33 should be considered as baseline error rates for the other test cases. Not surprisingly, error rates were closer to nominal significance levels in larger samples, where the likelihood-ratio criterion is more accurate. For very large samples, the asymptotic criteria seem to be appropriate regardless of model or sibship size (tables (tables22 and and3).3). In smaller samples, error rates are slightly high for the linkage test case and are slightly low for the admixture and sibling-resemblance test cases, an effect that is more pronounced for larger sibships. When only a small number of observations are considered, estimates of the variance components may be biased (see Hopper and Mathews 1982; Amos et al. 1996), so that these error rates are not a specific feature of the present model but may reflect violations of multivariate normality.

As the sample size increases, bias in the maximum-likelihood
estimates of variance components should be reduced, and the error rate of
asymptotic significance tests should approach its nominal level (Amos et
al. 1996). These small-sample biases
decrease for more-realistic values of
σ^{2}_{a} and
σ^{2}_{s}. It is
interesting to note that, although some bias remains when the variance
components are estimated by maximum likelihood, these biases are much
larger if only σ^{2}_{e} is
considered: for example, if the linkage information is ignored and a
traditional least-squares regression framework is adopted, modeling
only σ^{2}_{e} and
the association parameters, the error rates exceed 11% at the nominal .05
significance level, for all sample sizes examined in the eight-sib
linkage-test case in tables
tables22 and
and33.

When parental genotypes are not available for analysis
(table 3), it may appear
counterintuitive that, although the biases follow the general trends
described above, they appear to be greater in the two-sib case. The
reason for this increased bias is that
**w**_{i} is not **0** (and therefore
informative) in only a small proportion of sib pairs when parental
genotypes are not available for analysis, so that the effective sample size
is much less than that actually genotyped.

## Power Estimates

Results of the power calculations are presented in
table 4 and
figure 1. When parental genotypes
are available, power depends mostly on the amount of disequilibrium between
the trait and marker loci and is largely independent of the number of
offspring in each family (table
4). In contrast, when parental genotypes are not available,
power depends both on family size and on the level of disequilibrium:
larger sibships allow a greater proportion of segregating alleles to be
identified (and scored to be not 0 in the respective
*w*_{ij}), and, thus, power increases
with the number of siblings in each family. As might be expected, for any
family size, power is always greater when parental genotypes are available
for analysis, since all pairs of segregating alleles are evident in this
situation. However, for larger sibships (four or more offspring), it
appears that 5% power is lost when parental genotypes are
unavailable.

**...**

When
*D*/*D*_{max}>50%,
it is possible to achieve considerable power for loci accounting for 10% of
the phenotypic variance, and estimates of β_{w}
are unbiased and essentially exact when there is complete
disequilibrium (table 4). The
relationship between the apparent additive genetic value of the marker
locus, β_{w}, the disequilibrium
coefficient, *D,* and additive genetic effect at the QTL is very
close to
*a*_{marker}=*a*_{QTL}*D*/*p*_{QTL}*q*_{QTL}
(Falconer 1989; Fulker et al.
1999). This implies that the proportion of
variance explained by the association parameter is a function of the
squared disequilibrium coefficient *D*^{2}
and the additive genetic variance of the trait locus, so that it is not
surprising that power is very sensitive to *D.* When
*D*=*D*_{max} and the QTL and
marker-allele frequencies are the same, all of the genetic variance
attributable to the QTL will be encompassed in the
allelic-association parameter, and estimates of the residual
additive genetic variance
(σ^{2}_{a}) will equal
0.

Figure 1 shows the most
stringent significance level, α, that can be applied when 80% power
is achieved, as a function of both the total number of offspring sampled
and the additive genetic variance of the QTL, when the trait locus is
observed. As noted previously, when parental genotypes are available, power
does not depend greatly on the number of offspring in each family, so that
only the sib-pair case has been plotted. When parents are
unavailable, it is clear that achievable significance levels rapidly
approach that of the case in which parents are available, as the number of
children in each family increases. When these plots are used, an arbitrary
significance level may be selected for a given sample size and study
design. Alternatively, for a desired number of independent tests and
correspondingly adjusted significance level, an appropriate sample size and
study design may be selected. For example, for the hypothetical genome
screens proposed by Risch and Merikangas
(1996),
α=5×10^{-8} and
,
so that, as can be seen in
figure 1*A*, either
~350 sib pairs with parents (700 total offspring but 1,400 genotypes)
or ~500 sib pairs without parental information (1,000 total
offspring/genotypes) or ~260 sib trios without parents (800 total
offspring/genotypes) are needed for 80% power and a locus that accounts for
10% of phenotypic variance. It is noteworthy that the total number of
genotypes required is smaller when parental information is not used. In
practice, the marker locus and trait locus will often not be identical, so
these represent best-case estimates of power.

## Evaluation of the Permutation Test

To assess the performance of the permutation test, a series of
simulations was undertaken in which dominance variance was introduced to
skew the phenotypic distribution and to violate multivariate normality. In
an attempt to introduce further departures from asymptotic expectations, we
omitted the appropriate dominance-variance parameter from the
variance-components portion of the model. The results of these
simulations are presented in table
5. The performance of the empirical permutation test was
examined in 5,000 samples of 120 offspring (in families with 1 to 8
children) when a dominant major gene
(*H*^{2}=.3, *s*^{2}=.3) was
segregating (θ = 0). When *D*=0, the overall
error rate at the .05 significance level was .06 when asymptotic
expectations were used but was .05 when empirical significance levels were
calculated from 1,000 permutations of each data set. When
*D*=*D*_{max}, power at the .05
significance level was 56% for asymptotic significance levels but was 52%
when empirical significance levels were calculated from 1,000 permutations
of each data set. For larger samples in which the model is well specified,
such as those in table 4, we find
no significant advantages in this permutation test.

## Discussion

The orthogonal model that we propose is a generalization of the one
described by Fulker et al. (1999) for
sib-pair data. It allows for the optional inclusion of parental
data, which greatly increases power, and for the analysis of larger
sibships, in which identification of segregating
alleles is more efficient. For large sample sizes or when empirical
significance levels are calculated by the permutation method described, the
test is robust to a variety of biases, including linkage, background
familiality, and population stratification. Also,
when parental data are used, the test of the
within-family–association parameter
β_{w} is asymptotically a test of
*E*(*wy*)=0 and is equivalent to that
described by Rabinowitz (1997) without the
benefit of the variance-components framework. Other linear models,
such as that proposed by Allison (1997), could
be used with the variance-components approach, but the orthogonal
model is attractive because it both provides direct estimates of the
additive genetic value of the marker alleles and can be used when parental
genotypes are unavailable. It is important to emphasize that the present
approach treats linkage and association separately. Consequently, in
contrast to other methods, which provide tests of disequilibrium only in
minimal family configurations (Spielman et al.
1993; Allison
1997; Rabinowitz
1997; Allison et al.
1999), the orthogonal method does not detect
linkage in the absence of disequilibrium in nuclear families of any
configuration.

Although, in our simulations, power depended mostly on the major-gene–effect size and on the total number of offspring available for analysis, in practice it is undesirable to rely on a small number of large families, because they might represent very few alleles. However, moderate-size families (i.e., three to four sibs) might be more attractive than sib pairs, because they provide much greater power when parents are not available and require less genotyping effort when parents are available. The observation that, for multiplex families, the number of individuals that need to be genotyped in order to achieve comparable power is smaller when parental genotypes are not used is important in situations in which genotyping capacity is limited.

Obviously, power is very sensitive to disequilibrium, so that this test and other, related approaches are well suited for the analysis of dense maps. In practice, it may not be practical to use these dense maps for genome screens, but they can be used to follow up suggestive linkages that have been identified by allele-sharing methods on a more sparse map. When a dense marker map becomes available, it can be used to produce multipoint estimates of IBD. In the variance-components side of the model, better IBD estimates allow the fitted variance-covariance matrix to better approximate the true variances and covariances, improving the performance of the model, in terms of both power and error rates.

The model can be easily extended to allow for multiallelic markers
with up to *X* alleles, by inclusion of a separate
between- and within-family component for alleles 1 through
*X*-1, and, in this situation, no changes to the
variance model should be required. In other situations, it might be
appropriate to define either dominance genotype scores or, when imprinting
is suspected, separate paternal and maternal genotype scores. These
alternative genotype scores can be decomposed into orthogonal components by
taking either the sibling average or its asymptotic expectation derived
from the parental genotypes. However, these modifications require either
changes to the variance model or calculation of empirical *P*
values, by analogous permutation tests.

Hopper and Mathews (1982) have described a number of methods for verifying that multivariate normality assumptions are not grossly violated. The permutation test that we describe should allow this orthogonal model to be applied in situations in which multivariate normality is violated—for example, when the sample size is small or the trait distribution has been skewed by selection (e.g., see Allison 1997) or when the model for variances may be inappropriate. However, asymptotic significance levels are still appropriate in most situations examined here and can be a useful tool in prescreening, to conserve computing resources.

## Acknowledgments

We wish to thank Dr. Dan Weeks for helpful comments on an early version of the manuscript. This work has been funded by the Wellcome Trust. G.R.A. is supported by a Wellcome Trust Prize Studentship. W.O.C.C. is a Wellcome Trust Senior Clinical Research Fellow. QTDT, a computer program implementing the orthogonal test of association described herein, is available at the Wellcome Trust Centre for Human Genetics Website.

## Appendix A: *E*(β_{a}) with Allowance
for Population Admixture

Consider population admixture by defining
μ_{i} and
*p*_{i} and
*q*_{i} as the phenotypic mean and the
marker-allele frequencies for the population from which family
*i* was drawn (allowing for up to *K* different
subpopulations). Assume that within each subpopulation there is random
mating and random transmission of parental alleles to offspring and that
the total sample of *N* individuals is centered on mean 0, so
that μ=Σ*n*_{i}μ_{i}=0.
In this
situation,

and

so
that, for model (3), when the
standard expectations
*V*_{x}=*E*(*x*^{2})-*E*(*x*)^{2}
and
*C*_{x,y}=*E*(*xy*)-*E*(*x*)*E*(*y*)
are used, for any *x* and
*y,*

where

These expectations extend those of Cardon (in press).

## Appendix B: *E*(β_{b}) and
*E*(β_{w}) with Use of Parental
Genotypes

For the orthogonal model in equation
(5), the expectations for
*V*_{b},
*V*_{w},
*C*_{b,y}, and
*C*_{w,y} are required for solution of
the normal equations and to obtain expectations for the regression
parameters β_{w} and
β_{b}.

Let Σ_{z} denote the sum over
all possible mating types *z* and note that *E(y),*
*E(g),*
*E*(*g*^{2})*,* and
*E(gy)* are as given in Appendix A.
Then, when the mating type frequencies given by Haseman and Elston
(1972) are
used,

and

These are all the quantities required in order to determine

and

So, on inversion and multiplication,

## Appendix C: *E*(β_{b}) and
*E*(β_{w}) with Use of Sibling
Genotypes Only

When one parent is heterozygous, consider allelic transmission
*t*~ binomial
(*n*_{i}, 1/2),
so that
*E*(*t*^{2})=( 1/4)(*n*^{2}_{i}+*n*_{i}).
When the heterozygous and homozygous parents transmit the same allele
*t* times,
|*b*|=*t*/*n*_{i}. Thus, for
any family
*i,*

where
*C*_{1} denotes the condition that exactly one parent
is heterozygous.

To calculate *E(by),* it is convenient to separate
*y* into its orthogonal-mean, μ, and
major-gene, *ga,* components. Recall that
*b*_{i}=(1/*n*_{i})Σ_{j}*g*_{ij},
so that, when one parent is heterozygous, the major-gene component
is

When
two parents are heterozygous, consider allelic transmission
*t*^{′}~ binomial
(2*n*_{i}, 1/2),
so that
*E*(*t*^{′})=*n*_{i}
and
*E*(*t*^{′2})=( 1/4)(4*n*^{2}_{i}+2*n*_{i}).
When allele 1 (or 2) is transmitted
*t*^{′} times,
|*b*|=|*t*^{′}/*n*_{i}-1|.
So, for any family
*i,*

and
the major-gene component of *E(by)*
is

where
*C*_{2} denotes the condition that both parents are
heterozygous.

When *C*_{1} and *C*_{2}
are considered, together with *E(y),* *E(g),*
*E*(*g*^{2})*,* and
*E(gy)* given in Appendix A, and the
mating-type
frequencies

and

and,
when the component derivations of *E(by)* are
used,

and

Thus,

and

so that

## Electronic-Database Information

The URL for data in this article is as follows:

## References

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (238K)

- A method for identifying genes related to a quantitative trait, incorporating multiple siblings and missing parents.[Genet Epidemiol. 2005]
*Kistner EO, Weinberg CR.**Genet Epidemiol. 2005 Sep; 29(2):155-65.* - Comparison of linkage-disequilibrium methods for localization of genes influencing quantitative traits in humans.[Am J Hum Genet. 1999]
*Page GP, Amos CI.**Am J Hum Genet. 1999 Apr; 64(4):1194-205.* - Pedigree tests of transmission disequilibrium.[Eur J Hum Genet. 2000]
*Abecasis GR, Cookson WO, Cardon LR.**Eur J Hum Genet. 2000 Jul; 8(7):545-51.* - Genetic linkage methods for quantitative traits.[Stat Methods Med Res. 2001]
*Amos CI, de Andrade M.**Stat Methods Med Res. 2001 Feb; 10(1):3-25.* - Joint tests of linkage and association for quantitative traits.[Theor Popul Biol. 2001]
*Allison DB, Neale MC.**Theor Popul Biol. 2001 Nov; 60(3):239-51.*

- Contribution of genetic variation to transgenerational inheritance of DNA methylation[Genome Biology. 2014]
*McRae AF, Powell JE, Henders AK, Bowdler L, Hemani G, Shah S, Painter JN, Martin NG, Visscher PM, Montgomery GW.**Genome Biology. 2014; 15(5)R73* - A genome scan for Plasmodium falciparum malaria identifies quantitative trait loci on chromosomes 5q31, 6p21.3, 17p12, and 19p13[Malaria Journal. ]
*Brisebarre A, Kumulungui B, Sawadogo S, Atkinson A, Garnier S, Fumoux F, Rihet P.**Malaria Journal. 13198* - Role of Nicotine Dependence in the Association between the Dopamine Receptor Gene DRD3 and Major Depressive Disorder[PLoS ONE. ]
*Korhonen T, Loukola A, Wedenoja J, Nyman E, Latvala A, Broms U, Häppölä A, Paunio T, Schrage AJ, Vink JM, Mbarek H, Boomsma DI, Penninx BW, Pergadia ML, Madden PA, Kaprio J.**PLoS ONE. 9(6)e98199* - Distinct Loci in the CHRNA5/CHRNA3/CHRNB4 Gene Cluster Are Associated With Onset of Regular Smoking[Genetic epidemiology. 2013]
*Stephens SH, Hartz SM, Hoft NR, Saccone NL, Corley RC, Hewitt JK, Hopfer CJ, Breslau N, Coon H, Chen X, Ducci F, Dueker N, Franceschini N, Frank J, Han Y, Hansel NN, Jiang C, Korhonen T, Lind PA, Liu J, Lyytikäinen LP, Michel M, Shaffer JR, Short SE, Sun J, Teumer A, Thompson JR, Vogelzangs N, Vink JM, Wenzlaff A, Wheeler W, Yang BZ, Aggen SH, Balmforth AJ, Baumeister SE, Beaty TH, Benjamin DJ, Bergen AW, Broms U, Cesarini D, Chatterjee N, Chen J, Cheng YC, Cichon S, Couper D, Cucca F, Dick D, Foroud T, Furberg H, Giegling I, Gillespie NA, Gu F, Hall AS, Hällfors J, Han S, Hartmann AM, Heikkilä K, Hickie IB, Hottenga JJ, Jousilahti P, Kaakinen M, Kähönen M, Koellinger PD, Kittner S, Konte B, Landi MT, Laatikainen T, Leppert M, Levy SM, Mathias RA, McNeil DW, Medland SE, Montgomery GW, Murray T, Nauck M, North KE, Paré PD, Pergadia M, Ruczinski I, Salomaa V, Viikari J, Willemsen G, Barnes KC, Boerwinkle E, Boomsma DI, Caporaso N, Edenberg HJ, Francks C, Gelernter J, Grabe HJ, Hops H, Jarvelin MR, Johannesson M, Kendler KS, Lehtimäki T, Magnusson PK, Marazita ML, Marchini J, Mitchell BD, Nöthen MM, Penninx BW, Raitakari O, Rietschel M, Rujescu D, Samani NJ, Schwartz AG, Shete S, Spitz M, Swan GE, Völzke H, Veijola J, Wei Q, Amos C, Cannon DS, Grucza R, Hatsukami D, Heath A, Johnson EO, Kaprio J, Madden P, Martin NG, Stevens VL, Weiss RB, Kraft P, Bierut LJ, Ehringer MA.**Genetic epidemiology. 2013 Dec; 37(8)846-859* - Genomics and Genetics in the Biology of Adaptation to Exercise[Comprehensive Physiology. 2011]
*Bouchard C, Rankinen T, Timmons JA.**Comprehensive Physiology. 2011 Jul; 1(3)1603-1648*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- MedGenMedGenRelated information in MedGen
- PubMedPubMedPubMed citations for these articles

- A General Test of Association for Quantitative Traits in Nuclear
FamiliesA General Test of Association for Quantitative Traits in Nuclear FamiliesAmerican Journal of Human Genetics. Jan 2000; 66(1)279PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...