- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Multilocus Linkage Tests Based on Affected Relative Pairs

## Abstract

For complex diseases, recent interest has focused on methods that take into account joint effects at interacting loci. Conditioning on effects of disease loci at known locations can lead to increased power to detect effects at other loci. Moreover, use of joint models allows investigation of the etiologic mechanisms that may be involved in the disease. Here we present a method for simultaneous analysis of the joint genetic effects at several loci that uses affected relative pairs. The method is a generalization of the two-locus LOD-score analysis for affected sib pairs proposed by Cordell et al. We derive expressions for the relative risk, λ_{R}, to a relative of an affected individual, in terms of the additive and epistatic components of variance at an arbitrary number of disease loci, and we show how these can be used to fit a likelihood model to the identity-by-descent sharing among pairs of affected relatives in extended pedigrees. We implement the method by use of a stepwise strategy in which, given evidence of linkage to disease at *m*-1 locations on the genome, we calculate the conditional likelihood curve across the genome for an *m*th disease locus, using multipoint methods similar to those proposed by Kruglyak et al. We evaluate the properties of our method by use of simulated data and present an application to real data from families with insulin-dependent diabetes mellitus.

## Introduction

In recent years, several methods have been proposed for the simultaneous detection of multiple loci involved in complex diseases. These include model-based methods, in which a detailed model is specified for the disease mode of inheritance, and nonparametric—or model-free—methods, in which details such as allele frequencies and penetrance functions for the disease are not specified (Elston 1998). Model-based methods include the two-locus LOD-score method described by Lathrop and Ott (1990) and Schork et al. (1993). Model-free methods include the sib-pair methods of Dizier and Clerget-Darpoux (1986) and Knapp et al. (1994), the two-locus marker-association-segregation χ^{2} (MASC) method (Dizier et al. 1994), the two-locus maximum LOD score or maximum-likelihood statistic (MLS) (Cordell et al. 1995; Farrall 1997; Olson 1997) or score statistic (Dupuis et al. 1995), and the two-locus weighted pairwise correlation (WPC) method (Zinn-Justin and Abel 1998). Recently, Cox et al. (1998), Strauch et al. (1998), Xu et al. (1998), and Cordell et al. (1999) have further investigated the use of increasing the power to detect an effect by conditioning on an effect at a previously identified disease-gene location. Although all of these methods are ultimately aimed at detection of effects at multiple interacting loci, in practice they have normally been restricted to the analysis of no more than two loci, because of either theoretical or computational constraints. In addition, the methods proposed by Dizier and Clerget-Darpoux (1986), Knapp et al. (1994), and Cordell et al. (1995) are restricted to sib pairs or affected sib pairs (ASPs), which may be a convenient unit of sampling but which means that we discard information from other affected relatives when they are available. The methods of Lathrop and Ott (1990) and Zinn-Justin and Abel (1998), although not restricted to sib pairs, require specification of the true (two-locus) genetic model or class of models. In addition, the model-based approach of Lathrop and Ott (1990) and Schork et al. (1993) involves a large and sometimes infeasible computational burden.

We should distinguish between those methods whose primary focus is the *detection* of disease loci and those whose focus is to model the *interaction* of disease loci. These two goals are clearly interrelated, but which goal is the primary focus is not always clear from the methodology. In the present study, we take, as our primary aim, the detection of disease loci in the presence of epistatic interactions, while noting that the methods we describe may also, under certain circumstances, be used to test specific hypotheses concerning the interactions. This is in the spirit of approaches proposed by Elston (1995) and Tiwari and Elston (1997), for analysis of two-locus quantitative traits. Cox et al. (1998), Xu et al. (1998), and Cox et al. (1999) have recently proposed a method that involves weighting a family's contribution to the test statistic at one locus according to the magnitude of the family’s contribution to the test statistic at another locus. Although this is a potentially appealing way of conditioning on a known locus, the choice of weighting scheme may be problematic and may not necessarily reflect a feasible genetic model, yet some properties of the true genetic model (e.g., heterogeneity or epistasis) must be assumed, to generate the weights. The method is reminiscent of simply selecting one’s data according to either identity-by-descent (IBD) sharing or the genotypes possessed at a known locus. Although this has been a useful means for the detection of some effects—for example, at *IDDM4* conditioned on *IDDM1* in type 1 diabetes (Davies et al. 1994; Mein et al. 1998)—this procedure is rather arbitrary. It is not clear, for example, whether the data should be subdivided into families sharing 2 alleles IBD and families sharing 1 or 0 alleles IBD at the known locus or into families sharing 0 alleles IBD and families sharing 1 or 2 alleles IBD at this locus. If tests are performed in several subsamples of families, they may need to be corrected for multiple testing—for example, by use of a Bonferroni correction, which will reduce the significance of any result. Furthermore, even if a second locus does exist, the procedure may in fact result in an observed decrease—rather than an increase—in significance, because of the decrease in the effective sample size in each subsample.

In contrast, the method proposed in the present study is a generalization of the two-locus MLS method (Cordell et al. 1995), which is based on IBD allele sharing at several loci. Linkage tests that are based on allele sharing (Weeks and Lange 1988; Risch 1990*a**,* 1990*b**;* Whittemore and Halpern 1994) are a popular alternative to traditional model-based linkage analysis when mapping susceptibility genes for complex traits, since they require no explicit prior specification of the inheritance model. In recent years, Kruglyak and Lander (1995) and Kruglyak et al. (1996) have developed algorithms that extract the full multipoint inheritance information from pedigrees of moderate size, allowing IBD sharing among pairs or sets of relatives to be probabilistically inferred across the whole genome. These calculations have been incorporated into the computer programs GENEHUNTER and MAPMAKER/SIBS, for analysis of extended pedigrees and affected sib pairs, respectively. Although the inheritance pattern extracted by GENEHUNTER represents the fullest possible inheritance information available from a pedigree (under the assumption of no interference), there are some problems with the linkage test proposed by Kruglyak et al. (1996). First, the test has been shown to be conservative when the descent information is incomplete (Kruglyak et al. 1996; Kong and Cox 1997). Second, the general shape of the “nonparametric linkage” (NPL) curve obtained tends to decrease between markers, because information is more incomplete for a location midway between two markers than it is for a location close to a marker. This contrasts with the results from model-based (parametric) LOD-score analysis and with those from MAPMAKER/SIBS. In fact, the results from GENEHUNTER do not necessarily bear any direct relationship to the results from MAPMAKER/SIBS, even if only nuclear families are used, since the statistics used in GENEHUNTER are based on the scoring functions discussed by Whittemore and Halpern (1994), rather than on the likelihood-based statistics proposed by Risch (1990*a**,* 1990*b*) and used in MAPMAKER/SIBS. This is somewhat unsatisfactory, since we would prefer our results for extended pedigrees to be a generalization of those for sib pairs.

Kong and Cox (1997) have proposed an alternative one-parameter linkage test that addresses many of the problems with the tests proposed by Kruglyak et al. (1996). In particular, their method is more powerful than that of Kruglyak et al. (1996), and it produces NPL curves that conform in shape to traditional model-based LOD-score curves. However, the method proposed by Kong and Cox (1997), although likelihood based, is not directly equivalent to the ASP likelihood-ratio statistic of Risch (1990*a**,* 1990*b*), unless additional parameters are included.

Here, in contrast, we propose a generalization of the MLS statistic for ASPs that was proposed by Risch (1990*a**;* 1990*b*), using all pairs of affected relatives in a pedigree. It has been shown (Cordell et al. 1995; Dupuis et al. 1995) that, with an MLS or score approach, the power to detect effects at unlinked loci is increased by use of two-locus methods, provided that the true genetic model is not multiplicative, as defined by Risch (1990*a*). If the true genetic model is multiplicative, then the two-locus MLS for two unlinked loci is equal to the sum of the individual single-locus MLSs at the two loci, and no further significance is achieved by modeling the joint action of the loci. However, the initial significances at the individual loci will not be decreased. The MLS approach allows immediate generalization to models that involve an arbitrary number of disease loci via an extension of the methods described by Cordell et al. (1995) and Farrall (1997). All that is required is to be able to calculate the prior and posterior probabilities that each affected relative pair shares *i* alleles IBD (*i*=0,1,2) at particular locations on the genome. By use of the term “prior probabilities,” we mean probabilities that are based purely on relationship, whereas, by use of the term “posterior probabilities,” we mean probabilities that are conditional on both relationship and marker data.

## Methods

For the *j*th affected pair of relatives in a pedigree, define *w*_{ij} to be the probability of the observed marker data, given that the pair share *i* alleles IBD at a single marker locus. Risch (1990*c*) showed that, for ASPs, the likelihood may be written as , where *z*_{i} is the population parameter (to be estimated) that corresponds to the probability that an ASP shares *i* alleles IBD at the marker. Under the null hypothesis that the marker is unlinked to disease, the parameters (*z*_{0},*z*_{1},*z*_{2}) should take the values (.25,.5,.25) that correspond to the Mendelian probabilities of a random sib pair sharing 0, 1, or 2 alleles IBD. By defining *f*_{i} to be the prior probabilities (.25,.5,.25) and by defining to be the posterior probabilities, given the observed marker data of the *j*th pair sharing *i* alleles IBD, then, by use of Bayes theorem, we may write the likelihood for pair *j* as

Similarly, for an *m* locus-disease model, the likelihood for the *j*th ASP may be written as

where now *z*_{i1i2…im}, , and *f*_{i1i2…im} refer to the same sharing probabilities but at the *m* loci simultaneously—for example, *z*_{00…0} is the probability that an ASP shares 0 alleles IBD at each of the *m* loci. These expressions for the likelihood lead to the following expression for the log-likelihood–ratio test statistic (MLS) for testing of the null hypothesis that the *m* loci are all unlinked to disease:

where the are maximum-likelihood estimates of the relevant sharing probabilities.

This equation is, in fact, quite general, since, by defining the probabilities *f*_{i1i2…imj} and *z*_{i1i2…imj} as the relevant sharing probabilities for relative pair *j*, which may be of varying type (e.g., sibs, cousins, uncle-nephew etc.), we can use essentially the same expression,

as a test of the same null hypothesis, using data from a sample of affected relative pairs of varying types. If the sharing between pairs is independent, then this is a valid test statistic, since the likelihood for the whole data set is the product of the likelihoods for each pair. Meunier et al. (1997) and Greenwood and Bull (1999) have shown that, for a single locus and with both parents typed, this is approximately valid (giving a slight increase in type 1 error) when all possible (nonindependent) pairs from a nuclear family are used in an ASP study. However, for extended family data, it is not clear whether this result will still hold, since the correlation between pairs may be much greater. We therefore consider using (2) as a pseudolikelihood, and we estimate significance levels by using simulation rather than by relying on asymptotic results.

In the Appendix, we show that the *z*_{i1i2…imj}, estimated by in (2), may be written in terms of the prior probabilities *f*_{i1i2…imj} and the 3^{m}-1 underlying additive and dominance variances caused by the *m* disease-causing loci (divided by a constant). The MLS may therefore be written in terms of the variance components (divided by a constant) and the prior and posterior probabilities of the IBD sharing at the *m* loci. For all relatives, these probabilities can be calculated at increments across the genome, by use of the inheritance-vector distribution (Kruglyak et al. 1996). By maximizing the likelihood with respect to the variance component parameters, rather than with respect to the *z*_{i1i2…imj}, we may fit specific classes of genetic models to the data—for example, single-locus models, two-locus models, three-locus models, etc. Note that the “possible triangle” restrictions on the *z*s (Holmans 1993) will automatically be satisfied by restriction of the variance components to nonnegative quantities. When more than one disease locus is considered, we may set the second- or higher-order (epistatic) variance components to 0 (Cordell et al. 1995), which fits an additive model on the penetrance scale to the effects at the *m* unlinked loci. This model is a good approximation of a model of genetic heterogeneity (Risch 1990*a*), which biologically implies that the loci act via separate etiologic pathways to cause the disease. Alternatively, by expressing the epistatic components in terms of the first-order variance components, we can fit the multiplicative epistatic model defined by Risch (1990*a*). Note that, for unlinked loci, the MLS for a multiplicative model will equal the sum of the individual single-locus MLS values and, therefore, will give no increased power to detect an effect.

The additional evidence for linkage at a locus, conditional on linkage at *m*-1 previous loci, can be assessed by the difference in MLS between the best-fitting *m*-1 locus model and the model with an *m*th locus included (Cordell et al. 1995). We can calculate MLS curves across the genome, for a series of nested models, by looking first for evidence of a single disease locus, then by conditioning on a first locus at a given position and looking for evidence at a second disease locus, and then by conditioning on two loci at given locations and looking for evidence at a third disease locus, and so on. Alternatively, we could start by conditioning on effects at previously identified loci. Although this procedure could theoretically be continued for an arbitrary number of loci, the amount of data available and the increased degrees of freedom for the *m* locus models will limit the number of loci that can be modeled simultaneously; for data sets of the size currently available, it may not be useful for more than ~3 disease loci. However, by fitting a restricted model such as the previously described additive model, the degrees of freedom can be significantly reduced, and larger numbers of disease loci could be considered simultaneously.

## Results

The methods described above were applied to simulated and to real data. We generated IBD probabilities by use of the program GENIBD (S.A.G.E. 1998), which has the advantages of being significantly faster than the GENEHUNTER program of Kruglyak et al. (1996) and of allowing for a slightly larger maximum family size. We used a yet-to-be released version of GENIBD that also allows the calculation of joint IBD probabilities at linked loci.

### Application to Simulated Data

We simulated data for 25 families with the pedigree structure shown in figure 1. This structure is identical to that simulated by Kruglyak et al (1996). Marker data were simulated at 11 markers spaced at 10-cM intervals on each of four chromosomes, under the assumption that each marker had five equifrequent alleles. The disease was assumed to be caused by a three-locus model, with three disease loci—A, B, and C—lying in the centers of chromosomes 1, 2, and 3. The population prevalence of each disease allele was .05, and the penetrance of each three-locus genotype was 1 for individuals who possessed two disease alleles at any of the three disease loci and was 0 otherwise. Families were included in the study if they had at least three affected members in generation III, with at least one affected individual in each generation III sibship. Individuals in generation I were assumed to be unavailable for genotyping.

**...**

Significance levels for MLS values obtained in a data set were calculated by simulation and analysis of sets of marker data across a chromosome for an identical data set (in terms of structure of families and disease status of individuals). The alleles across a chromosome were dropped at random through the given families (allowing for intermarker recombination by use of the Haldane mapping function), and, for testing of a single-locus model, the distribution of the resulting MLS at any given location was calculated. For testing of an *m*th locus unlinked to a set of previous *m*-1 loci, a similar procedure can be followed, but the marker data in the regions of the *m*-1 loci must be fixed at their original observed values. This procedure will not be valid if the *m*th locus is linked to any of the previous *m*-1, although consideration of the distribution of the MLS in this case (Farrall 1997) suggests that this procedure will then give conservative significance levels. By examination of the distribution of the resulting multipoint MLSs at a single location, approximate data-specific pointwise (as opposed to genomewide) *P* values can be found. Note that the significance levels will be data-set specific because of the correlation between the affected relative pairs, and, hence, simulation will be needed to generate the *P* values, unless only independent pairs are used.

The MLS results for a single simulated replicate are shown in figure 2. Although analysis of a single replicate does not allow us to make a global inference about power, it is useful in terms of illustrating the kind of results we might expect to find in an analysis of real data. We see that locus A is easily identified, with an MLS of 9.8 (*P*<.001) in the correct location, but that locus B is less significant, with an MLS of 1.8 (*P*=.02) at the correct location or of 3.4 (*P*≈.002) at a distance of 16 cM away. Locus C is also less significant, with an MLS of 2.2 (*P*=.01) at the correct location. Figure 2 also shows, for chromosomes 2 and 3, the improvement in MLS when a two-locus analysis is performed, conditioning on the result for locus A on chromosome 1. We see that the additive and general two-locus analyses give a big improvement, compared with the single-locus analysis, in terms of locating the second disease locus both more accurately and with greater significance, giving an additive MLS of 4.7 (*P*<.001) or a general MLS of 6.3 (*P*<.001) close to locus B and an additive MLS of 4.2 (*P*<.001) or a general MLS of 4.3 (*P*≈.003) at locus C. We also used a three-locus model to analyze the data for chromosome 3, given the results at loci A and B, but no further improvement in significance was found, compared with the two-locus model.

To investigate the power of the multilocus strategy in a larger number of replicates, we simulated 100 replicates of the three-locus model described above. Since calculation of *P* values in the extreme tail of the distribution would be prohibitively time consuming, we simulated only 10 sets of families—rather than 25 families—per replicate, to generate more-modest significance levels. We found that the additive or general MLS for locus B, conditional on locus A, was more significant than the single-locus (or multiplicative) MLS for locus B alone, in 56% of replicates. Specifically, the power to detect locus B (with *P*=.05) was 57% for the additive MLS and 56% for the general MLS, compared with 51% for the single-locus or multiplicative MLS, illustrating a small increase in power when the multilocus method was used. Results were similar for locus C (as expected by symmetry).

We simulated two further three-locus models in which the effect at locus A was large, compared with the effects at loci B and C. This might be expected to more closely resemble the situation in type 1 diabetes (see analysis of real data below), in which there is a single major genetic component (the locus *IDDM1* in the *HLA * region on chromosome 6p21) but in which there is also a large number of smaller effects at other loci. The models were identical to that described previously, except that, in the first, or symmetric (in loci B and C), model, the penetrance of each three-locus genotype was 1 for individuals with two disease alleles at A and .25 for individuals with two disease alleles at locus B or C. In the second, or asymmetric, model the penetrance of each three-locus genotype was 1 for individuals with two disease alleles at locus A, .5 for individuals with two disease alleles at locus B, and .25 for individuals with two disease alleles at locus C. For the symmetric model, we found that the additive or general MLS for locus B, conditional on locus A, was more significant than the single-locus (or multiplicative) MLS for locus B alone, in 69% of replicates, with a power to detect an effect (with *P*=.05) that was 13% for the additive model and 8% for the general model, compared with 7% for the single-locus model. For the asymmetric model, we found that the additive or general MLS for locus B, conditional on locus A, was more significant than the single-locus (or multiplicative) MLS for locus B alone, in 70% of replicates, with a power to detect an effect (with *P*=.05) that was 31% for the additive model and 27% for the general model, compared with 19% for the single-locus model.

### Application to Real Data for Type 1 Diabetes

We also analyzed real data from a second genome screen (Mein et al. 1998) of 356 ASPs (with genotyped parents) affected with type 1 diabetes. Type 1 diabetes, or insulin-dependent diabetes mellitus (IDDM), is a complex trait with a number of genetic and environmental determinants. The major genetic component is the locus *IDDM1* in the *HLA * region on chromosome 6p21, but a large number of smaller effects at other loci have also been identified (Davies et al. 1994). Since *IDDM1* plays such an important role in the disease, it is of interest to examine the effects at other loci after the effect of *IDDM1* has been taken into account. This has been done for *IDDM2* and *IDDM4,* by use of two-locus MLS methods (Cordell et al 1995), and, more crudely, in other regions of the genome, by subdivision of the data according to *HLA* sharing status, genotype, or alleles present (Davies et al. 1994; Mein et al. 1998; Cucca et al. 1998*a*). Here we use our previously described stepwise strategy.

Table 1 shows all locations in the genome with an MLS > 1.4 (*P*=.01), as obtained by use of a single-locus analysis. Table 1 also includes locations that were not significant in the single-locus analysis but that were interesting in light of subsequent multilocus analyses. The most-significant MLS is at *IDDM1.* For the two-locus analysis, we therefore fix *IDDM1* as the first disease locus and consider the joint IBD sharing at *IDDM1* and at a second locus, which is placed at 1-cM intervals across the genome. MLS curves were fitted under multiplicative, additive, and general genetic models, for the action of the two loci; the MLS for the action of *IDDM1* was subtracted so that the curves represent the additional effect of locus 2 and, for multiplicative and general models, its epistatic interactions (Cordell et al. 1995). We call these MLS values “conditional MLS values,” since the statistics are conditional on the previously identified effect at *IDDM1.*

As expected, at positions unlinked to *IDDM1,* the multiplicative curves were identical to those curves obtained by use of a single-locus model (data not shown) for locus 2. The *P* values for these conditional multiplicative MLS values will therefore be identical to those given by Holmans (1993) for the single-locus “possible triangle” method. Since each family contains exactly two affected sibs, no correction for nonindependence of the pairs is required. The additive model has the same number of free parameters (two for each locus) as does the multiplicative model; therefore, the distribution of the test statistic for the additive model should be similar to that of the single-locus MLS, as has indeed been observed in simulations for a variety of specific additive models (Cordell et al. 1995; Farrall 1997). The distribution of the general two-locus MLS is somewhat different, since there are eight free parameters when the effect of both loci is tested and six free parameters when the effect of locus 2, given locus 1, is tested. Imposition of the “possible triangle” constraints means that the distribution cannot be calculated by use of standard asymptotic theory; we therefore used simulation to calculate the two-locus *P* values via importance sampling (Hammersley and Handscombe 1964; Kong et al. 1992), as described in Cordell et al. (1995).

Figure 3 and the Two Locus Conditional column of table 1 show the most important results. Parameter estimates for the significant two-locus models are given in table 2. Chromosome 6 is particularly interesting, since we see evidence for an additional susceptibility locus in the *D6S294*–*D5S286* region, ~20 cM from *IDDM1,* with an additive maximum MLS of 2.42 (*P*=.001), which is considerably more significant than the multiplicative MLS of 1.39 (*P*=.01) at this location. The position of the peak identified in this analysis lies between *IDDM1* and the putative peak for *IDDM15* (Delépine et al. 1997), ~25 cM from *IDDM15.* The two-locus analysis is particularly useful in this region, since, with use of a single-locus method, any effects tend to be masked by the highly significant effect at *IDDM1* (e.g., the single-locus MLS, not accounting for *IDDM1,* is 19.4 at this location). The parameter estimates in table 2 indicate that, although *IDDM1* makes a greater contribution to the overall genetic variance, there are nonnegligible effects caused by the second locus. On this chromosome, we also find some evidence for a third locus on the other arm, near *D6S271,* which may correspond to *IDDM8* (Luo et al. 1995); however, in this case, the two-locus conditional analysis does not offer any improvement in MLS, compared with single-locus analysis.

*IDDM1*) for IDDM data against chromosomal location in cM for chromosomes 6, 8, 10, 11, 15, 16, 18, and 21 and for the pseudo autosomal region. Dotted lines indicate additive MLS, solid lines indicate general MLS, and

**...**

Chromosome 10 has the most-significant MLS outside the *HLA* region, both in single- and two-locus analysis, but the two-locus conditional result does not give increased significance, compared with the single-locus analysis. This locus has been previously designated as *“IDDM10”* (Reed et al. 1997; Mein et al. 1998). For chromosome 11, at *IDDM2* we see a significant improvement in conditional MLS, an improvement from 2.77 for a single-locus or multiplicative model to 4.14 for a general model. No improvement is found when an additive model is used. These results are similar to those of Cordell et al. (1995), who found that epistatic components of variance were required to model the joint action of *IDDM1* and *IDDM2,* although our results here differ from theirs in that they suggest that epistatic terms which are more general than multiplicative terms are required. From table 2, we see that the data are well modeled by a large dominance effect at locus 1, together with a large additive × additive epistatic effect plus some smaller effects. By splitting according to *HLA* status, only a modest improvement in MLS is found among 0 or 1 sharers, indicating that our approach here may be more powerful for detection of these types of effects. Still considering chromosome 11, at *IDDM4* we see a significant increase in MLS, an increase from 0.54 for a single-locus model to 2.04 for an additive or general two-locus model. This finding is consistent with the results of Cordell et al. (1995), who showed that *IDDM1* and *IDDM4* act additively to cause type 1 diabetes (however, these results are not completely independent, because these two studies have 93 families in common). A similar result is seen by splitting the data and by analyzing only *HLA* 0 or 1 sharers, as might be expected from the estimates of the sharing parameters *z*_{ij} in table 2, which show deviation from expected proportions when *i*=0 or *i*=1 but not when *i*=2.

On chromosome 16, we find another significant effect that is increased when a general two-locus model, rather than a single-locus model, is fitted. No improvement in significance is observed when an additive model is fitted; indeed, a decrease in seen. Splitting according to *HLA* status gives only a modest improvement among 0 or 1 sharers, as might be expected from the estimates of the sharing parameters *z*_{ij}, in which little deviation in expected sharing is seen for *i*=0 or *i*=1. This again indicates the greater power of a multilocus strategy. For chromosome 8, we see a maximum conditional MLS of 1.62 for the general model, which is 0.9 units larger than the single-locus MLS. The additive model gives no improvement in MLS. These results are consistent with those of Mein et al. (1998), who obtained, on chromosome 8, a similar increase in significance among ASPs that share 2 *HLA* alleles IBD but who saw no increase among ASPs that share 0 or 1 *HLA* alleles IBD; see also Cucca et al. (1998*b*). On chromosome 18, we find a significant increase in MLS, an increase from 1.10 for a single-locus model to 1.95 for a two-locus model, whereas splitting by *HLA* status gives no increased significance. On chromosome 21, we find a locus of modest significance when a two-locus model is used, which is similar to the result seen among pairs sharing 2 *HLA* alleles IBD. Finally, in the pseudoautosomal region of the sex chromosomes, we find an increase in MLS, an increase from 1.23 for a single-locus model to 1.65 for an additive and 1.87 for a general two-locus model.

Since the most-significant result for a second locus is seen at *IDDM10,* we used our methods to screen for a third locus, conditioning on the sharing at *IDDM1* and *IDDM10.* The calculated MLS values correspond to the differences in MLS between the three-locus and the two-locus model for *IDDM1* and *IDDM10,* under three different genetic models: multiplicative, additive, and general (which includes all 27 components of variance). The *P* values were calculated by simulation of data under the null hypothesis that only *IDDM1* and *IDDM10* have an impact on the IBD sharing between affected sibs, by use of 10,000 simulated replicates. Results are shown in the Three Locus Conditional column of table 1. The results for chromosomes 3, 8, 15, and 21 are of particular interest, since the three-locus method gives more significant results than either the single-locus or two-locus methods, illustrating again the greater power of the multilocus approach. We also conducted four-locus analyses, assuming an additive model for the action of a hypothetical fourth disease locus, given the effects at *IDDM1,* *IDDM10,* and *IDDM2,* but we found no increase in significance, compared with single-, two-, or three-locus models.

## Discussion

A variety of model-free methods for testing linkage for complex diseases currently exists. The common feature of most of these methods is that they measure, at different locations on the genome, the observed IBD sharing between pairs or groups of relatives and that they compare this with the expected sharing under the null hypothesis of no linkage to disease. Within this broad class of methods, the likelihood-based statistic proposed in the present study has the advantage of providing a natural statistic for pairwise IBD sharing that may easily be extended to account for multilocus disease models. This approach has been shown, in the present study and elsewhere (Cordell et al. 1995, 1999), to give increased power for detection of any one of the disease loci in certain circumstances, which will depend heavily on the true underlying multilocus model. For the data on type 1 diabetes analyzed here, conditioning on the large effect at *IDDM1,* via a two-locus model, allowed the detection of several effects that had not been apparent from single-locus analyses. In table 2, examination of the estimates of the sharing parameters *z*_{ij} shows that, for some loci, the deviation from expected sharing occurred in such a way that subdividing the data according to *HLA* sharing status and performing a single-locus analysis would be expected to produce the same results as would the two-locus analysis. For other loci, this procedure would be unlikely to be useful. Although examination of the maximum-likelihood estimates of the sharing parameters can be informative, we must beware of overinterpretation of the estimates of the variance component parameters, since they merely provide a means of modeling the sharing parameters, under the assumption of an *m* locus model of disease. For a complex disease in which there are likely to be many loci involved, it is not clear to what extent the parameter estimates generated under the assumption of a two-locus—or even a three-locus—disease model will resemble their true population quantities.

A possible disadvantage of the statistic presented in (2) is that it considers only pairs of affected relatives, as opposed to considering IBD sharing among a larger set. Kruglyak et al. (1996) suggest that this may result in a loss in power, since, in their simulations with GENEHUNTER, the statistic NPL_{all} performed better than did the statistic NPL_{pairs} in most cases. However, Kong and Cox (1997) found that, for their modified NPL statistic, the choice of scoring function (with the use of affected pairs or all affected individuals) did not make much difference in their final results. Although allele sharing among large sets of affected relatives may be more informative than sharing among pairs only, the decrease in power for our statistic is likely to be outweighed by the increase in power gained by being able to simultaneously consider and model the action of several disease loci.

The likelihood-based statistic (2) implicitly assumes that all affected pairs are independent, which will not be the case for data from extended pedigrees or, indeed, from sibships of size >2. In our own simulations of large sibships, we found a negligible increase in type 1 errors when we assumed independence and used equal weights, which is consistent with the results of Meunier et al. (1997) and Greenwood and Bull (1999). For extended family data, however, the effect of the nonindependence may be much greater, depending on the family size and the relationships involved. It is therefore important, when using data from nonindependent pairs, that the significance levels be calculated by use of simulation, as previously described.

In addition to testing the hypothesis of linkage at a given location, conditional on linkages at previous locations, the multilocus models described in the present study may be used to test the fit of specific biological models (Cordell et al. 1995), by consideration of the difference in MLS between a restricted (e.g., additive or multiplicative) model and the general model. Again, the significance of the difference in MLS must be evaluated by simulation; an intuitive method for doing this would be to simulate from the best-fitting additive or multiplicative model, to see whether the increase in MLS for a general model is significantly large. This means that simulation of IBD sharing among pairs of affected relatives must be done using the maximum-likelihood IBD-sharing probabilities for the additive or multiplicative model as the “true” values. This will only be valid if the pairs of affected relatives are in fact independent. For the previously described IDDM data, we have exactly one ASP per family, and, therefore, the independence assumption is justified. We estimated *P* values by simulation, under the additional assumption that all markers were fully informative, which greatly simplifies the simulations (which then need only be performed at a single marker). The results of Holmans (1993) suggest that this should not make too much difference to the significance criteria obtained. Using this approach, we found that, for *IDDM1* and *IDDM2,* there is some evidence against the null hypothesis that these two loci act multiplicatively (difference in *MLS*=1.38, *P*=.04) and against the null hypothesis that the two loci act additively (difference in *MLS*=1.77, *P*=.01). For *IDDM1* and *IDDM4,* there is evidence against multiplicativity (difference in *MLS*=1.49, *P*=.008), but there is no evidence against additivity (difference in *MLS*=0). For *IDDM1* and chromosome 16, there is evidence against multiplicativity (difference in *MLS*=1.68, *P*=.02), and there is strong evidence against additivity (difference in *MLS*=4.16, *P*<.0001). For *IDDM1* and *IDDM10,* there is no evidence against multiplicativity (difference in *MLS*=0.36, *P*=.58), but there is some evidence against additivity (difference in *MLS*=0.87, *P*=.05).

Use of simulation to determine significance levels, rather than relying on some asymptotic distribution, may seem rather cumbersome. However, even methods for which a normal approximation is available may require simulation to evaluate significance levels, unless the number of families is large (Kong and Cox 1997). An alternative randomization method for calculation of significance thresholds, which is especially useful when parental marker data are not available, has been proposed by Zhao et al. (1999). Although it is important to set some criteria for significance, it is perhaps more important, particularly for complex diseases, simply to gain an idea of the general pattern of results. This can provide a starting point in terms of determining which regions of the genome are more or less promising for further investigation. On this note, it is interesting to observe that, in most cases, we find the general pattern for the single-locus MLS results to be similar to that of the NPL statistic of Kong and Cox (1997) but with the MLS approach having the advantage of being easily extended to the multilocus case.

Software for calculation of single-, two-, and three-locus MLS values, given the prior and posterior IBD-sharing probabilities, is available from the corresponding author of the present article. Software for calculation of the prior and posterior IBD-sharing probabilities is available in a number of statistical genetics packages, including GENIBD of S.A.G.E., which was used for the analyses presented here.

## Acknowledgment

This work was supported in part by U.S. Public Health Service grant GM28356 from the National Institute of General Medical Sciences. Some of the results of this paper were obtained by use of S.A.G.E., which is supported by U.S. Public Health Service Resource grant 1 P41 RR03655 from the National Center for Research Resources. We thank Charles Mein and John Todd’s group, for sharing their family data for type 1 diabetes.

## Appendix

Here we show that the *z*_{i1i2…imj} may be written in terms of the prior probabilities *f*_{i1i2…imj} and 3^{m}-1 underlying additive and dominance variances (divided by a constant) caused by the *m* disease-causing loci.

For a relative pair *j* of type *R,* the *z*_{i1i2…imj} may be written (Cordell at al. 1995) as

where *K* is the population prevalence of the disease, *K*_{R} is the risk to relatives of type *R* of an affected individual, λ_{R} is the risk ratio or relative risk *K*_{R}/*K*, and *K*^{*}_{i1i2…im} and λ^{*}_{i1i2…im} are the prevalence and risk ratio for a relative who shares (*i*_{1},*i*_{2},…,*i*_{m}) alleles IBD with an affected individual at loci 1,2…,*m*. Now λ_{R} may be written (James 1971)

where Cov denotes covariance, *X*_{i} is the phenotype of person *i* (*i*=1,2)—defined to be 0 or 1, according to whether the person is unaffected or affected—and person 2 is a type *R* relative of person 1.

If the disease is caused by a single disease locus, we may write (Kempthorne 1957; James 1971; Risch 1990*a*) *Cov*(*X*_{1},*X*_{2})=2*r*_{R}*V*_{A}+*u*_{R}*V*_{D}, where *V*_{A} and *V*_{D} are the additive and dominance variances caused by the disease locus, where *r*_{R} is the kinship coefficient or coefficient of coancestry (the probability that a random allele from individual 1 is IBD with a random allele from the same locus in individual 2) and where *u*_{R} is the coefficient of fraternity (Trustrum 1961) or the probability that two alleles are shared IBD, at a locus, by the individuals. We may express *r*_{R} and *u*_{R} in terms of the (prior) probabilities of the relative pair sharing 0, 1, and 2 alleles IBD as *r*_{R}=0.5*f*_{2}+0.25*f*_{1} and *u*_{R}=*f*_{2}. Recalling that λ^{*}_{0}, λ^{*}_{1}, and λ^{*}_{2} will be equivalent to the relative risks for an unrelated individual, an offspring, and a monozygotic twin of an affected individual, respectively, we can therefore express λ^{*}_{i1i2…im} and λ_{R} and, thus, equation (A1), in terms of the parameters *V*_{A}/(*K*^{2}) and *V*_{D}/(*K*^{2}).

If the disease is caused by effects at two disease loci, we follow a similar procedure to express λ^{*}_{i1i2…im} and λ_{R} in terms of *V*_{Ak} and *V*_{Dk} (the additive and dominance variances caused by locus *k*) and *V*_{A1A2}, *V*_{A1D2}, *V*_{A2D1}, and *V*_{D1D2}, the additive × additive, additive × dominance, dominance × additive, and dominance × dominance variances caused by loci 1 and 2, respectively (Kempthorne 1957). Cordell et al. (1995) give formulae for the λ^{*}_{i1i2} (*i*_{1},*i*_{2}=0,1,2) in terms of these eight variance components and *K*, the population prevalence. These formulae apply, regardless of whether the two disease loci are linked, because the λ^{*}_{i1i2} are conditional on sharing *i*_{1} alleles IBD at locus 1 and *i*_{2} at locus 2, so that any linkage information is irrelevant. All that remains is to derive expressions for λ_{R} in terms of these same components. These have previously been derived for siblings by Cordell et al. (1995), under the assumption of there being two unlinked disease loci, and they have been extended to the case in which the loci are linked by Farrall (1997). More generally, for any relationship, we may write

Here, *r*_{Rk} and *u*_{Rk} are, respectively, the kinship coefficient and coefficient of fraternity for locus *k.* The terms *r*_{R12}, ω_{R12}, ω_{R21}, and *u*_{R12} are more difficult to define but come from the generalization of the formula for unlinked loci (Kempthorne 1957; Cordell et al. 1995; Lynch and Walsh 1998). For unlinked loci, we can express the coefficients as the product of the coefficients for the single-locus terms—that is, *r*_{R12}=*r*_{R1}*r*_{R2}, ω_{R12}=*r*_{R1}*u*_{R2}, ω_{R21}=*r*_{R2}*u*_{R1}, and *u*_{R12}=*u*_{R1}*u*_{R2}. For arbitrary linkage between the loci, we must define *r*_{R12} as the simultaneous probability that a randomly chosen allele at locus 1 in individual 1 is IBD with a randomly chosen allele at locus 1 in individual 2 *and* that a randomly chosen allele at locus 2 in individual 1 is IBD with a randomly chosen allele at locus 2 in individual 2. Similarly, ω_{R12} is the probability that a randomly chosen allele at locus 1 in individual 1 is IBD with a randomly chosen allele at locus 1 in individual 2 *and* that 2 alleles are shared IBD by the individuals at locus 2, and so on for ω_{R21} and *u*_{R12}. The coefficients in equation (A3) may therefore be written as

By inserting the above expressions into equations (A3), (A2), and (A1), we may therefore parameterize *z*_{i1i2…imj} in terms of the eight variance components divided by *K*^{2}.

For an arbitrary number *m* of disease loci, we proceed in a similar fashion. Let *V*_{ASDT} be the covariance term for an effect that involves additive effects in a specific set *S* of *s* loci and dominance effects in a specific set *T* of *t* loci (i.e., set *S* is of size *s;* set *T* is of size *t*). We have the general formula (Kempthorne 1957; Cordell et al. 1995)

If the loci are all unlinked, the coefficients ω_{ST} can be written as *a**Sr*_{Ra}*b**Tu*_{Rb}=(*r*_{R})^{s}(*u*_{R})^{t}, where *r*_{R} and *u*_{R} are the kinship coefficient and coefficient of fraternity for the relative pair and where *r*_{Ra} and *u*_{Ra} are the locus-specific kinship coefficient and coefficient of fraternity at a locus *a*, defined, respectively, as the probability that a randomly chosen allele at locus *a* in individual 1 is IBD with a randomly chosen allele at locus *a* in individual 2 and the probability that the two alleles at locus *a* in individual 1 are IBD with the two alleles at locus *a* in individual 2. More generally, if any among the loci are linked, the coefficients ω_{ST} can be written as ω_{ST}=ω_{a1a2…as,b1b2…bt}, which is defined as the simultaneous probability that, at all of the loci *a*_{1},*a*_{2}…*a*_{s}, a randomly chosen allele in individual 1 is IBD with a randomly chosen allele in individual 2, and that, at all of loci *b*_{1},*b*_{2}…*b*_{t}, the individuals share two alleles IBD. As for two disease loci, these coefficients can be written in terms of the prior probabilities *f*_{i1i2…imj}, which will depend on the recombination fractions between any of the loci. It is helpful to illustrate this for the case of three disease loci. In that case, we have

where

and where *f*_{i(a)} is the (prior) probability of the pair sharing *i* alleles IBD at locus *a,* where *f*_{ij(ab)} is the (prior) probability of the pair sharing *i* alleles IBD at locus *a* and *j* alleles IBD at locus *b,* where *f*_{ijk(abc)} is the (prior) probability of the pair sharing *i* alleles IBD at locus *a*, *j* alleles IBD at locus *b* and *k* alleles IBD at locus *c,* and so on.

We have now expressed λ_{R} in terms of 3^{m}-1 variance components divided by *K*^{2}. We can use the reasoning of Cordell et al. (1995) and can apply it to equation (A4) to argue that λ^{*}_{i1i2…im} can be expressed in the same way as λ_{R} but with coefficients that correspond to the product of the locus-specific coefficients for the sharing at each locus. In the case of three disease loci, this gives the following expression for λ^{*}_{i1i2i3}:

where *a*_{k} takes the value 0 if *i*_{k}=0, 0.5 if *i*_{k}=1, and 1 if *i*_{k}=2 and where *d*_{k} takes the value 0 if *i*_{k}=0, 0 if *i*_{k}=1, and 1 if *i*_{k}=2.

## References

*IDDM2*and

*IDDM4*with

*IDDM1*in type 1 diabetes. Am J Hum Genet 57:920–934 [PMC free article] [PubMed]

*NIDDM1*) and 15 interact to increase susceptibility to diabetes in Mexican Americans. Nat Genet 21:213–215 [PubMed]

*a*) Investigation of linkage of chromosome 8 to type 1 diabetes: multipoint analysis and exclusion mapping of human chromosome 8 in 593 affected sib-pair families from the U.K. and U.S. Diabetes 47:1525–1527 [PubMed]

*b*) A male-female bias in type 1 diabetes and linkage to chromosome Xp in MHC HLA-DR3-positive patients. Nat Genet 19:301–302 [PubMed]

^{2}method. Am J Hum Genet 55:1042–1049 [PMC free article] [PubMed]

*P*values in linkage analysis. Am J Hum Genet 51:1413–1429 [PMC free article] [PubMed]

*IDDM8*) on chromosome 6q25-q27. Am J Hum Genet 57:911–919 [PMC free article] [PubMed]

*IDDM10*) on human chromosome 10p11-q11. Hum Mol Genet 6:1011–1016 [PubMed]

*a*) Linkage strategies for genetically complex traits. I. Multilocus models. Am J Hum Genet 46:222–228 [PMC free article] [PubMed]

*b*) Linkage strategies for genetically complex traits. II. The power of affected relative pairs. Am J Hum Genet 46:229–241 [PMC free article] [PubMed]

*c*) Linkage strategies for genetically complex traits. III. The effect of marker polymorphism on analysis of affected relative pairs. Am J Hum Genet 46:242–253 [PMC free article] [PubMed]

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (277K)

- Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified.[Am J Hum Genet. 2000]
*Göring HH, Terwilliger JD.**Am J Hum Genet. 2000 Apr; 66(4):1310-27. Epub 2000 Mar 23.* - Multipoint linkage analysis of the pseudoautosomal regions, using affected sibling pairs.[Am J Hum Genet. 2000]
*Dupuis J, Van Eerdewegh P.**Am J Hum Genet. 2000 Aug; 67(2):462-75. Epub 2000 Jun 26.* - Robust multipoint simultaneous identical-by-descent mapping for two linked loci.[Hum Hered. 2007]
*Lin WY, Schaid DJ.**Hum Hered. 2007; 63(1):35-46. Epub 2007 Jan 11.* - Linkage and association to genetic markers.[Exp Clin Immunogenet. 1995]
*Elston RC.**Exp Clin Immunogenet. 1995; 12(3):129-40.* - Affected sibpair linkage tests for multiple linked susceptibility genes.[Genet Epidemiol. 1997]
*Farrall M.**Genet Epidemiol. 1997; 14(2):103-15.*

- Linkage Analysis in the Next-Generation Sequencing Era[Human Heredity. 2011]
*Bailey-Wilson JE, Wilson AF.**Human Heredity. 2011 Dec; 72(4)228-236* - Incorporation of covariates in simultaneous localization of two linked loci using affected relative pairs[BMC Genetics. ]
*Chiu YF, Chiou JM, Liang KY, Lee CY.**BMC Genetics. 1167* - Recent advances in the genetics of schizophrenia[Cellular and molecular life sciences : CMLS...]
*Waterworth DM, Bassett AS, Brzustowicz LM.**Cellular and molecular life sciences : CMLS. 2002 Feb; 59(2)331-348* - Detecting gene-gene interactions that underlie human diseases[Nature reviews. Genetics. 2009]
*Cordell HJ.**Nature reviews. Genetics. 2009 Jun; 10(6)392-404* - Conditional Tests for Localizing Trait Genes[Human Heredity. 2009]
*Di Y, Thompson EA.**Human Heredity. 2009 May; 68(2)139-150*

- Multilocus Linkage Tests Based on Affected Relative PairsMultilocus Linkage Tests Based on Affected Relative PairsAmerican Journal of Human Genetics. Apr 2000; 66(4)1273PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...