- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- Genetics
- v.168(3); 2004 Nov
- PMC1448809

# Mapping Quantitative Trait Loci With Censored Observations

^{1}

*Corresponding author:*Department of Biostatistics, University of North Carolina, 3101E McGavran-Greenberg Hall, CB 7420, Chapel Hill, NC 27599-7420. E-mail: ude.cnu.soib@nil

## Abstract

The existing statistical methods for mapping quantitative trait loci (QTL) assume that the phenotype follows a normal distribution and is fully observed. These assumptions may not be satisfied when the phenotype pertains to the survival time or failure time, which has a skewed distribution and is usually subject to censoring due to random loss of follow-up or limited duration of the experiment. In this article, we propose an interval-mapping approach for censored failure time phenotypes. We formulate the effects of QTL on the failure time through parametric proportional hazards models and develop efficient likelihood-based inference procedures. In addition, we show how to assess genome-wide statistical significance. The performance of the proposed methods is evaluated through extensive simulation studies. An application to a mouse cross is provided.

QUANTITATIVE trait analysis plays an important role in the understanding of genetic variations in plants and animals. Mapping quantitative trait loci (QTL) can lead to improvements in economic traits, such as yield and quality in crop plants and milk production in cows. QTL mapping in animals can also provide valuable insights into the genetic etiologies of complex human diseases (Hilbert *et al.* 1991; Jacob *et al.* 1991; Shepel *et al.* 1998).

Much of the modern statistical methodology for QTL mapping in experimental crosses originates from the seminal work of Lander and Botstein (1989). The Lander-Botstein interval-mapping method postulates that QTL occur at a series of positions within a set of adjacent marker intervals and that the trait value depends on the QTL genotype through a linear regression model. The distance between each pair of genetic markers is assumed known. The method steps through the genome in specified increments, say every 1 or 2 cM, and calculates the likelihood-ratio statistic for testing no QTL present at each position. The position with the largest value of the likelihood-ratio statistic is declared to be the candidate QTL location provided that the value exceeds a certain threshold level. It is widely recognized that the interval-mapping method has higher power and requires fewer progenies than the single-marker analysis (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1994). Doerge *et al.* (1997) described in greater detail this method and various extensions.

Most of the existing QTL-mapping methods require that the phenotype be normally distributed and fully observed. These assumptions are likely to be false when the phenotype pertains to the survival time or failure time. The Weibull distribution and other skewed distributions with long right tails are more appropriate than the normal distribution. Furthermore, the failure time is often subject to censoring so that the trait value is known only to be beyond the censoring time. An example of failure time in plant experiments is the flowering time, which may be censored due to limited duration of the experiment; see Ferreira *et al.* (1995). In animal studies, the failure times of interest include time to tumor and time to death (*i.e.*, survival time), which may be subject to censoring because of limited study duration or death due to unrelated causes. One particular example is a mice cross presented by Broman (2003), in which the trait of interest is time to death after a bacterial infection and in which 30% of the mice are still alive at the end of the study period. Symons *et al.* (2002) presented another interesting study, in which the phenotype is the time until terminal illness due to tumor for Eμ-v-abl transgenic mice.

The incompleteness of the trait values presents major challenges in the application of the interval-mapping approach. Broman (2003) considered a cure model in which the mice that are alive at the end of the study are regarded as cured and in which the survival times among the deaths follow a log-normal distribution. This is a specialized model, which can deal only with the situations in which the potential censoring times are equal among all study subjects. Symons *et al.* (2002) utilized a variant of the expectation-maximization (EM) algorithm (Lipsitz and Ibrahim 1998) to map QTL with censored observations. This method is computationally intensive and its properties have not been investigated.

In this article, we provide a simple and rigorous extension of the interval-mapping approach of Lander and Botstein (1989) to censored quantitative traits. Specifically, we formulate the effects of QTL on the failure time through proportional hazards models (Kalbfleish and Prentice 2002, Sect. 2.3). We then develop efficient likelihood-based methods for locating QTL and estimating the effects of QTL. In addition, we show how to assess genome-wide statistical significance by extending the analytical results of Lander and Botstein (1989) and Dupuis and Siegmund (1999) and by developing an accurate and efficient Monte Carlo procedure. We conduct extensive simulation studies to evaluate the performance of the proposed methods. Finally, we provide an application to the mice data of Broman (2003).

## METHODS

### Interval mapping:

In this section, we develop an interval-mapping method for potentially censored failure-time traits in an F_{2} intercross population by modeling a single QTL. Expanding the results to other crosses is straightforward. Consider *n* progenies from an intercross between two inbred strains. Let *T _{i}* denote the quantitative trait for the

*i*th subject, which pertains to a failure time that can potentially be censored and thus incompletely observed. Let

*C*be the censoring time for the

_{i}*i*th subject. The observation on the trait value of the

*i*th subject consists of two components:

*Y*= min(

_{i}*T*,

_{i}*C*) and Δ

_{i}*=*

_{i}*I*(

*T*≤

_{i}*C*), where

_{i}*I*() is the indicator function for event . The failure time

*T*is fully observed only when it is uncensored,

_{i}*i.e*., Δ

*= 1.*

_{i}Suppose that we have data on a set of genetic markers with a known genetic map. Let **M*** _{i}* denote the multipoint marker genotype data for the

*i*th subject. We consider a putative QTL locus

*d*in the genome with two possible alleles

*q*and

*Q*from the two inbred parents and define

*G*= −1, 0, or 1 according to whether the

_{i}*i*th subject has genotype

*G*, the hazard function of

_{i}*T*takes the form

_{i}where β_{1} and β_{2} pertain to the additive and dominant effects of the QTL, and λ_{0}(·) is an unknown baseline hazard function (Kalbfleish and Prentice 2002, Sect. 2.3). In this article, we assume a parametric model for λ_{0}. In particular, we consider a Weibull hazard function , γ_{1} > 0, γ_{2} > 0 (Kalbfleish and Prentice 2002, p. 33).

Write **θ** = (**β**, **γ**), where **β** = (β_{1}, β_{2}) and **γ** = (γ_{1}, γ_{2}). At each locus, we may calculate π_{i}_{,}* _{g}* = Pr(

*G*=

_{i}*g*|

**M**

*) (*

_{i}*g*= −1, 0, 1;

*i*= 1, … ,

*n*), which are the conditional probabilities of the QTL genotypes given the observed marker data. Under the assumptions of no crossover interference and no genotyping errors, these probabilities are determined by the genotypes of the two flanking markers and the location of the QTL; see Equation 15.2 of Lynch and Walsh (1998).

We assume that censoring is noninformative (Kalbfleish and Prentice 2002, p. 195) and that the censoring time is independent of the failure time and QTL genotype. The likelihood for the vector of parameters **θ** based on the complete data (*Y _{i}*, Δ

*,*

_{i}*G*,

_{i}**M**

*) (*

_{i}*i*= 1, … ,

*n*) is proportional to

while the likelihood based on the observed data (*Y _{i}*, Δ

*,*

_{i}**M**

*)(*

_{i}*i*= 1, … ,

*n*) is proportional to

To obtain the maximum-likelihood estimator (MLE) of **θ**, we may maximize the observed-data likelihood (3) directly. An alternative approach is to apply the EM algorithm (Dempster *et al.* 1977) to (2). The expected value of the complete-data log-likelihood given the observed data can be shown to be, up to a constant,

where

and

In the E-step, we evaluate *p _{i}*

_{,}

*(*

_{g}**θ**) at the current estimate of

**θ**. The M-step can proceed in a similar manner to the case of complete data since expression (4), with

**θ**in

*p*

_{i}_{,}

*(*

_{g}**θ**) fixed, takes the same form as the complete-data log-likelihood. We begin the EM algorithm by assigning an initial value to

**θ**and iterate until convergence. The initial value for

**β**is set to

**0**and that of

**γ**to some value in the parameter space of

**γ**. The resulting MLE is denoted by . See appendix A for further detail.

We test the null hypothesis of no QTL effects, *i.e.*, H_{0}: **β** = **0**, by the likelihood-ratio statistic

where *L*(·) is the observed-data likelihood, and = (**0**, ) with being the restricted MLE of **γ** under H_{0}. The LOD score is LR/(2 ln 10). Under H_{0}, LR is asymptotically χ^{2}-distributed with 2 d.f. (appendix B). Note that *p _{i}*

_{,}

*(*

_{g}**θ**), ,

*L*(), LR, and LOD all depend on the locus

*d*through the dependence of π

_{i}_{,}

*on*

_{g}*d*. In the sequel, we include

*d*in the expressions to emphasize their dependence on

*d*if ambiguity arises. Note also that and

*L*() do not depend on

*d*. Thus, as in the case of standard interval mapping, the likelihood under H

_{0}is calculated once while the likelihood under the alternative is evaluated at each location in the genome to produce a LOD curve for each chromosome. The position with the largest value of the LOD score is declared to be the QTL location provided that the value exceeds a certain threshold level. We show how to determine the threshold level in the following section.

### Thresholds:

When searching the entire chromosome or whole genome for QTL, one should select a threshold level for the LOD score such that the probability (under the null hypothesis) that LOD or some other test statistic exceeds this level anywhere in the genome equals the desired false-positive rate. The pointwise significance level based on the χ^{2}-approximation is inadequate because of the multiple tests while the Bonferroni correction is too conservative because of the dependence of the test statistics at different points in the genome. In appendix C, we show that the likelihood-ratio statistic LR(*d*) can be partitioned into the sum of the squares of two asymptotically independent Ornstein-Uhlenbeck processes. This result is analogous to those of Lander and Botstein (1989) and Dupuis and Siegmund (1999) and implies that the analytical approximations of thresholds for normal traits can be applied to the case of censored failure time observations. These analytical results assume that the markers are dense or equally spaced with no missing data and thus may not work well in practice. Using results of Davies (1977)(1987), Rebai *et al.* (1994) provided approximate thresholds in backcross (BC) and F_{2}, which are applicable in the intermediate map density case. The calculations are formidable, even for F_{2}, and do not accommodate missing marker data.

To overcome the limitations of the analytical approximations, we propose a novel resampling approach to determining the thresholds for genome-wide statistical significance. This approach allows arbitrary distributions of the markers as well as arbitrary test positions. It also accommodates missing marker data and dominant markers.

For evaluating the correlations of the test statistics among different locations, it is more convenient to work with the score statistic than the likelihood-ratio statistic. Let **U**_{β}(; *d*) be the score function (based on the observed-data likelihood) for **β** at location *d*, which can be approximated by the sum of *n* independent zero-mean random vectors , where **θ**_{0} = (**0**, **γ**_{0}) is the true parameter value under H_{0}; see appendix B. Thus, *n*^{−1/2}**U**_{β}; *d* is asymptotically zero-mean normal with covariance matrix **V**(*d*) that is the limit of . We replace the unknown quantities in **Ũ*** _{i}*(

**θ**

_{0};

*d*) by their sample estimators to yield

**Û**

*(*

_{i}*d*) shown in appendix B. The score statistic for testing H

_{0}:

**β**=

**0**against H

_{1}:

**β**≠

**0**takes the form

where is a consistent estimator of the limiting covariance matrix **V**(*d*). It can be shown that *W*(*d*) is asymptotically equivalent to LR(*d*) (Cox and Hinkley 1974, Sect. 9.3).

In general, the limiting distribution of sup* _{d}W*(

*d*) is not analytically tractable. We use a resampling approach similar to that of Lin

*et al*. (1993) to approximate the null distributions of

*n*

^{−1/2}

**U**

_{β}(;

*d*) and sup

*(*

_{d}W*d*). From appendix B, it is easy to see that

*n*

^{−1/2}

**U**

_{β}(;

*d*) converges to a zero-mean Gaussian process with covariance function

**ψ**(

*d*

_{1},

*d*

_{2}) that is the limit of at (

*d*

_{1},

*d*

_{2}) and that

**ψ**(

*d*

_{1},

*d*

_{2}) can be consistently estimated by . Define , where

*Z*(

_{i}*i*= 1, … ,

*n*) are independent standard normal random variables that are independent of the observed data. Conditional on the observed data,

*n*

^{−1/2}

**Û**(

*d*) is a Gaussian process with mean

**0**and covariance function at (

*d*

_{1},

*d*

_{2}), which converges to

**ψ**(

*d*

_{1},

*d*

_{2}). Thus, the conditional distribution of

*n*

^{−1/2}

**Û**(

*d*) given the observed data converges to the limiting distribution of

*n*

^{−1/2}

**U**

_{β}(;

*d*). As a result, the distribution of

*W*(

*d*) can be approximated by that of

To approximate the distribution of sup* _{d}W*(

*d*), we generate the normal random sample (

*Z*

_{1}, … ,

*Z*) a large number of times while holding the observed data fixed; for each sample, we calculate

_{n}*Ŵ*(

*d*) and sup

*(*

_{d}Ŵ*d*). The 100(1 − α)th percentile of the simulated sup

*(*

_{d}Ŵ*d*) is the threshold value for the genome-wide significance level of α. This resampling approach is computationally much more efficient than the use of permutation (Churchill and Doerge 1994) and other simulation methods because it involves only simulation of normal random variables and does not entail repeated analysis of simulated data sets.

## SIMULATIONS

To investigate the operating characteristics of the proposed methods in practical situations, we performed extensive simulation studies. We generated the failure times from the Weibull distributions with baseline hazard function , where γ_{1} = 0.01 and γ_{2} = 2. We reparameterized γ_{1} and γ_{2} according to so as to ensure that the estimates of γ_{1} and γ_{2} are positive. The censoring times were generated from the uniform (0, τ) distribution, where τ was chosen to yield ~30% censored observations. Assuming no crossover interference, we generated the marker data from the Markov chain. The interval-mapping step size was set at 1 cM.

We considered a chromosome with a total length of 100 cM. Genetic maps with different numbers of equally spaced markers were simulated. Under H_{1}, one QTL located at 35 cM was simulated with β_{1} = 0.35 and β_{2} = 0.30. We generated 10,000 replicates of 300 observations from an F_{2} population. We evaluated the finite-sample properties of the MLEs of the QTL effects at the true QTL location. The results for the estimator of the additive QTL effect are summarized in Table 1. The proposed estimator appears to be virtually unbiased. The standard error estimator reflects accurately the true variation. The confidence intervals have proper coverage probabilities. We obtained similar results for the estimator of the dominant QTL effect (data not shown). We also examined the performance of the proposed interval-mapping methods for locating the QTL and estimating the QTL effects at the location with the maximum LOD. The results are summarized in Table 2. There is little bias for the estimator of the QTL location or the estimators of the QTL effects. The mapping is more precise for denser marker maps.

**Sampling means of the estimators for the QTL location and for the QTL effects at the estimated QTL location in the simulation studies**

We conducted additional simulation studies to evaluate the performance of the analytical and resampling methods for determining genome-wide statistical significance. We generated both equally and unequally spaced markers. We also simulated data with missing marker genotypes and dominant markers, which are more comparable with real data. We considered one chromosome with a total length of 100 cM. For the cases of unevenly spaced markers, we placed *m* markers at the following locations,

where LOC* _{j}* is the

*j*th marker location and [

*m*/2] is the largest integer that is less than or equal to

*m*/2. In these settings, the first half of the markers is denser than the second half of the markers. We generated 10,000 replicates of 300 observations from an F

_{2}population. The dense-map and sparse-map approximations were obtained from Equation C1 in appendix C . The thresholds for the resampling method were based on 10,000 normal samples. The results are summarized in Tables 3 and and44.

The thresholds based on the resampling method are close to the empirical values, whether the data are generated under H_{0} or H_{1}; consequently, the LR tests based on these thresholds have proper type I error and power. This is true of any genetic map, with or without missing marker genotypes and dominant markers. The dense-map approximations are too conservative and thus result in power loss, while the sparse-map approximations tend to be too liberal. We also assessed the approximations by Rebai *et al.* (1994), which turn out to be conservative when the genetic map is dense. For example, in the case of 51 markers with α = 0.05, the sizes are 2.66 and 2.26% for marker patterns 1 and 2, respectively.

## APPLICATION

To illustrate our methods, we consider the mice data previously analyzed by Broman (2003). A total of 116 female mice from an intercross between the BALB/cByJ and C57BL/6ByJ strains were genotyped at 133 markers, including 2 on the X chromosome. The phenotype of interest is the time to death following infection with *Listeria monocytogenes*. Approximately 30% of the survival times are censored.

Broman (2003) proposed a nonparametric (NP) approach and a two-part model. The NP approach is an extension of the Kruskal-Wallis statistic (Lehmann 1975, Sect. 5.2) by assigning a prior weight (π_{i}_{,}* _{g}*) to the rank of the

*i*th observation for each QTL genotype group

*g*. In this approach, the censored observations are treated as the true failure times and an average rank is assigned to those observations. In the two-part approach, Broman considered a cure model in which the mice that are alive at the end of the study are regarded as cured while the survival times among the deaths follow a log-normal distribution.

We applied the proposed methods to these data, assuming a Weibull baseline hazard, and the results are shown in Table 5 and Figures 1 and and2.2. The threshold for the LOD score at the 5% genome-wide significance level based on the resampling approach is 3.43, which is close to 3.27, the threshold obtained by permutation for the NP approach of Broman (2003). Our results are fairly consistent with those of Broman (2003). We detect almost the same QTL on chromosomes 5, 13, and 15 except that we detect an additional QTL on chromosome 6 rather than on chromosome 1. The QTL on chromosome 5 appears to have a strong additive effect and the hazard ratio of the survival time with genotype *QQ vs. qq* is ~9.95. Genotypes *qq* and *Qq* at the QTL on chromosome 6 seem to have similar effects. The QTL on chromosome 13 appears to have both additive and dominant effects. The QTL on chromosome 15 appears to have a strong dominant effect. At most detected QTL locations, our LOD scores are larger than those of Broman's NP approach. This suggests that our approach may be more efficient in detecting QTL.

**Estimates of the QTL positions and QTL effects along with the maximum LOD scores for the data on survival time following infection with**

**Listeria monocytogenes****in 116 intercross mice**

*Listeria monocytogenes*in 116 intercross mice. The threshold pertains to the 5% genome-wide significance level under the resampling method.

_{10}

*P*-values for three QTL mapping methods for the data on survival time following infection with

*Listeria monocytogenes*in 116 intercross mice. The

*P*-values for the proposed method are based on 100,000 normal samples. In the region

**...**

Figure 1 shows the LOD curves from the three methods: proposed method, nonparametric method, and two-part model. The LOD scores from the two-part model are larger than those of the other two methods at some QTL locations because there are two more free parameters in the two-part model than in the other two methods. This will decrease the power to detect QTL since a larger threshold (*i.e.*, 4.91) is required. To evaluate different methods on a common scale, we converted the LOD curves to the estimated pointwise *P*-values. Figure 2 displays the values of −log_{10}*P* for chromosomes 1, 5, 6, 13, and 15. Comparisons with the nonparametric method and two-part model reveal that the proposed method yields more significant results on the above chromosomes except chromosome 1. Incidentally, the resampling method is ~100 times faster than the permutation method in this application.

To get some ideas about the adequacy of the Weibull distribution in describing the Listeria data, we fitted both the semiparametric model and the Weibull model at marker D5M357, which is close to the peak of the LOD score on chromosome 5. The estimated QTL effects from the two models are very similar. It would be worthwhile to develop formal goodness-of-fit methods for assessing the adequacy of the parametric survival model at the true QTL location.

## EXTENSIONS

In this section, we extend the single-QTL model to multiple QTL. The approach of interval mapping (IM) considers one putative QTL at a time. The QTL located elsewhere on the genome can have interfering effects, so that the estimators for the locations and effects of QTL may be biased and the power of detecting QTL may be compromised (Lander and Botstein 1989; Haley and Knott 1992; Zeng 1994). Boer *et al.* (2002) showed that the IM method fails to detect three interacting QTL with no main effects through simulation studies. A variety of approaches have been proposed for mapping multiple QTL. These methods can increase the power to detect QTL and reduce biases in the estimators of the QTL effects and locations. In this section, we consider mainly composite-interval mapping (CIM; Jansen 1993; Zeng 1993, 1994) and multiple-interval mapping (MIM; Kao *et al.* 1999) for censored traits.

### Composite-interval mapping:

The idea of CIM is to combine IM with multiple regression analysis in mapping QTL by conditioning on markers outside a region of interest to account for the effects of other QTL. To extend the original CIM model to censored traits, we consider the following proportional hazards model,

where *j* and *j* + 1 conform to two flanking markers bordering the putative QTL, and *M _{ik}* is the indicator variable for the marker genotype, which takes values −1, 0, and 1 for genotypes

*aa*,

*Aa*, and

*AA*, respectively. We may further enhance the model by considering the interaction effects between the putative QTL and controlling markers. Replacing λ(

*t*|

*G*) in (2) and (3) with (5), we obtain the complete-data and observed-data likelihood functions, respectively. As in the case of standard interval mapping, we can maximize the observed-data likelihood directly or apply the EM algorithm to obtain the MLEs. We can test H

_{i}_{0}:

**β**=

**0**at any position in the genome.

The CIM approach requires that the sample size be large relative to the number of markers included in the model. In practice, the sample size is generally not very large. Thus, Zeng (1994) suggested including in the model only those markers that are more or less evenly spaced in the genome or those preidentified markers that explain most of the genetic variation in the genome. This suggestion also applies to our setting.

### Multiple-interval mapping:

The MIM approach proposed by Kao *et al.* (1999) uses multiple marker intervals simultaneously to fit multiple putative QTL directly in the model. Consider *K* QTL, *Q*_{1}, … , *Q _{K}*, located at

*d*

_{1}, … ,

*d*in the genome. There are 3

_{K}*possible QTL genotypes. Some of the*

^{K}*K*QTL may exhibit epistasis. We formulate the effects of the

*K*QTL on the failure time through a proportional hazards model, such that, conditional on the joint genotype

**G**

*= (*

_{i}*G*

_{1}

*, … ,*

_{i}*G*), the hazard function of

_{Ki}*T*takes the form

_{i}where **x*** _{ij}* = (

*G*, 1 − |

_{ij}*G*|)

_{ij}*, δ*

^{T}*is an indicator variable for epistasis between*

_{jk}*Q*and

_{j}*Q*, and

_{k}**β**

*and*

_{j}**B**

*pertain to the main effects and epistatic effects, respectively. The variable δ*

_{jk}*indicates, by the values 1*

_{jk}*vs.*0, whether or not

*Q*and

_{j}*Q*interact. Given the marker data

_{k}**M**

*for the*

_{i}*i*th subject and assuming no crossover interference, we may calculate π

_{i}_{,}

*(*

_{g}*g*= 1, … , 3

*), the conditional probabilities of the 3*

^{K}*possible genotypes of the*

^{K}*K*QTL. The complete-data likelihood takes the same form as Equation 2 except that the summation of

*g*is now over (1, … , 3

*). To obtain the MLEs and LOD scores, we can again apply the EM algorithm.*

^{K}Since the true number and locations of the QTL are unknown, model selection is a critical issue in the MIM approach. Kao *et al.* (1999) suggested stepwise and chunkwise selection with the likelihood-ratio test statistic as a selection criterion to identify QTL, to separate linked QTL, and to analyze epistasis between QTL. Broman and Speed (2002) developed a modified Bayesian information criterion (Schwarz 1978) for model selection. When many QTL are included in the MIM model, the computation can be very intensive. Sen and Churchill (2001) reduced the computation burden of MIM by replacing the EM algorithm with a Monte Carlo algorithm. All these strategies can be applied to our setting.

## DISCUSSION

We have described our methods in the context of an F_{2} population. All the proposed methods can be easily generalized to other crosses such as BC, F_{3}, or even more complicated crosses such as combined crosses (Zou *et al.* 2001). We may also extend our methods to accommodate covariates, such as block factors in an agriculture field trial or cage number in a mouse cross.

Symons *et al.* (2002) formulated the effects of the QTL on the failure time through the semiparametric proportional hazards model. They utilized a variant of the EM algorithm developed by Lipsitz and Ibrahim (1998), in which Monte Carlo simulation is used to approximate the conditional expectation of the complete-data partial-likelihood score function given the observed data. The solution to this conditional expectation is not a maximum partial-likelihood estimator, so that the method is not statistically efficient. The Monte Carlo simulation is time-consuming. There is no formal justification for the method of Lipsitz and Ibrahim (1998) or that of Symons *et al.* (2002). Our method is computationally much simpler than Monte Carlo simulation. It is based on the maximum-likelihood estimator and is thus statistically efficient. Furthermore, we have established the theoretical properties of our method and assessed its empirical performance through simulation studies. The method of Symons *et al.* (2002) has the advantage that the baseline hazard function is unspecified.

The proposed resampling approach to determining genome-wide significance is applicable to arbitrary genetic maps and accommodates missing marker data and dominant markers. For the CIM and MIM models, no analytical thresholds are available and the permutation method is extremely time-consuming. The proposed resampling approach can be applied to the CIM and MIM models since all the relevant formulas have been presented for an arbitrary likelihood. For the MIM method, the resampling approach produces appropriate thresholds for testing each putative QTL given the others, with or without adjustment for the fact that multiple QTL are tested simultaneously.

We have written a computer program in C to implement the proposed method. This program is available from the authors upon request.

## APPENDIX A:

#### COMPUTATIONS OF MLE AND COVARIANCE MATRIX

In this section, we present the formulas for the M-step of the EM algorithm under the Weibull model. In addition, we provide the observed information matrix as well as a consistent estimator of the covariance matrix of MLE . In the M-step of the (*k* + 1)th iteration, the first and second derivatives of the expected value of the log-likelihood given the observed data and current estimate ^{(}^{k}^{)} are given by

where

**g** = (*g*, 1 − |*g*|)* ^{T}*, and , which is the cumulative hazard function conditional on the QTL genotype

*g*. Then, we can apply the Newton-Raphson algorithm to update the current estimate with the new maximizer

^{(}

^{k}^{+1)}.

The observed information matrix is given by

where for a column vector **a**, **a**^{2} denotes the matrix **aa*** ^{T}*. Thus, a consistent estimator of the covariance matrix of is given by the inverse of the observed information matrix evaluated at ,

*i.e.*, .

## APPENDIX B:

#### ASYMPTOTIC PROPERTIES OF SCORE AND LIKELIHOOD-RATIO STATISTICS

In this section, we show that LR(*d*) is asymptotically χ^{2}_{2}-distributed under H_{0} and provide the necessary ingredients for deriving the thresholds. Let **U**(**θ**; *d*) and **I**(**θ**; *d*) be the observed-data score function and information matrix at location *d* with the following partitions to conform with the partition (**β**, **γ**) of **θ**,

and

Under H_{0}, *n*^{−1}**I**() converges to **∑**(*d*). Denote

It can be shown that asymptotically (Cox and Hinkley 1974, Sect. 9.3). Through Taylor series expansions, asymptotically, where and **θ**_{0} = (**0**, **γ**_{0}). The replacements of the unknown quantities in **Ũ*** _{i}*(

**θ**

_{0};

*d*) with their sample estimators yield .

Let . Then **z**(*d*) converges to a normal distribution with mean **0** and an identity 2 × 2 covariance matrix. Thus, LR(*d*) is asymptotically distributed as χ^{2}_{2} under H_{0}.

## APPENDIX C:

#### ANALYTICAL APPROXIMATIONS OF THRESHOLDS

We show in this appendix that, for infinitely dense markers, the null distribution of LR(*d*) can be approximated by an Ornstein-Uhlenbeck process. Under H_{0}, **∑**(*d*) does not depend on *d*. Let *d*_{1} and *d*_{2} denote two points on the chromosome, and *r* be the recombination fraction corresponding to the genetic distance |*d*_{1} − *d*_{2}|. Under the assumption of no crossover interference, it is easy to show that the correlation between **g**(*d*_{1}) and **g**(*d*_{2}) is given by

provided that *r* is small. Since **U**_{β,}* _{i}*(

**θ**;

*d*) = (Δ

*− Λ*

_{i}_{0}(

*Y*))

_{i}**g**

*(*

_{i}*d*) under H

_{0}, we have

The above result implies that *z*_{1}(*d*) and *z*_{2}(*d*), the first and second components of **z**(*d*), are approximately independent Ornstein-Uhlenbeck processes with means zero and variances 1 − 2*r* and 1 − 4*r*, respectively. By the arguments of Dupuis and Siegmund (1999), the tail distribution of sup* _{d}*LR(

*d*) under H

_{0}satisfies

where Δ is the average marker distance (in morgans), *C* is the number of chromosomes, *L* is the total length of the genome (in morgans), and *v* = *v*(*a*(6Δ)^{1/2}), the definition of which can be found in Siegmund (1985). When Δ = 0, the above formula reduces to that of Lander and Botstein (1989). These results imply that all the analytical thresholds for the normal trait can be applied to our case.

## References

- Boer, M. P., C. J. Braak and R. C. Jansen, 2002. A penalized likelihood method for mapping epistatic quantitative trait loci with one-dimensional genome searches. Genetics 162 951–960. [PMC free article] [PubMed]
- Broman, K. W., 2003. Mapping quantitative trait loci in the case of a spike in the phenotype distribution. Genetics 163 1169–1175. [PMC free article] [PubMed]
- Broman, K. W., and T. P. Speed, 2002. A model selection approach for the identification of quantitative trait loci in experimental crosses (with discussion). J. R. Stat. Soc. B 64 731–775.
- Churchill, G. A., and R. W. Doerge, 1994. Empirical threshold values for quantitative trait mapping. Genetics 138 963–971. [PMC free article] [PubMed]
- Cox, D. R., and D. V. Hinkley, 1974
*Theoretical Statistics.*Chapman & Hall, London. - Davies, R. B., 1977. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64 247–254. [PubMed]
- Davies, R. B., 1987. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74 33–43. [PubMed]
- Dempster, A. P., N. M. Laird and D. B. Rubin, 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39 1–38.
- Doerge, R. W., Z-B. Zeng and B. S. Weir, 1997. Statistical issues in the search for genes affecting quantitative traits in experimental populations. Stat. Sci. 12 195–219.
- Dupuis, J., and D. Siegmund, 1999. Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151 373–386. [PMC free article] [PubMed]
- Ferreira, M. E., J. Satagopan, B. S. Yandell, P. H. Williams and T. C. Osborn, 1995. Mapping loci controlling vernalization requirement and flower time in
*Brassica napus.*Theor. Appl. Genet. 90 727–732. [PubMed] - Haley, C. S., and S. A. Knott, 1992. A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity 69 315–324. [PubMed]
- Hilbert, P., K. Lindpaintner, J. S. Beckmann, T. Serikawa, F. Soubrier
*et al.*, 1991. Chromosomal mapping of two genetic loci associated with blood-pressure regulation in hereditary hypertensive rats. Nature 353 521–529. [PubMed] - Jacob, H. J., K. Lindpaintner, S. E. Lincoln, K. Kusumi, R. K. Bunker
*et al.*, 1991. Genetic mapping of a gene causing hypertension in the stroke-prone spontaneously hypertensive rat. Cell 67 213–224. [PubMed] - Jansen, R. C., 1993. Interval mapping of multiple quantitative trait loci. Genetics 135 205–211. [PMC free article] [PubMed]
- Kalbfleish, J. D., and R. L. Prentice, 2002
*The Statistical Analysis of Failure Time Data*, Ed. 2. Wiley, Hoboken, NJ. - Kao, C. H., Z-B. Zeng and R. D. Teasdale, 1999. Multiple interval mapping for quantitative trait loci. Genetics 152 1203–1216. [PMC free article] [PubMed]
- Lander, E. S., and D. Botstein, 1989. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121 185–199. [PMC free article] [PubMed]
- Lehmann, E. L., 1975
*Nonparametrics: Statistical Methods Based on Ranks*. Holden-Day, San Francisco. - Lin, D. Y., L. J. Wei and Z. Ying, 1993. Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80 557–572.
- Lipsitz, S. R., and J. G. Ibrahim, 1998. Estimating equations with incomplete categorical covariates in the Cox model. Biometrics 54 1002–1013. [PubMed]
- Lynch, M., and B. Walsh, 1998
*Genetics and Analysis of Quantitative Traits.*Sinauer, Sunderland, MA. - Rebai, A., B. Gofinet and B. Mangin, 1994. Approximate thresholds of interval mapping test for QTL detection. Genetics 138 235–240. [PMC free article] [PubMed]
- Schwarz, G., 1978. Estimating the dimension of a model. Ann. Stat. 6 461–464.
- Sen, S., and G. A. Churchill, 2001. A statistical framework of quantitative trait mapping. Genetics 159 371–387. [PMC free article] [PubMed]
- Shepel, L. A., H. Lan, J. D. Haag, G. M. Brasic, M. E. Gheen
*et al.*, 1998. Genetic identification of multiple loci that control breast cancer susceptibility. Genetics 149 289–299. [PMC free article] [PubMed] - Siegmund, D., 1985
*Sequential Analysis: Tests and Confidence Intervals*. Springer-Verlag, New York. - Symons, R. C., M. J. Daly, J. Fridlyand, T. P. Speed, W. D. Cook
*et al.*, 2002. Multiple genetic loci modify susceptibility to plasmacytoma-related morbidity in E*μ*-v-abl transgenic mice. Proc. Natl. Acad. Sci. USA 99 11299–11304. [PMC free article] [PubMed] - Zeng, Z-B., 1993. Theoretical basis of precision mapping of quantitative trait loci. Proc. Natl. Acad. Sci. USA 90 10972–10976. [PMC free article] [PubMed]
- Zeng, Z-B., 1994. Precision mapping of quantitative traits loci. Genetics 136 1457–1468. [PMC free article] [PubMed]
- Zou, F., B. S. Yandell and J. P. Fine, 2001. Statistical issues in the analysis of quantitative traits in combined crosses. Genetics 158 1339–1346. [PMC free article] [PubMed]

**Genetics Society of America**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (128K)

- Semiparametric methods for mapping quantitative trait loci with censored data.[Biometrics. 2005]
*Diao G, Lin DY.**Biometrics. 2005 Sep; 61(3):789-98.* - Interval mapping methods for detecting QTL affecting survival and time-to-event phenotypes.[Genet Res. 2005]
*Moreno CR, Elsen JM, Le Roy P, Ducrocq V.**Genet Res. 2005 Apr; 85(2):139-49.* - A note on QTL detecting for censored traits.[Genet Sel Evol. 2006]
*Fang Y.**Genet Sel Evol. 2006 Mar-Apr; 38(2):221-9. Epub 2006 Feb 23.* - Linkage mapping of quantitative trait loci in humans: an overview.[Ann Hum Genet. 2002]
*Ghosh S, Reich T, Majumder PP.**Ann Hum Genet. 2002 Nov; 66(Pt 5-6):431-8.* - QTL mapping and the genetic basis of adaptation: recent developments.[Genetica. 2005]
*Zeng ZB.**Genetica. 2005 Feb; 123(1-2):25-37.*

- Evaluating markers in selected genes for association with functional longevity of dairy cattle[BMC Genetics. ]
*Szyda J, Morek-Kopeć M, Komisarek J, Żarnecki A.**BMC Genetics. 1230* - Bayesian model selection for characterizing genomic imprinting effects and patterns[Bioinformatics. 2010]
*Yang R, Wang X, Wu Z, Prows DR, Lin M.**Bioinformatics. 2010 Jan 15; 26(2)235-241* - Bayesian robust analysis for genetic architecture of quantitative traits[Bioinformatics. 2009]
*Yang R, Wang X, Li J, Deng H.**Bioinformatics. 2009 Apr 15; 25(8)1033-1039* - Hierarchical Generalized Linear Models for Multiple Quantitative Trait Locus Mapping[Genetics. 2009]
*Yi N, Banerjee S.**Genetics. 2009 Mar; 181(3)1101-1113* - Mapping Quantitative Trait Loci From a Single-Tail Sample of the Phenotype Distribution Including Survival Data[Genetics. 2007]
*Sillanpää MJ, Hoti F.**Genetics. 2007 Dec; 177(4)2361-2377*

- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyRelated taxonomy entry
- Taxonomy TreeTaxonomy Tree

- Mapping Quantitative Trait Loci With Censored ObservationsMapping Quantitative Trait Loci With Censored ObservationsGenetics. Nov 2004; 168(3)1689PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...