• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of geneticsGeneticsCurrent IssueInformation for AuthorsEditorial BoardSubscribeSubmit a Manuscript
Genetics. Aug 2007; 176(4): 2529–2540.
PMCID: PMC1950652

Bayesian Multiple Quantitative Trait Loci Mapping for Complex Traits Using Markers of the Entire Genome

Abstract

A Bayesian methodology has been developed for multiple quantitative trait loci (QTL) mapping of complex binary traits that follow liability threshold models. Unlike most QTL mapping methods where only one or a few markers are used at a time, the proposed method utilizes all markers across the genome simultaneously. The outperformance of our Bayesian method over the traditional single-marker analysis and interval mapping has been illustrated via simulations and real data analysis to identify candidate loci associated with colorectal cancer.

TREMENDOUS advances have been achieved over the last decade in the identification of genes underlying many heritable traits with the greatest progress limited almost entirely to those with Mendelian inheritance patterns and well-defined quantitative traits that have relatively large and consistent effects. However, many common pathologies afflicting the greatest number of individuals are not due to simple Mendelian traits. Recent emphasis has been shifted to map complex traits, which are caused by the sum of a complex interaction between gene products and environmental stimuli. Complicating the analysis of these types of traits is the prediction that many are also controlled by genes that have small effects individually, but whose cumulative action is the cause of significant interindividual variation. Due to the complex and often subtle nature of phenotypic variation, traits with complex etiologies have proven far more resistant to genetic analysis. Most of the available quantitative trait loci (QTL) mapping methods map only one or a few QTL at a time and therefore are not efficient for mapping such complex traits. Forward and stepwise selection procedures have been proposed in searching for multiple QTL. Though simple, these methods have their limitations, such as the uncertainty of number of QTL, the sequential model building that makes it unclear how to assess the significance of the associated tests, etc.

To overcome this problem, Bayesian QTL mapping (Satagopan et al. 1996; Sillanpaa and Arjas 1998; Stephens and Fisch 1998; Yi and Xu 2000, 2001; Hoeschele 2001) has been developed, in particular, for detection of multiple QTL by treating the number of QTL as a random variable and specifically modeling it using reversible-jump Markov chain Monte Carlo (MCMC) (Green 1995). Due to the change of dimensionality, care must be taken in determining the acceptance probability for such a dimension change, which in practice may not be handled correctly (Ven 2004). To avoid such a problem by the uncertain dimensionality of parameter space, Yi (2004) and Xu (2003) proposed a unified Bayesian framework to identify multiple QTL using all markers across the genome. The method of Xu (2003) is based on a shrinkage idea to simultaneously evaluate marker effects of the entire genome under the random regression model by assigning each marker a normal prior with mean 0 and an effect-specific variance. The effect-specific prior variance was further assigned a vague prior such that the variance was estimated from the data. Those markers that have no effect on the trait will be essentially shrunk down to 0. Similarly, Yi (2004) adapted the stochastic search variable selection (SSVS) approach of George and McCulloch (1993) to the QTL mapping framework. SSVS is a variable selection method that keeps all possible variables in the model and limits the posterior distribution of nonsignificant variables in a small neighborhood of 0 and therefore eliminates the need to remove nonsignificant variables from the model. In principle, Xu (2003) and Yi (2004)'s methods are similar and both have the ability to control the genetic variances of a large number of QTL where each has small effect (Wang et al. 2005). Due to the simplicity of Xu (2003), we decide to go with the unified shrinkage method in this article.

Some quantitative traits do not have continuous measurements, but rather are qualitative traits with, for example, binary measurements. This research is mainly motivated by a colorectal cancer susceptibility study. One hundred thirty-five backcross mice of A/J × SPRET/EiJ F1 (ASP F1) hybrids to A/J were given intraperitoneal injections of the alkylating carcinogen azoxymethane where ~40% of the mice developed tumors. The goal of the study was to identify susceptibility genes for colorectal cancer. Mapping genes for such binary traits is more complicated than that for continuous traits as current QTL mapping methods are mainly limited to test association between a marker and a binary trait with simple chi-square tests. As an alternative, Hackett and Weller (1995), Xu and Atchley (1996), and Visscher et al. (1996) proposed interval mapping procedures for complex binary disease traits assuming that the binary traits are controlled by an underlying normally distributed liability. The quantitative liability is then modeled by the usual quantitative genetics model. Due to the apparent success of the unified Bayesian framework to identify multiple QTL using all markers across the genome for normally distributed quantitative traits, the systematic investigation of the unified Bayesian mapping for complex binary traits would be interesting and useful. In this article, we propose the Bayesian methodology in mapping complex binary traits and investigate its performance via extensive simulations. Detailed convergence diagnostics are also presented.

STATISTICAL METHODS

The liability and threshold model:

In the liability model (Wright 1934a,b; Falconer 1965; Falconer and Mackay 1996), a binary trait is assumed to be controlled by a latent liability variable, which is normally distributed. That is, suppose di and yi (i = 1,…, n) are the binary phenotype and the underlying liability, respectively, of the ith individual; then the threshold model assumes that there is a fixed threshold in the scale of liability, t, which determines di. Specifically, di = 1, if yi > t; otherwise di = 0.

Marker analysis:

Here we allow only QTL to be located on markers. An extension to allow QTL to be located between markers is discussed in the following section.

For backcross QTL data, we can describe the liability yi by the following linear model,

equation M1
(1)

where K is the total number of markers; μ is the overall population mean, xij is a dummy variable indicating the genotype of the jth marker of individual i with xij = 1 or 0 if the marker genotype is homozygote or heterozygote, respectively, aj is the partial regression coefficient, and ei is the residual error with a distribution of equation M2. Note, aj describes the genetic effect of the jth marker that partly absorbs the effects of all QTL located between markers j − 1 and j + 1, as shown by Zeng (1993).

Similarly, for an F2 population, the liability yi can be related to QTL as

equation M3
(2)

where xij and wij are defined as equation M4 and wij = −1 for genotype AA, xij = 0 and wij = 1 for Aa, and equation M5 and wij = −1 for aa with AA, Aa, and aa referring to the three marker genotypes at each locus; aj and bj are the additive and dominance effects of the jth marker; and μ and ei are defined the same as in the backcross model.

Since the latent liability y is unobserved, the mean μ and residual variance equation M6 can be set arbitrarily. For model identifiability, we set t = 0 and σe = 1 throughout the presentation.

To employ the frequentist method via a likelihood function requires calculation of the conditional probability, pi, of di = 1 given {xij, j = 1, 2,…, K}, which can be approximated by the logistic model

equation M7
(3)

(Xu and Atchley 1996), with equation M8 for the backcross, equation M9 for the F2, and equation M10. Therefore the log likelihood can be approximated as

equation M11
(4)

where pi is defined above. The maximum-likelihood estimators (MLE) can be obtained by directly maximizing L. However, in practice, the number of markers is often comparable to or even larger than the number of observations. Under such circumstances, the maximum-likelihood method will have very low efficiency or even fail.

Below we describe the Bayesian method which overcomes such a problem. We describe here only the basis for the backcross population. With a minor modification, it can be easily extended to an F2 design.

Bayesian modeling:

In the Bayesian framework, both the data and the parameters are treated as random, with random variables being classified as observed and unobserved. The goal of Bayesian analysis is to combine the prior distribution of the unobserved variables with the observed data to obtain a posterior distribution of the unknown variables. The observed data in our QTL setup are the binary responses equation M12, and the marker genotypes X = {xij}, for i = 1, 2,…, n and j = 1, 2,…, K. Our unobserved variables are the liability equation M13, the regression coefficients, equation M14, and the variances associated with equation M15.

By Bayes' theorem, the joint posterior density of the parameters {Y, B, V}, given the observed data {S, X}, is

equation M16
(5)

Inference is performed conditional on X and we suppress this conditioning notation for the remainder of the article.

We assume that the joint prior distribution of p(B, V) is

equation M17
(6)

Specifically, we choose equation M18, and equation M19, where [var phi](x, σ2) is the density function of normal distribution with mean zero and variance σ2.

The first term in (5) is the conditional distribution of the data given all the unknowns, which equals

equation M20
(7)

where I(A) is an indicator function, taking the value of 1 if condition A is true and 0 otherwise. Note that p(S | Y, B, V) = p(S | Y) because S depends solely on Y. The second term in (5) is the conditional distribution of the liability. Because the liability is normally distributed and independent of each other given other variables, we have

equation M21
(8)

A MCMC method is used to generate the joint posterior distribution of all unknowns given in (5). Let ψ = {Y, B, V} be the collection of unknown variables. For a given j, if the conditional distribution of ψj given the rest of variables has a known standard density, new ψj are drawn via Gibbs samples. Otherwise, we use the Metropolis–Hastings algorithm to draw a new sample equation M22 according to a proposed density equation M23. equation M24 will be accepted with probability min(1, r), where

equation M25
(9)

Here a negative subscript (− j) denotes a vector with the jth element removed.

We first initialize ψ as follows. The overall mean μ and the genetic effects of all QTL equation M26 are initialized with zero while the variances equation M27 are initialized with 1. Given the initial values of (B, V), we generate yi from the corresponding truncated normal distributions,

equation M28
(10)
equation M29
(11)

where Φ(x) is the standardized normal distribution function. We describe below the steps for a single MCMC iterative sweep t + 1. Superscripts (t) signify the current variables, and we begin with the initialized values for t = 0:

  • Step 1. Updating μ: μ(t+1) is drawn from the posterior normal distribution with mean equation M30 and variance 1/n.
  • Step 2. Updating aj for j = 1,…, K: equation M31 is drawn from the posterior normal distribution with mean
    equation M32
    (12)
    and variance
    equation M33
    (13)
  • Step 3. Updating equation M34 for j = 1,…, K: equation M35 is sampled from a scale-inverted χ2-distribution, equation M36, where equation M37 is a χ2-distribution with 1 d.f.
  • Step 4. Updating yi for i = 1,…, n: equation M38 is drawn from a truncated normal distribution (10) if di = 1 or (11) if di = 0.

After this round of sampling, we have completed one sweep of the MCMC and are ready to continue our sampling for the next round by repeating steps 1–4 with the new ψ. When the chain converges, the sampled parameters approximately follow the joint posterior distribution. From the joint posterior sample, one can easily obtain the desired Bayesian estimates, such as the posterior means and variances.

Simulations and real data analysis:

The performance of the proposed Bayesian method is evaluated by analyzing a set of simulated backcross data.

A single chromosome with a length of 15 morgans is simulated. On this chromosome, 301 evenly spaced markers (300 intervals, each 5 cM long) are located and two sets of QTL are simulated. In the first setup, four QTL are put along the genome with positions and effects listed in Table 1. Here the QTL are only loosely linked and each QTL explains 20% of the total liability variance. To investigate the ability of the method to identify small QTL effects and to separate closely linked QTL, 11 QTL are evenly placed on the first half of the chromosome, i.e., from 0 to 750 cM in the second setup. The QTL are numbered from 1 to 11 with effects in descending order: equation M39. The first QTL explains 43.54% of the total liability variance and the second QTL explains half the size of the first one and so on. The proportion of the total liability variance explained by the 11 QTL is H = 89.12%. To demonstrate the advantages of the proposed method over the traditional regression method, we simulate 300 individuals, which is smaller than the number of markers; traditional regression analysis fails if all markers are included as covariates. Further, we have simulated 500 individuals to see how sample size affects QTL mapping. For each simulated datum, the sampled parameter values from the first 50,000 sweeps of the chain (burn-in period) were discarded from the analysis. Then we performed an additional 500,000 MCMC sweeps. After the burn in, the final sample of observations was selected every 50 sweeps to reduce serial correlation, resulting in 10,000 samples from the posterior.

TABLE 1
QTL parameters and their estimates obtained from the simulated data

Table 1 shows some summary statistics and Figure 1 shows the estimated QTL-effect profiles (the posterior means of the marker effects) against marker positions along the genome for normal data (Figure 1a) and binary data (Figure 1b) from 10,000 sample states under setup one with sample size 300.

Figure 1.
Bayesian estimates of QTL effects in the simulated backcross family under setup one. (a) Results from (underlying) normal liability data and (b) results from binary data. “x” refers to the simulated QTL position (x-axis) and effect (y ...

Both profiles show clear signals of QTL at the simulated positions. In most cases the estimated QTL effects are close to the true values (Table 1). However, the analysis based on the binary data has reduced efficiency relative to the analysis on the normal liability data directly. This is expected because of the reduced information in binary data. The histograms of the posterior distribution are also presented in Figure 2. The posterior distributions are nearly normal shaped.

Figure 2.
Histograms for the posterior QTL effects at the four simulated markers for binary data (left) and normal data (right).

The results for setup two where some QTL are closely linked (sample size = 300) are presented in Figure 3. For comparison, we also perform a simple chi-square test at each marker. From Figure 3a, where −log10(P-value) of the single-marker analysis is plotted, we see many markers that are significantly associated with the simulated trait. Because the single-marker analysis fit only one marker at a time, those markers that are correlated with the simulated QTL will also be highly associated with the trait. On the other hand, in the Bayesian analysis result (Figure 3b), five large QTL are quite clear and four of them are located at the simulated positions with effects also close to the true values. To improve the estimate power, we simulated 500 backcross individuals on the basis of the same setup as above and applied both single-marker and Bayesian analyses on them. The results are depicted in Figure 4. Clearly, the mapping efficiency has been improved quite a lot when sample size increases from 300 to 500 (Figure 4b), where six clear QTL are given at the true positions with effects very close to true values. The smallest QTL that the Bayesian method can pick up in this example explains 2.72% of the liability variance.

Figure 3.
Estimates of QTL effects in the simulated backcross family (n = 300) with 11 QTL of different genetic effects. (a) Single-marker analysis based on a chi-square test; (b) Bayesian analysis. “x” refers the simulated QTL position ...
Figure 4.
Estimates of QTL effects in the simulated backcross family (n = 500) with 11 QTL of different genetic effects. (a) Single-marker analysis based on chi-square test; (b) Bayesian analysis. “x” refers the simulated QTL position ( ...

Real data analysis:

Azoxymethane (AOM) is an alkylating carcinogen that causes strain-specific susceptibility to the development of colorectal tumors in mice. Susceptible mice strains treated with AOM exhibit genetic and pathologic changes similar to those in nonfamilial or sporadic human colorectal cancer. We previously characterized 32 inbred mouse strains for their susceptibility to AOM-induced colorectal tumors and identified the A/J strain as having one of the highest sensitivities, with nearly 100% of mice developing colorectal tumors (our unpublished results). In contrast, the genetically distinct strain from the related Mus spretus species, SPRET/EiJ, was found to be completely resistant.

For the current study, we set up a backcross of ASP F1 hybrids to A/J to generate backcross progeny. Two- to 3-month-old mice were given intraperitoneal injections of AOM at 10 mg/kg of body weight once a week for 4 weeks. Subsequently, mice were killed by carbon dioxide asphyxiation 20 weeks after the last AOM dose. A tail clip, a liver sample, and the colon were dissected from each mouse. Each colon was gently flushed with phosphate buffer saline solution and cut open along its longitudinal axis. The position and size of tumors were recorded before excising any colon tumors for histological verification. Tails, livers, and colons were then placed into labeled Eppendorf tubes and stored at −80°. DNA was isolated from liver or tail samples from each mouse by phenol-chloroform extraction. One hundred thirty-five backcross mice were genotyped using the Illumina SNP genotyping platform. Two hundred fifty-four informative markers were used that distinguished the A/J strain from the SPRET/EiJ strain. ASP F1 mice were found to be nearly tumor free with ~5% of mice developing a single colorectal tumor. Analysis of the 135 mice in the backcross population revealed that 40% of mice developed AOM-induced colorectal tumors (Figure 5).

Figure 5.
Number of tumors in two parental strains A/J and SPRET/EiJ, plus ASP F1 and backcross strains.

Since ~5% of ASP F1 mice still developed a single colorectal tumor, we recategorized the backcross animals into two groups, with one group being tumor free or having a single tumor while the second group was backcross mice developing more than one tumor. The analysis was based on this recategorized binary trait and is summarized in Figure 6. Due to the small sample size, no significant results were identified. However, one region on chromosome 6 is promising. This region encompasses a previously detected susceptibility locus for AOM-induced colorectal tumors (Ruivenkamp et al. 2003). We are currently working on collecting another 140 backcross mice to confirm the findings on chromosome 6. For comparison, the analysis based on a single chi-square test was also performed. Again, the smallest P-value is on chromosome 6, which is consistent with the Bayesian analysis, but it does not survive the genomewide 5% significance level either. Another interesting region indicated by our analysis is located on chromosome 11, which deserves further investigation as well.

Figure 6.
(a) Bayesian analysis; (b) single-marker analysis based on a chi-square test. Vertical dotted lines are used to separate the chromosomes. The 95% confidence intervals are bracketed by x's and shown at the two positions with the highest posterior means. ...

Interval mapping:

In this section, we extend the QTL mapping on markers to a more general situation where QTL are allowed to be located at any position within marker intervals.

Bayesian modeling:

Wang et al. (2005) developed a method for normal phenotypes that allows a QTL to take a position varying within a marker interval rather than fixed at a marker. They assumed that each marker interval is associated with a QTL, and thus the number of putative QTL is identical to the number of intervals. In their model, the conditional distribution of the QTL genotype given the marker genotypes is derived from a Markov model under the assumption of no segregation inference. The main difference between interval mapping and marker analysis is that for interval mapping, the QTL positions and QTL genotypes are unobserved, while for marker analysis, the QTL positions are fixed at markers and also QTL genotypes are observed. A major advantage of the Bayesian analysis is that the unobserved QTL positions and QTL genotypes can be treated as random variables and sampled via the MCMC procedure. Therefore, for interval mapping, we need only to add additional steps to the MCMC procedure described above for sampling the QTL positions and QTL genotypes.

For interval mapping, the observed variables are the same as those of the marker analysis, which include S and the marker genotypes M = {mij} (i = 1, 2,…, n; j = 1, 2,…, K). (Here we change the notation for the marker genotypes from X to M since we reserve X for QTL genotypes. In the marker analysis, the QTL are markers themselves and therefore we refer to marker genotypes as X.) For unobserved variables, besides those in the marker analysis {Y; B; V}, there are additional variables, which include the QTL positions, λ = {λj} for j = 1, 2,…, p; and the QTL genotypes X = {xij} for i = 1, 2,…, n and j = 1, 2,…, p, where p is the total number of intervals, which in general ≤K − 1, and the equality holds only if one chromosome is considered in the analysis.

Now the joint posterior density of the parameters {Y, λ, X, B, V}, given the observed data {S, M}, is

equation M40
(14)

Again, inference is performed conditional on M and we suppress this conditioning notation for the remainder of the article.

The last term in (14) is the joint prior, and we assume

equation M41
(15)

The prior distributions for μ, aj, and equation M42 are the same as in the marker analysis while for the QTL positions and genotypes, we choose the following prior distributions, pj) = 1/dj and xij = −1 or 1 with probability equation M43 each for i = 1, 2,…, n, j = 1, 2,…, p, where dj is the length of the jth marker interval.

To sample unobserved variables, first we sample Y, B, V the same way as described in steps 1–4 for the marker analysis except that all xij, which are known in the marker analysis, should be replaced by equation M44, samples from the tth iteration. Then the following extra steps are needed to sample λ, X:

  • Step 5. Update QTL genotypes: The QTL genotype equation M45 (i = 1, 2,…, n; j = 1, 2,…, p) is updated one individual and one locus at a time, on the basis of their flanking marker information. Specifically, equation M46 is sampled from the conditional probability distribution p(x), which equals
    equation M47
    where equation M48 and equation M49 are the two flanking markers of the jth QTL in the individual i; equation M50 is the conditional probability of the QTL (at equation M51) given the two flanking markers; equation M52 and the probability
    equation M53
    (16)
  • Step 6. Update QTL positions: equation M54 is sampled via a Metropolis–Hastings approach since there is no closed form for the conditional posterior probability density of a QTL position. We first sample a new position equation M55 uniformly from the neighborhood of equation M56 with δ being a tuning parameter, which we take a value of 2 cM in subsequent simulations. equation M57 will then be accepted or rejected according to the probability min(1, α), where
    equation M58
    (17)
    If neither equation M59 nor equation M60 is within δ-distance away from the ends of the interval, equation M61. However, if equation M62 or/and equation M63 are within δ-distance from one end of the interval, then
    equation M64
    (18)
    where equation M65 is the distance of equation M66 from the nearest end of the interval; similarly, equation M67 is the distance of equation M68 from the nearest end of the interval.

After these two extra steps for sampling λ, X, one MCMC iteration is done and we then continue our sampling for the next round by repeating steps 1–6 with the new ψ.

Simulation studies:

For interval mapping, a backcross population of 500 individuals was simulated. We investigated a single large chromosome of 24 M that was covered by 121 evenly spaced markers (120 intervals, each 20 cM long). Two sets of QTL are simulated. In the first setup, four QTL are placed at 50, 450, 850, and 1250 cM with effects 1, −1, 1, and −1, respectively. The second simulation setups are described in Table 2. Figure 7 shows the estimated QTL effect against QTL positions along the genome for continuous and binary data for setup one. The QTL signals are clearly shown at the simulated positions for both cases. The QTL effects are almost equal to the true values in the normal case but a little larger than the true values in the binary case. Figure 8 presents the QTL positions and effects along with the real positions and effects for continuous and binary data, respectively, for setup two. It is shown that the Bayesian shrinkage method performs better for normal data than for binary data, especially for those QTL that locate in the region beyond 100 cM and have small effects. Parameters of closely linked QTL are hard to estimate. For example, the following QTL pairs, (5, 6), (10, 11), and (16, 17), are so tightly linked that they are inseparable in both normal and binary data cases. QTL 7 and 8, which are closely linked and have the same size but with opposite directions, are not detected in either data set. Nevertheless, for binary data, our method can clearly localize separable QTL and QTL with large effects. A much denser marker set along with a larger population size is required to resolve QTL with tight linkage and small effect.

Figure 7.
Bayesian estimates of QTL effects in the simulated backcross family with four QTL for continuous and binary data. “x” refers the simulated QTL position (x-axis) and effect (y-axis). The 95% confidence interval is bracketed by two horizontal ...
Figure 8.
Bayesian estimates of QTL effects in the simulated backcross family x with 20 QTL for continuous and binary data. The heights of the solid lines correspond to the posterior means. “x” refers to the simulated QTL position (x-axis) and effect ...
TABLE 2
QTL parameters and their positions

Convergence diagnostics tests:

Two convergence diagnostics tests are performed using convergence diagnosis and output analysis (CODA) software (Best et al. 1995) to make sure that the sample is representative of the underlying stationary distribution. The parameters we consider here include μ and the effect of four markers at which the true QTL are located. Figure 9 shows the autocorrelation function against the lag for the setup one simulation described in the Marker analysis section, where the number of markers is larger than the number of samples. The autocorrelation appears to be very small after lag 20 for all five parameters. The small autocorrelation shows no indication of slow convergence for the chain. For the same data set, we run five parallel Markov chains with the same length 10,000 using very different initial values to test Gelman and Rubin statistics. Figure 10 shows plots for the 50 and 97.5% quantiles of the sampling distribution for the shrink factor for above five parameters. The plots in Figure 10 show that both the median and the 97.5% quantiles stabilize around the expected value 1.0, indicating that the Markov chains converge to their limiting distribution.

Figure 9.
Autocorrelation function for the study of setup one. Here b0 represents μ and b1–b4 represent the effects of four markers at which the true QTL are located.
Figure 10.
Plot of Gelman and Rubin's shrink factor for the study of setup one. Here b0 represents μ and b1–b4 represent the effects of four markers at which the true QTL are located.

DISCUSSION

The Bayesian method developed in Xu (2003) has been extended to complex binary traits assuming the threshold model. The results outperform the single-marker chi-square test. The major advantage of using the threshold model is that once the underlying liability is generated, all other unknowns have conditional posterior distributions identical to those already given in the Bayesian analysis of normal data. This methodology can be easily generalized to multiple-ordered categorical traits. As an alternative to the approach of the liability augmentation, one can directly use the relationship (3) between the binary phenotype and the model effects. The problem is that normal is not a conjugate prior to logistic distribution and therefore Gibbs sampling will not work in this case. The Metropolis–Hastings algorithm or some complicated sampling scheme like ARMS (Gilks and Wild 1992) can be used but it is less efficient than the Gibbs sampling. The method has been implemented in C and the source codes are available from http://www.bios.unc.edu/~hhuang/QTLBinary.

The most striking feature of the Bayesian analysis is that the estimated profiles show clear signal only at a few positions with all remaining markers having effects that are close to zero. This shrinkage effect has been explained in detail in Xu (2003). The key point is that the priors of the variances of QTL effects are allowed to vary across markers instead of being fixed. This is demonstrated clearly in the special forms of (12) and (13). A large aj will have large variance σj and thus a negligible equation M69, which will lead to a small shrinkage effect. However, if a QTL has a small effect, a small equation M70 will be expected, which will generate a large ratio equation M71 and dominate over equation M72; thus aj will be very likely shrunk toward zero.

In this article, we have considered the cases that QTL are fixed at the observed markers and that one QTL can be located within each marker interval. For densely distributed markers, the former method is a good approximation to the latter one since the two methods make almost no difference. However, the first method may lead to biased results when the marker density is not high enough. One assumption made in the second method is that each interval contains at most one QTL, which may not be true if two markers are far apart. Yi (2004) has developed a new method that assumes that, on average, there is at most one QTL in every d cM. The pseudomarkers are introduced to make every interval the same length. The genotypes at pseudomarkers are treated as missing and can be easily handled in a Bayesian framework.

Several frequency-based model selection methods have been developed for mapping multiple QTL. To reduce modeling space, Carlborg and Andersson (2002) included only those markers that are significantly associated with the trait from the single-locus model in the multiple-marker model. Coffman et al. (2005) first selected one marker per linkage group, regardless of whether that marker is significantly associated with the trait or not, and then fitted their model on the basis of the chosen markers. Both approaches may identify multiple linked QTL in a biased way as mentioned in Kao et al. (1999). Our method keeps all possible models under consideration and therefore avoids potential selection bias of the above approaches. Although the variable dimension of our model is very high, our Bayesian model is simple and straightforward since standard Gibbs are mainly used in each MCMC iteration. We have tested the method on 300 individuals with 2000 markers and it took less than only 1 hr to generate 100,000 sweeps on a 3-GH linux machine. Convergence diagnostic tests also show that our method converges quickly and well.

Our method is particularly designed for the situation where no any prior information for the number of QTL and their positions is available. However, it should be noted that the method can be modified to include such prior information. For example, instead of using a noninformative prior for equation M73, we may use different priors for equation M74 on the basis of the available information from other studies. For example, if there is strong evidence of a QTL in a region, we may let σ2's in the region follow a uniform prior on [a, b] with a being large. On the other hand, if there is strong evidence that no such QTL exists in another region, we may then let σ2's in this region follow a uniform prior on [0, c] with c being small. Of course, introducing such region-specific priors complicates the MCMC procedure but would be beneficial especially when the sample size is small.

Further, our model currently can handle only main QTL effects. Another important extension is to include epistatic effects in the model. Several articles (Yi and Xu 2002; Narita and Sasaki 2004) have discussed this issue for normal traits. To include epistatic effects, an upper bound of the number of QTL in the model (Yi et al. 2005) has to be placed, which is extremely useful and necessary since the number of variables dramatically increases when epistatic effects are considered.

Finally, it worth mention that we have run marker analysis only on the real data. We have found that a significantly amount of markers are in segregation distortion, which will affect the genetic map that the interval mapping heavily relies on. However, marker analysis is unrelated to the genetic map and therefore is more robust to such segregation distortion. We are currently carefully investigating the causes of such segregation distortion, which itself may provide us insights on the etiology of colorectal cancer.

Acknowledgments

The authors thank the associate editor and two referees for helpful comments and suggestions, which have led to an improvement of this article. This work has been partially supported by National Institutes of Health grants MH070504, GM074175, CA105417, and CA079869.

References

  • Best, N. C., M. K. Cowles and S. K. Vines, 1995. CODA Manual Version 0.30. MRC Biostatistics Unit, Cambridge, UK.
  • Carlborg, O., and L. Andersson, 2002. Use of randomization testing to detect multiple epistatic QTL. Genet. Res. 79: 175–184. [PubMed]
  • Coffman, C. J., R. W. Doerge, K. L. Simonsen, K. M. Nichols, C. K. Duarte et al., 2005. Model selection in binary trait locus mapping. Genetics 170: 1281–1297. [PMC free article] [PubMed]
  • Falconer, D. S., 1965. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29: 51–71.
  • Falconer, D. S., and T. F. C. Mackay, 1996. Introduction to Quantitative Genetics, Ed. 4. Addison-Wesley, Boston.
  • George, E. I., and R. E. McCulloch, 1993. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 88: 881–889.
  • Gilks, W. R., and P. Wild, 1992. Adaptive rejection sampling for Gibbs sampling. Appl. Stat. 41: 337–348.
  • Green, P. J., 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732.
  • Hackett, C. A., and J. I. Weller, 1995. Genetic mapping of quantitative trait loci for traits with ordinal distributions. Biometrics 51: 1252–1263. [PubMed]
  • Hoeschele, I., 2001. Mapping quantitative trait loci in outbred pedigrees, pp. 599–644 in Handbook of Statistical Genetics, edited by D. J. Balding, M. Bishop and C. Cannings. Wiley, New York.
  • Kao, C. H., Z. B. Zeng and R. D. Teasdale, 1999. Multiple interval mapping for quantitative trait loci. Genetics 152: 1203–1216. [PMC free article] [PubMed]
  • Narita, A., and Y. Sasaki, 2004. Detection of multiple QTL with epistatic effects under a mixed inheritance model in an outbred population. Genet. Sel. Evol. 36: 415–433. [PMC free article] [PubMed]
  • Ruivenkamp, C. A., T. Csikos, A. M. Klous, T. V. Wezel and P. Demant, 2003. Five new mouse susceptibility to colon cancer loci, Scc11Scc15. Oncogene 22: 7258–7260. [PubMed]
  • Satagopan, J. M., B. S. Yandell, M. A. Newton and T. C. Osborn, 1996. A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics 144: 805–816. [PMC free article] [PubMed]
  • Sillanpaa, M. J., and E. Arjas, 1998. Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data. Genetics 148: 1373–1388. [PMC free article] [PubMed]
  • Stephens, D. A., and R. D. Fisch, 1998. Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo. Biometrics 54: 1334–1347.
  • Ven, R. V., 2004. Reversible-jump Markov chain Monte Carlo for quantitative trait loci mapping. Genetics 167: 1033–1035. [PMC free article] [PubMed]
  • Visscher, P. M., C. S. Haley and S. A. Knott, 1996. Mapping QTLs for binary traits in backcross and F2 populations. Genet. Res. 68: 55–63.
  • Wang, H., Y.-M. Zhang, X. Li, G. L. Masinde, S. Mohan et al., 2005. Bayesian shrinkage estimation of quantitative trait loci parameters. Genetics 170: 465–480. [PMC free article] [PubMed]
  • Wright, S., 1934. a An analysis of variability in number of digits in an inbred strain of guinea pigs. Genetics 19: 506–536. [PMC free article] [PubMed]
  • Wright, S., 1934. b The results of crosses between inbred strains of guinea pigs, differing in number of digits. Genetics 19: 537–551. [PMC free article] [PubMed]
  • Xu, S., 2003. Estimating polygenic effects using markers of the entire genome. Genetics 163: 789–810. [PMC free article] [PubMed]
  • Xu, S., and W. R. Atchley, 1996. Mapping quantitative trait loci for complex binary diseases using line crosses. Genetics 143: 1417–1424. [PMC free article] [PubMed]
  • Yi, N., 2004. A unified Markov chain Monte Carlo framework for mapping multiple quantitative trait loci. Genetics 167: 967–975. [PMC free article] [PubMed]
  • Yi, N., and S. Xu, 2000. Bayesian mapping of quantitative trait loci for complex binary traits. Genetics 155: 1391–1403. [PMC free article] [PubMed]
  • Yi, N., and S. Xu, 2001. Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics 157: 1759–1771. [PMC free article] [PubMed]
  • Yi, N., and S. Xu, 2002. Mapping quantitative trait loci with epistatic effects. Genet. Res. 79: 185–198. [PubMed]
  • Yi, N., B. S. Yandell, G. A. Churchill, D. B. Allison, E. J. Eisen et al., 2005. Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics 170: 1333–1344. [PMC free article] [PubMed]
  • Zeng, Z. B., 1993. Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA 90: 10972–10976. [PMC free article] [PubMed]

Articles from Genetics are provided here courtesy of Genetics Society of America
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...