- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

# Large Upward Bias in Estimation of Locus-Specific Effects from Genomewide Scans

^{1}Department of Genetics, Southwest Foundation for Biomedical Research, San Antonio;

^{2}Department of Psychiatry and

^{3}Columbia Genome Center, Columbia University, and

^{4}New York State Psychiatric Institute, New York

## Abstract

The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect—and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to *all* statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

## Introduction

The primary goal of a genomewide scan, by linkage and/or by association analysis, is to localize genes whose variation influences the observed phenotypic variation. A secondary goal, at least in the opinion of some investigators, is to estimate the effects of each identified locus on the phenotype (e.g., Blangero et al. 2000). This can be measured by estimating parameters that characterize the relationship between genotype and phenotype—for example, locus-specific heritability, locus-specific recurrence risk to relatives, and the risk attributable to a specific allele or genotype. Clinically and epidemiologically oriented researchers, in particular, often tend to view the estimation of locus-specific effects, especially if estimated as an attributable risk, as an integral and crucial component of a genome scan.

In this study, we demonstrate that the chromosomal position and the genotype-phenotype relationship of a locus cannot both be estimated reliably by use of a single data set of currently realistic size, at least for loci of small effect size. The fundamental problem of joint estimation is that the statistic providing evidence for the presence of a locus at a given chromosomal location, typically given in the form of a LOD score, is itself a function of the parameter(s) characterizing the genotype-phenotype relationship. Statistical significance and the estimated parameter(s) therefore are not independent but are highly correlated. When the test statistic is maximized over the many pointwise tests in the genome, the estimates of the parameter(s) characterizing the locus-specific effects thus are effectively maximized as well. The obvious sampling bias in the reported test statistics results in a less-obvious sampling bias in the reported parameter estimates, which tend to be grossly inflated. This is borne out empirically by the observation that most follow-up studies give lower estimates of effect size than do the initial studies.

As an example, we focus on the additive phenotypic variance attributable to the genetic variation at a quantitative trait locus (QTL), which is a free parameter estimated in variance-components (VC) linkage analysis. The bias in QTL heritabilities has been noted elsewhere in agricultural plant and animal species (Lande and Thompson 1990; Beavis 1994; Utz and Melchinger 1994; Georges et al. 1995; Kearsey and Farquhar 1997; Beavis 1998; Melchinger et al. 1998; Utz et al. 2000), but this literature is largely unknown to “gene hunters” on human data. Here, we provide a systematic overview of the problem, using both genomewide simulation of human pedigree data and approximate analytical solutions. It is shown that the bias is a function of the true effect size, the sample size, and other factors, which together also determine the power of a study. On small samples—small in the sense of having low power to detect QTLs of realistic effect size, but potentially much larger than most current human-pedigree data sets—the estimates of QTL heritability at LOD-score peaks can be virtually independent of the true state of nature.

It is important to keep in mind that VC linkage analysis and estimates of QTL effect size derived from it are only an example used here for demonstration purposes. The same bias applies to *all* other methods of gene localization and estimates of *any* underlying parameter relating marker-locus genotypes and trait phenotypes, if these parameters are estimated at peaks of the test statistic. The main goal of this study is to make investigators aware of the existence and severity of this problem in general.

### Brief Overview of VC Linkage Analysis

Let us first give a brief overview of those aspects of VC linkage analysis that are required for the understanding of this study. For further details, the reader is referred to the original articles by (for example) Lange et al. (1976), Hopper and Mathews (1982), Goldgar (1990), Amos (1994), and Almasy and Blangero (1998), as well as the recent reviews by Blangero et al. (2000, 2001).

VC pedigree analysis assumes that both genetic and environmental factors have effects on the phenotype of interest. The phenotypic covariance among study participants is modeled as a function of independent, additive variance components, which are estimated by maximum likelihood. In the most basic model of VC linkage analysis, under the alternative hypothesis of linkage (i.e., the presence of a QTL at the chromosomal location being tested), the overall phenotypic variance (σ^{2}) is modeled as the sum of the phenotypic variances attributable to the additive effects of a QTL at the given chromosomal position (σ^{2}_{q}), the aggregate additive effects of genes elsewhere in the genome (σ^{2}_{p}), and individual-specific environmental effects and/or measurement errors (σ^{2}_{e}). Under the null hypothesis of no linkage (i.e., the absence of a QTL at or near the chromosomal location being tested), σ^{2}_{q} is set to 0. The statistical evidence for linkage is evaluated by a likelihood-ratio test, which is typically presented as a LOD score:

denotes the likelihood. Asymptotically, the likelihood-ratio statistic, , is assumed to be distributed as an equal mixture of a χ^{2} random variable with 1 df and a point mass at 0 (Self and Liang 1987).

As a function of these variance components, the (additive) heritability of the trait can be written as

and the (additive) heritability attributable to the QTL can be written as

For brevity, we refer to σ^{2}_{q} as “QTL effect size” and to *h*^{2}_{q} as “QTL heritability.”

### Sources of Bias

There are a variety of sources of bias in QTL effect-size estimates. Here, we will distinguish between two types of bias. By “pointwise sources of bias,” we mean the factors leading to a bias in the QTL-heritability estimate when linkage analysis is conducted at a single position, *q,* in the genome, and the QTL heritability is estimated independently of the magnitude of the LOD score—that is,

By “genomewide sources of bias,” we mean the additional bias resulting from joint estimation of locus position and effect size in scans of the whole genome or parts thereof—that is,

Of course, a genome scan may provide statistically significant evidence not just for one locus but for multiple loci or for none at all. Equation (2) is meant to illustrate the multiple-testing problem that results in genomewide bias, independent of whether the genomewide maximum LOD score is significant. This genomewide bias may also be viewed as a type of pointwise bias, resulting when the QTL heritability is estimated only when the LOD score exceeds some threshold.

In addition to these intrastudy sources of bias, there are interstudy sources of bias, because results from successful scans are presumably more likely to be reported. This last source of bias is beyond the scope of this study, but it is expected to have the general effect of magnifying the bias described here.

#### Pointwise sources of bias

There are many reasons why QTL-heritability estimates may be biased if computed at a single point in the genome. In general, estimates of bounded parameters are often biased, whether obtained by maximum likelihood or any other method. *h*^{2}_{q} is defined on the interval , as it is a proportion. For *h*^{2}_{q}≠0.5, the boundaries of its domain (0 and 1) are asymmetrical, which necessarily leads to an asymmetrical distribution of , and is expected to result in bias. In most cases, the closer to a boundary the true value of the parameter, the larger the bias from this source. As QTL heritabilities for complex traits are expected to be small in general, on the basis of both empirical evidence and theoretical considerations (e.g., Blangero et al. 2000; Terwilliger and Göring 2000; Weiss and Terwilliger 2000), the pointwise bias from this source will typically be upward. However, under quite general “standard” conditions, maximum-likelihood estimates are consistent—that is, the estimate gets arbitrarily close to the true value with increasing sample size. This type of bias, therefore, goes away asymptotically. Under suitable circumstances, pointwise estimates of QTL heritabilities can be “effectively unbiased” (Williams et al. 1997).

There are many other reasons why pointwise QTL-heritability estimates are often biased in more-serious ways, where the bias does not disappear asymptotically and often goes hand in hand with an inflated false-positive rate. Sometimes, VC-based pedigree analysis is conducted on pedigrees that are ascertained independent of the phenotypes being studied. Such so-called “random ascertainment” minimizes the risk of ascertainment bias and has been assumed in the simulations and analytical expressions presented in this study. Often, however, other ascertainment protocols are used to increase the power of gene mapping. In general, this is expected to result in a potentially large bias in the estimation of genotype-phenotype relationships, because samples ascertained on a given phenotype are not representative of the population as a whole, with respect to that phenotype. Real-world ascertainment schemes rarely conform to simple mathematical models, and correction for such ascertainment bias may be complicated or impossible—not least because directed ascertainment typically enriches for certain etiological factors at the expense of others (see, e.g., Fisher 1934; Boehnke and Greenberg 1984; Boehnke and Lange 1984; Beaty et al. 1987; Rao et al 1988; Comuzzie and Williams 1999). Furthermore, the phenotypic distribution is rarely perfectly multivariate normal and sometimes deviates a great deal from that distribution, violating the inherent assumptions of standard VC linkage analysis. This generally leads to biased parameter estimates and, potentially, to an increased false-positive rate (see, e.g., Allison et al. 1999; Blangero et al. 2001). It should be mentioned, however, that variants of VC linkage analysis that are more robust to violations of this assumption now exist (see, e.g., Lange et al. 1989; Blangero et al. 2001). In addition, gene × gene interactions (i.e., epistasis) and gene × environment interactions also typically inflate locus-specific heritability estimates, since interactions, if not modeled, can inflate the apparent marginal effects. Environmental factors shared among relatives, including prenatal environment, mimic genetic similarity between relatives and thus inflate the estimated trait heritability (Terwilliger et al., in press) and, under certain circumstances, estimates of QTL heritability as well. Assortative mating generally has similar consequences (see, e.g., Crow and Kimura 1970; Falconer and Mackay 1996; Mukhopadhyay et al. 2000). By contrast, many types of errors—for example, genotyping errors—are expected to lead to deflated estimates of parameters describing the genotype-phenotype relationship, resulting in, for example, an inflated estimate of the recombination fraction (see, e.g., Göring and Terwilliger 2000*a,* 2000*b,* 2000*c*). Although such pointwise sources of bias clearly are very important, we assume their absence in this study, to highlight the magnitude of the bias due to genomewide multiple testing alone.

#### Genomewide sources of bias

The main focus of this article is on genomewide sources of bias—namely, the bias that results when QTL effects are reported only at the chromosomal position(s) where the test statistic peaks and exceeds some chosen threshold. When maximizing the LOD score over the genome or over parts thereof, as shown in equation (2) above, one is really engaged in a multiple-testing experiment (see Jones and Rushton [1982] for a general overview of simultaneous statistical inference). The simple explanation for why the obvious sampling bias in the reported LOD score extends to a bias in the reported QTL-heritability estimate is that the LOD score is a function of the QTL heritability parameter, as shown in equation (1) above. *Z* and are therefore not independent. In fact, for a given data set, assuming constant information on meiotic transmissions throughout the genome, there is essentially a one-to-one correspondence between LOD scores and QTL-heritability estimates, which then provide redundant information (see fig. 2). Typically, however, the available information on chromosomal segregation varies from point to point throughout the genome, because of differences in marker density, marker informativeness, and which individuals are genotyped at a given marker—among other reasons, including genotyping, map, and other errors. Although the one-to-one correspondence between LOD scores and QTL-heritability estimates then no longer holds, *Z* and remain positively correlated. By maximization of the LOD score over the genome or over parts thereof, the estimate of QTL heritability is thus effectively maximized as well, resulting in an upward bias. The bias from genomewide testing may also be viewed as a type of pointwise bias that results when the QTL heritability is estimated only when the LOD score is significant. Even if pointwise estimates of locus-specific effect size were unbiased when estimated irrespective of the LOD score, they cannot be so conditional on the linkage test being significant, given the correlation of the LOD score and the QTL-heritability estimate.

*Z*= observed LOD score; = sample estimate of additive trait heritability attributable to QTL. See text for details of

**...**

In many respects, performing a genomewide scan is analogous to statistical model selection, where the chromosomal positions tested correspond to the model parameters considered, and the estimated locus-specific effects at chromosomal positions of the peaks of the test statistic correspond to the estimated values of the parameters chosen for the model. A difference, however, is that chromosomal positions are generally tested individually, one at a time. It is well known that estimates of parameters that are selected for the model on statistical grounds are biased if the same data set is used for model selection and parameter estimation (e.g., Miller 1990; Faraway 1992). The stricter the criteria are for including a parameter in the model, the larger the bias is expected to be. In genomewide linkage scans, the problem is severe, because the large number of tests requires a high threshold for significance. Consequently, the customary LOD score of 3 for the declaration that a linkage finding is significant asymptotically corresponds to a pointwise *P* value of .0001 (Morton 1955), which is much smaller than the *P* value of .05 commonly used as a standard in pointwise statistical analysis. The autocorrelation of the test statistic and, equivalently, of the locus effect-size estimate along the chromosomes depends on many factors, including the nature of the data and analysis method, and the appropriate genomewide significance threshold could be chosen accordingly (see also Lander and Kruglyak 1995). For genomewide scans for allelic association, the problem is even more severe, as the autocorrelation of the test statistic is lower. In other words, the number of equivalent independent tests is greater, requiring a significance threshold far in excess of a LOD score of 3, even though this is generally not practiced currently (see also Risch and Merikangas 1996). The same problem applies, to a still greater degree, to joint tests of linkage and association.

The bias caused by genomewide testing disappears asymptotically. However, as shown below, most current data sets for mapping of complex traits in humans have nowhere near the required size to make the genomewide bias negligible.

## Results

### Simulation Results

We have investigated the genomewide bias in QTL-heritability estimates by simulation. The simulated data set consisted of 1,000 randomly ascertained nuclear families with two offspring each. The genome comprised 22 autosomal chromosomes with a total genome size of 3,300 cM. Marker genotypes of both parents and offspring were simulated for fully informative markers spaced at a density of 2 cM throughout the genome. Quantitative-trait phenotypes were simulated for all individuals. The overall (additive) trait heritability, *h*^{2}, was set to 0.5, attributable to 0–5 unlinked QTLs with *h*^{2}_{q}=0.1 each (or 0–10 QTLs with *h*^{2}_{q}=0.05 each) and nonlocalized polygenic effects. Each QTL was diallelic, with equally frequent alleles, and was located in the middle of a chromosome. The remaining phenotypic variance was due to individual-specific effects. Phenotypic effects of dominance, covariates, shared environment, and any other complicating factors were assumed to be absent. Marker maps, marker genotypes, and phenotypic data were assumed to be accurate. Two-point VC-based linkage analysis was conducted on each marker using SOLAR (Almasy and Blangero 1998). All LOD score peaks 3, their chromosomal locations, and their associated QTL-heritability estimates were recorded. For genome scans without a LOD score 3, the highest LOD score in the genome, its position, and its associated QTL-heritability estimate were recorded instead, as investigators generally still publish the findings of such studies and discuss at least the highest peak. For each different simulation setting, 500 replicates (i.e., full genome scans) were analyzed. By design, the size of the data set, although large compared with that of most real-world mapping studies, is too small to detect the simulated QTLs with high power, since this appears to be the reality for most genome scans of complex traits to date (e.g., Terwilliger and Göring 2000; Weiss and Terwilliger 2000). Alternative study designs, such as large, multigenerational pedigrees, involving an equal number of individuals may be more powerful (see Williams and Blangero 1999; Blangero et al. 2000) and may lead to a somewhat smaller bias. In all other respects, the simulation conditions represent a best-case scenario, designed to minimize the resulting bias.

We first examined estimates of overall trait heritability, as well as estimates of QTL heritability obtained at the exact chromosomal position of a QTL, to verify the correctness of our simulation and analysis procedure. The mean value of across replicates was 0.498, essentially identical to the generating value. Figure 1 shows the pointwise distribution of at the true position of a QTL across replicates, for generating value *h*^{2}_{q}=0.05. The distribution is clearly skewed, with a long upper tail and a large point mass at the lower boundary of 0, and the expected value of , 0.062, is somewhat biased upwards. For *h*^{2}_{q}=0.1, the mean estimate was 0.104 (data not shown). The decrease in bias is simply due to the fact that the larger generating value is further away from the lower boundary, thus reducing the skewness resulting from the point mass at 0. Note that the observed pointwise biases are not large under the ideal circumstances simulated.

*h*

^{2}

_{q}= the generating value for additive trait heritability attributable to QTL; = its sample estimate. See text for details of the simulation.

We next examined the properties of these estimates in genomewide scans. Figure 2 shows that, in a given data set, under the assumption of complete information on chromosomal segregation throughout the genome, there is essentially a one-to-one correspondence between the observed LOD score and its associated QTL-heritability estimate. In this situation, in fact, the LOD score is essentially a linear function of (Williams and Blangero 1999). If the information on meiotic transmission varies throughout the genome—in other words, if the effective sample size varies in terms of “equivalent number of meioses” (Edwards 1976)—the one-to-one correspondence breaks down, but statistical significance and the locus-specific effect-size estimate remain positively correlated. We therefore expected that the estimates of QTL heritabilities at genomewide LOD score peaks would be biased upwards.

Table 1 shows that this is, indeed, the case. It gives the average QTL-heritability estimate at the chromosomal position of the genomewide maximum LOD score (columns labeled *Z*_{max}) and at the chromosomal positions of LOD score peaks of at least 3 (columns labeled *Z*_{3}), as a function of the number of QTLs with *h*^{2}_{q}=0.1 in the genome. Let us first focus on the estimates for both true and false peaks. In the baseline case, when there is no mappable QTL in the genome (i.e., *h*^{2}_{q}=0.0), the mean estimate of associated with *Z*_{max} is ~0.24, demonstrating the magnitude of the bias due to maximization of the LOD score over the genome. When there are mappable QTLs in the genome, essentially identical estimates are obtained. The same also holds for QTLs with *h*^{2}_{q}=0.05 (data not shown). The QTL-heritability estimates thus are of similar magnitude, no matter what the true QTL heritabilities are or whether any mappable QTLs exist at all. Under the simulation settings, the estimates are therefore essentially independent of the true state of nature. Table 1 also gives the QTL-heritability estimates associated with only those LOD score peaks meeting or exceeding the customary LOD score threshold of 3 (Morton 1955), in which case the bias is even greater. For the simulated data set of 1,000 fully informative nuclear families, the QTL-heritability estimate simply must be nearly 30% for the LOD score to be significant, no matter what the true state of nature is.

Some investigators may hesitate to speak of a bias in QTL-heritability estimates when the true QTL heritability is 0 (i.e., when there is no QTL at the given chromosomal location). However, this surely is the situation for most of the genome and therefore is a relevant situation to consider. Besides, in real life, one generally cannot determine whether a LOD score peak is a true or false positive, just as one often does not know whether there really are any major QTLs to be mapped in the first place. In any case, the two rightmost columns of table 1 focus only on those LOD score peaks that are true positives, defined here arbitrarily as a peak occurring within 25 cM of a QTL. As before, the QTL-heritability estimates are greatly inflated. The estimates are very similar when both true and false peaks are considered or only true peaks. For a study with low power, like that simulated here, one can essentially predict from the outset what the estimated QTL heritabilities are likely to be.

These simulations assumed fully informative markers throughout the genome and complete, error-free information in general. Of course, this is not the case in real studies. As a result, in real data the autocorrelation of the test statistic along the chromosome is decreased (i.e., the LOD score fluctuates more), the genomewide false-positive rate is increased (Terwilliger et al. 1997), and the bias is exacerbated further (data not shown).

### Analytical Results

Let us also derive an analytical expression for the genomewide bias in QTL-heritability estimates. For mathematical tractability, a number of simplifying assumptions were made. The bias resulting from genomewide testing is approximated by the pointwise bias occurring at the true position of a QTL when one conditions on the LOD score being significant. As in the simulations, we assume complete and accurate information on chromosomal segregation and phenotype. Furthermore, the analytical expressions do not allow for differences in overall trait heritability between the total population and samples obtained from it. Despite these simplifications, the analytically derived results correspond closely to those from the simulations.

Let be the cutoff for declaring the VC likelihood-ratio statistic of linkage, Λ, to be significant, equivalent to a LOD score of 3. Under the alternative hypothesis of linkage, the statistic is expected to be distributed as a noncentral χ^{2} random variable with 1 df and noncentrality parameter equal to the expected value of the statistic on the data—that is, (Stuart and Ord 1991). Let us denote this density function as χ^{′2}[Λ,1,ξ]. The expected QTL-heritability estimate at the true QTL location, given that the test statistic is significant—that is, Λλ, is given by

The denominator represents the pointwise power to detect linkage in a data set. The division by this quantity is required to ensure that the integration is done over a proper density function, integrating to 1, because the expectation is computed conditional on the test statistic being significant. Williams and Blangero (1999) have derived approximate analytical expressions for the power of VC linkage analysis for a variety of pedigree structures, in the presence of complete, error-free information on chromosomal segregation and phenotypes. They showed that the expected value of the likelihood-ratio statistic for a given data set is a function of both the QTL heritability and the overall heritability of the trait. Specifically, , where *c* is a constant for a given data set and total heritability. Furthermore, when full and accurate information on chromosomal transmissions and phenotypes is available,

because of the one-to-one correspondence of LOD score and QTL-heritability estimate for a given data set in that situation (see fig. 2). By substituting these two expressions into equation (3), one obtains

As an example, for a data set comprising *n* nuclear pedigrees consisting of two parents with two offspring, the constant is given by

(Williams and Blangero 1999), under the assumption of complete and accurate information on chromosomal transmissions and phenotypes.

Figure 3 shows the expected bias in the QTL-heritability estimate as a function of the sample size (i.e., the number of two-offspring nuclear pedigrees) and the true QTL heritability, for fixed overall heritability of 0.5 and LOD scores of at least 3 at the true position of the QTL. The bias is shown to be a function of the sample size and the true QTL heritability, such that for a smaller sample size and/or a smaller true QTL heritability, the bias is larger. Notice that the curves are quite linear and parallel, with slopes of ~−1, when power is low. In other words, the QTL-heritability estimates are quite constant for different true underlying QTL heritabilities—that is, —and thus are virtually independent of the true QTL heritability, just as observed by simulation.

*h*

^{2}= overall additive trait heritability (

*h*

^{2}=0.5, unless otherwise indicated);

*h*

^{2}

_{q}= additive trait heritability attributable to QTL. The indicated sample sizes refer

**...**

For completeness, figure 3 also shows the bias, for a fixed sample size of 500 pedigrees, as a function of the overall trait heritability, to show that the bias also depends on this trait-specific quantity. This is also apparent from equations (4) and (5), which show that the noncentrality parameter is a function of the overall trait heritability, in addition to the QTL heritability and the sample size, among other factors. The higher the residual genetic correlation among relatives, the greater the power (Williams and Blangero 1999) and the smaller the bias for a given data set.

Figure 4 is an attempt to explain graphically why the bias decreases with an increased sample size, for a given locus-effect size in the study population. There are two components to the explanation. First, everything else being equal, the larger the sample size, the smaller the variance of the sample estimate of the locus-specific effect-size. In the figure, this is indicated by the narrower density function of for the larger sample. Secondly, everything else being equal, the larger the sample size, the greater the power to map a locus of a given effect size. The bias, which results because the locus-specific effect size is estimated only when the test statistic is significant, is thus reduced, because a larger sample need not be as extreme, with respect to its locus-specific effect size estimate than a smaller sample, to yield a statistically significant finding. In the figure, the shaded area under each curve corresponds to the power of a sample of that size and, equivalently, to the proportion of samples of that size from which the locus-specific effect size is estimated and reported. A corollary of this is that when a LOD score of, say, 10 is reported, its associated genotype-phenotype parameter estimate(s) is more believable (i.e., is expected to be less biased) than at a less convincing LOD score of, say, 4. However, studies of such high power to detect genes influencing truly complex traits appear unrealistic, at present, for most complex-trait loci—or, at least, may be achievable only by nonrandom ascertainment schemes, in which case the ascertainment bias is expected to be large and potentially uncorrectable.

### Replication

The literature is rife with publications reporting seemingly convincing evidence for the presence of a locus at a given chromosomal position for some complex trait—findings that have never been replicated. The difficulty of replicating a significant linkage and/or association finding for a complex trait in an independent sample can be disheartening. However, it should not be surprising at all, even if the initial localization is, in fact, correct (Suarez et al. 1994). This can be seen if one views the issue of replication in light of the bias discussed here. The two issues are really one and the same: when power is low—as has been shown empirically to be the case in gene mapping for complex traits—considerable luck is required to obtain a significant true finding, and the resulting bias in locus-specific effect size estimate is correspondingly large. If one assumes that a published locus-specific effect-size estimate of some initial study is accurate when performing power calculations for a replication study, one most likely overestimates the power to replicate, perhaps greatly so, ultimately resulting in likely replication failure. A corollary is that failure of replication does not imply that a reported finding is false, even though it does point out that the locus-specific effect-size estimate from the initial study is likely an overestimate.

More formally, as shown above, the bias in locus-specific effect-size estimation is inversely related to the power of a study. The pointwise failure to replicate, or 1−power, is given by

where ξ_{rep} is the expected likelihood-ratio statistic for the replication study and λ_{rep} is the critical value for declaring replication significant, here assumed to correspond to a LOD score of 3, as before. In figure 5, “probability-of-replication-failure” curves (i.e., “1−power” curves) are superimposed on the bias curves of figure 3, as a function of the true underlying QTL heritability and the sample size, using the same data structures (two-offspring nuclear pedigrees) and conditions (overall heritability of 0.5 and complete and accurate information on phenotypes and chromosomal transmissions) as before. Note that the bias does not disappear until the sample size is large and/or the true QTL heritability is sizeable; in either of these cases, power would be high. Most current genetic studies of complex traits are probably underpowered and are subject to a significant upward bias in locus-specific effect-size estimates.

*h*

^{2}

_{q}= additive trait heritability attributable to QTL. The indicated sample sizes refer to numbers of two-offspring nuclear families. See text for details on analytical

**...**

It should be mentioned that the prospects of replication are likely even poorer when the validation sample comes from a population that differs genetically and/or environmentally from the population used in the original study. Obviously, there are genetic differences among human populations, as a result of our evolutionary history, even though interpopulation genetic variation appears to be smaller than intrapopulation genetic variation (see, e.g., Weiss and Terwilliger 2000), and there are environmental and cultural differences as well (see, e.g., Terwilliger et al., in press). The mere fact that a locus was successfully mapped in one population makes it likely that the locus can more easily be mapped in that population than in another.

## Discussion

### Bias Elimination?

A key question is whether it is possible to avoid or reduce this bias. Given that estimates of locus-specific effect size can be virtually independent of the true effect size, as shown above, any hopes of satisfactory avoidance of bias appear to be misguided, even in the absence of pointwise biases. A single data set of limited size simply cannot be used to reliably estimate both locus position and effect size, at least for a locus of small effect.

Let us nonetheless discuss one attempt at bias reduction, because it is quite intuitive and appears promising at first glance. One could ask the following question: given a data set and a significance criterion chosen a priori, what unknown true QTL heritability is most likely to have produced the observed results? One could try to address this question by, for example, equating the observed QTL-heritability estimate, , with its expected value given an assumed true QTL heritability, *h*^{2}_{q}, and then solving the following equation, based on equation (4) above, for *h*^{2}_{q}*:*

(If the equation has no solution, the estimate of the underlying QTL heritability in the population would be 0.) In principle, this approach would also allow computation of confidence intervals for the unknown true QTL heritability. The fundamental problem with such an approach is that the corrected QTL-heritability estimate would be very crude, with an extremely wide confidence interval. The reason is that for low-power investigations, such as most complex-trait–mapping studies, the expected value of the likelihood-ratio statistic is quite small; as a consequence, only the upper tails of different noncentral χ^{2} distributions would be compared with one another, and these tails overlap significantly. This is demonstrated in figure 6, which shows LOD score density functions for a data set of 1,000 two-offspring nuclear families (as before) for different underlying values of *h*^{2}_{q}*,* conditional on the LOD score being significant (3). Note the wide overlap of the various distributions.

*Z*= LOD score;

*h*

^{2}

_{q}= additive trait heritability attributable to QTL. The given power numbers refer to a sample of 1,000 two-offspring nuclear families. See text for details

**...**

The crudeness of such bias correction is not surprising. We all know that a LOD score of, say, 3, may be a false positive (i.e., *h*^{2}_{q}=0). Of course, the LOD score may also be a true positive, giving evidence of a true locus with *h*^{2}_{q}>0. If there were a statistical technique by which the bias could be accurately corrected, this would mean, in essence, that there is information allowing us to distinguish true and false LOD score peaks from each other, on the basis of their magnitude alone. Of course, this is not possible.

### Pointwise Replication

The only realistic option for bias elimination appears to be the use of independent data sets for locus mapping and for estimation of the locus-effect size (Utz et al. 2000). One or several data sets are used for model selection and others for parameter estimation. Optimally, different sampling schemes are used for these two purposes, the former being designed to maximize mapping power and the latter to be representative of the target population as a whole (see, e.g., Terwilliger et al., in press). If the locus-specific parameters are estimated in a new data set at the exact chromosomal position where a LOD score peak occurred in an earlier study, the bias from genomewide LOD score maximization is completely avoided, if we assume that the estimates are reported independent of the significance of the localization signal. In fact, because the peak in the initial study likely occurred, by chance, some distance away from the true locus position, pointwise replication in a different sample is expected to furnish a QTL-heritability estimate that is somewhat lower than the true value at the actual location of the gene, if we assume the absence of pointwise sources of bias.

There are many difficulties with replication as currently practiced in the field of gene mapping for complex human traits, however. Different data sets often come from different populations. In general, no two human pedigree samples can be genetically identical, in contrast to inbred animal species, and environmental and cultural variation is likewise unavoidable, contrary to experimental laboratory conditions (for a general discussion, see Lindsay and Ehrenberg [1993]). In addition, it is not clear, for a wide variety of reasons, whether peaks at similar chromosomal locations in separate samples necessarily point to the same underlying genetic factor. This formulation alone already points to the next problem, namely that estimates of locus effect size at a putative QTL position identified by an initial study are often only reported for those studies that replicate the earlier finding—that is, those that provide at least some evidence for a locus in that region of the genome. Furthermore, most studies use nonrandom ascertainment, leading to potentially uncorrectable pointwise biases.

### Generality of Results

The most important aspect of this paper is that our findings have ramifications for joint estimation of locus position and effect in general, no matter what method is used for mapping or what locus-specific parameters are estimated. Of course, the findings are in no way limited to genetic studies but apply equally to other multiple-testing problems in epidemiology (Thomas et al. 1985) and elsewhere. QTL heritabilities estimated in VC linkage analysis were merely an example used here. In a linkage and/or association study in general, only the trait phenotypes and the marker genotypes are observed, which are correlated with each other only indirectly, via the generally unobservable trait locus genotypes (Göring 2000; Terwilliger and Göring 2000; Weiss and Terwilliger 2000). Estimation of any parameters relating marker genotypes and trait phenotypes are thus subject to genomewide sampling bias, since all tests of linkage and/or association can be conceptualized mathematically in a form equivalent to equation (1), and all genomewide scans are equivalent to equation (2). The same basic fallacy therefore also applies to estimation of locus-specific dominance effect sizes, locus-specific recurrence risk to relatives, and locus allele-/genotype-specific attributable risk. The same holds for estimation of gene × environment interactions (Melchinger et al. 1998; Utz et al. 2000) and gene × gene interactions (i.e., epistasis). In fact, because higher significance levels should arguably be employed for testing for such interactions, since there are more potential tests, the bias is expected to be even greater. The same is true for genomewide association studies and joint linkage and association studies. Estimates of trait-locus allele/genotype frequencies and penetrances, recombination fractions, etc., are also subject to bias. Obviously, the problem cannot be avoided by using one method for mapping and another for locus-specific effect-size estimation, given that all methods developed for gene mapping better be correlated, as they strive to model the same underlying biological realities. Joint estimation of locus position and effect simply does not work on the same data set, at least when power is as low as it is for complex traits, on which most of us are currently focusing our efforts.

### Differential Information Content throughout the Genome

Occasionally, one might be faced with the following situation: two LOD score peaks of nearly identical magnitude are observed, but the associated locus effect-size estimates are quite different. In the absence of any other relevant information, such as the existence of candidate genes, is it reasonable to give follow-up priority to the chromosomal region with the larger QTL-heritability estimate? We argue that it is not. This hypothetical situation is only possible when the effective sample size, in terms of informative “equivalent meioses” (Edwards 1976), differs between those two chromosomal regions. We would suggest that additional markers be genotyped near the peak with the larger locus-specific effect size estimate, to bring the local information content up to that of the chromosomal region of the other peak. One would then be in a better position to decide which peak to pursue. If additional genotyping is not possible, one might even argue that the putative locus with the lower heritability estimate might be more believable, because the evidence for this locus is based on a larger number of informative meioses (i.e., a larger effective sample size).

## Conclusions

We have demonstrated, by simulations and approximate analytical expressions, that QTL-heritability estimates at LOD score peaks in genomewide VC linkage analysis are grossly biased upwards, even though such estimates can have fairly small pointwise biases under certain circumstances. The bias is shown to depend on the true parameter value, the sample size, and the study design (among other factors), which together also determine the power of a study. In realistic situations, estimates of QTL effect size can be virtually independent of the true effect size. We have further shown that there appears to be no satisfactory way of correcting for the genomewide bias within a study, suggesting that pointwise estimation on an independent data set may be the only meaningful approach to improved parameter estimates. Most importantly, it is argued that the results presented here for VC linkage analysis apply to all statistical methods of gene localization and their estimates of parameters characterizing the genotype-phenotype relationship.

Given the magnitude of the bias in most real-world studies, as judged by the results presented here, statements such as our own “Also, unlike most penetrance model-free linkage analysis methods, the variance-component method can be used both for localization of QTLs and for obtaining good estimates of the relative importance of the QTL in determining phenotypic variance in the population” (Almasy and Blangero 1998) are wishful rather than realistic, at least when viewed in the context of a genomewide scan. We simply should not expect to be able to map loci and estimate their effects from a single genome scan, even when the data set is large compared with those used in most current studies. When estimates of locus-specific effect size are published after a genomewide scan is conducted, we should keep in mind that these estimates are almost certainly biased upwards, probably to a large degree. By always keeping this bias in mind, we can hope to avoid overestimation of the importance of the identified loci, as most of us are prone to do. This would also allow us make more-realistic assumptions about the likely true locus-specific effects on which to base our own power calculations for mapping of complex traits. Moreover, we would be less likely to overestimate the likely effectiveness of potential future drugs targeting the identified genes. Lastly, given the equivalence of the factors leading to bias and to replication failure, we also would no longer be surprised by how difficult it is to successfully replicate an earlier gene localization for a complex trait in an independent sample.

## Acknowledgments

This work has been supported in part by National Institutes of Health grants HL45522, HL28972, GM31575, and MH59490 (to J.B.), and by National Institutes of Health grant MH63749 and a Hitchings-Elion fellowship (to J.D.T.).

## References

*a*) Linkage analysis in the presence of errors. I: Complex-valued recombination fractions and complex phenotypes. Am J Hum Genet 66:1095–1106 [PMC free article] [PubMed]

*b*) Linkage analysis in the presence of errors. II: Marker-locus genotyping errors modeled with hypercomplex recombination fractions. Am J Hum Genet 66:1107–1118 [PMC free article] [PubMed]

*c*) Linkage analysis in the presence of errors. IV: Joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. Am J Hum Genet 66:1310–1327 [PMC free article] [PubMed]

*t*distribution. J Am Stat Assoc 84:881–896

**American Society of Human Genetics**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (483K)

- Reduction of selection bias in genomewide studies by resampling.[Genet Epidemiol. 2005]
*Sun L, Bull SB.**Genet Epidemiol. 2005 May; 28(4):352-67.* - Replication of small effect quantitative trait loci for behavioral traits facilitated by estimation of effect size from independent cohorts.[Genes Brain Behav. 2006]
*Bennett B, Carosone-Link P.**Genes Brain Behav. 2006 Jul; 5(5):404-12.* - Locus-specific heritability estimation via the bootstrap in linkage scans for quantitative trait loci.[Hum Hered. 2006]
*Wu LY, Sun L, Bull SB.**Hum Hered. 2006; 62(2):84-96. Epub 2006 Oct 12.* - Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives.[Health Technol Assess. 2001]
*Brookes ST, Whitley E, Peters TJ, Mulheran PA, Egger M, Davey Smith G.**Health Technol Assess. 2001; 5(33):1-56.* - Score test for detecting linkage to complex traits in selected samples.[Genet Epidemiol. 2004]
*Lebrec J, Putter H, Houwelingen JC.**Genet Epidemiol. 2004 Sep; 27(2):97-108.*

- Integrated phenotypes: understanding trait covariation in plants and animals[Philosophical Transactions of the Royal Soc...]
*Armbruster WS, Pélabon C, Bolstad GH, Hansen TF.**Philosophical Transactions of the Royal Society B: Biological Sciences. 2014 Aug 19; 369(1649)20130245* - Assessing the phenotypic effects in the general population of rare variants in genes for a dominant mendelian form of diabetes[Nature genetics. 2013]
*Flannick J, Beer NL, Bick AG, Agarwala V, Molnes J, Gupta N, Burtt NP, Florez JC, Meigs JB, Taylor H, Lyssenko V, Irgens H, Fox E, Burslem F, Johansson S, Brosnan MJ, Trimmer JK, Newton-Cheh C, Tuomi T, Molven A, Wilson JG, O'Donnell CJ, Kathiresan S, Hirschhorn JN, Njølstad PR, Rolph T, Seidman JG, Gabriel S, Cox DR, Seidman C, Groop L, Altshuler D.**Nature genetics. 2013 Nov; 45(11)1380-1385* - Mendelian randomization in health research: Using appropriate genetic variants and avoiding biased estimates[Economics and Human Biology. 2014]
*Taylor AE, Davies NM, Ware JJ, VanderWeele T, Smith GD, Munafò MR.**Economics and Human Biology. 2014 Mar; 13(100)99-106* - Estimating Effect Sizes in Genome-Wide Association Studies[Behavior genetics. 2010]
*Bukszár J, van den Oord EJ.**Behavior genetics. 2010 May; 40(3)394-403* - Genome-wide association study of patient and clinician rated global impression severity during antipsychotic treatment[Pharmacogenetics and genomics. 2013]
*Clark SL, Souza RP, Adkins DE, Åberg K, Bukszár J, McClay JL, Sullivan PF, van den Oord EJ.**Pharmacogenetics and genomics. 2013 Feb; 23(2)69-77*

- Cited in BooksCited in BooksPubMed Central articles cited in books
- PubMedPubMedPubMed citations for these articles

- Large Upward Bias in Estimation of Locus-Specific Effects from Genomewide ScansLarge Upward Bias in Estimation of Locus-Specific Effects from Genomewide ScansAmerican Journal of Human Genetics. Dec 2001; 69(6)1357PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...