Logo of gseBioMed CentralBiomed Central Web Sitesearchsubmit a manuscriptregisterthis articleGenetics, Selection, Evolution : GSEJournal Front Page
Genet Sel Evol. 2009; 41(1): 55.
Published online Dec 31, 2009. doi:  10.1186/1297-9686-41-55
PMCID: PMC2817680

Deregressing estimated breeding values and weighting information for genomic regression analyses

Abstract

Background

Genomic prediction of breeding values involves a so-called training analysis that predicts the influence of small genomic regions by regression of observed information on marker genotypes for a given population of individuals. Available observations may take the form of individual phenotypes, repeated observations, records on close family members such as progeny, estimated breeding values (EBV) or their deregressed counterparts from genetic evaluations. The literature indicates that researchers are inconsistent in their approach to using EBV or deregressed data, and as to using the appropriate methods for weighting some data sources to account for heterogeneous variance.

Methods

A logical approach to using information for genomic prediction is introduced, which demonstrates the appropriate weights for analyzing observations with heterogeneous variance and explains the need for and the manner in which EBV should have parent average effects removed, be deregressed and weighted.

Results

An appropriate deregression for genomic regression analyses is EBV/r2 where EBV excludes parent information and r2 is the reliability of that EBV. The appropriate weights for deregressed breeding values are neither the reliability nor the prediction error variance, two alternatives that have been used in published studies, but the ratio (1 - h2)/[(c + (1 - r2)/r2)h2] where c > 0 is the fraction of genetic variance not explained by markers.

Conclusions

Phenotypic information on some individuals and deregressed data on others can be combined in genomic analyses using appropriate weighting.

Background

Genomic prediction [1] involves the use of marker genotypes to predict the genetic merit of animals in a target population based on estimates of regression of performance on high-density marker genotypes in a training population. Training populations might involve genotyped animals with alternative types of information including single or repeated measures of individual phenotypic performance, information on progeny, estimated breeding values (EBV) from genetic evaluations, or a pooled mixture of more than one of these information sources. In pooling information of different types, it is desirable to avoid any bias introduced by pooling and to account for heterogeneous variance so that the best use is made of available information.

Uncertainty as to whether or not EBV should be used directly or deregressed or replaced by measures such as daughter yield deviation (DYD) [2], and the manner in which information should be weighted, if at all, has been apparent for some time in literature related to discovering and fine-mapping quantitative trait loci (QTL). Typically in fixed effects models with uncorrelated residuals, observations would be weighted by the inverse of their variances. Morsci et al. [3] pointed out the counter intuitive behavior of using the reciprocal of the variance of breeding values as weights in characterization of QTL and followed the arguments of Rodriguez-Zas et al. [4] in using reliability as weights. Rodriguez-Zas et al. [4] did analyses that were limited by features of the chosen software so EBV/2 (i.e. predicted transmitting ability PTA) were multiplied by the square root of reliability and analyzed unweighted. Georges et al. [5] deregressed PTA to construct DYD and weighted these using the inverse of the variance of the DYD. Spelman et al. [6] had direct access to DYD and similarly weighted these by the inverse of their scaled variance, equivalent to using the inverse of reliability as weights. Other researchers have reported the use of PTA [7], standardized PTA [7,8] or DYD weighted by respective reliabilities [8]. The uncertainty associated with using information for QTL discovery has recently been extended to genomic prediction. An Interbull survey [9] of methods being used in various countries for genomic prediction of dairy cattle reported that some researchers used deregressed proofs weighted with corresponding reliabilities, others used DYD weighted by effective daughter contributions, while yet others used EBV without any weighting. The objective of this paper is to present a logical argument for using deregressed information, appropriately weighted for analysis. For simplicity, we consider the residual variance from the perspective of an additive model but the deregression and weighting concepts extend to analyses that include dominance and epistasis.

Methods

An ideal model

Genomic prediction involves the use of genotypes or haplotypes to predict genetic merit. Conceptually, it involves two phases, a training phase where the genotypic or haplotypic effects are estimated, typically as random effects, in a mixed model scenario, followed by an application phase where the genomic merit of selection candidates is predicted from the knowledge on their genotypes and previously estimated effects from the training phase. The ideal data for training would be true genetic merit data observed on unrelated animals in the absence of selection. In that case, the model equation would be:

equation image
(1)

where g is a vector of true genetic merit (i.e. breeding value BV) with var(g) = TAn external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i2.gif, the scalar An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i2.gif is the genetic variance and T can be constructed using the theory from combined linkage disequilibrium and linkage analyses [10], μ is an intercept, M is an incidence matrix whose columns are covariates for substitution, genotypic or haplotypic effects, a are effects to be estimated, var(Ma) = An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i3.gif, G is a genomic relationship matrix [11-13], ε is the lack of fit, var(ε) = An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i4.gif, hopefully small and will be 0 if BV could be perfectly estimated as a linear function of observed marker genotypes. In different settings, a might be defined as a vector of fixed effects [14] or a vector of random effects [1]. Even when a is fixed, Ma is random because M, which contains genotypes, is random. However, in genomic analyses M is treated as fixed because the analysis is conditional on the observed genotypes. The philosophical issues related to the randomness of M and a are discussed in detail by Gianola [15] but for our context it is sufficient to define var(Ma) = An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i3.gif without explicitly specifying distributional properties of M or a.

Genotypes used as covariates in Ma are unlikely to capture all the variation in true genetic merit, either because they are not comprehensively covering the entire genome, or because linkage disequilibrium between markers and causal genes is not perfect. Knowledge of E is required in the analysis whether a is treated as a fixed (e.g. GLS) or random effect (e.g. BLUP). In practice with experiments that involve related animals, it is unreasonable to assume E has a simple form such as a diagonal matrix since that implies a zero covariance between lack of fit effects for different animals, however, it can be approximated using knowledge on the pedigree using the additive relationship matrix, A [16]. These lack of fit covariances can be accommodated by fitting a polygenic effect for each animal, in addition to the marker genotypes [17], or accounted for by explicitly modeling correlated residuals. For a non-inbred animal, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i5.gif, therefore An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i6.gif and the proportion of the genetic variance not accounted for by the markers can be defined to be An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i7.gif. The scalar c, will be close to 0 if markers account for most of the genetic variation and close to 1 if markers perform poorly.

A model using individual phenotypic records

In practice we do not have the luxury of using true BV as data in genomic prediction. A more common circumstance might involve training based on phenotypic observations that include fixed effects on phenotype denoted Xb where X is an incidence matrix for fixed non-genetic effects in b. An appropriate model equation for phenotypes is

equation image
(2)

where e is a vector of random non-genetic or residual effects. In comparison to (1), the use of y for training involves the addition of the vectors Xb and e to the left- and right-hand side, inflating the variance and giving

equation image
(3)

with An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i10.gif since cov(ε, e') = 0. This model can be fitted by explicitly including a random polygenic effect for ε, or by accounting for the non-diagonal variance-covariance structure of the residuals defined as var (ε + e). Including a polygenic term is not typically done in genomic prediction analyses [12,18], and when undertaken does not seem to markedly alter the accuracy of genomic predictions [Habier D. Personal communication]. Assuming var (ε + e) is a scaled identity matrix facilitates the computing involved in fitting this model, as the relevant mixed model equations can be modified by multiplying the left- and right-hand sides by the unknown scale parameter as is typically done in single trait analyses. However, this is not an option if residuals are heterogeneous, for example, because they involve varying numbers of repeated observations.

A model using repeated records on the individual

Consider the circumstance where the training observations are a vector An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i11.gif representing observations that are the mean of n observations on the individual with n potentially varying. In that case, equation (3) becomes

equation image
(4)

With An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i13.gif, a diagonal matrix with elements An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i14.gif with An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i15.gif being the phenotypic variance, heritability h2, and repeatability t. Ignoring off-diagonal elements of E, the elements of the inverse of R with R = var(ε) + D would for non-inbred animals be An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i16.gif. In fixed effects models, this matrix can be arbitrarily scaled for convenience. In univariate random effects models, a common practice is to formulate mixed model equations using the ratios of residual variance to variances of the random effects. Here, it makes sense to factor out the residual variance of one phenotypic observation, i.e. An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i17.gif, from the expression for the residual variance of the mean of n observations. In this circumstance, a scaled inverse of the residual variance being An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i18.gif or equivalently

equation image
(5)

which can be used for weighted regression analyses treating marker effects as fixed or random. When c = 0, the genetic effects can be perfectly explained by the model, and for n = 1, a single observation on the individual, the weight is 1 for any heritability. Scaling the weights is convenient because records with high information exceed 1 and the weights are trait independent which is useful when analysing multiple traits with identical heritability and information content.

Offspring averages as data

In some cases the training data may represent the mean of p individual measurements on several offspring, rather than the mean phenotype of the genotyped animal. In that circumstance, the residual variance includes a genetic component for the mate and Mendelian sampling. For half-sib progeny means with unrelated mates and no common environmental variance, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i20.gif. However, the half-sib progeny mean contains only half the genetic merit of the parent, therefore the genotypic covariates need to be halved, or the mean doubled, in order to analyse data that includes records on genotyped individuals and records on offspring of genotyped individuals. The variance for twice the progeny mean is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i21.gif, and adding An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i22.gif, factoring out An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i17.gif and inverting gives

equation image
(6)

For full-sib progeny means the intraclass correlation of residuals will include a genetic component and perhaps a common environmental component (e.g. litter, with variance An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i24.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i25.gif giving An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i26.gif for unrelated parents. Adding variation due to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i27.gif factoring out An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i17.gif and inverting gives

equation image
(7)

This expression can be used as weights in the fixed or random regression of full-sib progeny means on parent average marker genotypes.

Estimated breeding values as training data

An estimated breeding value, typically derived using BLUP, can be recognised as the true BV plus a prediction error. That is, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i29.gif. Accordingly, training on EBV might be viewed as extending the model equation in (1) by the addition of the prediction error, in the same way that (3) was derived by the addition of a residual nongenetic component. The model equation would therefore be

equation image
(8)

There are at least two issues with this formulation of the problem, which may not be immediately apparent, and which both result from properties of BLUP. The first issue is that the addition of the prediction error term to the left- and right-hand side of (8) actually reduces rather than increases the variance, despite the fact that diagonal elements of An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i31.gif must exceed 0, in contrast to the addition of non-genetic random residual effects in (3). That is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i32.gif, whereas var(gi) < var(yi), due to shrinkage properties of BLUP estimators [19]. Generally, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i33.gif but for BLUP An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i34.gif so that An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i35.gif implying An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i36.gif. The reduction in variance of the training data comes about because prediction errors are negatively correlated with BV as can be readily shown since An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i37.gif. This means that superior animals tend to be underevaluated (i.e. have negative prediction errors) whereas inferior animals tend to be overevaluated. This is a consequence of shrinkage estimation and prediction errors being uncorrelated with EBV, i.e. An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i38.gif. In order to account for the covariance between the prediction errors and the BV, a model that accounted for such covariance would need to be fitted. Such models are computationally more demanding compared to models whereby the fitted effects and residuals are uncorrelated. The second issue resulting from the properties of BLUP, is that it is a shrinkage estimator, that shrinks observations towards the mean, the extent of shrinkage depending upon the amount of information. This is apparent if one considers the regression of phenotype on true genotype (i.e. BV) which is 1, whereas the regression of EBV on BV is equal to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif ≤ 1, where An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif is the reliability of the EBV (for animal i) or squared correlation between BV and EBV. In the context of any marker locus, the contrast in EBV between genotypes at a particular locus is shrunk relative to the contrast that would be obtained if BV or phenotypes were used as data, with the shrinkage varying according to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif. We are, however, interested in estimating the effect of a marker on phenotype, but we get a lower value for the contrast if EBV with An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif ≤ 1 are used as data, rather than using phenotypes. A further complication is that training data based on EBV typically comprise individuals with varying An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif. This problem can be avoided by deregressing or unshrinking the EBV.

Deregressing estimated breeding values

The solution to the model fitting problems associated with the reduced variance of EBV and the inconsistent regression of EBV on genotype according to reliability can both be addressed by inflating the EBV. Rather than fitting (8), we will fit the linearly inflated data represented as KAn external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i40.gif for some diagonal matrix K. That is, we will fit:

equation image
(9)

for some matrix K chosen so that An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i42.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i43.gif is a constant. Since An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i44.gif then this expression will be 0 when An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i45.gif. For this value ki, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i46.gif, a constant for all animals regardless of their reliability. Accordingly, the deregression matrix is K = diagonalAn external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i47.gif and the deregressed observations are An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i48.gif. Note in passing that the nature of the deregression will depend upon the EBV base. Genetic evaluations are typically adjusted to a common base before publication, by addition or subtraction of some constant. The EBV should be deregressed after removing the post-analysis base adjustment or by explicitly accounting for the base in the deregression procedure [20]. To show the dependence of the deregression to the post-analysis base, supposes that EBV are adjusted to a base, b. Then a linear contrast in deregressed EBV without removing the base effect is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i49.gif unless An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i50.gif. Marker effects are typically estimated as linear combinations of data, and will therefore be sensitive to the base adjustment.

A deregressed observation represents a single value that encapsulates all the information available on the individual and its relatives, as if it was a single observation with h2 = r2. This can be shown by recognising that h2 is the regression of genotype on phenotype. Taking the deregressed observation to be the phenotype, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i51.gif. Training on deregressed EBV is therefore like training on phenotypes with varying h2. Provided An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif > h2, training on deregressed EBV is equivalent to having a trait with higher heritability. However, as explained later, we recommend removing ancestral information from the deregressed EBV.

Weighting deregressed information

Deregressed observations have heterogeneous variance when r2 varies among individuals. The residual variance of a particular deregressed observation is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i52.gif but An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i53.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i54.gif so the residual variance expression simplifies to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i55.gif. Ignoring the off-diagonal elements of var(ε) as before, the diagonals of the inverse of the residual variance after factoring out An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i17.gif are An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i56.gif which simplifies to give

equation image
(10)

an expression analogous to (5) with n = 1 and h2 = An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif. Note that the weight in (10) approaches An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i58.gif as An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif→1 in which case the weight tends to infinity as c→0. This is the same as would occur when the number of offspring p→∞, and p is used as a weight.

Removing parent average effects

Animal model evaluations by BLUP using the inverse relationship matrix shrink individual and progeny information towards parent average (PA) EBV [21]. It makes sense to remove the PA effect as part of the deregression process for two reasons. First, some animals may have EBV with no individual or progeny information. These animals cannot usefully contribute to genomic prediction. This is apparent if one imagines a number of halfsibs with individual marker genotypes and deregressed PA EBV. These animals cannot add any information beyond what would be available from the common parent's genotype and EBV. Second, if any parents are segregating a major effect, about half the offspring will inherit the favourable allele and the others will inherit the unfavourable allele. However, the EBV of both kinds of offspring will be shrunk towards the parent average. Parent average effects can be eliminated by directly storing the individual and offspring deregressed information and corresponding r2 during the iterative solution of equations carried out for the purposes of genetic evaluation [2]. In some cases researchers do not have access to the evaluation system used to create the EBV on their training populations. In those circumstances, it is necessary to approximate the evaluation equations and backsolve for deregressed information free of the effects of parent average. This can be done for one training animal at a time, given h2 and knowledge of only the EBV (unadjusted for the base) and r2 on the animal, its sire and its dam. First, compute parent average (PA) EBV and reliability for animal i with sire and dam as parents: An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i59.gif, and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i60.gif. Assuming sire and dam are unrelated and not inbred, the additive genetic covariance matrix for PA and offspring is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i61.gif with inverse An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i62.gif. Using this result, recognise that the equations to be solved are:

equation image
(11)

where An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i64.gif is information equivalent to a right-hand-side element pertaining to the individual, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i65.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i66.gif reflects the unknown information content of the parent average and individual (plus information from any of its offspring and/or subsequent generations), λ = (1 - h2)/h2 is assumed known. Define

equation image

then using the facts [19] that An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i68.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i69.gif leads to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i70.gif, and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i71.gif. Rearranging these equations, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i72.gif, and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i73.gif. The formula to derive the inverse of a 2 × 2 matrix applied to the coefficient matrix from (11) gives An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i74.gif, and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i75.gif for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i76.gif.

Equating these alternative expressions for cPA, PA leads to

equation image
(12)

and equating the expressions for ci, i leads to

equation image
(13)

Second, solve these nonlinear equations for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i65.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i66.gif. Although not obvious, there is a direct solution for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i65.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i66.gif. It can be derived by dividing (12) by (13), defining An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i79.gif, and rearranging to get

equation image
(14)

Substituting the expression for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i66.gif in (14) into the denominator of (13), defining An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i81.gif, and rearranging leads to a quadratic expression in An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i65.gif, namely An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i82.gif, which has a positive root that can rearranged to

equation image
(15)

Application of (15) provides the solution for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i65.gif that can be substituted in (14) to solve for An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i66.gif, together enabling reconstruction of the coefficient matrix of (11).

Third, the right-hand side of (11) can be formed by multiplying the now known coefficient matrix by the known vector of EBV for PA and individual. The right-hand side on the individual, free of PA effects is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i64.gif The equation to obtain an estimate of EBV for animal i, free of its parent average, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i84.gif, based only on An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i64.gif, is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i85.gif and the corresponding An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i86.gif for use in constructing the weights in (10) is given by An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i87.gif. The deregressed information is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i88.gif, which simplifies to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i89.gif and is analogous to an average. An iterative procedure using mixed model equations to simultaneously deregress all the sires in a pedigree, while jointly estimating the base adjustment and accounting for group effects was given by Jairath et al [20]. However, that method requires knowledge on the numbers of offspring of each sire.

Double counting of information from descendants

Genetic evaluation of animal populations results in EBV that are a weighted function of the parent average EBV, any information on the individual, adjusted for fixed effects, and a weighted function of the EBV of offspring, adjusted for the merit of the mates [2]. The previous section has argued for the removal of parent average effects in constructing information for genomic analyses. It could be argued that information from genotyped descendants should also be removed to avoid double counting. This can be achieved during the evaluation process, and is desirable in the absence of selection. If the genotyped descendants are a selected subset, the removal of their information will lead to biased information on the individual. Simulation suggests that the double counting of descendants performance has negligible impact on genomic predictions (results not shown).

Results

Weights for different information sources

Comparative weights for individual and average of n individual observations using (5), and for progeny means of p halfsibs using (6) and deregressed EBV of varying reliability using (10) are in Table Table11.

Table 1
Relative weightsa for n phenotypic observations on the individual, p observations in twice the halfsib progeny mean with heritability 0.25 and repeatability 0.6, or deregressed EBV with reliability r2 for varying values of c, the proportion of genetic ...

Removing parent average effects

Suppose genomic training is to be undertaken for a trait using EBV available from national evaluations that have yet to be deregressed. Widely-used bulls have been genotyped and the EBV and r2 of those bulls are available, along with corresponding information on the sire and dam of each bull. Such a trio might have values of An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i90.gif = 10, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i91.gif = 0.97; An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i92.gif = 2, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i93.gif = 0.36; and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i94.gif = 15, An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif = 0.68. Given h2 = 0.25, λ = 0.75/0.25 = 3, the PA information is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i95.gif, and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i96.gif. Using (15), with α = 5.97, δ = 0.523, then An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i97.gif = 9.16 which substituted in (14) gives An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i98.gif = 5.08.

Substituting these information contents into the coefficient matrix or left-hand side of (11) is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i99.gif with inverse An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i100.gif. These values correspond to An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i101.gif = 0.5 - 3 × 0.0558 = 0.33 and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i102.gif = 1.0 - 3 × 0.1066 = 0.68 the reported An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i103.gif and An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i39.gif confirming the equations used to determine the information content. The right-hand side of (11) can then be reconstructed by multiplying the coefficient matrix by the vector of EBV as An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i104.gif. The element of interest is the right-hand side element corresponding to the individual, obtained as An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i64.gif = -6 × 6 + 11.08 × 15 = 130. The deregressed information for use in subsequent analysis is obtained as An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i105.gif and the corresponding reliability of this information free of PA effects is An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i86.gif = 1.0 - 3/(5.08 + 3) = 0.63. The relevant scaled weight for use with the deregressed information on this individual assuming c = 0.5 can be found using (10) as An external file that holds a picture, illustration, etc.
Object name is 1297-9686-41-55-i106.gif. This implies that the deregressed information is 2.76 times more valuable than a single record on the individual.

Discussion

The relative value of alternative information sources varies according to c, the parameter that reflects the ability of the genotypic covariates to predict genetic merit. Genomic prediction models that fit well have small values for c and result in greater relative emphasis of reliable information than is the case when the genomic prediction model fits poorly and the residual variation is dominated by contributions from lack-of-fit. For example, the mean of 20 halfsib progeny has about 3.6 times the value of the mean of 5 progeny when c is 0.1, and 2.5 times the value when c is 0.8. Deregressed EBV with reliability 1.0 are 11 times as valuable as reliability 0.5 when c is 0.1 but only 3 times as valuable when c is 0.5. These results indicate that collecting genotypes and phenotypes on training animals with low to moderate reliability will be of more relative value to genomic predictions that account for only 50% genetic variation (i.e. correlation 0.7 between genomic prediction and real merit) than they will for genomic predictions that account for a high proportion of variance.

The impact of the assumed c is to influence the relative value of individuals with reliable information, such as progeny test results, in comparison to individuals with information from less reliable sources, such as individual records. The use of too large a value of c will result in overemphasis of less accurate information in relation to more accurate information. The use of too small a value of c will result in too little emphasis on less accurate records. The correct value of c will not be known prior to training analyses but can be estimated from validation analyses. Training analyses could then be repeated using the estimated value of c. Alternatively, sensitivity to c could be assessed by training using a range of values. The sensitivity to c varies according to the heterogeneity of information content in the training data.

In practice, information sources of phenotypic data on training individuals can vary more widely than the examples derived in this paper. For example, training individuals might have their own and a mix of half-and fullsib progeny observed. In such cases, a practical approach is to first set up the mixed model equations that would be appropriate to estimate breeding values on the training individuals and use these to solve for the deregressed information [2]. This approach could also be useful in circumstances where training individuals do not all have the appropriate phenotypes. Consider a situation where some individuals have carcass measurements while others have correlated observations such as live animal ultrasound measures. A bivariate analysis of these two traits could be used to produce a single deregressed value for the carcass trait for each animal that accounted for appropriately weighted ultrasound information.

Conclusions

The arguments put forward in this manuscript support the use of deregressed information, in agreement with practices adopted by many researchers [22]. The weighting factors proposed in this paper differ from any reported in the literature except when the parameter c = 0 in which cases the weights are effectively the same as those used by Georges et al. [5] and Spelman et al. [6]. In practice, the benefit of deregression and the subsequent weighting of alternative information sources will depend on the extent to which the number of repeat records, number of progeny and/or r2 varies among individuals in the training population.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

DJG derived the formulae following debate with JFT and RLF as to appropriate weights for training analyses with disparate data. JFT derived the direct solution for removing parent average effects. DJG drafted the manuscript and RLF and JFT helped to revise and finalize it. All authors read and approved the final manuscript.

Acknowledgements

DJG and RLF are supported by the United States Department of Agriculture, National Research Initiative grant USDA-NRI-2009-03924 and by Hatch and State of Iowa funds through the Iowa Agricultural and Home Economic Experiment Station, Ames, IA.

References

  • Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. [PMC free article] [PubMed]
  • VanRaden PM, Wiggans GR. Derivation, calculation, and use of national animal model information. J Dairy Sci. 1991;74(8):2737–2746. http://www.hubmed.org/display.cgi?uids=1918547 [PubMed]
  • Morsci NMTJ, Schnabel RD. Association analysis of adinopectin and somatostatin polymorphisms on BTA1 with growth and carcass traits in Angus Association analysis of adinopectin and somatostatin polymorphisms on BTA1 with growth and carcass traits in Angus cattle. Anim Genet. 2006;37:554–562. doi: 10.1111/j.1365-2052.2006.01528.x. [PubMed] [Cross Ref]
  • Rodriguez-Zas SL, Southey BR, Heyen DW, Lewin HA. Interval and composite interval mapping of somatic cell score, yield, and components of milk in dairy cattle. J Dairy Sci. 2002;85(11):3081–3091. [PubMed]
  • Georges M, Nielsen D, Mackinnon M, Mishra A, Okimoto R, Pasquino AT, Sargeant LS, Sorensen A, Steele MR, Zhao X. Mapping quantitative trait loci controlling milk production in dairy cattle by exploiting progeny testing. Genetics. 1995;139(2):907–920. [PMC free article] [PubMed]
  • Spelman RJ, Coppieters W, Karim L, van Arendonk JA, Bovenhuis H. Quantitative trait loci analysis for five milk production traits on chromosome six in the Dutch Holstein-Friesian population. Genetics. 1996;144(4):1799–1808. [PMC free article] [PubMed]
  • Ashwell MS, Da Y, VanRaden PM, Rexroad CE, Miller RH. Detection of putative loci affecting conformational type traits in an elite population of United States Holsteins using microsatellite markers. J Dairy Sci. 1998;81(4):1120–1125. [PubMed]
  • Van Tassell CP, Sonstegard TS, Ashwell MS. Mapping quantitative trait loci affecting dairy conformation to chromosome 27 in two Holstein grandsire families. J Dairy Sci. 2004;87(2):450–457. [PubMed]
  • Loberg A, Durr JW. Interbull survey on the use of genomic information. Proc Interbull Intl Workshop. 2009.
  • Meuwissen THE, Goddard ME. Prediction of identity by descent probabilities from marker-haplotyes. Genet Sel Evol. 2001;33:605–634. doi: 10.1186/1297-9686-33-6-605. [PMC free article] [PubMed] [Cross Ref]
  • Nejati-Javaremi A, Smith C, Gibson JP. Effect of total alleleic relationship on accuracy of evaluation and response to selection. J Anim Sci. 1997;75:1738–1745. [PubMed]
  • VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–4423. doi: 10.3168/jds.2007-0980. [PubMed] [Cross Ref]
  • Strandén I, Garrick DJ. Technical note: Derivation of equivalent computing algorithms for genomic predictions and reliabilities of animal merit. J Dairy Sci. 2009;92(6):2971–2975. doi: 10.3168/jds.2008-1929. http://www.hubmed.org/display.cgi?uids=19448030 [PubMed] [Cross Ref]
  • Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. fourth. New York: Longman, Inc; 1996.
  • Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R. Additive genetic variability and the Bayesian alphabet. Genetics. 2009;183:347–363. doi: 10.1534/genetics.109.103952. [PMC free article] [PubMed] [Cross Ref]
  • Van Vleck LD. Selection index and introduction to mixed model methods. Boca Raton: CRC 1993 chap. Genes identical by descent - the basis of genetic likeness; p. 49.
  • Calus MPL, Meuwissen THE, de Roos APW, Veerkamp RF. Accuracy of genomic selection using different methods to define haplotypes. Genetics. 2008;178:553–561. doi: 10.1534/genetics.107.080838. [PMC free article] [PubMed] [Cross Ref]
  • Weigel KA, de los Campos G, González-Recio O, Naya H, Wu XL, Long N, Rosa GJ, Gianola D. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci. 2009;92(10):5248–5257. doi: 10.3168/jds.2009-2092. [PubMed] [Cross Ref]
  • Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31:423–449. doi: 10.2307/2529430. [PubMed] [Cross Ref]
  • Jairath L, Dekkers JC, Schaeffer LR, Liu Z, Burnside EB, Kolstad B. Genetic evaluation for herd life in Canada. J Dairy Sci. 1998;81(2):550–562. [PubMed]
  • Mrode R. BLUP univariate models with one random effect. In Linear Models for the Prediction of Animal Breeding Values. Cambridge: CABI; 2005.
  • Thomsen H, Reinsch N, Xu N, Looft C, Grupe S, Kuhn C, Brockmann GA, Schwerin M, Leyhe-Horn B, Hiendleder S, Erhardt G, Medjugorac I, Russ I, Forster M, Brenig B, Reinhardt F, Reents R, Blumel J, Averdunk G, Kalm E. Comparison of estimated breeding values, daughter yield deviations and de-regressed proofs within a whole genome scan for QTL. J Anim Breed Genet. 2001;118:357–370. doi: 10.1046/j.1439-0388.2001.00302.x. [Cross Ref]

Articles from Genetics, Selection, Evolution : GSE are provided here courtesy of BioMed Central
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...