BMC Proc. 2010; 4(Suppl 1): S9.

Published online 2010 Mar 31.

# Simultaneous QTL detection and genomic breeding value estimation using high density SNP chips

^{1}Animal Breeding and Genomics Centre, ASG Wageningen UR, PO Box 65, 8200 AB Lelystad, The Netherlands

^{2}Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia

^{}Corresponding author.

#### Conference

13th European workshop on QTL mapping and marker assisted selection,

20–21 April 2009

Wageningen, The Netherlands

Copyright ©2010 Veerkamp et al; licensee BioMed Central Ltd.

This is an open access article distributed under the terms of the Creative Commons Attribution License (

http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article has been

cited by other articles in PMC.

## Abstract

### Background

The simulated dataset of the 13^{th} QTL-MAS workshop was analysed to i) detect QTL and ii) predict breeding values for animals without phenotypic information. Several parameterisations considering all SNP simultaneously were applied using Gibbs sampling.

### Results

Fourteen QTL were detected at the different time points. Correlations between estimated breeding values were high between models, except when the model was used that assumed that all SNP effects came from one distribution. The model that used the selected 14 SNP found associated with QTL, gave close to unity correlations with the full parameterisations.

### Conclusions

Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust, e.g. with respect to accuracy of estimated breeding values. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection.

## Background

High density SNP chips with ~50,000 SNPs have become available for most livestock species. Breeding value estimation using all these SNPs simultaneously is expected to yield the highest accuracy [1]. Several parameterisations of the SNP effects in the statistical model have been suggested [2-5]. The objectives of this study were to accurately identify QTL and predict breeding values in the simulated data of the 13^{th} QTL-MAS workshop, using different parameterisations for the SNP effects.

## Methods

The simulated data of the 13^{th} QTL-MAS workshop is described Coster et al. [6]. Simulated data were analysed per time point, and for QTL detection, the change between traits at subsequent time points was also used. A pedigree based model was fitted using ASREML [7]. The Gibbs sampler described initially by Meuwissen and Goddard [1] and Calus et al. [4,5] was used for models including the SNP parameterisations. The general model used was:

, where y_{i} is the phenotypic record of animal i, µ is the average phenotypic performance, animal_{i} is the random polygenic effect for animal i, haplotype_{ijk} is a random effect for a paternal (k = 1) or maternal (k = 2) haplotype at locus j (of nloc loci) of animal i, and e_{i} is a random residual for animal i. The first parameterisation was a simple BLUP model with the additive relationship matrix between the animals only. Other parameterisations assumed the SNP effects came from one distribution (SNP1), i.e. BayesA, from two distributions (SNP2 i.e. BayesC), or from three distributions allowing for small, medium and large SNP effects (SNP3). A further parameterisation assumed a QTL was placed in between two SNP and 453 IBD matrices were calculated for all the haplotypes at a bracket using linkage disequilibrium and linkage analysis information [2]. Finally, a parameterisation used the phased genotypes to construct identical by state haplotypes from either 2 or 5 SNP, (IBS2 and IBS5, respectively) as presented before by Villumsen et al. [3] but with the addition that the same SNP were used at the border of two neighbouring brackets. The final reduced model included the 14 selected SNP that had a posterior probability >0.1 of affecting a QTL in the SNP2 analysis.

## Results

### Pre-analysis

An important question is how to model the time series data, and extrapolate the breeding values to the required time point 600. The mean of the traits indicated that points 265, 397 and 530 are in the linear part of the growth curve, confirmed by high phenotypic, and genetic correlations between those points (> 0.95). Graphical inspection confirmed that little information was available to estimate the inflection point or asymptotic values at population individual or genetic level. Therefore all five time point were analysed separately and linear regression fitted through the breeding value at point 265, 397 and 530 was used to extrapolate breeding values to the required point 600.

### QTL detection

In total 14 SNP had a posterior QTL probability above 0.10 for at least one of the time points (Figure ). For example, on chromosome one at position 0.4447 a strong QTL was found affecting the trait at each time point and the change in traits between time points, independent of the model used for analysis. The SNP2 model gave QTL also at locations 0.4029 and 0.9137 on chromosome one. The latter clearly affected the trait at time point 0, had little effect at point 132, and had no effect thereafter or on the change of the trait between the time points. The IBD model distributed this QTL effect across a few more SNP (Figure ), leading to a lower maximum posterior probability around location 0.9137. This lower posterior probability spread across more brackets was generally observed for the IBD model compared to the SNP2 model.

**Posterior QTL probabilities using SNP2 model. Columns from left to right are time points 0, 132, 265, 397 and 530 respectively, and rows from top to bottom are chromosomes one to five.** Y-axis is posterior probability (scale 0 to 1) and X-axis is location **...**

**Posterior QTL probabilities using IBD model. Columns from left to right are time points 0, 132, 265, 397 and 530 respectively, and rows from top to bottom are chromosomes one to five.** Y-axis is posterior probability (scale 0 to 1) and X-axis is location **...**

### Breeding values

Table gives the correlations between the breeding values (for animals without phenotypic information) predicted with the different parameterisations. Correlations were high between most models that included the SNP information. Albeit the breeding values from the model assuming that all SNP effects came from one distribution (SNP1) differed. Even the analysis including only the 14 SNP selected on the basis of the posterior probability >.10, gave correlations close to unity with the more extensive models. Similarly correlations with true breeding values were 0.93 and 0.92 for all SNP models and 0.91 for the SNP1 model (Table ), respectively. Overall predicted breeding values appeared insensitive to the models used.

Evaluation of predicted breeding values (EBV) at point 600 for the animals without phenotypic data

## Discussion

Using all SNP simultaneously, 14 QTL were identified with relative sharp peaks in posterior probability and 9 of these were within 5 cM of the 18 QTLs simulated, and all 14 were within 10 cM. Surprisingly few false positive QTL were found especially since the cut off point for the posterior probability of 10% was set arbitrarily. In the context of the simulated growth curve model, five QTLs were found for the asymptote, and four were close to the simulated QTL for relative growth. In our analysis these QTL for relative growth rate were found at the first time points only, as expected since here the effect is largest on the variance. As suggested by the preanalysis no QTL was found within 5 cM of the QTL affecting the inflection point, albeit on chromosome 2 one QTL was close. It would be interesting to see if using the growth model in the analysis would be more successful in picking up the QTL for the inflection point, since such a model resembles the underlying simulated model closer and requires two parameters less to be estimated, compared with the model used here. The disadvantage of fitting the growth curve model might be that sampling covariance between the three parameters, together with the inability to separate these parameters in the current data, might lead to more spurious QTL estimates.

Little difference was found between the IBD and SNP methods, although some of the peaks were distributed across more SNP when using IBD. This might be linked to the genetic history of the QTL or with the parameterisation. For example when the QTL is fixed at a SNP, then using brackets of two SNP will split the effect across the two brackets.

From the correlations and the MSE the breeding values appear fairly robust across the different models with the exception of the model assuming that all SNP effects can be captured with one distribution. The exception of model SNP1 is because the assumption on the distribution of the SNP effects is violated, because some large QTL were present and most SNP had no effect in the simulated data. Interesting to observe that, apart from the BLUP analysis, all regression coefficients deviated from one (Table ). SNP1 smaller and the other models above one, we have no explanation for this difference. The analysis including a subset of 14 SNP gave high correlations with the other fully parameterised methods, suggesting there was considerable scope in reducing the number of SNP required when the QTLs were estimated in this dataset. However, this is in agreement with findings in real data also [8].

## Conclusions

Nine out of 18 QTL were detected, however the six QTL for inflection point were missed. Models for genomic selection were indicated to be fairly robust. Still, it is worthwhile to investigate the number QTL underlying the quantitative traits, before choosing the model used for genomic selection

## Competing interests

The authors declare that they have no competing interests.

## Authors' contributions

RFV carried out the analyses and drafted the manuscript. MPLC developed the software and together with HAM and KV helped to interpret the results and present them in the manuscript. All authors read and approved the final manuscript.

## Acknowledgements

Hendrix Genetics, CRV B.V. and NWO-Casimir (The Netherlands Organization for Scientific Research) are acknowledged for financial support for MPLC. KLV was supported by the Erasmus Mundus Sabretrain project and HM by the EU SABRE project. RFV was funded by the EU RobustMilk project.

This article has been published as part of BMC Proceedings Volume 4 Supplement 1, 2009: Proceedings of 13th European workshop on QTL mapping and marker assisted selection.

The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/4?issue=S1.

## References

- THE Meuwissen, ME Goddard. Mapping multiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet Sel Evol. 2004;36(3):261–279. doi: 10.1186/1297-9686-36-3-261. [PMC free article] [PubMed] [Cross Ref]
- THE Meuwissen, ME Goddard. Prediction of identity by descent probabilities from marker-haplotypes. Genet Sel Evol. 2001;33:605–634. doi: 10.1186/1297-9686-33-6-605. [PMC free article] [PubMed] [Cross Ref]
- TM Villumsen, LLG Janss, MS Lund. The importance of haplotype length and heritability using genomic selection in dairy cattle. Journal of Animal Breeding and Genetics. 2009;126:3–13. doi: 10.1111/j.1439-0388.2008.00747.x. [PubMed] [Cross Ref]
- MPL Calus, THE Meuwissen, APW de Roos, RF Veerkamp. Accuracy of Genomic Selection Using Different Methods to Define Haplotypes. Genetics. 2008;178:553–561. doi: 10.1534/genetics.107.080838. [PMC free article] [PubMed] [Cross Ref]
- M Calus, T Meuwissen, J Windig, E Knol, C Schrooten, A Vereijken, R Veerkamp. Effects of the number of markers per haplotype and clustering of haplotypes on the accuracy of QTL mapping and prediction of genomic breeding values. Genetics Selection Evolution. 2009;41:11. doi: 10.1186/1297-9686-41-11. [PMC free article] [PubMed] [Cross Ref]
- Coster A, Bastiaansen J, Calus M, Maliepaard C, Bink M. QTLMAS 2009: Simulated Dataset. BMC Proceedings. 2010;4(Suppl 1):S3–1. doi: 10.1186/1753-6561-4-S1-S3. [PMC free article] [PubMed] [Cross Ref]
- AR Gilmour, BR Cullis, SJ Welham, R Thompson. ASREML. Program user manual. NSW Agriculture, Orange Agricultural Institute, Forest Road, Orange, NSW, 2800, Australia. 2000.
- BJ Hayes, PJ Bowman, AJ Chamberlain, ME Goddard. Invited review: Genomic selection in dairy cattle: Progress and challenges. J. 2009;92:433–443. doi: 10.3168/jds.2008-1646. [PubMed] [Cross Ref]