Format

Send to

Choose Destination
Genetics. 2017 Oct;207(2):489-501. doi: 10.1534/genetics.117.300198. Epub 2017 Aug 24.

Incorporating Gene Annotation into Genomic Prediction of Complex Phenotypes.

Author information

1
National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China.
2
Animal Breeding and Genetics Group, University of Goettingen, 37075, Germany.
3
Animal Breeding and Genetics Group, University of Goettingen, 37075, Germany hsimian@gwdg.de jqli@scau.edu.cn.
4
National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Laboratory of Agro-animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou 510642, China hsimian@gwdg.de jqli@scau.edu.cn.

Abstract

Today, genomic prediction (GP) is an established technology in plant and animal breeding programs. Current standard methods are purely based on statistical considerations but do not make use of the abundant biological knowledge, which is easily available from public databases. Major questions that have to be answered before biological prior information can be used routinely in GP approaches are which types of information can be used, and at which points they can be incorporated into prediction methods. In this study, we propose a novel strategy to incorporate gene annotation into GP of complex phenotypes by defining haploblocks according to gene positions. Haplotype effects are then modeled as categorical or as numerical allele dosage variables. The underlying concept of this approach is to build the statistical model on variables representing the biologically functional units. We evaluate the new methods with data from a heterogeneous stock mouse population, the Drosophila Genetic Reference Panel (DGRP), and a rice breeding population from the Rice Diversity Panel. Our results show that using gene annotation to define haploblocks often leads to a comparable, but for some traits to a higher, predictive ability compared to SNP-based models or to haplotype models that do not use gene annotation information. Modeling gene interaction effects can further improve predictive ability. We also illustrate that the additional use of markers that have not been mapped to any gene in a second separate relatedness matrix does in many cases not lead to a relevant additional increase in predictive ability when the first matrix is based on haploblocks defined with gene annotation data, suggesting that intergenic markers only provide redundant information on the considered data sets. Therefore, gene annotation information seems to be appropriate to perceive the importance of DNA segments. Finally, we discuss the effects of gene annotation quality, marker density, and linkage disequilibrium on the performance of the new methods. To our knowledge, this is the first work that incorporates epistatic interaction or gene annotation into haplotype-based prediction approaches.

KEYWORDS:

GenPred; Shared Data Resources; categorical model; gene annotation; genomic selection; haplotype

PMID:
28839043
PMCID:
PMC5629318
DOI:
10.1534/genetics.117.300198
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center