Format

Send to

Choose Destination
BioData Min. 2018 Nov 3;11:23. doi: 10.1186/s13040-018-0186-4. eCollection 2018.

Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS.

Author information

1
1Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA.
2
2Department of Mathematics, The University of Tulsa, Tulsa, OK 74104 USA.
3
3Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road, Cleveland, OH 44106 USA.
4
4Department of Psychology, The University of Tulsa, Tulsa, OK 74104 USA.

Abstract

Background:

ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding.

Methods:

We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity.

Results:

We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias.

Conclusions:

Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.

KEYWORDS:

Feature selection; Genetic relationship matrix (GRM); Genome-wide association study (GWAS); Machine learning; Transition and transversion

Conflict of interest statement

Not applicable.Not applicable.The authors declare that they have no competing interest.Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center