Identification of contributing genes of Huntington's disease by machine learning

BMC Med Genomics. 2020 Nov 23;13(1):176. doi: 10.1186/s12920-020-00822-w.

Abstract

Background: Huntington's disease (HD) is an inherited disorder caused by the polyglutamine (poly-Q) mutations of the HTT gene results in neurodegeneration characterized by chorea, loss of coordination, cognitive decline. However, HD pathogenesis is still elusive. Despite the availability of a wide range of biological data, a comprehensive understanding of HD's mechanism from machine learning is so far unrealized, majorly due to the lack of needed data density.

Methods: To harness the knowledge of the HD pathogenesis from the expression profiles of postmortem prefrontal cortex samples of 157 HD and 157 controls, we used gene profiling ranking as the criteria to reduce the dimension to the order of magnitude of the sample size, followed by machine learning using the decision tree, rule induction, random forest, and generalized linear model.

Results: These four Machine learning models identified 66 potential HD-contributing genes, with the cross-validated accuracy of 90.79 ± 4.57%, 89.49 ± 5.20%, 90.45 ± 4.24%, and 97.46 ± 3.26%, respectively. The identified genes enriched the gene ontology of transcriptional regulation, inflammatory response, neuron projection, and the cytoskeleton. Moreover, three genes in the cognitive, sensory, and perceptual systems were also identified.

Conclusions: The mutant HTT may interfere with both the expression and transport of these identified genes to promote the HD pathogenesis.

Keywords: Enrichment analysis; Huntington’s disease; Machine learning; Transcriptional regulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Cognition
  • Cytoskeleton / ultrastructure
  • Datasets as Topic
  • Decision Trees
  • Gene Expression Profiling*
  • Gene Expression Regulation / genetics
  • Gene Ontology
  • Humans
  • Huntington Disease / etiology
  • Huntington Disease / genetics*
  • Inflammation / genetics
  • Linear Models
  • Machine Learning*
  • Nerve Tissue Proteins / biosynthesis
  • Nerve Tissue Proteins / genetics
  • Perception
  • Prefrontal Cortex / metabolism
  • Sensation / genetics

Substances

  • Nerve Tissue Proteins