Evaluation of computational techniques for predicting non-synonymous single nucleotide variants pathogenicity

Genomics. 2019 Jul;111(4):869-882. doi: 10.1016/j.ygeno.2018.05.013. Epub 2018 May 26.

Abstract

The human genetic diseases associated with many factors, one of these factors is the non-synonymous Single Nucleotide Variants (nsSNVs) cause single amino acid change with another resulting in protein function change leading to disease. Many computational techniques have been released to expect the impacts of amino acid alteration on protein function and classify mutations as pathogenic or neutral. Here in this article, we assessed the performance of eight techniques; FATHMM, SIFT, Provean, iFish, Mutation Assessor, PANTHER, SNAP2, and PON- P2 using a VaribenchSelectedPure dataset of 2144 pathogenic variants and 3777 neutral variants extracted from the free standard database "Varibench." The first five techniques achieve (45.60-83.75) % specificity, (52.64-94.13) % sensitivity, (51.00-88.90) % AUC, and (49.76-88.24) % ACC on whole dataset, while all eight techniques achieve (36.54-77.88) % specificity, (50.00-75.00) % sensitivity, (51.00-76.40) % AUC, and (25.00-77.78) % ACC on random sample dataset. We also created a Meta classifier (CSTJ48) that combines FATHMM, iFish, and Mutation Assessor. It registers 96.33% specificity, 86.07% sensitivity, 91.20% AUC, and 91.89 ACC. By comparing the results, it's clear that FATHMM gives the highest performance over the seven individual techniques, where it achieves 83.75% and 77.88% specificity, 94.13%, and 75.00% sensitivity, 88.90% and 76.40% AUC, and 88.24% and 77.78% ACC on whole and random sample dataset, respectively. Also, the launched Meta classifier (CSTJ48) is outperforming over all the eight individual tools that compared here.

Keywords: Computational techniques; FATHMM; Meta classifier (CSTJ48); Mutation Assessor; Neutral variants; Non-synonymous SNVs (nsSNVs); PANTHER; PON P2; Pathogenic variants; Provean; SIFT; SNAP2; iFish.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Predisposition to Disease*
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / standards
  • Humans
  • Machine Learning / standards*
  • Polymorphism, Single Nucleotide*
  • Software / standards*