Predicting Ovarian/Breast Cancer Pathogenic Risks of Human BRCA1 Gene Variants of Unknown Significance

Biomed Res Int. 2021 Apr 14:2021:6667201. doi: 10.1155/2021/6667201. eCollection 2021.

Abstract

High-throughput sequencing is gaining popularity in clinical diagnoses, but more and more novel gene variants with unknown clinical significance are being found, giving difficulties to interpretations of people's genetic data, precise disease diagnoses, and the making of therapeutic strategies and decisions. In order to solve these issues, it is of critical importance to figure out ways to analyze and interpret such variants. In this work, BRCA1 gene variants with unknown clinical significance were identified from clinical sequencing data, and then, we developed machine learning models so as to predict the pathogenicity for variants with unknown clinical significance. Through performance benchmarking, we found that the optimized random forest model scored 0.85 in area under receiver operating characteristic curve, which outperformed other models. Finally, we applied the best random forest model to predict the pathogenicity of 6321 BRCA1 variants from both sequencing data and ClinVar database. As a result, we obtained the predictive pathogenic risks of BRCA1 variants of unknown significance.

MeSH terms

  • Algorithms
  • BRCA1 Protein / genetics*
  • Breast Neoplasms / diagnosis*
  • Breast Neoplasms / genetics*
  • Female
  • Genetic Predisposition to Disease*
  • Genetic Variation*
  • Humans
  • Models, Genetic
  • Ovarian Neoplasms / diagnosis*
  • Ovarian Neoplasms / genetics*
  • ROC Curve
  • Risk Factors
  • Support Vector Machine

Substances

  • BRCA1 Protein
  • BRCA1 protein, human