Cervical Cancer Identification with Synthetic Minority Oversampling Technique and PCA Analysis using Random Forest Classifier

J Med Syst. 2019 Jul 17;43(9):286. doi: 10.1007/s10916-019-1402-6.

Abstract

Cervical cancer is the fourth most communal malignant disease amongst women worldwide. In maximum circumstances, cervical cancer indications are not perceptible at its initial stages. There are a proportion of features that intensify the threat of emerging cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Ascertaining those features and constructing a classification model to categorize, if the cases are cervical cancer or not is an existing challenging research. This learning intentions at using cervical cancer risk features to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Utmost medical data sets are frequently imbalanced since the number of patients is considerably fewer than the number of non-patients. For the imbalance of the used data set, SMOTE is cast-off to solve this problem. The data set comprises of 32 risk factors and four objective variables: Hinselmann, Schiller, Cytology and Biopsy. Accuracy, Sensitivity, Specificity, PPA and NPA of the four variables remains accurate after SMOTE when compared with values obtained before SMOTE. An RSOnto ontology has been created to visualize the progress in classification performance.

Keywords: Cervical cancer; PCA; RFE; RSOnto; Random Forest; SMOTE.

MeSH terms

  • Age Factors
  • Algorithms
  • Female
  • Hormonal Contraception / statistics & numerical data
  • Humans
  • Machine Learning
  • Principal Component Analysis
  • Risk Factors
  • Sensitivity and Specificity
  • Sexual Behavior
  • Sexually Transmitted Diseases / epidemiology
  • Smoking / epidemiology
  • Support Vector Machine*
  • Uterine Cervical Neoplasms / epidemiology*