A 21‑gene Support Vector Machine classifier and a 10‑gene risk score system constructed for patients with gastric cancer

Mol Med Rep. 2020 Jan;21(1):347-359. doi: 10.3892/mmr.2019.10841. Epub 2019 Nov 21.

Abstract

Gastric cancer (GC) ranks fifth in terms of incidence and third in terms of tumor mortality worldwide. The present study was designed to construct a Support Vector Machine (SVM) classifier and risk score system for GC. The GSE62254 (training set) and GSE26253 (validation set 2) datasets were downloaded from the Gene Expression Omnibus database. Furthermore, the gene expression profile of GC (validation set 1) was obtained from The Cancer Genome Atlas database. Differentially expressed genes (DEGs) between recurrent and non‑recurrent samples were determined using the limma package. The feature genes were selected using the Caret package, and an SVM classifier was built using the e1071 package. Using the penalized package, the optimal predictive genes for constructing a risk score system were screened. Finally, stratification analysis of clinical factors and pathway enrichment analysis were performed using Gene Set Enrichment Analysis. A total of 239 DEGs were identified in GSE62254, among which 114 DEGs were significantly associated with both recurrence‑free survival and overall survival. Subsequently, 21 feature genes were screened from the 114 DEGs, and an SVM classifier was built. A risk score system for survival prediction was constructed, following the selection of 10 optimal genes, including A‑kinase anchoring protein 12, angiopoietin‑like protein 1, cysteine‑rich sequence 1, myeloid/lymphoid or mixed‑lineage leukemia, translocated to chromosome 11, neuron navigator 3, neurobeachin, nephroblastoma overexpressed, pleiotrophin, tumor suppressor candidate 3 and zinc finger and SCAN domain containing 18. The stratification analysis revealed that pathological stage was an independent prognostic clinical factor in the high‑risk group. Additionally, eight significant pathways were associated with the 10‑gene signature. The SVM classifier and risk score system may be applied for classifying and predicting the prognosis of patients with GC, respectively.

Keywords: GC; DEGs; SVM; risk score system; stratification analysis.

MeSH terms

  • A Kinase Anchor Proteins / genetics
  • A Kinase Anchor Proteins / metabolism
  • Aged
  • Angiopoietin-Like Protein 1
  • Angiopoietin-like Proteins / genetics
  • Angiopoietin-like Proteins / metabolism
  • Carrier Proteins / genetics
  • Carrier Proteins / metabolism
  • Cytokines / genetics
  • Cytokines / metabolism
  • Databases, Genetic
  • Female
  • Gene Expression Profiling
  • Gene Expression Regulation, Neoplastic / genetics*
  • Gene Regulatory Networks
  • Humans
  • Kaplan-Meier Estimate
  • Male
  • Membrane Proteins / genetics
  • Membrane Proteins / metabolism
  • Middle Aged
  • Neoplasm Recurrence, Local / genetics*
  • Neoplasm Recurrence, Local / metabolism
  • Nerve Tissue Proteins / genetics
  • Nerve Tissue Proteins / metabolism
  • Prognosis
  • Risk Factors
  • Stomach Neoplasms / genetics*
  • Stomach Neoplasms / metabolism
  • Stomach Neoplasms / mortality
  • Stomach Neoplasms / pathology
  • Support Vector Machine*
  • Tumor Suppressor Proteins / genetics
  • Tumor Suppressor Proteins / metabolism

Substances

  • A Kinase Anchor Proteins
  • ANGPTL1 protein, human
  • Angiopoietin-Like Protein 1
  • Angiopoietin-like Proteins
  • Carrier Proteins
  • Cytokines
  • Membrane Proteins
  • NAV3 protein, human
  • NBEA protein, human
  • Nerve Tissue Proteins
  • TUSC3 protein, human
  • Tumor Suppressor Proteins
  • pleiotrophin