Format

Send to

Choose Destination
BMC Syst Biol. 2016 Aug 26;10 Suppl 3:62. doi: 10.1186/s12918-016-0306-z.

Classification of breast cancer patients using somatic mutation profiles and machine learning approaches.

Vural S1, Wang X2, Guda C3,4,5,6.

Author information

1
Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
2
School of Basic Medicine and Clinic Pharmacy, China Pharmaceutical University, Nanjing, 211198, China.
3
Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, 68198, USA. babu.guda@unmc.edu.
4
Bioinformatics and Systems Biology Core, University of Nebraska Medical Center, Omaha, NE, 68198, USA. babu.guda@unmc.edu.
5
Department of Biochemistry and Molecular Biology, University of Nebraska Medical Center, Omaha, NE, 68198, USA. babu.guda@unmc.edu.
6
Fred and Pamela Buffet Cancer Center, University of Nebraska Medical Center, Omaha, NE, 68198, USA. babu.guda@unmc.edu.

Abstract

BACKGROUND:

The high degree of heterogeneity observed in breast cancers makes it very difficult to classify the cancer patients into distinct clinical subgroups and consequently limits the ability to devise effective therapeutic strategies. Several classification strategies based on ER/PR/HER2 expression or the expression profiles of a panel of genes have helped, but such methods often produce misleading results due to their dynamic nature. In contrast, somatic DNA mutations are relatively stable and lead to initiation and progression of many sporadic cancers. Hence in this study, we explore the use of gene mutation profiles to classify, characterize and predict the subgroups of breast cancers.

RESULTS:

We analyzed the whole exome sequencing data from 358 ethnically similar breast cancer patients in The Cancer Genome Atlas (TCGA) project. Somatic and non-synonymous single nucleotide variants identified from each patient were assigned a quantitative score (C-score) that represents the extent of negative impact on the gene function. Using these scores with non-negative matrix factorization method, we clustered the patients into three subgroups. By comparing the clinical stage of patients, we identified an early-stage-enriched and a late-stage-enriched subgroup. Comparison of the mutation scores of early and late-stage-enriched subgroups identified 358 genes that carry significantly higher mutations rates in the late stage subgroup. Functional characterization of these genes revealed important functional gene families that carry a heavy mutational load in the late state rich subgroup of patients. Finally, using the identified subgroups, we also developed a supervised classification model to predict the stage of the patients.

CONCLUSIONS:

This study demonstrates that gene mutation profiles can be effectively used with unsupervised machine-learning methods to identify clinically distinguishable breast cancer subgroups. The classification model developed in this method could provide a reasonable prediction of the cancer patients' stage solely based on their mutation profiles. This study represents the first use of only somatic mutation profile data to identify and predict breast cancer subgroups and this generic methodology can also be applied to other cancer datasets.

KEYWORDS:

Breast cancer classification; Breast cancer subtypes; Cancer stage prediction; Gene mutation profiles; TCGA; Unsupervised and supervised machine learning; Whole exome sequencing data analysis

PMID:
27587275
PMCID:
PMC5009820
DOI:
10.1186/s12918-016-0306-z
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center