Mutation Clusters from Cancer Exome

Genes (Basel). 2017 Aug 15;8(8):201. doi: 10.3390/genes8080201.

Abstract

We apply our statistically deterministic machine learning/clustering algorithm *K-means (recently developed in https://ssrn.com/abstract=2908286) to 10,656 published exome samples for 32 cancer types. A majority of cancer types exhibit a mutation clustering structure. Our results are in-sample stable. They are also out-of-sample stable when applied to 1389 published genome samples across 14 cancer types. In contrast, we find in- and out-of-sample instabilities in cancer signatures extracted from exome samples via nonnegative matrix factorization (NMF), a computationally-costly and non-deterministic method. Extracting stable mutation structures from exome data could have important implications for speed and cost, which are critical for early-stage cancer diagnostics, such as novel blood-test methods currently in development.

Keywords: DNA; K-means; cancer signatures; clustering; correlation; covariance; eRank; exome; genome; industry classification; machine learning; matrix; nonnegative matrix factorization; quantitative finance; sample; somatic mutation; source code; statistical risk model.