Format

Send to

Choose Destination
Bioinformatics. 2016 Sep 1;32(17):i445-i454. doi: 10.1093/bioinformatics/btw434.

Simultaneous discovery of cancer subtypes and subtype features by molecular data integration.

Author information

1
Department of Computer Science, KULeuven, Leuven, Belgium.
2
Leiden Institute for Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands.
3
Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium.
4
Department of Medical Biochemisty and Cell Biology, Institute of Biomedicine, University of Gothenburg, Gothenburg, Sweden.
5
Department of Information Technology, iMinds, Ghent University, Gent, Belgium, Bioinformatics Institute Ghent, 9052 Gent, Belgium, Department of Plant Biotechnology and Bioinformatics, Ghent University, Gent, Belgium Department of Genetics, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa.
6
Department of Computer Science, KULeuven, Leuven, Belgium, Leiden Institute for Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands.

Abstract

MOTIVATION:

Subtyping cancer is key to an improved and more personalized prognosis/treatment. The increasing availability of tumor related molecular data provides the opportunity to identify molecular subtypes in a data-driven way. Molecular subtypes are defined as groups of samples that have a similar molecular mechanism at the origin of the carcinogenesis. The molecular mechanisms are reflected by subtype-specific mutational and expression features. Data-driven subtyping is a complex problem as subtyping and identifying the molecular mechanisms that drive carcinogenesis are confounded problems. Many current integrative subtyping methods use global mutational and/or expression tumor profiles to group tumor samples in subtypes but do not explicitly extract the subtype-specific features. We therefore present a method that solves both tasks of subtyping and identification of subtype-specific features simultaneously. Hereto our method integrates` mutational and expression data while taking into account the clonal properties of carcinogenesis. Key to our method is a formalization of the problem as a rank matrix factorization of ranked data that approaches the subtyping problem as multi-view bi-clustering

RESULTS:

We introduce a novel integrative framework to identify subtypes by combining mutational and expression features. The incomparable measurement data is integrated by transformation into ranked data and subtypes are defined as multi-view bi-clusters We formalize the model using rank matrix factorization, resulting in the SRF algorithm. Experiments on simulated data and the TCGA breast cancer data demonstrate that SRF is able to capture subtle differences that existing methods may miss.

AVAILABILITY AND IMPLEMENTATION:

The implementation is available at: https://github.com/rankmatrixfactorisation/SRF CONTACT: kathleen.marchal@intec.ugent.be, siegfried.nijssen@cs.kuleuven.be

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
27587661
DOI:
10.1093/bioinformatics/btw434
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center