Regularization and grouping -omics data by GCA method: A transcriptomic case

Monika Piwowar; Kinga A Kocemba-Pilarczyk; Piotr Piwowar

doi:10.1371/journal.pone.0206608

Regularization and grouping -omics data by GCA method: A transcriptomic case

PLoS One. 2018 Nov 1;13(11):e0206608. doi: 10.1371/journal.pone.0206608. eCollection 2018.

Authors

Monika Piwowar¹, Kinga A Kocemba-Pilarczyk², Piotr Piwowar³

Affiliations

¹ Department of Bioinformatics and Telemedicine, Jagiellonian University-Medical College, Krakow, Poland.
² Chair of Medical Biochemistry, Jagiellonian University Medical College, Krakow, Poland.
³ AGH University of Science and Technology, Faculty of Electrical Engineering, Automatics, Computer Science and Biomedical Engineering, Department of Measurements and Electronic, Krakow, Poland.

Abstract

The paper presents the application of Grade Correspondence Analysis (GCA) and Grade Correspondence Cluster Analysis (GCCA) for ordering and grouping -omics datasets, using transcriptomic data as an example. Based on gene expression data describing 256 patients with Multiple Myeloma it was shown that the GCA method could be used to find regularities in the analyzed collections and to create characteristic gene expression profiles for individual groups of patients. GCA iteratively permutes rows and columns to maximize the tau-Kendall or rho-Spearman coefficients, which makes it possible to arrange rows and columns in such a way that the most similar ones remain in each other's neighbourhood. In this way, the GCA algorithm highlights regularities in the data matrix. The ranked data can then be grouped using the GCCA method, and after that aggregated in clusters, providing a representation that is easier to analyze-especially in the case of large sets of gene expression profiles. Regularization of transcriptomic data, which is presented in this manuscript, has enabled division of the data set into column clusters (representing genes) and row clusters (representing patients). Subsequently, rows were aggregated (based on medians) to visualise the gene expression profiles for patients with Multiple Myeloma in each collection. The presented analysis became the starting point for characterisation of differentiated genes and biochemical processes in which they are involved. GCA analysis may provide an alternative analytical method to support differentiation and analysis of gene expression profiles characterising individual groups of patients.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Cluster Analysis
Computational Biology / methods*
Gene Expression
Gene Expression Profiling
Humans
Multiple Myeloma / genetics
Multiple Myeloma / metabolism
Transcription, Genetic
Transcriptome*

Grants and funding

The Jagiellonian University Medical College sponsored the preparation of the manuscript as part of the grant number: K/ZDS/006364 to MP. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.