Format

Send to

Choose Destination
PLoS Comput Biol. 2018 May 14;14(5):e1006105. doi: 10.1371/journal.pcbi.1006105. eCollection 2018 May.

A loop-counting method for covariate-corrected low-rank biclustering of gene-expression and genome-wide association study data.

Author information

1
Mathematics, New York University, New York, New York, United States of America.
2
Center for Computational Biology, Flatiron Institute, New York, New York, United States of America.
3
Psychiatry, University of California, San Diego, California, United States of America.
4
Human Biology, J. Craig Venters Institute, La Jolla, California, United States of America.
5
Genetics and Genomic Sciences, Mount Sinai Medical School, New York, New York, United States of America.
6
Computer Science, Princeton University, Princeton, New Jersey, United States of America.
7
Computational Mathematics Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America.
8
Department of Rehabilitation Medicine, New York University Medical School, New York, New York, United States of America.
9
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
10
Physiology and Biophysics, University of Gothenburg, Gothenburg, Sweden.

Abstract

A common goal in data-analysis is to sift through a large data-matrix and detect any significant submatrices (i.e., biclusters) that have a low numerical rank. We present a simple algorithm for tackling this biclustering problem. Our algorithm accumulates information about 2-by-2 submatrices (i.e., 'loops') within the data-matrix, and focuses on rows and columns of the data-matrix that participate in an abundance of low-rank loops. We demonstrate, through analysis and numerical-experiments, that this loop-counting method performs well in a variety of scenarios, outperforming simple spectral methods in many situations of interest. Another important feature of our method is that it can easily be modified to account for aspects of experimental design which commonly arise in practice. For example, our algorithm can be modified to correct for controls, categorical- and continuous-covariates, as well as sparsity within the data. We demonstrate these practical features with two examples; the first drawn from gene-expression analysis and the second drawn from a much larger genome-wide-association-study (GWAS).

PMID:
29758032
PMCID:
PMC5997363
DOI:
10.1371/journal.pcbi.1006105
[Indexed for MEDLINE]
Free PMC Article

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: 1. John I Nurnberger Jr, who is one of the members of the Bipolar Disorders Working Group of the Psychiatric Genomics Consortium, is also an investigator for Assurex and a consultant for Janssen. 2. The remaining authors have no conflicts of interest to declare.

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center