Format

Send to

Choose Destination
See comment in PubMed Commons below
PLoS One. 2011;6(12):e28072. doi: 10.1371/journal.pone.0028072. Epub 2011 Dec 22.

A higher-order generalized singular value decomposition for comparison of global mRNA expression from multiple organisms.

Author information

  • 1Department of Electrical and Computer Engineering, University of Texas at Austin, Texas, USA.

Abstract

The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices D(i)∈R(m(i) × n), each with full column rank. Each matrix is exactly factored as D(i)=U(i)Σ(i)V(T), where V, identical in all factorizations, is obtained from the eigensystem SV=VΛ of the arithmetic mean S of all pairwise quotients A(i)A(j)(-1) of the matrices A(i)=D(i)(T)D(i), i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λ(k)≥1. Equality holds if and only if the corresponding eigenvector v(k) is a right basis vector of equal significance in all matrices D(i) and D(j), that is σ(i,k)/σ(j,k)=1 for all i and j, and the corresponding left basis vector u(i,k) is orthogonal to all other vectors in U(i) for all i. The eigenvalues λ(k)=1, therefore, define the "common HO GSVD subspace." We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.

PMID:
22216090
PMCID:
PMC3245232
DOI:
10.1371/journal.pone.0028072
[PubMed - indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Support Center