• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
PLoS One. 2011; 6(12): e28072.
Published online Dec 22, 2011. doi:  10.1371/journal.pone.0028072
PMCID: PMC3245232

A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression from Multiple Organisms

Dongxiao Zhu, Editor

Abstract

The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e001.jpg, each with full column rank. Each matrix is exactly factored as Di = UiΣiVT, where V, identical in all factorizations, is obtained from the eigensystem SV = VΛ of the arithmetic mean S of all pairwise quotients An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e002.jpg of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e003.jpg, ij. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λk≥1. Equality holds if and only if the corresponding eigenvector vk is a right basis vector of equal significance in all matrices Di and Dj, that is σi,k/σj,k = 1 for all i and j, and the corresponding left basis vector ui,k is orthogonal to all other vectors in Ui for all i. The eigenvalues λk = 1, therefore, define the “common HO GSVD subspace.” We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.

Introduction

In many areas of science, especially in biotechnology, the number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing. This is accompanied by a fundamental need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. For example, comparative analyses of global mRNA expression from multiple model organisms promise to enhance fundamental understanding of the universality and specialization of molecular biological mechanisms, and may prove useful in medical diagnosis, treatment and drug design [1]. Existing algorithms limit analyses to subsets of homologous genes among the different organisms, effectively introducing into the analysis the assumption that sequence and functional similarities are equivalent (e.g., [2]). However, it is well known that this assumption does not always hold, for example, in cases of nonorthologous gene displacement, when nonorthologous proteins in different organisms fulfill the same function [3]. For sequence-independent comparisons, mathematical frameworks are required that can distinguish and separate the similar from the dissimilar among multiple large-scale datasets tabulated as matrices with different row dimensions, corresponding to the different sets of genes of the different organisms. The only such framework to date, the generalized singular value decomposition (GSVD) [4][7], is limited to two matrices.

It was shown that the GSVD provides a mathematical framework for sequence-independent comparative modeling of DNA microarray data from two organisms, where the mathematical variables and operations represent biological reality [7], [8]. The variables, significant subspaces that are common to both or exclusive to either one of the datasets, correlate with cellular programs that are conserved in both or unique to either one of the organisms, respectively. The operation of reconstruction in the subspaces common to both datasets outlines the biological similarity in the regulation of the cellular programs that are conserved across the species. Reconstruction in the common and exclusive subspaces of either dataset outlines the differential regulation of the conserved relative to the unique programs in the corresponding organism. Recent experimental results [9] verify a computationally predicted genome-wide mode of regulation that correlates DNA replication origin activity with mRNA expression [10], [11], demonstrating that GSVD modeling of DNA microarray data can be used to correctly predict previously unknown cellular mechanisms.

We now define a higher-order GSVD (HO GSVD) for the comparison of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e004.jpg datasets. The datasets are tabulated as An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e005.jpg real matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e006.jpg, each with full column rank, with different row dimensions and the same column dimension, where there exists a one-to-one mapping among the columns of the matrices. Like the GSVD, the HO GSVD is an exact decomposition, i.e., each matrix is exactly factored as An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e007.jpg, where the columns of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e008.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e009.jpg have unit length and are the left and right basis vectors respectively, and each An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e010.jpg is diagonal and positive definite. Like the GSVD, the matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e011.jpg is identical in all factorizations. In our HO GSVD, the matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e012.jpg is obtained from the eigensystem An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e013.jpg of the arithmetic mean An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e014.jpg of all pairwise quotients An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e015.jpg of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e016.jpg, or equivalently of all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e017.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e018.jpg.

To clarify our choice of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e019.jpg, we note that in the GSVD, defined by Van Loan [5], the matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e020.jpg can be formed from the eigenvectors of the unbalanced quotient An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e021.jpg (Section 1 in Appendix S1). We observe that this An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e022.jpg can also be formed from the eigenvectors of the balanced arithmetic mean An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e023.jpg. We prove that in the case of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e024.jpg, our definition of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e025.jpg by using the eigensystem of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e026.jpg leads algebraically to the GSVD (Theorems S1–S5 in Appendix S1), and therefore, as Paige and Saunders showed [6], can be computed in a stable way. We also note that in the GSVD, the matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e027.jpg does not depend upon the ordering of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e028.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e029.jpg. Therefore, we define our HO GSVD for An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e030.jpg matrices by using the balanced arithmetic mean An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e031.jpg of all pairwise arithmetic means An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e032.jpg, each of which defines the GSVD of the corresponding pair of matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e033.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e034.jpg, noting that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e035.jpg does not depend upon the ordering of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e036.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e037.jpg.

We prove that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e038.jpg is nondefective (it has An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e039.jpg independent eigenvectors), and that its eigensystem is real (Theorem 1). We prove that the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e040.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e041.jpg (Theorem 2). As in our GSVD comparison of two matrices [7], we interpret the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e042.jpgth diagonal of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e043.jpg in the factorization of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e044.jpg th matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e045.jpg as indicating the significance of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e046.jpgth right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e047.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e048.jpg in terms of the overall information that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e049.jpg captures in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e050.jpg. The ratio An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e051.jpg indicates the significance of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e052.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e053.jpg relative to its significance in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e054.jpg. We prove that an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e055.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e056.jpg if and only if the corresponding eigenvector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e057.jpg is a right basis vector of equal significance in all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e058.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e059.jpg, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e060.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e061.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e062.jpg, and the corresponding left basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e063.jpg is orthonormal to all other vectors in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e064.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e065.jpg. We therefore mathematically define, in analogy with the GSVD, the “common HO GSVD subspace” of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e066.jpg matrices to be the subspace spanned by the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e067.jpg that correspond to the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e068.jpg eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e069.jpg (Theorem 3). We also show that each of the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e070.jpg that span the common HO GSVD subspace is a generalized singular vector of all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e071.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e072.jpg with equal corresponding generalized singular values for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e073.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e074.jpg (Corollary 1).

Recent research showed that several higher-order generalizations are possible for a given matrix decomposition, each preserving some but not all of the properties of the matrix decomposition [12][14] (see also Theorem S6 and Conjecture S1 in Appendix S1). Our new HO GSVD extends to higher orders all of the mathematical properties of the GSVD except for complete column-wise orthogonality of the left basis vectors that form the matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e075.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e076.jpg, i.e., in each factorization.

We illustrate the HO GSVD with a comparison of cell-cycle mRNA expression from S. pombe [15], [16], S. cerevisiae [17] and human [18]. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required (Section 2 in Appendix S1). We find that the common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in this common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. Simultaneous sequence-independent classification of the genes of the three organisms in the common subspace is in agreement with previous classifications into cell-cycle phases [19]. Notably, genes of highly conserved sequences across the three organisms [20], [21] but significantly different cell-cycle peak times, such as genes from the ABC transporter superfamily [22][28], phospholipase B-encoding genes [29], [30] and even the B cyclin-encoding genes [31], [32], are correctly classified.

Methods

HO GSVD Construction

Suppose we have a set of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e077.jpg real matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e078.jpg each with full column rank. We define a HO GSVD of these An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e079.jpg matrices as

equation image
(1)

where each An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e081.jpg is composed of normalized left basis vectors, each An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e082.jpg is diagonal with An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e083.jpg, and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e084.jpg, identical in all matrix factorizations, is composed of normalized right basis vectors. As in the GSVD comparison of global mRNA expression from two organisms [7], in the HO GSVD comparison of global mRNA expression from An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e085.jpg organisms, the shared right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e086.jpg of Equation (1) are the “genelets” and the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e087.jpg sets of left basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e088.jpg are the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e089.jpg sets of “arraylets” (Figure 1 and Section 2 in Appendix S1). We obtain An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e090.jpg from the eigensystem of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e091.jpg, the arithmetic mean of all pairwise quotients An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e092.jpg of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e093.jpg, or equivalently of all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e094.jpg, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e095.jpg:

equation image
(2)

with An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e097.jpg. We prove that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e098.jpg is nondefective, i.e., An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e099.jpg has An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e100.jpg independent eigenvectors, and that its eigenvectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e101.jpg and eigenvalues An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e102.jpg are real (Theorem 1). We prove that the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e103.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e104.jpg (Theorem 2).

Figure 1
Higher-order generalized singular value decomposition (HO GSVD).

Given An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e132.jpg, we compute matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e133.jpg by solving An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e134.jpg linear systems:

equation image
(3)

and we construct An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e136.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e137.jpg by normalizing the columns of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e138.jpg:

equation image
(4)

HO GSVD Interpretation

In this construction, the rows of each of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e140.jpg matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e141.jpg are superpositions of the same right basis vectors, the columns of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e142.jpg (Figures S1 and S2 and Section 1 in Appendix S1). As in our GSVD comparison of two matrices, we interpret the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e143.jpgth diagonals of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e144.jpg, the “higher-order generalized singular value set” An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e145.jpg, as indicating the significance of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e146.jpgth right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e147.jpg in the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e148.jpg, and reflecting the overall information that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e149.jpg captures in each An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e150.jpg respectively. The ratio An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e151.jpg indicates the significance of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e152.jpg in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e153.jpg relative to its significance in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e154.jpg. A ratio of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e155.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e156.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e157.jpg corresponds to a right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e158.jpg of equal significance in all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e159.jpg matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e160.jpg. GSVD comparisons of two matrices showed that right basis vectors of approximately equal significance in the two matrices reflect themes that are common to both matrices under comparison [7]. A ratio of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e161.jpg indicates a basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e162.jpg of almost negligible significance in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e163.jpg relative to its significance in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e164.jpg. GSVD comparisons of two matrices showed that right basis vectors of negligible significance in one matrix reflect themes that are exclusive to the other matrix.

We prove that an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e165.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e166.jpg if and only if the corresponding eigenvector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e167.jpg is a right basis vector of equal significance in all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e168.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e169.jpg, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e170.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e171.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e172.jpg, and the corresponding left basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e173.jpg is orthonormal to all other vectors in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e174.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e175.jpg. We therefore mathematically define, in analogy with the GSVD, the “common HO GSVD subspace” of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e176.jpg matrices to be the subspace spanned by the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e177.jpg corresponding to the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e178.jpg that satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e179.jpg (Theorem 3).

It follows that each of the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e180.jpg that span the common HO GSVD subspace is a generalized singular vector of all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e181.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e182.jpg with equal corresponding generalized singular values for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e183.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e184.jpg (Corollary 1). Since the GSVD can be computed in a stable way [6], we note that the common HO GSVD subspace can also be computed in a stable way by computing all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e185.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e186.jpg. This also suggests that it may be possible to formulate the HO GSVD as a solution to an optimization problem, in analogy with existing variational formulations of the GSVD [33]. Such a formulation may lead to a stable numerical algorithm for computing the HO GSVD, and possibly also to a higher-order general Gauss-Markov linear statistical model [34][36].

We show, in a comparison of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e187.jpg matrices, that the approximately common HO GSVD subspace of these three matrices reflects a theme that is common to the three matrices under comparison (Section 2).

HO GSVD Mathematical Properties

Theorem 1

An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e188.jpg is nondefective (it has An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e189.jpg independent eigenvectors) and its eigensystem is real.

Proof. From Equation (2) it follows that

equation image
(5)

and the eigenvectors of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e191.jpg equal the eigenvectors of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e192.jpg.

Let the SVD of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e193.jpg appended along the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e194.jpg-columns axis be

equation image
(6)

Since the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e196.jpg are real and with full column rank, it follows from the SVD of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e197.jpg that the symmetric matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e198.jpg are real and positive definite, and their inverses exist. It then follows from Equations (5) and (6) that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e199.jpg is similar to An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e200.jpg,

equation image
(7)

and the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e202.jpg equal the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e203.jpg.

A sum of real, symmetric and positive definite matrices, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e204.jpg is also real, symmetric and positive definite; therefore, its eigensystem

equation image
(8)

is real with An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e206.jpg orthogonal and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e207.jpg. Without loss of generality let An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e208.jpg be orthonormal, such that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e209.jpg. It follows from the similarity of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e210.jpg with An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e211.jpg that the eigensystem of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e212.jpg can be written as An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e213.jpg, with the real and nonsingular An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e214.jpg, where An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e215.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e216.jpg such that An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e217.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e218.jpg.

Thus, from Equation (5), An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e219.jpg is nondefective with real eigenvectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e220.jpg. Also, the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e221.jpg satisfy

equation image
(9)

where An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e223.jpg are the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e224.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e225.jpg. Thus, the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e226.jpg are real. □

Theorem 2

The eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e227.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e228.jpg.

Proof. Following Equation (9), asserting that the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e229.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e230.jpg is equivalent to asserting that the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e231.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e232.jpg.

From Equations (6) and (7), the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e233.jpg satisfy

equation image
(10)

under the constraint that

equation image
(11)

where An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e236.jpg is a real unit vector, and where it follows from the Cauchy-Schwarz inequality [37] (see also [4], [34], [38]) for the real nonzero vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e237.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e238.jpg that for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e239.jpg

equation image
(12)

With the constraint of Equation (11), which requires the sum of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e241.jpg positive numbers An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e242.jpg to equal one, the lower bound on the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e243.jpg in Equation (10) is at its minimum when the sum of the inverses of these numbers is at its minimum, that is, when the numbers equal

equation image
(13)

for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e245.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e246.jpg. Thus, the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e247.jpg satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e248.jpg. □

Theorem 3

The common HO GSVD subspace. An eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e249.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e250.jpg if and only if the corresponding eigenvector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e251.jpg is a right basis vector of equal significance in all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e252.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e253.jpg, that is, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e254.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e255.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e256.jpg, and the corresponding left basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e257.jpg is orthonormal to all other vectors in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e258.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e259.jpg. The “common HO GSVD subspace” of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e260.jpg matrices is, therefore, the subspace spanned by the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e261.jpg corresponding to the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e262.jpg that satisfy An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e263.jpg.

Proof. Without loss of generality, let An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e264.jpg. From Equation (12) and the Cauchy-Schwarz inequality, an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e265.jpg equals its minimum lower bound An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e266.jpg if and only if the corresponding eigenvector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e267.jpg is also an eigenvector of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e268.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e269.jpg [37], where, from Equation (13), the corresponding eigenvalue equals An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e270.jpg,

equation image
(14)

Given the eigenvectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e272.jpg of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e273.jpg, we solve Equation (3) for each An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e274.jpg of Equation (6), and obtain

equation image
(15)

Following Equations (14) and (15), where An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e276.jpg corresponds to a minimum eigenvalue An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e277.jpg, and since An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e278.jpg is orthonormal, we obtain

equation image
(16)

with zeroes in the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e280.jpgth row and the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e281.jpgth column of the matrix above everywhere except for the diagonal element. Thus, an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e282.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e283.jpg if and only if the corresponding left basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e284.jpg are orthonormal to all other vectors in An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e285.jpg.

The corresponding higher-order generalized singular values are An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e286.jpg. Thus An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e287.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e288.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e289.jpg, and the corresponding right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e290.jpg is of equal significance in all matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e291.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e292.jpg. □

Corollary 1

An eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e293.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e294.jpg if and only if the corresponding right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e295.jpg is a generalized singular vector of all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e296.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e297.jpg with equal corresponding generalized singular values for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e298.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e299.jpg.

Proof. From Equations (12) and (13), and since the pairwise quotients An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e300.jpg are similar to An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e301.jpg with the similarity transformation of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e302.jpg for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e303.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e304.jpg, it follows that an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e305.jpg satisfies An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e306.jpg if and only if the corresponding right basis vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e307.jpg is also an eigenvector of each of the pairwise quotients An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e308.jpg of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e309.jpg with equal corresponding eigenvalues, or equivalently of all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e310.jpg with all eigenvalues at their minimum of one,

equation image
(17)

We prove (Theorems S1–S5 in Appendix S1) that in the case of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e312.jpg matrices our definition of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e313.jpg by using the eigensystem of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e314.jpg leads algebraically to the GSVD, where an eigenvalue of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e315.jpg equals its minimum of one if and only if the two corresponding generalized singular values are equal, such that the corresponding generalized singular vector An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e316.jpg is of equal significance in both matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e317.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e318.jpg. Thus, it follows that each of the right basis vectors An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e319.jpg that span the common HO GSVD subspace is a generalized singular vector of all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e320.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e321.jpg with equal corresponding generalized singular values for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e322.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e323.jpg. □

Note that since the GSVD can be computed in a stable way [6], the common HO GSVD subspace we define (Theorem 3) can also be computed in a stable way by computing all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e324.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e325.jpg (Corollary 1). It may also be possible to formulate the HO GSVD as a solution to an optimization problem, in analogy with existing variational formulations of the GSVD [33]. Such a formulation may lead to a stable numerical algorithm for computing the HO GSVD, and possibly also to a higher-order general Gauss-Markov linear statistical model [34][36].

Results

HO GSVD Comparison of Global mRNA Expression from Three Organisms

Consider now the HO GSVD comparative analysis of global mRNA expression datasets from the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e326.jpg organisms S. pombe, S. cerevisiae and human (Section 2.1 in Appendix S1, Mathematica Notebooks S1 and S2, and Datasets S1, S2 and S3). The datasets are tabulated as matrices of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e327.jpg columns each, corresponding to DNA microarray-measured mRNA expression from each organism at An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e328.jpg time points equally spaced during approximately two cell-cycle periods. The underlying assumption is that there exists a one-to-one mapping among the 17 columns of the three matrices but not necessarily among their rows, which correspond to either An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e329.jpg-S. pombe genes, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e330.jpg-S. cerevisiae genes or An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e331.jpg-human genes. The HO GSVD of Equation (1) transforms the datasets from the organism-specific genesAn external file that holds a picture, illustration, etc.
Object name is pone.0028072.e332.jpgAn external file that holds a picture, illustration, etc.
Object name is pone.0028072.e333.jpg-arrays spaces to the reduced spaces of the 17-“arraylets,” i.e., left basis vactorsAn external file that holds a picture, illustration, etc.
Object name is pone.0028072.e334.jpg17-“genelets,” i.e., right basis vectors, where the datasets An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e335.jpg are represented by the diagonal nonnegative matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e336.jpg, by using the organism-specific genesAn external file that holds a picture, illustration, etc.
Object name is pone.0028072.e337.jpg17-arraylets transformation matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e338.jpg and the one shared 17-geneletsAn external file that holds a picture, illustration, etc.
Object name is pone.0028072.e339.jpg17-arrays transformation matrix An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e340.jpg (Figure 1).

Following Theorem 3, the approximately common HO GSVD subspace of the three datasets is spanned by the five genelets An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e341.jpg that correspond to An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e342.jpg. We find that these five genelets are approximately equally significant with An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e343.jpg in the S. pombe, S. cerevisiae and human datasets, respectively (Figure 2 a and b). The five corresponding arraylets in each dataset are An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e344.jpg-orthonormal to all other arraylets (Figure S3 in Appendix S1).

Figure 2
Genelets or right basis vectors.

Common HO GSVD Subspace Represents Similar Cell-Cycle Oscillations

The expression variations across time of the five genelets that span the approximately common HO GSVD subspace fit normalized cosine functions of two periods, superimposed on time-invariant expression (Figure 2 c and d). Consistently, the corresponding organism-specific arraylets are enriched [39] in overexpressed or underexpressed organism-specific cell cycle-regulated genes, with 24 of the 30 P-values An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e347.jpg (Table 1 and Section 2.2 in Appendix S1). For example, the three 17th arraylets, which correspond to the 0-phase 17th genelet, are enriched in overexpressed G2 S. pombe genes, G2/M and M/G1 S. cerevisiae genes and S and G2 human genes, respectively, representing the cell-cycle checkpoints in which the three cultures are initially synchronized.

Table 1
Arraylets or left basis vectors.

Simultaneous sequence-independent reconstruction and classification of the three datasets in the common subspace outline cell-cycle progression in time and across the genes in the three organisms (Sections 2.3 and 2.4 in Appendix S1). Projecting the expression of the 17 arrays of either organism from the corresponding five-dimensional arraylets subspace onto the two-dimensional subspace that approximates it (Figure S4 in Appendix S1), An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e379.jpg of the contributions of the arraylets add up, rather than cancel out (Figure 3 ac). In these two-dimensional subspaces, the angular order of the arrays of either organism describes cell-cycle progression in time through approximately two cell-cycle periods, from the initial cell-cycle phase and back to that initial phase twice. Projecting the expression of the genes, An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e380.jpg of the contributions of the five genelets add up in the overall expression of 343 of the 380 S. pombe genes classified as cell cycle-regulated, 554 of the 641 S. cerevisiae cell-cycle genes, and 632 of the 787 human cell-cycle genes (Figure 3 df). Simultaneous classification of the genes of either organism into cell-cycle phases according to their angular order in these two-dimensional subspaces is consistent with the classification of the arrays, and is in good agreement with the previous classifications of the genes (Figure 3 gi). With all 3167 S. pombe, 4772 S. cerevisiae and 13,068 human genes sorted, the expression variations of the five arraylets from each organism approximately fit one-period cosines, with the initial phase of each arraylet (Figures S5, S6, S7 in Appendix S1) similar to that of its corresponding genelet (Figure 2). The global mRNA expression of each organism, reconstructed in the common HO GSVD subspace, approximately fits a traveling wave, oscillating across time and across the genes.

Figure 3
Common HO GSVD subspace represents similar cell-cycle oscillations.

Note also that simultaneous reconstruction in the common HO GSVD subspace removes the experimental artifacts and batch effects, which are dissimilar, from the three datasets. Consider, for example, the second genelet. With An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e383.jpg in the S. pombe, S. cerevisiae and human datasets, respectively, this genelet is almost exclusive to the S. cerevisiae dataset. This genelet is anticorrelated with a time decaying pattern of expression (Figure 2a). Consistently, the corresponding S. cerevisiae-specific arraylet is enriched in underexpressed S. cerevisiae genes that were classified as up-regulated by the S. cerevisiae synchronizing agent, the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e384.jpg-factor pheromone, with the P-value An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e385.jpg. Reconstruction in the common subspace effectively removes this S. cerevisiae-approximately exclusive pattern of expression variation from the three datasets.

Simultaneous HO GSVD Classification of Homologous Genes of Different Cell-Cycle Peak Times

Notably, in the simultaneous sequence-independent classification of the genes of the three organisms in the common subspace, genes of significantly different cell-cycle peak times [19] but highly conserved sequences [20], [21] are correctly classified (Section 2.5 in Appendix S1).

For example, consider the G2 S. pombe gene BFR1 (Figure 4a), which belongs to the evolutionarily highly conserved ATP-binding cassette (ABC) transporter superfamily [22]. The closest homologs of BFR1 in our S. pombe, S. cerevisiae and human datasets are the S. cerevisiae genes SNQ2, PDR5, PDR15 and PDR10 (Table S1a in Appendix S1). The expression of SNQ2 and PDR5 is known to peak at the S/G2 and G2/M cell-cycle phases, respectively [17]. However, sequence similarity does not imply similar cell-cycle peak times, and PDR15 and PDR10, the closest homologs of PDR5, are induced during stationary phase [23], which has been hypothesized to occur in G1, before the Cdc28-defined cell-cycle arrest [24]. Consistently, we find PDR15 and PDR10 at the M/G1 to G1 transition, antipodal to (i.e., half a cell-cycle period apart from) SNQ2 and PDR5, which are projected onto S/G2 and G2/M, respectively (Figure 4b). We also find the transcription factor PDR1 at S/G2, its known cell-cycle peak time, adjacent to SNQ2 and PDR5, which it positively regulates and might be regulated by, and antipodal to PDR15, which it negatively regulates [25][28].

Figure 4
Simultaneous HO GSVD classification of homologous genes of different cell-cycle peak times.

Another example is the S. cerevisiae phospholipase B-encoding gene PLB1 [29], which peaks at the cell-cycle phase M/G1 [30]. Its closest homolog in our S. cerevisiae dataset, PLB3, also peaks at M/G1 [17] (Figure 4d). However, among the closest S. pombe and human homologs of PLB1 (Table S1b in Appendix S1), we find the S. pombe genes SPAC977.09c and SPAC1786.02, which expressions peak at the almost antipodal S. pombe cell-cycle phases S and G2, respectively [19] (Figure 4c).

As a third example, consider the S. pombe G1 B-type cyclin-encoding gene CIG2 [31], [32] (Table S1c in Appendix S1). Its closest S. pombe homolog, CDC13, peaks at M [19] (Figure 4e). The closest human homologs of CIG2, the cyclins CCNA2 and CCNB2, peak at G2 and G2/M, respectively (Figure 4g). However, while periodicity in mRNA abundance levels through the cell cycle is highly conserved among members of the cyclin family, the cell-cycle peak times are not necessarily conserved [1]: The closest homologs of CIG2 in our S. cerevisiae dataset, are the G2/M promoter-encoding genes CLB1,2 and CLB3,4, which expressions peak at G2/M and S respectively, and CLB5, which encodes a DNA synthesis promoter, and peaks at G1 (Figure 4f).

Discussion

We mathematically defined a higher-order GSVD (HO GSVD) for two or more large-scale matrices with different row dimensions and the same column dimension. We proved that our new HO GSVD extends to higher orders almost all of the mathematical properties of the GSVD: The eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e386.jpg are always greater than or equal to one, and an eigenvalue of one corresponds to a right basis vector of equal significance in all matrices, and to a left basis vector in each matrix factorization that is orthogonal to all other left basis vectors in that factorization. We therefore mathematically defined, in analogy with the GSVD, the common HO GSVD subspace of the An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e387.jpg matrices to be the subspace spanned by the right basis vectors that correspond to the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e388.jpg that equal one.

The only property that does not extend to higher orders in general is the complete column-wise orthogonality of the normalized left basis vectors in each factorization. Recent research showed that several higher-order generalizations are possible for a given matrix decomposition, each preserving some but not all of the properties of the matrix decomposition [12][14]. The HO GSVD has the interesting property of preserving the exactness and diagonality of the matrix GSVD and, in special cases, also partial or even complete column-wise orthogonality. That is, all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e389.jpg matrix factorizations in Equation (1) are exact, all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e390.jpg matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e391.jpg are diagonal, and when one or more of the eigenvalues of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e392.jpg equal one, the corresponding left basis vectors in each factorization are orthogonal to all other left basis vectors in that factorization.

The complete column-wise orthogonality of the matrix GSVD [5] enables its stable computation [6]. We showed that each of the right basis vectors that span the common HO GSVD subspace is a generalized singular vector of all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e393.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e394.jpg with equal corresponding generalized singular values for all An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e395.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e396.jpg. Since the GSVD can be computed in a stable way, the common HO GSVD subspace can also be computed in a stable way by computing all pairwise GSVD factorizations of the matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e397.jpg and An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e398.jpg. That is, the common HO GSVD subspace exists also for An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e399.jpg matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e400.jpg that are not all of full column rank. This also means that the common HO GSVD subspace can be formulated as a solution to an optimization problem, in analogy with existing variational formulations of the GSVD [33].

It would be ideal if our procedure reduced to the stable computation of the matrix GSVD when An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e401.jpg. To achieve this ideal, we would need to find a procedure that allows a computation of the HO GSVD, not just the common HO GSVD subspace, for An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e402.jpg matrices An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e403.jpg that are not all of full column rank. A formulation of the HO GSVD, not just the common HO GSVD subspace, as a solution to an optimization problem may lead to a stable numerical algorithm for computing the HO GSVD. Such a formulation may also lead to a higher-order general Gauss-Markov linear statistical model [34][36].

It was shown that the GSVD provides a mathematical framework for sequence-independent comparative modeling of DNA microarray data from two organisms, where the mathematical variables and operations represent experimental or biological reality [7], [8]. The variables, subspaces of significant patterns that are common to both or exclusive to either one of the datasets, correlate with cellular programs that are conserved in both or unique to either one of the organisms, respectively. The operation of reconstruction in the subspaces common to both datasets outlines the biological similarity in the regulation of the cellular programs that are conserved across the species. Reconstruction in the common and exclusive subspaces of either dataset outlines the differential regulation of the conserved relative to the unique programs in the corresponding organism. Recent experimental results [9] verify a computationally predicted genome-wide mode of regulation [10], [11], and demonstrate that GSVD modeling of DNA microarray data can be used to correctly predict previously unknown cellular mechanisms.

Here we showed, comparing global cell-cycle mRNA expression from the three disparate organisms S. pombe, S. cerevisiae and human, that the HO GSVD provides a sequence-independent comparative framework for two or more genomic datasets, where the variables and operations represent biological reality. The approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.

Additional possible applications of our HO GSVD in biotechnology include comparison of multiple genomic datasets, each corresponding to (i) the same experiment repeated multiple times using different experimental protocols, to separate the biological signal that is similar in all datasets from the dissimilar experimental artifacts; (ii) one of multiple types of genomic information, such as DNA copy number, DNA methylation and mRNA expression, collected from the same set of samples, e.g., tumor samples, to elucidate the molecular composition of the overall biological signal in these samples; (iii) one of multiple chromosomes of the same organism, to illustrate the relation, if any, between these chromosomes in terms of their, e.g., mRNA expression in a given set of samples; and (iv) one of multiple interacting organisms, e.g., in an ecosystem, to illuminate the exchange of biological information in these interactions.

Supporting Information

Appendix S1

A PDF format file, readable by Adobe Acrobat Reader.

(PDF)

Mathematica Notebook S1

Higher-order generalized singular value decomposition (HO GSVD) of global mRNA expression datasets from three different organisms. A Mathematica 5.2 code file, executable by Mathematica 5.2 and readable by Mathematica Player, freely available at http://www.wolfram.com/products/player/.

(NB)

Mathematica Notebook S2

HO GSVD of global mRNA expression datasets from three different organisms. A PDF format file, readable by Adobe Acrobat Reader.

(PDF)

Dataset S1

S. pombe global mRNA expression. A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e404.jpg = 3167 S. pombe gene clones at An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e405.jpg = 17 time points during about two cell-cycle periods from Rustici et al. [15] with the cell-cycle classifications of Rustici et al. or Oliva et al. [16].

(TXT)

Dataset S2

S. cerevisiae global mRNA expression. A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e406.jpg = 4772 S. cerevisiae open reading frames (ORFs), or genes, at An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e407.jpg = 17 time points during about two cell-cycle periods, including cell-cycle classifications, from Spellman et al. [17].

(TXT)

Dataset S3

Human global mRNA expression. A tab-delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing the relative mRNA expression levels of An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e408.jpg = 13,068 human genes at An external file that holds a picture, illustration, etc.
Object name is pone.0028072.e409.jpg = 17 time points during about two cell-cycle periods, including cell-cycle classifications, from Whitfield et al. [18].

(TXT)

Acknowledgments

We thank G. H. Golub for introducing us to matrix and tensor computations, and the American Institute of Mathematics in Palo Alto and Stanford University for hosting the 2004 Workshop on Tensor Decompositions and the 2006 Workshop on Algorithms for Modern Massive Data Sets, respectively, where some of this work was done. We also thank C. H. Lee for technical assistance, R. A. Horn for helpful discussions of matrix analysis and careful reading of the manuscript, and L. De Lathauwer and A. Goffeau for helpful comments.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: This research was supported by Office of Naval Research Grant N00014-02-1-0076 (to MAS), National Science Foundation Grant DMS-1016284 (to CFVL), as well as the Utah Science Technology and Research (USTAR) Initiative, National Human Genome Research Institute R01 Grant HG-004302 and National Science Foundation CAREER Award DMS-0847173 (to OA). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P. Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature. 2006;443:594–597. [PubMed]
2. Lu Y, Huggins P, Bar-Joseph Z. Cross species analysis of microarray expression data. Bioinformatics. 2009;25:1476–1483. [PMC free article] [PubMed]
3. Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93:10268–10273. [PMC free article] [PubMed]
4. Golub GH, Van Loan CF. Matrix Computations. Baltimore: Johns Hopkins University Press, third edition; 1996.
5. Van Loan CF. Generalizing the singular value decomposition. SIAM J Numer Anal. 1976;13:76–83.
6. Paige CC, Saunders MA. Towards a generalized singular value decomposition. SIAM J Numer Anal. 1981;18:398–405.
7. Alter O, Brown PO, Botstein D. Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms. Proc Natl Acad Sci USA. 2003;100:3351–3356. [PMC free article] [PubMed]
8. Alter O. Discovery of principles of nature from mathematical modeling of DNA microarray data. Proc Natl Acad Sci USA. 2006;103:16063–16064. [PMC free article] [PubMed]
9. Omberg L, Meyerson JR, Kobayashi K, Drury LS, Diffley JF, et al. Global effects of DNA replication and DNA replication origin activity on eukaryotic gene expression. Mol Syst Biol. 2009;5:312. [PMC free article] [PubMed]
10. Alter O, Golub GH. Integrative analysis of genome-scale data by using pseudoinverse projection predicts novel correlation between DNA replication and RNA transcription. Proc Natl Acad Sci USA. 2004;101:16577–16582. [PMC free article] [PubMed]
11. Omberg L, Golub GH, Alter O. A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. Proc Natl Acad Sci USA. 2007;104:18371–18376. [PMC free article] [PubMed]
12. De Lathauwer L, De Moor B, Vandewalle J. A multilinear singular value decomposition. SIAM J Matrix Anal Appl. 2000;21:1253–1278.
13. Vandewalle J, De Lathauwer L, Comon P. The generalized higher order singular value decomposition and the oriented signal-to-signal ratios of pairs of signal tensors and their use in signal processing. Proc ECCTD'03 - European Conf on Circuit Theory and Design. 2003. pp. I-389–I-392.
14. Alter O, Golub GH. Reconstructing the pathways of a cellular system from genome-scale signals using matrix and tensor computations. Proc Natl Acad Sci USA. 2005;102:17559–17564. [PMC free article] [PubMed]
15. Rustici G, Mata J, Kivinen K, Lió P, Penkett CJ, et al. Periodic gene expression program of the fission yeast cell cycle. Nat Genet. 2004;36:809–817. [PubMed]
16. Oliva A, Rosebrock A, Ferrezuelo F, Pyne S, Chen H, et al. The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol. 2005;3:e225. [PMC free article] [PubMed]
17. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9:3273–3297. [PMC free article] [PubMed]
18. Whitfield ML, Sherlock G, Saldanha A, Murray JI, Ball CA, et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. Mol Biol Cell. 2002;13:1977–2000. [PMC free article] [PubMed]
19. Gauthier NP, Larsen ME, Wernersson R, Brunak S, Jensen TS. Cyclebase.org: version 2.0, an updated comprehensive, multi-species repository of cell cycle experiments and derived analysis results. Nucleic Acids Res. 2010;38:D699–D702. [PMC free article] [PubMed]
20. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410. [PubMed]
21. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated nonredundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007;35:D61–D65. [PMC free article] [PubMed]
22. Decottignies A, Goffeau A. Complete inventory of the yeast ABC proteins. Nat Genet. 1997;15:137–145. [PubMed]
23. Mamnun YM, Schüller C, Kuchler K. Expression regulation of the yeast PDR5 ATP-binding cassette (ABC) transporter suggests a role in cellular detoxification during the exponential growth phase. FEBS Lett. 2004;559:111–117. [PubMed]
24. Werner-Washburne M, Braun E, Johnston GC, Singer RA. Stationary phase in the yeast Saccharomyces cerevisiae. Microbiol Rev. 1993;57:383–401. [PMC free article] [PubMed]
25. Meyers S, Schauer W, Balzi E, Wagner M, Goffeau A, et al. Interaction of the yeast pleiotropic drug resistance genes PDR1 and PDR5. Curr Genet. 1992;21:431–436. [PubMed]
26. Mahé Y, Parle-McDermott A, Nourani A, Delahodde A, Lamprecht A, et al. The ATP-binding cassette multidrug transporter Snq2 of Saccharomyces cerevisiae: a novel target for the transcription factors Pdr1 and Pdr3. Mol Microbiol. 1996;20:109–117. [PubMed]
27. Wolfger H, Mahé Y, Parle-McDermott A, Delahodde A, Kuchler K. The yeast ATP binding cassette (ABC) protein genes PDR10 and PDR15 are novel targets for the Pdr1 and Pdr3 transcriptional regulators. FEBS Lett. 1997;418:269–274. [PubMed]
28. Hlaváček O, Kučerová H, Harant K, Palková Z, Váchová L. Putative role for ABC multidrug exporters in yeast quorum sensing. FEBS Lett. 2009;583:1107–1113. [PubMed]
29. Lee KS, Patton JL, Fido M, Hines LK, Kohlwein SD, et al. The Saccharomyces cerevisiae PLB1 gene encodes a protein required for lysophospholipase and phospholipase B activity. J Biol Chem. 1994;269:19725–19730. [PubMed]
30. Cho RJ, Campbell MJ, Winzeler EA, Steinmetz L, Conway A, et al. A genome-wide transcriptional analysis of the mitotic cell cycle. Mol Cell. 1998;2:65–73. [PubMed]
31. Martin-Castellanos C, Labib K, Moreno S. B-type cyclins regulate G1 progression in fission yeast in opposition to the p25rum1 cdk inhibitor. EMBO J. 1996;15:839–849. [PMC free article] [PubMed]
32. Fisher DL, Nurse P. A single fission yeast mitotic cyclin B p34cdc2 kinase promotes both S-phase and mitosis in the absence of G1 cyclins. EMBO J. 1996;15:850–860. [PMC free article] [PubMed]
33. Chu MT, Funderlic RE, Golub GH. On a variational formulation of the generalized singular value decomposition. SIAM J Matrix Anal Appl. 1997;18:1082–1092.
34. Rao CR. Linear Statistical Inference and Its Applications. New York, NY: John Wiley & Sons, second edition; 1973.
35. Rao CR. Optimization of functions of matrices with applications to statistical problems. In: Rao PSRS, Sedransk J, editors. W.G. Cochran's Impact on Statistics. New York, NY: John Wiley & Sons; 1984. pp. 191–202.
36. Paige CC. The general linear model and the generalized singular value decomposition. Linear Algebra Appl. 1985;70:269–284.
37. Marshall AW, Olkin L. Matrix versions of the Cauchy and Kantorovich inequalities. Aequationes Mathematicae. 1990;40:89–93.
38. Horn RA, Johnson CR. Matrix Analysis. Cambridge, UK: Cambridge University Press; 1985.
39. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM. Systematic determination of genetic network architecture. Nat Genet. 1999;22:281–285. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...