• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Mar 18, 2003; 100(6): 3351–3356.
Published online Mar 11, 2003. doi:  10.1073/pnas.0530258100
PMCID: PMC152296
Genetics

Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms

Abstract

We describe a comparative mathematical framework for two genome-scale expression data sets. This framework formulates expression as superposition of the effects of regulatory programs, biological processes, and experimental artifacts common to both data sets, as well as those that are exclusive to one data set or the other, by using generalized singular value decomposition. This framework enables comparative reconstruction and classification of the genes and arrays of both data sets. We illustrate this framework with a comparison of yeast and human cell-cycle expression data sets.

Keywords: DNA microarrays, cell cycle, yeast Saccharomyces cerevisiae, human HeLa cell line

Recent advances in high-throughput genomic technologies enable acquisition of different types of molecular biological data, e.g., DNA-sequence and mRNA-expression data, on a genomic scale. Comparative analysis of these data among two or more model organisms promises to enhance fundamental understanding of the universality as well as the specialization of molecular biological mechanisms. It also may prove useful in medical diagnosis, treatment, and drug design. Comparisons of the DNA sequence of entire genomes already give insights into evolutionary, biochemical, and genetic pathways.

Comparative analysis of mRNA-expression data requires mathematical tools that are able to distinguish the similar from the dissimilar among two or more large-scale data sets. These tools should provide mathematical frameworks for the description of the data, where the variables and operations may represent some biological reality. Recently we showed that singular value decomposition (SVD) provides such a framework for genome-wide expression data (refs. 1–3; see also refs. 4–7).

Now we show that generalized SVD (GSVD) (8) provides a comparative mathematical framework for two genome-scale expression data sets. GSVD is a linear transformation of the two data sets from the two genes × arrays spaces to two reduced and diagonalized “genelets” × “arraylets” spaces. The genelets are shared by both data sets. Each genelet is expressed only in the two corresponding arraylets, with a corresponding “angular distance” indicating the relative significance of this genelet, i.e., its significance, in one data set relative to that in the other.

We show that a genelet of equal significance in both data sets may represent a process common to both data sets. The two corresponding arraylets may represent the cellular states in each data set that correspond to this common process. A genelet of no significance in one data set relative to the other may represent a process exclusive to the latter data set. The corresponding arraylet of this data set may represent the cellular state that corresponds to this exclusive process.

We also show that mathematical reconstruction of gene expression in a subset of genelets may simulate experimental observation of only the process that these genelets are inferred to represent. Similarly, reconstruction of array expression in the subset of corresponding arraylets may simulate observation of only the corresponding cellular state. Reconstruction of each data set in two or more subspaces may simulate observation of genome-scale differential expression in the processes, which these subspaces are inferred to span. We demonstrate comparative classification of both sets of genes and arrays based on similarity in their reconstructed rather than overall expression.

We illustrate this framework with a comparison of yeast (9) and human (10) cell cycle-expression data sets.

Mathematical Methods: GSVD

A single microarray probes the relative expression levels of N1 genes in a single sample. A series of M1 arrays probes the genome-scale expression levels in M1 different samples, i.e., under M1 different experimental conditions. Let the matrix ê1, of size N1-genes × M1-arrays, tabulate the full expression data. The vector in the nth row of the matrix ê1, left angle bracketg1,n| [equivalent] left angle bracketn|ê1, lists the expression of the nth gene across the different samples that correspond to the different arrays.§ The vector in the mth column of the matrix ê1, |a1,mright angle bracket [equivalent] ê1|mright angle bracket, lists the genome-scale expression measured by the mth array. Let the matrix ê2, of size N2-genes × M2-arrays, tabulate the relative expression levels of N2 genes under M2 = M1 [equivalent] M <max{N1, N2} experimental conditions that correspond one to one to the M1 conditions underlying ê1. This one-to-one correspondence between the two sets of conditions is at the foundation of the GSVD comparative analysis of the two data sets and should be mapped out carefully.

GSVD then is simultaneous linear transformation of the two expression data sets ê1 and ê2 from the two N1-genes × M-arrays and N2-genes × M-arrays spaces to the two reduced M-genelets × M-arraylets spaces (see Fig. 5, which is published as supporting information on the PNAS web site, www.pnas.org, and also at http://genome-www.stanford.edu/GSVD/),

equation M1

In these spaces the data are represented by the diagonal nonnegative matrices [epsilon]1 and [epsilon]2, which satisfy left angle bracketk|[epsilon]1|mright angle bracket [equivalent] epsilon1,mδkm ≥ 0 and left angle bracketk|[epsilon]2|mright angle bracket [equivalent] epsilon2,mδkm ≥ 0 for all 1 ≤ k, m ≤ M. The mth genelet is expressed only in the two mth arraylets, each of which corresponds to one of the two data sets. Therefore, each genelet is decoupled from all other genelets in both data sets simultaneously.

The antisymmetric angular distance between the data sets,

equation M2

indicates the relative significance of the mth genelet, i.e., its significance in the first data set relative to that in the second in terms of the ratio of the expression information captured by this genelet in the first data set to that in the second. An angular distance of 0 indicates a genelet of equal significance in both data sets, with epsilon1,m = epsilon2,m; ±π/4 indicates no significance in the second data set relative to the first, with epsilon1,m [dbl greater-than sign] epsilon2,m, or in the first relative to the second, respectively. The angular distances are arranged in decreasing order of significance in the first data set relative to the second such that π/4 ≥ θ1(...) ≥ θM ≥ −π/4. The “generalized fractions of eigenexpression” of each data set separately indicate the significance of each genelet and its corresponding arraylet in this data set in terms of the fraction of the overall expression information that they capture in this data set alone (see Appendix, Eqs. 4 and 5, and Fig. 6, which are published as supporting information on the PNAS web site).

The transformation matrix x̂−1 defines the M-genelets × M-arrays basis set that is shared by both data sets. The transformation matrices û1 and û2 define the N1-genes × M-arraylets and N2-genes × M-arraylets basis sets that correspond to the first and second data sets, respectively. The vector in the mth row of x̂−1, left angle bracketγm| [equivalent] left angle bracketm|x̂−1, lists the expression of the mth genelet across the different arrays in both data sets simultaneously. The vectors in the mth columns of û1 and û2, |α1,mright angle bracket [equivalent] û1|mright angle bracket and 2,mright angle bracket [equivalent] û2|mright angle bracket, list the genome-scale expression in the mth arraylets of the first and second data sets, respectively. The genelets are normalized, such that left angle bracketγmmright angle bracket = 1 for all 1 ≤ m ≤ M, but not necessarily orthogonal superpositions of the genes of the first and, at the same time, the second data set. The arraylets of either data set are orthonormal superpositions of the arrays of this data set such that, in general, x̂−1 is nonorthogonal, whereas û1 and û2 are both orthogonal,

equation M3

where Î is the identity matrix. Therefore, each arraylet of either data set is decoupled and decorrelated from all other arraylets of this data set. The genelets and arraylets are unique, and therefore also data-driven, up to a phase factor of ±1, because each genelet and arraylet capture both parallel and antiparallel gene- or array-expression patterns, respectively, except in degenerate subspaces, defined by subsets of equal angular distances.

GSVD Calculation.

From Eqs. 1 and 3, the M-arrays × M-arrays symmetric correlation matrices â1 = êequation M4ê1 = (x̂−1)T[epsilon]equation M5x̂−1 and â2 = êequation M6ê2 are represented in the M-genelets × M-genelets space by the simultaneously diagonal matrices [epsilon]equation M7 and [epsilon]equation M8, respectively. In theory, it is possible to calculate the GSVD of the two data sets ê1 and ê2 by (i) diagonalizing âequation M9â1 = x̂([epsilon]equation M10[epsilon]1)2x̂−1 to obtain x̂; (ii) projecting x̂ onto ê1 and ê2 to obtain [epsilon]equation M11 = (û1[epsilon]1)T1[epsilon]1) = (ê1x̂)T1x̂) and [epsilon]equation M12; and (iii) projecting x̂, [epsilon]1, and [epsilon]2 onto ê1 and ê2 to obtain û1 = ê1x̂[epsilon]equation M13 and û2. In practice, we avoid computing the quotient of the correlation matrices, âequation M14â1, and use the numerically robust GSVD algorithm (8, 9) to obtain x̂.

Comparative Pattern Inference.

The decorrelation of the arraylets suggests that some of the significant arraylets of each data set, i.e., these with the largest generalized fractions of eigenexpression (see Appendix, Eqs. 4 and 5, and Fig. 6), may represent independent cellular states, where the corresponding genelets represent the corresponding regulatory programs, biological processes, or experimental artifacts that contribute to the overall expression signal in each data set. The one-to-one correspondence between the two sets of experimental conditions that underlie the two data sets suggests that among these genelets, a genelet of equal significance in both data sets with angular distance of ≈0 may represent a process common to both data sets; a genelet of no significance in one data set relative to the other with angular distance of ≈±equation M15 may represent a process exclusive to the latter data set. We infer that a genelet represents a process exclusive to one or common to both data sets when its expression pattern across the corresponding one or both sets of arrays is biologically or experimentally interpretable. We associate this genelet with a biological process when this inference is supported by one or two coherent biological themes, reflected in the functions of the genes of the corresponding one or both data sets, whose coefficients of this genelet in the GSVD expansion, as listed in the corresponding one or both arraylets, are largest in magnitude compared to those coefficients of all other genes. With this we assume that the corresponding one or both arraylets represent the cellular states of this exclusive or common process, respectively. We estimate the probabilistic significance of these associations by annotations using combinatorics (ref. 10; see Appendix, Fig. 7, and Table 1, which are published as supporting information on the PNAS web site).

Comparative Data Reconstruction.

The decoupling of the genelets and both sets of arraylets allows reconstructing either data set in a given subspace of K-genelets and corresponding arraylets without eliminating genes or arrays, êi → ∑equation M16epsiloni,ki,kright angle bracketleft angle bracketγk|, where i = 1, 2. For visualization and classification, we set the arithmetic mean of each genelet across the arrays and that of each arraylet across the genes to 0, such that the expression of each gene and array in the reconstructed data set is centered at its array- or gene-invariant level, respectively.

Comparative Data Classification.

Inferring that subsets of genelets and arraylets represent independent processes or states, exclusive to one or common to both data sets, allows classifying the genes and arrays of one or simultaneously both data sets by similarity in their expression of these genelets or arraylets, respectively, rather than their overall expression. We least-squares-approximate a subspace spanned by K > 2 genelets with that spanned by the two orthonormal vectors |xright angle bracket and |yright angle bracket, which maximize equation M17 left angle bracketγk|(|xright angle bracketleft angle bracketx| + |yright angle bracketleft angle brackety|)|γkright angle bracket. We plot the projection of each gene of either data set left angle bracketgi,n|, where i = 1, 2, from the K-genelets subspace onto |yright angle bracket, ∑equation M18epsiloni,kleft angle bracketn|αi,kright angle bracketleft angle bracketγk|yright angle bracket/Ni,n, along the y axis vs. that onto |xright angle bracket along the x axis, normalized by its ideal amplitude, where the contribution of each genelet to the overall projected expression of the gene adds up rather than cancels out, Nequation M19 = ∑equation M20equation M21epsiloni,kepsiloni,l|left angle bracketn|αi,kright angle bracketleft angle bracketαi,l|nright angle bracketleft angle bracketγk|(|xright angle bracketleft angle bracketx| + |yright angle bracketleft angle brackety|)|γlright angle bracket|. In this plot, the distance of each gene from the origin, ri,n, is the amplitude of its normalized projection. An amplitude of 1 indicates that the genelets add up; 0 indicates that they cancel out. The phase difference of each gene from the x axis, [var phi]i,n, is its phase in the progression of expression across the genes from |xright angle bracket to |yright angle bracket and back to |xright angle bracket, going through the projections of all K-genelets in this subspace (|xright angle bracketleft angle bracketx| + |yright angle bracketleft angle brackety|)|γkright angle bracket. We sort the genes according to [var phi]i,n. Similarly, we plot the projection of each array, |ai,mright angle bracket, from the K-arraylets subspace onto equation M22i,kright angle bracketleft angle bracketγk|yright angle bracket, ∑equation M23epsiloni,kleft angle brackety|γkright angle bracketleft angle bracketγk|mright angle bracket/Ni,m, along the y axis vs. that onto equation M24i,kright angle bracketleft angle bracketγk|xright angle bracket along the x axis, normalized by its ideal amplitude, Nequation M25 = ∑equation M26equation M27 epsiloni,k epsiloni,l|left angle bracketm|γkright angle bracketleft angle bracketγl|mright angle bracketleft angle bracketγk|(|xright angle bracketleft angle bracketx| + |yright angle bracketleft angle brackety|)|γlright angle bracket|. We sort the arrays according to their phase differences from the x axis, [var phi]i,m.

Biological Results: Comparison of Yeast and Human Cell-Cycle Expression Data Sets

Spellman et al. (11) monitored mRNA levels for 6,113 putative ORFs of the yeast Saccharomyces cerevisiae over two cell-cycle periods in a yeast culture synchronized initially in the cell-cycle stage M/G1 by the pheromone α factor, relative to reference mRNA from an asynchronous culture, at 7-min intervals for 119 min. The data set for the yeast experiments we analyze (see Data Sets 1–4, which are published as supporting information on the PNAS web site and mathematica notebook at http://genome-www.stanford.edu/GSVD/) tabulates the ratios of gene-expression levels for the N1 = 4,523 genes with no missing data in at least 15 of the M1 = 18 arrays. Of these genes, 604 were classified as cell cycle-regulated by Spellman et al., and 77 were classified by traditional methods. Whitfield et al. (12) monitored mRNA levels for 43,198 human gene clones over two and a half cell-cycle periods in a HeLa cell-line culture synchronized initially in S by a double-thymidine block, relative to reference mRNA from an asynchronous HeLa culture, at 2-h intervals for 34 h. The data set for the human experiments we analyze (see Data Sets 5–8, which are published as supporting information on the PNAS web site) tabulates the ratios of gene-expression levels for the N2 = 12,056 clones with no missing data in at least 15 of the M2 = 18 arrays. Of these clones, 750 were classified as cell cycle-regulated by Whitfield et al., and 73 were classified by traditional methods. We estimate the missing data in each data set using SVD (ref. 2; see Appendix and Figs. 8–11, which are published as supporting information on the PNAS web site) and calculate the GSVD of both data sets.

Common Yeast and Human Cell-Cycle Subspace.

The time, i.e., array variations of the third, fourth, and fifth genelets, left angle bracketγ3|, left angle bracketγ4|, and left angle bracketγ5|, that are almost equally significant in both data sets (slightly more in the yeast data), with 0 < θ3, θ4, θ5 < π/16 (Fig. (Fig.1),1), fit normalized cosine functions of two periods and initial phases of π/3, 0, and −π/3, respectively, superimposed on time-invariant expression (Fig. (Fig.2).2). The genelets left angle bracketγ14|, left angle bracketγ15|, and left angle bracketγ16|, which are also almost equally significant in both data sets (slightly more in the human data), with −π/6 < θ14, θ15, θ16 < 0, fit normalized cosines of two and a half periods and initial phases of −π/3, π/3, and 0, respectively. Coherent themes of yeast and human cell-cycle programs emerge from the annotations of the 100 yeast and 100 human genes (13, 14), with largest parallel and separately also antiparallel contributions from each one of these six genelets as listed in the corresponding yeast and human arraylets (see Data Sets 9 and 10, which are published as supporting information on the PNAS web site). We associate all these six genelets with the cell-cycle gene-expression oscillations common to both the yeast and human genomes and manifested in both data sets. We assume that the corresponding six yeast and six human arraylets represent the yeast and human cell-cycle cellular states, respectively. The probabilistic significance of these associations by annotations, estimated using combinatorics, is high: Most of the P values, calculated assuming hypergeometric probability distribution of the annotations among the genes, are orders of magnitude <0.01 (ref. 10; see Appendix, Fig. 7, and Table 1). Following the traditional classifications, the 0-phase genelet left angle bracketγ4| is associated in parallel with the yeast cell-cycle stage M/G1, in which the yeast culture is initially synchronized, and both 0-phase genelets left angle bracketγ4| and −left angle bracketγ16| are associated in parallel with the human cell-cycle stage S, in which the human culture is initially synchronized.

Fig 1.
Yeast and human genelets. (a) Raster display of x̂−1, the expression of 18 genelets in 18 yeast and human arrays simultaneously, centered at their array-invariant levels. (b) Bar chart of the angular distances showing left angle bracketγ ...
Fig 2.
Line-joined graphs of the expression levels of the genelets. (a) left angle bracketγ3| (red), left angle bracketγ4| (blue), and left angle bracketγ5| (green), which are associated with the common yeast and human cell-cycle gene-expression ...

Projecting the expression of the 18 yeast arrays from this six-dimensional yeast arraylets subspace onto the two-dimensional subspace that approximates it, ≥50% of the contributions of the six arraylets add up (rather than cancel out) in the overall expression of 16 arrays, the normalized amplitudes of which satisfy 0.5 ≤ r1,m < 1 (Fig. (Fig.3).3). Sorting the arrays according to their phases, {[var phi]1,m}, gives an array order similar to that of the cell-cycle time points measured by the arrays that describes the yeast cell-cycle progression from the M/G1 stage through G1, S, S/G2, and G2/M back to M/G1 twice. Because the projection of the 0-phase arraylets |α1,4right angle bracket and −|α1,16right angle bracket, which correspond to the 0-phase genelets, left angle bracketγ4| and −left angle bracketγ16|, is correlated with the arrays |a1,1right angle bracket, |a1,2right angle bracket, and |a1,10right angle bracket and also |a1,9right angle bracket and |a1,18right angle bracket, we associate both yeast 0-phase arraylets with the cell-cycle cellular state of transition from G2/M to M/G1, in which the yeast culture is synchronized initially. Projecting the expression of the 18 human arrays from the six-dimensional human arraylets subspace onto the two-dimensional subspace that approximates it, ≥50% of the contributions of the six arraylets add up in the expression of 16 arrays. Sorting the arrays describes the human cell-cycle progression from S through G2, G2/M, M/G1, and G1/S back to S two and a half times. Because the projection of the 0-phase arraylets, |α2,4right angle bracket and −|α2,16right angle bracket, is correlated with the arrays |a2,2right angle bracket and |a2,9right angle bracket, we associate both human 0-phase arraylets with the cell-cycle stage S, in which the human culture is synchronized.

Fig 3.
Yeast (a–c) and human (d–f) expression reconstructed in the six-dimensional cell-cycle subspaces approximated by two-dimensional subspaces. (a) Yeast array expression, projected onto π/2-phase along the y axis vs. that onto 0-phase ...

Projecting the expression of the yeast and human genes from the six-dimensional genelets subspace onto the two-dimensional subspace that approximates it, ≥50% of the contributions of the six genelets add up in the overall expression of 547 of the 604 yeast genes that were classified as cell cycle-regulated by Spellman et al. (11), 709 of the 750 human genes classified by Whitfield et al. (12), and 71 of the 77 yeast and 71 of the 73 human genes classified by traditional methods (including, e.g., 14 of 16 human histones, that were not classified by Whitfield et al. as cell cycle-regulated based on their overall expression). Simultaneous classification of the yeast and human genes into the five cell-cycle stages describes the yeast and human cell cycles' progression along the yeast and human genes, respectively, and is in good agreement with the classifications by Spellman et al. and Whitfield et al. and also the traditional ones. Because the projection of the 0-phase genelets, left angle bracketγ4| and −left angle bracketγ16|, is correlated with yeast genes that peak late in G2/M and early in M/G1 and human genes that peak in S, we associate left angle bracketγ4| and −left angle bracketγ16| with cell-cycle expression oscillations of yeast at the transition from G2/M to M/G1 and human at S. This simultaneous classification therefore outlines a correspondence between the groups of yeast genes and those of human genes, e.g., yeast genes that peak at M/G1 correspond to human genes that peak at S, the cell-cycle stages in which the yeast and human cultures are synchronized initially, respectively.

With all 4,523 yeast and 12,056 human genes sorted, the gene variations of the six yeast and six human arraylets approximately fit one-period cosines of π/3, 0, and −π/3 initial phases (Fig. (Fig.4)4) such that the initial phase of each arraylet is similar to that of its corresponding genelet. Both sorted and reconstructed yeast and human expressions approximately fit traveling waves of one-period cosinusoidal variation across the genes and of two or two and a half periods across the arrays, respectively.

Fig 4.
Yeast (a–d) and human (e–h) expression reconstructed in the six-dimensional cell-cycle subspaces with genes sorted according to their phases in the two-dimensional subspaces that approximate them. (a) Yeast expression of the sorted 4,523 ...

Exclusive Yeast Pheromone-Response Subspace.

The genelets left angle bracketγ1| and left angle bracketγ2|, insignificant in the human data set relative to that of the yeast, with θ1, θ2 > π/7 (Fig. (Fig.1),1), describe initial transient increase and decrease in expression, respectively (Fig. (Fig.2).2). A theme of yeast response to pheromone synchronization emerges from the annotations of those yeast genes with contributions from left angle bracketγ1| and left angle bracketγ2| that are largest in magnitude. The genelet left angle bracketγ6|, equally significant in both data sets with θ6 ~ 0, describes an initial transient increase in expression superimposed on cosinusidial variation. A theme of transition from pheromone response to cell-cycle progression emerges from the annotations of those yeast genes with contributions from left angle bracketγ6|, as listed in the corresponding yeast arraylet |α1,6right angle bracket, that are largest in magnitude (see Data Set 9). We associate these three genelets and corresponding three yeast arraylets with the pheromone response, which is exclusive to the yeast genome. Classification of the yeast genes and arrays into pheromone-response stages in the subspaces spanned by these genelets and arraylets, respectively, is in good agreement with the traditional understanding of this program (ref. 13; Figs. 12–14, which are published as supporting information on the PNAS web site).

Exclusive Human Stress-Response Subspace.

The genelets left angle bracketγ17| and left angle bracketγ18| are insignificant in the yeast data set relative to that of the human, with θ17, θ18 < −π/6. A theme of human synchronization stress response emerges from the annotations of those human genes with contributions from left angle bracketγ17| and left angle bracketγ18| that are largest in magnitude. Also, from the annotations of those human genes with contributions from left angle bracketγ6|, as listed in the corresponding human arraylet |α2,6right angle bracket, that are largest in magnitude emerges a theme of transition from stress response to cell-cycle progression (see Data Set 10). We associate these three genelets and corresponding three human arraylets with this human-exclusive stress response. Classification of the human genes and arrays into stress-response stages in the subspaces spanned by these genelets and arraylets, respectively, is in agreement with current understanding of this program (ref. 12; Figs. 15–17, which are published as supporting information on the PNAS web site).

Differential Expression of Yeast Genes in the Exclusive Pheromone-Response and the Common Cell-Cycle Subspaces.

According to their expression in the yeast-exclusive pheromone-response subspace, mRNA expression of both yeast genes KAR4 and CIK1 peak early in the time course (together with that of other genes known to be involved in the α-factor response) (Fig. (Fig.3).3). In the common cell-cycle subspace, KAR4 peaks at the G1 cell-cycle stage, whereas CIK1 peaks almost half a cell-cycle period later (and also earlier) at S/G2 (Fig. 12). This differential expression of CIK1 and KAR4 in the response to pheromone program vs. that of the cell cycle is in agreement with the experimental observation of Kurihara et al. (15), who showed that induction of CIK1 depends on that of KAR4 during mating, and is independent of KAR4 during mitosis.

Differential Expression of Human Genes in the Exclusive Stress-Response and the Common Cell-Cycle Subspaces.

In the human-exclusive stress-response subspace, most human histones reach their expression minima early (Fig. (Fig.3).3). In the common cell-cycle subspace, most histones peak early, together with other genes known to peak in the cell-cycle stage S (Fig. 14). This differential expression of most histones may explain why these histones do not appear to be cell cycle-regulated based on their overall expression.

Conclusions

We have shown that GSVD provides a comparative mathematical framework for two genome-scale expression data sets, in which the variables and operations may represent some biological reality. Using GSVD in a comparison of yeast and human cell-cycle expression data sets, we were able to find (i) biological similarity in these two disparate organisms in terms of their mRNA expression during their cell-cycle programs; (ii) experimental dissimilarity in terms of yeast and human mRNA expression during their different synchronization-response programs; and (iii) differential gene expression in the yeast and human cell-cycle programs vs. their synchronization-response programs, respectively.

Possible additional applications of GSVD include comparison of two genomic data sets, each corresponding to (i) the same experiment repeated, e.g., using different experimental protocols, to separate the biological signal that is similar in both data sets from the dissimilar experimental artifacts; (ii) one of two different types of genomic information (e.g., DNA copy number, mRNA expression, or protein abundance) collected from the same set of samples (e.g., tumor samples) to elucidate the molecular composition of the overall biological signal in these samples; (iii) one of two chromosomes of the same organism to illustrate the relation, if any, between these chromosomes in terms of their, e.g., mRNA expression in a given set of samples; and (iv) one of two interacting organisms, e.g., during infection, to illuminate the exchange of biological information in these interactions.

Supplementary Material

Supporting Information:

Acknowledgments

We thank G. H. Golub for insightful discussions of matrix computation, M. L. Whitfield for discussions of the human cell-cycle data and careful reading, and G. M. Church, S. R. Eddy, and E. Rivas for thoughtful reviews of this manuscript. This work was supported by National Cancer Institute Grants CA77097 (to D.B.) and CA85129 (to P.O.B.) and National Institute of General Medical Sciences Grant GM46406 (to D.B.). O.A. is a Sloan Foundation/Department of Energy Postdoctoral Fellow in Computational Molecular Biology (DE-FG03-99ER62836) and a National Human Genome Research Institute Individual Mentored Research Scientist Development Awardee in Genomic Research and Analysis (5 K01 HG00038-01). P.O.B. is a Howard Hughes Medical Institute Investigator.

Abbreviations

  • SVD, singular value decomposition
  • GSVD, generalized SVD

Notes

§In this article m denotes a matrix, |vright angle bracket denotes a column vector, and left angle bracketu| denotes a row vector such that m|vright angle bracket, left angle bracketu|m, and left angle bracketu|vright angle bracket all denote inner products, and |vright angle bracketleft angle bracketu| denotes an outer product.

References

1. Alter O., Brown, P. O. & Botstein, D. (2000) Proc. Natl. Acad. Sci. USA 97, 10101-10106. [PMC free article] [PubMed]
2. Alter O., Brown, P. O. & Botstein, D. (2001) in Microarrays: Optical Technologies and Informatics, eds. Bittner, M. L., Chen, Y., Dorsel, A. N. & Dougherty, E. R. (Int. Soc. Optical Eng., Bellingham, WA), Vol. 4266, pp. 186.
3. Nielsen T. O., West, R. B., Linn, S. C., Alter, O., Knowling, M. A., O'Connell, J. X., Ferro, M., Sherlock, G., Pollack, J. R., Brown, P. O., et al. (2002) Lancet 359, 1301-1307. [PubMed]
4. Wen X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. L. & Somogyi, R. (1998) Proc. Natl. Acad. Sci. USA 95, 334-339. [PMC free article] [PubMed]
5. Hilsenbeck S. G., Friedrichs, W. E., Schiff, R., O'Connell, P., Hansen, R. K., Osborne, C. K. & Fuqua, S. A. (1999) J. Natl. Cancer Inst. 91, 453-459. [PubMed]
6. Raychaudhuri S., Stuart, J. M. & Altman, R. B. (2000) in Proceedings of the Pacific Symposium on Biocomputing, eds. Altman, R. B., Lauderdale, K., Dunker, A. K., Hunter, L. & Klein, T. E. (World Scientific, Singapore), pp. 455.
7. Holter N. S., Mitra, M., Maritan, A., Cieplak, M., Banavar, J. R. & Fedoroff, N. V. (2000) Proc. Natl. Acad. Sci. USA 97, 8409-8414. [PMC free article] [PubMed]
8. Golub G. H. & Van Loan, C. F., (1996) Matrix Computation (Johns Hopkins Univ. Press, Baltimore).
9. Paige C. C. & Saunders, M. A. (1981) SIAM J. Numer. Anal. 18, 398-405.
10. Tavazoie S., Hughes, J. D., Campbell, M. J., Cho, R. J. & Church, G. M. (1999) Nat. Genet. 22, 281-285. [PubMed]
11. Spellman P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. & Futcher, B. (1998) Mol. Biol. Cell 9, 3273-3297. [PMC free article] [PubMed]
12. Whitfield M. L., Sherlock, G., Saldanha, A., Murray, J. I., Ball, C. A., Alexander, K. E., Matese, J. C., Perou, C. M., Hurt, M. M., Brown, P. O. & Botstein, D. (2002) Mol. Biol. Cell 13, 1977-2000. [PMC free article] [PubMed]
13. Dwight S. S., Harris, M. A., Dolinski, K., Ball, C. A., Binkley, G., Christie, K. R., Fisk, D. G., Issel-Tarver, L., Schroeder, M., Sherlock, G., et al. (2002) Nucleic Acids Res. 30, 69-72. [PMC free article] [PubMed]
14. Sherlock G., Hernandez-Boussard, T., Kasarskis, A., Binkley, G., Matese, J. C., Dwight, S. S., Kaloper, M., Weng, S., Jin, H., Ball, C. A., et al. (2001) Nucleic Acids Res. 29, 152-155. [PMC free article] [PubMed]
15. Kurihara L. J., Stewart, B. G., Gammie, A. E. & Rose, M. D. (1996) Mol. Cell. Biol. 16, 3990-4002. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...