• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Bioinformatics. Author manuscript; available in PMC Apr 1, 2009.
Published in final edited form as:
PMCID: PMC2600603
NIHMSID: NIHMS57130

Integrative analysis reveals the direct and indirect interactions between DNA copy number aberrations and gene expression changes

Hyunju Lee,1,4,* Sek Won Kong,2,3,* and Peter J Park1,2,

Abstract

Motivation

DNA copy number aberrations (CNAs) and gene expression (GE) changes provide valuable information for studying chromosomal instability and its consequences in cancer. While it is clear that the structural aberrations and the transcript levels are intertwined, their relationship is more complex and subtle than initially suspected. Most studies so far have focused on how a CNA affects the expression levels of those genes contained within that CNA.

Results

To better understand the impact of CNAs on expression, we investigated the correlation of each CNA to all other genes in the genome. The correlations are computed over multiple patients that have both expression and copy number measurements in brain, bladder, and breast cancer data sets. We find that a CNA has a direct impact on the gene amplified or deleted, but it also has a broad, indirect impact elsewhere. To identify a set of CNAs that is coordinately associated with the expression changes of a set of genes, we used a biclustering algorithm on the correlation matrix. For each of the three cancer types examined, the aberrations in several loci are associated with cancer-type specific biological pathways that have been described in the literature: CNAs of chromosome (chr) 7p13 were significantly correlated with epidermal growth factor receptor signaling pathway in glioblastoma multiforme, chr 13q with NF-kappaB cascades in bladder cancer, and chr 11p with Reck pathway in breast cancer. In all three data sets, gene sets related to cell cycle/division such as M phase, DNA replication, and cell division were also associated with CNAs. Our results suggest that CNAs are both directly and indirectly correlated with changes in expression and that it is beneficial to examine the indirect effects of CNAs.

1. INTRODUCTION

Nearly all cancers are caused by abnormalities in the DNA (Vogelstein and Kinzler, 2004). Structural changes of chromosomal regions such as aneuploidies, translocations, copy number aberrations (CNAs), and point mutations have been observed in various tumors (Lengauer et al., 1998). Among these, CNAs represent both amplifications and deletions of chromosomes, often ranging from 0.5 to 10 Mb in size. CNAs of oncogenes and tumor suppressor genes have been reported as causatively related with initiation, development and progression of cancer (Pinkel and Albertson, 2005; Albertson et al., 2000). With the maturation of microarray technology, CNAs studies using high-resolution array comparative genomic hybridizations (aCGH) have been performed in many types of cancer, including brain, prostate, colon, pancreatic and lung cancers (Liu et al., 2006; Chaudhary and Schmidt, 2006; Tonon et al., 2005; Pole et al., 2006; Phillips et al., 2006). These genome-wide chromosome copy number data have accelerated cancer research by allowing identification of new candidate cancer loci, classification of cancer subtypes, and discovery of molecular mechanisms of cancers. In addition, meta-analyses of published aCGH data sets have revealed a relationship between the CNA pattern and cancer cell lineages (Myllykangas et al., 2007; Jong et al., 2007).

While CNAs are structural changes, measuring the level of transcripts provides additional information on whether those changes have functional consequences. Genome-wide profiling of gene expression (GE) has already shown promising possibilities in classification of cancer, prediction of treatment responses, and discovery of correlated events in the clinical data such as metastasis (Bild et al., 2006). So far, several groups performed systematic studies to check whether CNAs are directly associated with transcriptional changes of the genes contained in those CNAs (Pollack et al., 2002; Jarvinen et al., 2006; Chaudhary and Schmidt, 2006; Phillips et al., 2006; Stranger et al., 2007). Hyman et al. (2002) analyzed a set of aCGH and GE profiles from the same 14 breast cancer cell lines hybridized on cDNA microarrays. They calculated the mean difference in gene expression between samples with and without amplifications divided by standard deviations for each gene and compared with those from random permutations for estimating statistical significance. They reported that 44% of the highly amplified genes (>2.5 in copy number ratio) were up-regulated and that the percentage decreased with a lower level of amplification. Using the same statistical method, Jarvinen et al. (2006) analyzed CNAs and GEs from laryngeal squamous cell carcinoma cell line and found that 39% of amplified regions were up-regulated and 14% of deleted regions were down-regulated. These percentages decrease in the primary tumors: only 18% of amplified regions are up-regulated and there were no changes in the deleted regions. Chaudhary and Schmidt (2006) stimulated the prostate cancer cell line DU145 with serum and found that a large proportion of genes in deleted regions were down-regulated, but most genes in amplified regions did not show any change in GE. Although different tumor types and quantification methods can give varied estimates, these results clearly demonstrate the high impact of copy number in the transcription of those genes contained in the aberration. This direct relationship between structural changes in the DNA and gene expression has been used to identify or verify candidate cancer genes and pathways (Soroceanu et al., 2007; Ruano et al., 2006; Yao et al., 2006; Chin et al., 2006; Wolf et al., 2004; Sweet-Cordero et al., 2006; Hyman et al., 2002).

These studies suggest that the relationship between CNAs and GEs is not simple and that the positive correlations are often but not always observed. The interaction between the two is further complicated by distant interactions in which a CNA can impact the expression of genes located elsewhere. For instance, Soroceanu et al. (2007) observed in glioblastoma that the DNA loss in PTEN, a known oncogene located in chr 10, is highly correlated with over-expression of IGFR or EGFR, both of which are located away from chr 10. In the following, we call the relationships between CNAs and GE in the same location as a direct interaction and those in the different locations as an indirect one.

In the current study, we investigate both the direct and indirect relationships between structural changes by measured by aCGH and functional changes measured by expression arrays, by analyzing three data sets in which both the copy number and expression were available. For this type of integration, there are several difficulties to overcome. The first is that the choice of data sets is limited. While both aCGH and expression data sets are plentiful, paired data sets with both DNA and RNA data on the same set of patients are scarce. It is possible to infer relationships from unpaired data sets, but that process is prone to false positives. The second issue is that the probes in the two platforms generally vary greatly, both in array type and in resolution. The newer aCGH arrays have oligonucleotide probes with much higher resolution, but the arrays in the data sets we use are two channel arrays using Bacterial Artificial Chromosomes (BACs) and thus have a low resolution, on the order of 1 MB. The platforms for expression data, on the other hand, are generally oligonucleotide arrays with higher resolution. Reconciling between the two requires resolving the many-to-one or one-to-many mappings in each chromosomal segment and may require judicious averaging of the probe values in the higher resolution platform. The third difficulty is that many genes are co-expressed and that CNAs occur simultaneously in multiple locations (Chin et al., 2006). This limits the precision in locating the interacting partners. In the proposed approach, we thus deduce a set of modules, each module containing a group of co-expressed genes and a group co-occurring CNAs. These two groups are highly correlated and provide sufficient information for pathway analysis. The relationships inferred by these modules involve distant loci and are thus fundamentally different from those derived in previous studies. Below, the proposed approach is described in detail and is applied to three data sets containing glioblastoma multiforme (GBM), bladder, and breast cancer samples. In all cases, we observe that cell-cycle related pathways are enriched. More importantly, we identify several statistically significant CNAs that are associated with disease-specific pathways in each case.

2. METHODS

Data sets

The method is illustrated in Figure 1. We collected and reanalyzed three paired data sets: 34 GBM samples (Nigro et al., 2005), 57 bladder tumor samples (Stransky et al., 2006), and 89 breast tumor samples (Chin et al., 2006). Each sample consists of a BAC array for measuring copy number and an Affymetrix GeneChip for measuring expression. Each of three datasets contained about 2,400 BAC probes at an approximately megabase interval. For copy number changes, log ratio to normal samples were used as described in the original publications. Gene expression index was recalculated with the raw data (CEL files) using the GCRMA algorithm (Hubbell et al., 2002). When multiple probe sets were mapped to the same RefSeq ID, we calculated the geometric mean after excluding the probe sets (with_x_at suffix) that do not map uniquely to the genome. Log-transformed values were used for further evaluation and statistical procedures.

Fig. 1
A schematic of our approach. (a) A gene expression data set and (b) its paired CNA data set are collected. For CNAs, we choose BAC probes that show amplification or deletion in a given fraction of patients. (c) For every pair of genes and the selected ...

Measuring association between CNA and expression

To investigate the association between CNAs and gene expression changes, we used the Pearson Correlation Coefficient (PCC). We first selected BAC array probes. Since many probes did not show any aberrations and thus are no longer of interest, we selected a subset of BAC probes for further analysis using the following criterion: CNAs with probes among the top 12.5% of the amplifications or the bottom 12.5% of the deletions for at least twenty percent of samples. Using PCC, we computed the association between all pairs of selected BACs from aCGH and RefSeq ID from gene expression data. The results were stored in the Correlation Matrix, as illustrated in Figure 1c. We defined the association as direct when the BAC probes and RefSeq genes were located on the same cytoband, and all other significant associations were defined as indirect. We note that while a segmentation algorithm is generally used to process aCGH data (Lai et al., 2005), it results in a loss of sensitivity in this analysis, as the spatial averaging fails to take advantage of the full range of the observed log-ratios for a given probe. This is particularly true for the BAC data sets we consider here.

Biclustering for identification of modules

Because the occurrences of many CNAs are highly correlated, it is difficult to accurately distinguish among them. The same is true for expression profiles. Thus, rather than trying to relate a particular CNA with the expression of a particular gene, we search for a set of CNAs and a set of expression profiles that are highly correlated, using a biclustering approach.

Biclustering has been popular in expression profiles studies as it attempts to find a subset of genes having similar expression patterns under a group of conditions. Such an entity is often called a module. For a comparison of various biclustering algorithms, see Prelic et al. (2006). In this study, a biclustering algorithm called SAMBA (Tanay et al., 2004) was used to identify associated CNAs and gene expression changes (Figure 1d). The statistical significance of generated modules was calculated by a method based on the framework developed by Tanay et al. (2004) and those modules with p-values smaller than 0.0001 were selected for further analysis (See Supplementary Material). This approach allows multiple appearances of genes and of conditions in several modules, reflecting a biological principle that genes can have multiple functions (Cheng and Church, 2000; Dudley et al., 2005). The biclustering approach is appropriate for the present study, as both CNAs and genes with expression changes may participate in multiple pathways and loci distributed across different chromosomes may be related to the same biological pathway.

Enriched pathways in CNV-GE modules

To determine functional relevance of the modules identified, we tested whether the genes from expression data contained in a module were enriched for specific biological functions or signaling pathways (Figure 1d). We collected the gene sets of biologically related functions from Gene Ontology (GO) using the annotation package in Bioconductor (http://www.bioconductor.org). Biological process GO terms with sizes between 5 and 250 were used to exclude too specific or too general ones. Additional gene sets were downloaded from the Molecular Signature Database (MSigDB) at the Broad Institute (http://www.broad.mit.edu/gsea/msigdb/msigdbindex.html). We used three categories of gene sets from MSigDB: (C1) Cytobands, (C2) Manually curated pathways including BioCarta, and (C3) Motif gene sets.

For each module, we calculated the hypergeometric statistics and the associated p-values to find enriched gene sets. To address the multiple comparison issue with respect to the large number of gene sets tested at the same time, we calculated the estimated false discovery rate using the q-values and <0.01 was used for enrichment threshold (Storey and Tibshirani, 2003).

To find out statistically significant structural components, we calculated a hypergeometric statistic for the enrichment of cytobands of BACs in a given module. To map BAC probes to cytobands, we used cytoband information downloaded from the UCSC golden path database (ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/cytoBand.txt.gz).

3. RESULTS

3.1 Associations between CNAs and GEs

The first question was whether there are in fact many cases of strong association between distant loci. Isolated cases of such relationships have been observed, but there was no quantification of such effect previously.

PCC between all pairs of selected BACs from aCGH and RefSeq ID from gene expression data showed that the large proportion of significant associations was from different cytobands. When we controlled the significant associations as the top 1% of total number of associations, 2% (10 out of 515) of pairs in the same cytobands and 1% (4,386 out of 439,151) of pairs in the different cytobands were significantly associated. The numbers were similarly high in bladder and breast cancer data sets.

These numbers clearly suggest that there are highly correlated distant loci and that studying the impact of CNAs only on the expression of those genes contained in the CNAs is not sufficient.

3.2 Modules from the biclustering method

Given our threshold for statistical significance, biclustering of CNA-GE correlation matrix generated 247, 339, and 506 modules for the GBM, bladder, and breast cancer datasets, respectively. This was based on a less strict overlapping criterion between modules (overlap factor 0.1 is used in [0,1] scale where 1 indicates non-overlap). Each module consisted of selected BAC probes and RefSeq genes that were highly correlated with each other. To select modules containing both structural and functional changes among them, we performed hypergeometric tests of the BAC probes for possible enrichment in a cytoband, and genes for gene sets as described in Methods.

Signaling pathway gene sets

When we applied the BioCarta and the manually curated gene sets from the MSigDB C2 category, 20, 18, and 18 modules were significant in GBM, bladder, and breast cancer, respectively. These modules identified signaling pathways that were highly correlated with the CNAs of distant locations (Figure 2, Supplementary Figures S1 and S2).

Fig. 2
Structural and functional changes observed in the GBM data set. The number on the y-axis is a module identifier and the names on the x-axis represent the enriched cytobands and pathways in the modules. The shading of the red color corresponds to the −log10(p-value) ...

In the GBM data set (Nigro et al., 2005), the authors examined the paired data from 34 patients. They found that patient survival was significantly correlated with both CNAs and GE, and noticed that the aberration in a locus could be associated with the changes in expression on a different locus. For instance, they observed that CH3L1/YKL-40, a gene located on chr 1, had a strong correlation with CNAs of chr 10. Here we systematically investigated both direct and indirect association between CNAs and GE. Figure 2 shows 20 enriched categories from the MSigDB C2 gene sets for the GBM data. These categories were enriched in one or more modules deemed significant. The PGC1A pathway and the proteosome pathway, for instance, were found multiple times in different modules.

We queried all the genes from the enriched pathways in the PubMed database to check whether our findings have been reported previously. Official gene symbol and the name of specific cancer were used as keywords. We found the supporting evidence for a number of genes from the enriched modules. The result and the supporting references are summarized in Table 1. Here we describe two examples from Table 1. Module 193 of the GBM dataset was significantly enriched for the epidermal growth factor receptor (EGFR) related pathway (uncorrected Fisher’s exact p-value = 4.3E-05 and corresponding q-value < 0.01), shown in Figure 3. Among the genes in this pathway, EGF, GAB3, and GRB7 were highly correlated with the CNAs of chr 7p13. This result is very interesting, for EGF itself is located on chr 4q25. It has been reported that EGFR in 7p12 is amplified in 30%–50% of human GBM (Ruano et al., 2006). Gefitinib, the EGFR kinase inhibitor, has been tried for the treatment of recurrent malignant glioma in selected cases (Mellinghoff et al., 2005). The effectiveness of this treatment is still in debate; however, multiple lines of evidence showed that this pathway is altered in several types of cancer including GBM.

Fig. 3
(a) An example of a module from the GBM data set. This module contains a highly correlated set of 43 BAC probes and 50 genes. (b) Testing for enrichment of cytobands and pathways results in a module with four BAC probes located in chr7p13 and three genes ...
Table 1
Analysis of pathways for the GBM data set. Enriched pathways from the C2 category in MSigDB are listed. Each pathway can be enriched in more than one module. Among the genes in a given pathway, those in modules are listed.

In another example, the Calcium/Calmodulin related pathway that includes CAMK2A, CAMK2B, CAMK2G, CAMKK2, and CALM3 was enriched in module 91. Calcineurin, a calmodulin binding protein, has been known as a brain tumor specific neuronal marker (Goto et al., 1986). The potential implication of calmodulin-dependent phosphodiesterase in GBM is reviewed in Das and Sharma (2005).

In the second dataset, Stransky et al. (2006) identified the chromosomal region in bladder cancer samples where CNAs are partly responsible for the changes in gene expression. They discovered that several genes in a selected amplified region were regulated under the epigenetic control of H3K9 trimethylation and DNA methylation. Such regions were identified as copy number-independent regions of correlations using their Transcriptome Correlation Map, in which correlations among the expression profiles of adjacent probes are computed and stretches of probes with high correlations are selected. Copy number-dependent regions where levels of gene expression can be explained by CNAs in the same region were also identified.

In this study, we could find at least two signaling pathways where gene expression levels of genes with GO terms are correlated with CNAs in the different regions (Supplementary Figure S4). GO:0043123 (Positive regulation of l-kappaB kinase/NF-kappaB cascade) was enriched in the three modules, where the BAC probes in chr 13q were significantly correlated with BCL10 (chr 1p22), TRAF3IP2 (chr 6q21), EDG2 (chr 9q31.3), TNFRSF1A (chr 12p13.2), LITAF (chr 16p13.13), TNFRSF10B (chr 8p21.3), and others. GO:0007249 (I-kappaB kinase/NF-kappaB cascade) was also enriched in a module. Interestingly, these two GO terms were closely related in terms of the number of genes shared by two, but the associated CNA loci were not same. It has been reported that NF-κB activates anti-apoptotic proteins and plays an important role in tumorigenesis and anticancer treatment (Dutta et al., 2006). A complete result for the pathways is described in Supplementary Table S1.

In Chin et al. (2006), the authors investigated the correlations between copy number, expression, and treatment responses in breast cancer. They found four regions of recurrent amplification associated with poor outcome and identified 66 genes cis-regulated by CNAs, with many genes known to be important for cancer progression. We applied our proposed method, and the result is summarized in Supplementary Table S2. The RECK pathway (Inhibition of Matrix Metalloproteinase) was significant among the MSigDB C2 gene sets. Four genes, RECK (chr 9p13.3), hRAS (chr 11p15.5), MMP2 (chr 17q12-21), and MMP14 (chr 14q11-12) were significantly correlated with chr11p15.4 and chr11p15.5. Down-regulation of RECK has been implicated in tumor angiogenesis and progression (Span et al., 2003), but its role in breast cancer has not been reported yet. Our result results that RECK regulated MMPs in breast cancer should be investigated further. Epidermis development (GO:0008544) was also significantly enriched. Six genes EMP1(chr 12p12.3), PPARD(chr 6p21.31), PLOD1(chr 1p36), LAMC2(chr 1q25.3), LAMB3 (chr 1q32.2), and BNC1(15q25.2) are were significantly correlated with chr 3q25.33. EMP1 was reported as a novel marker for of lobular breast carcinomas (Turashvili et al., 2007), and the loss of expression of LN5-encoding genes, LAMC2 and LAMB3, in breast cancer cell lines has also been observed (Sathyanarayana et al., 2003). Finally, we also found several modules that were highly enriched in immune responses, consistent with the role of immune system in developing and metastasis of breast cancer, as reviewed in de Visser et al. (2006).

Gene Ontology gene sets

When we applied the Gene Ontology gene sets, 33, 29 and 52 modules were significant in GBM, bladder, and breast cancer, respectively (Supplementary Figures S3, S4, and S5). In these modules, 65, 59, and 43 GO terms were enriched. The overlap among them are shown in Figure 4. Five significant GO terms were observed in common among three cancer types: M phase (GO:0000279), DNA replication (GO:0006260), locomotive behavior (GO:0007626), ATP synthesis coupled proton transport (GO:0015986), and Cell division (GO:0051301). Three of the GO terms (Cell division, M phase, and DNA replication) are all tightly related to cell cycle, cell division, and proliferation which are a signature of all types of cancer.

Fig. 4
Overlap among the enrichment Gene Ontology categories in the three cancer types. Among the five in the center are M phase (GO:0000279), DNA replication (GO:0006260), and cell division (GO:0051301). Cancer type-specific gene sets are described in the text. ...

Advantage over analysis of a single data type

To show the strength of combining two data types, we also carried out pathway enrichment analysis for each type separately with the GBM data (Nigro et al., 2005) as an example. For GE, we used gene set enrichment analysis (GSEA) (Subramanian et al., 2005) to find differentially enriched gene sets between the two classes, 24 short term survivals (STS) and 10 long term survivals (LTS), using the MSigDB C2 category of manually curated pathways, including those from Biocarta. We found that no gene sets are significant at FDR < .25. When we decreased the significance level to the nominal p-value < .01, 27 gene sets are enriched among genes up-regulated in STS and 1 gene set is enriched among genes up-regulated in LTS (data not shown). For aCGH, we first applied ISACGH (Conde et al., 2007) to identify the amplified and deleted regions in the chromosome by segmenting each sample. Then, we used FATIGO (Al-Shahrour et al., 2007) for enrichment test of Biocarta pathways between the genes in CNAs and the rest of the genes in the chromosome. When multiple-testing adjusted p-value was calculated for Fisher’s exact test, there were no functionally enriched regions.

While it is not possible to definitively conclude that the pathways identified in the joint analysis is more functionally relevant than those from separately analysis, we have found that not many pathways are significant in single-data set analysis and that the list of pathways are significantly different. Because the relationship between the two data sets are exploited in the joint analysis, it is more likely to result in a more biologically meaningful set of pathways. We also note that the joint analysis can be carried out even when the phenotypic data (patient survival times in this case) are not available.

Higher resolution aCGH platforms

Our results above are based on the aCGH data with BAC probes, but the method obviously can be applied to the platforms of higher resolution. To illustrate this, we analyzed copy number data obtained from Affymetrix 100K SNP arrays and expression data from Affymetrix U133 Plus 2.0 on 65 paired data sets of GBM patients (Kotliarov et al., 2006). Multiple probe sets were mapped into Refseq identifier and copy number estimates based on SNP probes were binned into 100kb regions along the chromosome. A binned region was selected as amplified if the averaged log-ratio (base 2) in more than 30% of samples are greater than 2, and deleted if the log-ratio in more than 30% of the samples are less than −1.5 or that of 10% of samples are less than −2.0. Figure S6 summarizes the enriched modules. Interestingly, one of the modules was significantly enriched for cardiac epidermal growth factor pathway and, among the genes in this pathway, ADAM12, EGFR, JUN, EDN1, and PLCG1 were highly correlated with the CNAs of ch7p11.2. This shows that, while the overlap is not as strong as one would like, the two data sets from different platforms, especially for copy number estimation, commonly identify the important feature that structural changes of chr7p is correlated with functional changes of epidermal growth factor related pathways.

4. DISCUSSION

As the arrays for comparative genomic hybridization have increased in resolution, it has become possible to relate DNA copy numbers with changes in expression. Integrating these two data sets effectively is a challenging task but is bound to result in new insights for the interplay between chromosomal instability and gene expression. A primary difficulty in this integrative analysis has been the lack of appropriate data sets. In a given cancer type, both expression data sets and copy number data sets abound, but paired data sets with expression and copy number from the same patients are solely lacking. It is possible to carry out analysis even with unpaired data. For instance,Liu et al.(2006) identified physical clusters of genes with differential expression and then prioritized the clusters based on whether a CNA is present at the same location in a different set of GBM patients. However, the analysis becomes much more powerful when both types of data are derived from the same patient because the relationship can be inferred not just on averaged quantities but in each sample. Fortunately, there has been an increased recognition for such a design recently. In the ambitious Cancer Genome Atlas project from the National Institutes of Health (http://cancergenome.nih.gov), multiple data types including gene expression, copy number, microRNA, SNPs, and DNA methylation are being generated on the same set of patients from three tumor types.

In the present work, we conducted a systematic study of how a copy number change at each location may be correlated with expression at every other location. Whereas previous studies have focused on their interaction at the same locus, we have extended this to long-range interactions. It is perhaps not surprising that there are highly correlated pairs on different loci, but the fraction of the high correlation at the same loci was extremely small (less than 1% among the most significant pairs).

One of the difficulties in an integrative study such as this is the mapping of the probes between different array platforms. Optimal probe design for expression is concentration of probes near the 3′ ends of known transcripts, whereas that for aCGH is more uniform spacing of the probes, often with higher density near oncogenes. Because of this difference in design, mapping between the two platforms involves averaging across probes in one platform to match a corresponding probe in another platform. This can result in loss of information from the higher resolution platform.

Because every pair of loci is examined, a large number of correlations is computed. Given the noise in the data, the ordering of all such pairs is not stable and interpreting each pair on the list is impractical. For a more robust analysis and clearer interpretation, we have used correlation analysis followed by biclustering to identify modules. Interpreting the modules is also not simple, however, as many of the modules have significant overlaps. Most biclustering methods have the advantage of allowing a row or a column to belong multiple clusters, but a disadvantage is that many similar clusters can appear. Moreover, when pathway analysis is performed, each module can give multiple significant pathways. We have dealt with this problem by focusing on the pathways that appear multiple times among the clusters and examining the cluster in which the pathway has the most significant score. Our analysis of the pathways appears effective, with each data set giving tumor type-specific pathways as well as all of them giving the common cell-cycle/proliferation signatures.

Acknowledgments

This work was funded by the National Institutes of Health through R01 GM082798 and U54 LM008748 to PJP.

References

  • Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D, Dopazo J. FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res. 2007;35:W91–96. [PMC free article] [PubMed]
  • Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, Kuo WL, Gray JW, Pinkel D. Quantitative mapping of amplicon structure by array CGH identifies CYP24 as a candidate oncogene. Nat Genet. 2000;25:144–146. [PubMed]
  • Bild AH, Yao G, Chang JT, Wang Q, Potti A, Chasse D, Joshi MB, Harpole D, Lancaster JM, Berchuck A, Olson JAJ, Marks JR, Dressman HK, West M, Nevins JR. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature. 2006;439:353–357. [PubMed]
  • Chaudhary J, Schmidt M. The impact of genomic alterations on the transcriptome: a prostate cancer cell line case study. Chromosome Res. 2006;14:567–586. [PubMed]
  • Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93–103. [PubMed]
  • Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve RM, Qian Z, Ryder T, Chen F, Feiler H, Tokuyasu T, Kingsley C, Dairkee S, Meng Z, Chew K, Pinkel D, Jain A, Ljung BM, Esserman L, Albertson DG, Waldman FM, Gray JW. Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006;10:529–541. [PubMed]
  • Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J. ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling. Nucleic Acids Res. 2007;35:W81–85. [PMC free article] [PubMed]
  • Das SB, Sharma RK. Potential role of calmodulin-dependent phosphodiesterase in human brain tumor (review) Oncol Rep. 2005;14:1059–1063. [PubMed]
  • de Visser KE, Eichten A, Coussens LM. Paradoxical roles of the immune system during cancer development. Nat Rev Cancer. 2006;6:24–37. [PubMed]
  • Dudley AM, Janse DM, Tanay A, Shamir R, Church GM. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol Syst Biol. 2005;1:2005.0001. [PMC free article] [PubMed]
  • Dutta J, Fan Y, Gupta N, Fan G, Gelinas C. Current insights into the regulation of programmed cell death by NF-kB. Oncogene. 2006;25:6800–6816. [PubMed]
  • Goto S, Matsukado Y, Mihara Y, Inoue N, Miyamoto E. Calcineurin as a neuronal marker of human brain tumors. Brain Res. 1986;371:237–243. [PubMed]
  • Hubbell E, Liu WM, Mei R. Robust estimators for expression analysis. Bioinformatics. 2002;18:1585–1592. [PubMed]
  • Hyman E, Kauraniemi P, Hautaniemi S, Wolf M, Mousses S, Rozenblum E, Ringner M, Sauter G, Monni O, Elkahloun A, Kallioniemi OP, Kallioniemi A. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res. 2002;62:6240–6245. [PubMed]
  • Jarvinen AK, Autio R, Haapa-Paananen S, Wolf M, Saarela M, Grenman R, Leivo I, Kallioniemi O, Makitie AA, Monni O. Identification of target genes in laryngeal squamous cell carcinoma by high-resolution copy number and gene expression microarray analyses. Oncogene. 2006;25:6997–7008. [PubMed]
  • Jong K, Marchiori E, van der Vaart A, Chin SF, Carvalho B, Tijssen M, Eijk PP, van den Ijssel P, Grabsch H, Quirke P, Oudejans JJ, Meijer GA, Caldas C, Ylstra B. Cross-platform array comparative genomic hybridization meta-analysis separates hematopoietic and mesenchymal from epithelial tumors. Oncogene. 2007;26:1499–1506. [PubMed]
  • Kapoor GS, Zhan Y, Johnson GR, O’Rourke DM. Distinct domains in the SHP-2 phosphatase differentially regulate epidermal growth factor receptor/NF-kappaB activation through Gab1 in glioblastoma cells. Mol Cell Biol. 2004;24:823–836. [PMC free article] [PubMed]
  • Knobbe CB, Reifenberger J, Reifenberger G. Mutation analysis of the Ras pathway genes NRAS, HRAS, KRAS and BRAF in glioblastomas. Acta Neuropathol (Berl) 2004;108:467–470. [PubMed]
  • Kotliarov Y, Steed M, Christopher N, Walling J, Su Q, Center A, Heiss J, Rosenblum M, Mikkelsen T, Zenklusen J, Fine H. High-resolution global genomic survey of 178 gliomas reveals novel regions of copy number alteration and allelic imbalances. Cancer Res. 2006;66:9428–9436. [PubMed]
  • Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics. 2005;21:3763–3770. [PMC free article] [PubMed]
  • Lam-Himlin D, Espey MG, Perry G, Smith MA, Castellani RJ. Malignant glioma progression and nitric oxide. Neurochem Int. 2006;49:764–768. [PubMed]
  • Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human Cancers. Nature. 1998;396:643–649. [PubMed]
  • Lipinski CA, Tran NL, Menashi E, Rohl C, Kloss J, Bay RC, Berens ME, Loftus JC. The tyrosine kinase pyk2 promotes migration and invasion of glioma cells. Neoplasia. 2005;7:435–445. [PMC free article] [PubMed]
  • Liu F, Park PJ, Lai W, Maher E, Chakravarti A, Durso L, Jiang X, Yu Y, Brosius A, Thomas M, Chin L, Brennan C, DePinho RA, Kohane I, Carroll RS, Black PM, Johnson MD. A genome-wide screen reveals functional gene clusters in the cancer genome and identifies EphA2 as a mitogen in glioblastoma. Cancer Res. 2006;66:10815–10823. [PubMed]
  • Mellinghoff IK, Wang MY, Vivanco I, Haas-Kogan DA, Zhu S, Dia EQ, Lu KV, Yoshimoto K, Huang JHY, Chute DJ, Riggs BL, Horvath S, Liau LM, Cavenee WK, Rao PN, Beroukhim R, Peck TC, Lee JC, Sellers WR, Stokoe D, Prados M, Cloughesy TF, Sawyers CL, Mischel PS. Molecular determinants of the response of glioblastomas to EGFR kinase inhibitors. N Engl J Med. 2005;353:2012–2024. [PubMed]
  • Myllykangas S, Bohling T, Knuutila S. Specificity, selection and significance of gene amplifications in cancer. Semin Cancer Biol. 2007;17:42–55. [PubMed]
  • Nigro JM, Misra A, Zhang L, Smirnov I, Colman H, Griffin C, Ozburn N, Chen M, Pan E, Koul D, Yung WKA, Feuerstein BG, Aldape KD. Integrated array-comparative genomic hybridization and expression array profiles identify clinically relevant molecular subtypes of glioblastoma. Cancer Res. 2005;65:1678–1686. [PubMed]
  • Panner A, James CD, Berger MS, Pieper RO. mTOR controls FLIPS translation and TRAIL sensitivity in glioblastoma multiforme cells. Mol Cell Biol. 2005;25:8809–8823. [PMC free article] [PubMed]
  • Perry C, Sklan EH, Soreq H. CREB regulates AChE-R-induced proliferation of human glioblastoma cells. Neoplasia. 2004;6:279–286. [PMC free article] [PubMed]
  • Phillips HS, Kharbanda S, Chen R, Forrest WF, Soriano RH, Wu TD, Misra A, Nigro JM, Colman H, Soroceanu L, Williams PM, Modrusan Z, Feuerstein BG, Aldape K. Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. Cancer Cell. 2006;9:157–173. [PubMed]
  • Pinkel D, Albertson DG. Array comparative genomic hybridization and its applications in cancer. Nat Genet. 2005;37(Suppl):11–17. [PubMed]
  • Pole JCM, Courtay-Cahen C, Garcia MJ, Blood KA, Cooke SL, Alsop AE, Tse DML, Caldas C, Edwards PAW. High-resolution analysis of chromosome rearrangements on 8p in breast, colon and pancreatic cancer reveals a complex pattern of loss, gain and translocation. Oncogene. 2006;25:5693–5706. [PubMed]
  • Pollack JR, Sorlie T, Perou CM, Rees CA, Jeffrey SS, Lonning PE, Tibshirani R, Botstein D, Borresen-Dale AL, Brown PO. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A. 2002;99:12963–12968. [PMC free article] [PubMed]
  • Prelic A, Bleuler S, Zimmermann P, Wille A, Buhlmann P, Gruissem W, Hennig L, Thiele L, Zitzler E. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics. 2006;22:1122–1129. [PubMed]
  • Puli S, Lai JCK, Edgley KL, Daniels CK, Bhushan A. Signaling pathways mediating manganese-induced toxicity in human glioblastoma cells (u87) Neurochem Res. 2006;31:1211–1218. [PubMed]
  • Ruano Y, Mollejo M, Ribalta T, Fiano C, Camacho FI, Gomez E, de Lope AR, Hernandez-Moneo JL, Martinez P, Melendez B. Identification of novel candidate target genes in amplicons of Glioblastoma multiforme tumors detected by expression and CGH microarray profiling. Mol Cancer. 2006;5:39. [PMC free article] [PubMed]
  • Sathyanarayana UG, Padar A, Huang CX, Suzuki M, Shigematsu H, Bekele BN, Gazdar AF. Aberrant promoter methylation and silencing of laminin-5-encoding genes in breast carcinoma. Clin Cancer Res. 2003;9:6389–6394. [PubMed]
  • Schultze K, Bock B, Eckert A, Oevermann L, Ramacher D, Wiestler O, Roth W. Troglitazone sensitizes tumor cells to TRAIL-induced apoptosis via down-regulation of FLIP and Survivin. Apoptosis. 2006;11:1503–1512. [PubMed]
  • Soroceanu L, Kharbanda S, Chen R, Soriano RH, Aldape K, Misra A, Zha J, Forrest WF, Nigro JM, Modrusan Z, Feuerstein BG, Phillips HS. Identification of IGF2 signaling through phosphoinositide-3-kinase regulatory subunit 3 as a growth-promoting axis in glioblastoma. Proc Natl Acad Sci U S A. 2007;104:3466–3471. [PMC free article] [PubMed]
  • Span PN, Sweep CGJ, Manders P, Beex LVAM, Leppert D, Lindberg RLP. Matrix metalloproteinase inhibitor reversion-inducing cysteine-rich protein with Kazal motifs: a prognostic marker for good clinical outcome in human breast carcinoma. Cancer. 2003;97:2710–2715. [PubMed]
  • Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci U S A. 2003;100:9440–9445. [PMC free article] [PubMed]
  • Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007;315:848–853. [PMC free article] [PubMed]
  • Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SGD, Segraves R, de Rycke Y, Elvin P, Cassidy A, Spraggon C, Graham A, Southgate J, Asselain B, Allory Y, Abbou CC, Albertson DG, Thiery JP, Chopin DK, Pinkel D, Radvanyi F. Regional copy number-independent deregulation of transcription in cancer. Nat Genet. 2006;38:1386–1396. [PubMed]
  • Subramanian A, Tamayo P, Mootha V, Mukherjee S, Ebert B, Gillette M, Paulovich A, Pomeroy S, Golub T, Lander E, Mesirov J. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–15550. [PMC free article] [PubMed]
  • Sunahara M, Nakagawara A. Turcot syndrome. Nippon Rinsho. 2000;58:1484–1489. [PubMed]
  • Sweet-Cordero A, Tseng GC, You H, Douglass M, Huey B, Albertson D, Jacks T. Comparison of gene expression and DNA copy number changes in a murine model of lung cancer. Genes Chromosomes Cancer. 2006;45:338–348. [PubMed]
  • Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci U S A. 2004;101:2981–2986. [PMC free article] [PubMed]
  • Tonon G, Wong KK, Maulik G, Brennan C, Feng B, Zhang Y, Khatry DB, Protopopov A, You MJ, Aguirre AJ, Martin ES, Yang Z, Ji H, Chin L, Depinho RA. High-resolution genomic profiles of human lung cancer. Proc Natl Acad Sci U S A. 2005;102:9625–9630. [PMC free article] [PubMed]
  • Turashvili G, Bouchal J, Baumforth K, Wei W, Dziechciarkova M, Ehrmann J, Klein J, Fridman E, Skarda J, Srovnal J, Hajduch M, Murray P, Kolar Z. Novel markers for differentiation of lobular and ductal invasive breast carcinomas by laser microdissection and microarray analysis. BMC Cancer. 2007;7:55. [PMC free article] [PubMed]
  • Vlodavsky E, Soustiel JF. Immunohistochemical expression of peripheral benzodiazepine receptors in human astrocytomas and its correlation with grade of malignancy, proliferation, apoptosis and survival. J Neurooncol. 2007;81:1–7. [PubMed]
  • Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med. 2004;10:789–799. [PubMed]
  • Wolf M, Mousses S, Hautaniemi S, Karhu R, Huusko P, Allinen M, Elkahloun A, Monni O, Chen Y, Kallioniemi A, Kallioniemi OP. High-resolution analysis of gene copy number alterations in human prostate cancer using CGH on cDNA microarrays: impact of copy number on gene expression. Neoplasia. 2004;6:240–247. [PMC free article] [PubMed]
  • Wu M, Huang C, Gan K, Huang H, Chen Q, Ouyang J, Tang Y, Li X, Yang Y, Zhou H, Zhou Y, Zeng Z, Xiao L, Li D, Tang K, Shen S, Li G. LRRC4, a putative tumor suppressor gene, requires a functional leucine-rich repeat cassette domain to inhibit proliferation of glioma cells in vitro by modulating the extracellular signal-regulated kinase/protein kinase B/nuclear factor-kappaB pathway. Mol Biol Cell. 2006;17:3534–3542. [PMC free article] [PubMed]
  • Yao J, Weremowicz S, Feng B, Gentleman RC, Marks JR, Gelman R, Brennan C, Polyak K. Combined cDNA array comparative genomic hybridization and serial analysis of gene expression analysis of breast tumor progression. Cancer Res. 2006;66:4065–4078. [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...