![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||
Copyright © 2006 Yi et al; licensee BioMed Central Ltd. WholePathwayScope: a comprehensive pathway-based analysis tool for high-throughput data 1Advanced Biomedical Computing Center, National Cancer Institute-Frederick/SAIC-Frederick Inc., Frederick, MD 21702, USA 2McDermott Center for Human Growth and Development, University of Texas Southwestern Medical Center at Dallas, TX 75390-9046, USA 3Departments of Internal Medicine and Molecular Genetics, University of Texas Southwestern Medical Center at Dallas, TX 75390-9046, USA 4The Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, TX 75390-9046, USA Corresponding author.Ming Yi: myi/at/ncifcrf.gov; Jay D Horton: Jay.Horton/at/utsouthwestern.edu; Jonathan C Cohen: Jonathan.Cohen/at/utsouthwestern.edu; Helen H Hobbs: Helen.Hobbs/at/utsouthwestern.edu; Robert M Stephens: bobs/at/ncifcrf.gov Received May 9, 2005; Accepted January 19, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Analysis of High Throughput (HTP) Data such as microarray and proteomics data has provided a powerful methodology to study patterns of gene regulation at genome scale. A major unresolved problem in the post-genomic era is to assemble the large amounts of data generated into a meaningful biological context. We have developed a comprehensive software tool, WholePathwayScope (WPS), for deriving biological insights from analysis of HTP data. Result WPS extracts gene lists with shared biological themes through color cue templates. WPS statistically evaluates global functional category enrichment of gene lists and pathway-level pattern enrichment of data. WPS incorporates well-known biological pathways from KEGG (Kyoto Encyclopedia of Genes and Genomes) and Biocarta, GO (Gene Ontology) terms as well as user-defined pathways or relevant gene clusters or groups, and explores gene-term relationships within the derived gene-term association networks (GTANs). WPS simultaneously compares multiple datasets within biological contexts either as pathways or as association networks. WPS also integrates Genetic Association Database and Partial MedGene Database for disease-association information. We have used this program to analyze and compare microarray and proteomics datasets derived from a variety of biological systems. Application examples demonstrated the capacity of WPS to significantly facilitate the analysis of HTP data for integrative discovery. Conclusion This tool represents a pathway-based platform for discovery integration to maximize analysis power. The tool is freely available at http://www.abcc.ncifcrf.gov/wps/wps_index.php. Background In today's post-genomic era, the sequencing projects and the development of High Throughput (HTP) technologies such as microarray and proteomics provide great opportunities to uncover and explore the complexity of biological problems using systems biology. HTP technologies have provided a powerful approach to address a diverse array of biological questions by allowing analysis of the complete transcriptional and translational repertoire of cells or tissues. Pathologically identical tumors can be differentiated into clinically meaningful subgroups by microarray analysis [1-4], and new pathways perturbed in disease states have been identified using microarray analysis [5-7]. Expression arrays also reveal new participants in biological pathways [8-11], new gene targets for pharmaceutical agents, and new functions of genes [12]. Today, the use of DNA microarrays is increasingly widespread and affordable and great expectations have been placed on technological advances in proteomics. However, analyzing the enormous quantity of data generated from such HTP experiments remains a major challenge. A variety of software tools are available to extract and analyze HTP data that primarily focus on microarray data. Two major strategies used are: 1) unsupervised clustering, in which genes are clustered according to changes in expression pattern with no accommodation for biological context, and 2) supervised classification, in which genes are classified according to an underlying or pre-known biology. Numerous existing microarray analysis tools such as GeneCluster [13], TreeView [14], TM4 [15]; and GeneSpring [16] primarily use clustering algorithms, which require significant user effort to connect with biological information. Current HTP data analysis methods, which are primarily based on the computation of data values for each individual gene, such as clustering and classification (Hierarchical, Principle Component Analysis [17], and Significance Analysis of Microarray [18]), do provide great insights in many aspects of experimental analysis. However, a more comprehensive way to integrate and analyze HTP data in the context of biological pathways and networks has become the current need in both academics and industry. As the amount of HTP data has increased and more insightful analysis approaches have been identified, the exploration of the underlying gene regulatory and biochemical networks of pathways to analyze data derived from a variety of HTP technologies has become one of the major challenges in the fields of bioinformatics and computational biology. Many software tools capable of analyzing HTP data within the context of biological pathways have been developed [19-22]. Recently released commercial software packages including PathwayAssist™ [23], PathArt [24], Ingenuity Pathways Analysis tool [25], MetaCore [26] also compete in the field of pathway-based HTP analysis. These tools provide an assortment of interfaces for the visualization of gene networks, natural language processing (NLP) extracted, or hand-curated biological pathway/association network databases and accept gene-list based data input. Each of these tools has one or more unique features that distinguish it from others. Some open source or publicly accessible software, such as GenMAPP [27], Cytoscape [28], Pathway Processor [29] and ViMac [30], display microarray data within the context of pathways annotated in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways [19,22], and provide statistical assessment of the reliability of each differentially expressed gene [29]. However, one of the limitations of these tools is the inability to handle multiple datasets simultaneously in an intuitive way. There is a need for more flexibile and comprehensive HTP data analysis software tools in the public domain that are accessible to the academic community and can provide a suite of utilities to analyze HTP data in biological contexts, such as pathways. To facilitate the simultaneous analysis and comparison of multiple HTP experiments in the context of biological pathways and association networks, and allow pattern extraction of a selected gene list with biological themes, we developed a stand-alone, Windows-based software tool called WholePathwayScope, or WPS. This software program not only provides many unique ways to analyze and visualize HTP data, but also combines advantages of clustering methodology with a more intuitive pathway or association network-based analysis, and many other features that allow for more comprehensive data analysis. Implementation WPS provides a pathway-based platform for integrative data analysis WholePathwayScope or WPS is a software tool that displays HTP data in user-defined or stored gene groups or pathways. The program incorporates a suite of pre-defined biological pathways and allows for the construction of additional user-defined pathways or collections of genes. It also allows generation of biological association networks composed of gene-pathway/term relationships, which can be further manipulated and converted to subnetworks, gene-gene, or pathway/term-pathway/term networks. Results from multiple HTP experiments can be visualized simultaneously, both as summary data from multiple pathways (WSCP) and as detailed data for individual pathways (PSCP). Results can be displayed numerically and can be color-coded according to user-defined criteria to facilitate visual analysis. The program also offers statistical evaluation of global functional category (GO term, pathway etc.) enrichment in a user's gene list, or of user-defined pattern enrichment of choice genes that have been color-coded with HTP data directly. The program is written in Microsoft Visual Basic 6 and runs in the Microsoft Windows environment. It utilizes Microsoft Access Databases including the internal databases for gene annotations, pathways, gene ontology and disease association information as well as designated criteria (CRI) files for HTP data. Pathways and association networks are created and presented in windows or graphical user interfaces (GUIs), and stored and accessed either in individual files or dynamically within Microsoft Access Databases. Users control the program through a user interface involving GUIs provided from a series of panels, menus and windows. There is also an extensive context-sensitive help system. Internal database for gene, pathway, and disease annotation WPS includes an internal database for integrating gene annotation information from both mouse and human genomes. Annotation covered includes GenBank accession numbers (GenBank IDs), Unigene IDs, Locuslink IDs (Now Entrez Gene), Gene Symbols, Aliases [31], SwissProt IDs (Protein IDs) [32], and disease information from both Genetic Association Database [33] and a partial MedGene Database [34]. It also carries pathway/term information including KEGG [19,22,35], Biocarta [36], CGAP [37] and Gene Ontology information [38-40] for the purpose of gene-term association network generation and Fisher's exact test. Designed Files in WPS The overall program layout is described in Figure Figure1.1
The second file type is the WholeScope File (WSCP file), which is composed of a series of pathway tags. Each pathway tag in the WSCP file is linked to a PSCP file saved in a user's desktop or represents a term (either a pathway or GO term in the internal database), which is also colorable like gene tags. The PCSP file can be accessed if it is linked by a pathway tag or a new PSCP file(s) can be dynamically created if the pathway tag represents a database associated term from the WSCP file by clicking on the pathway tag. Global changes in expression levels of genes in each pathway or term can be indicated by setting criteria in CRI files (see below) to color code the pathway tags (see WSCP file example in Figure Figure33
The third file type is the criteria (CRI) file, which is used to enter the user-defined color criteria and HTP data (see Additional files 35–50 for examples of raw data files used in this manuscript). Each CRI file is a Microsoft Access file that contains a HTP (e.g. microarray) dataset, the mapped gene identifiers (BaseGenBankID) for each microarray element, and the user-defined color criteria for the PSCP and WSCP files (see Additional files 19–34 for examples of CRI files used in this manuscript). HTP datasets are converted to CRI files in the program from Excel files (Microsoft, Inc) containing the HTP data through a Data Conversion Window (Fig. (Fig.2).2
Data analysis using WPS Once a series of PSCP, WSCP and CRI files have been loaded into the program, the user has the option of proceeding along several analysis courses. Some of the features available for this analysis are described briefly below. The result section of this manuscript illustrates the program using real data examples to describe some scenarios to apply the program for data analysis. In addition, a set of tutorial movies and illustration image files may be obtained from our program demo web page [41] for many major features and general usage of the program (see additional files 15, 16, 17, 18 for some examples of demo movies). Pattern extraction of a list of genes using color cue templates for biological themes In CRI files, users can define criteria to color code specific categories of behaviors of genes in datasets (e.g. red color for no less than 2 fold change of genes, or green color for genes flagged as down-regulated genes etc.). This kind of criteria definition can be used to extract a gene list of genes matching a specific pattern of such criteria across one or more CRI dataset files. Pattern extraction can work in two ways: global pattern extraction across datasets, or local or pathway PSCP file scoped pattern extraction. The extracted gene list can be immediately copied and pasted to other utility windows for further analysis (see Fig. Fig.6,6
Generation and manipulation of Gene-Term Association Network (GTAN) to explore gene-pathway or gene-term relations Using an input or filtered gene list, such as a list of genes derived from clustering analysis from other programs or pattern extraction in WPS, the associated pathways or GO terms can be identified from the internal database or user-defined PSCP files. These results are listed into the result table in a gene-term pairwise format (Fig. (Fig.8).8
Fisher's exact test for biological significance of gene lists and pathway-level pattern enrichment of high-throughput data The Fisher's exact test is performed based on 2 × 2 contingency tables (whether a gene is in the given list or not vs whether this gene is associated with a pathway/term or not; see Additional file 7 for illustration of an example of 2 × 2 contingency table). Similar to EASE [44], Fisher's exact test p-values are computed for each term in a chosen system, which are then ranked from smaller to larger values, to estimate the statistical significance and enrichment of global functional categories (GO terms, pathways etc.) within a given system for a list of genes of a user's interests or that match a pattern. The biological themes of the gene list can be rapidly retrieved from GO system and Biocarta and KEGG pathway collections as top ranked terms or pathways based on the Fisher's exact test p-values (see Figure Figure77 In contrast to the global functional statistical estimation of a gene list, within a PSCP file being analyzed and colored by CRI file(s), the statistical significance and enrichment of genes with certain user-defined criteria, can be also estimated by Fisher's exact test for the corresponding CRI file(s) (see Additional file 9 for a screenshot illustration of this feature). Other utilities: information search and dataset manipulation Within the information search window, one can type in a keyword (e.g. Gene Name, GenBankID etc.) to search for its relevant information, including annotation information, as well as associated disease information from the internal database. Two dataset manipulation utilities are available to conveniently manipulate the size of data files so that one or more subsets of a dataset, or sum of multiple datasets can be used for further analysis: 1. Sorting a dataset into pathway/term scoped "sub-datasets" based on PSCP files, or pathways/terms in the internal database (see Additional file 10 for screenshot); 2. Merge data files. Results Some of the features of WPS are illustrated using experimental data in the following section. Although only microarray data is utilized, any source of HTP data would be equally suited to the analysis. Comparison of multiple datasets within multiple pathways WholePathwayScope displays HTP data within a biological context. Figure Figure11 By way of example, the WPS program was used to compare gene expression profiles between wild-type mice and two strains of genetically-modified mice that either express high levels of ABCG5 and ABCG8 (G5G8 Tg mice) or no ABCG5 or ABCG8 (G5G8 -/- mice) [45,46] also see Additional file 11 for a description of material and data preparation). ABCG5 and ABCG8 encode ABC half transporters that heterodimerize to limit the intestinal absorption of dietary sterols and to promote the secretion of sterols from the liver into bile [47]. To compare gene expression patterns in the G5G8 Tg and G5G8 -/- mice and assess the reproducibility of the microarray results, microarray datasets from five expression array experiments (two from G5G8 Tg mice and three from G5G8 -/- mice) were analyzed simultaneously in a WSCP file, which includes a subset of the biochemical pathways and gene families involved in lipid metabolism (Fig. (Fig.3).3
When the WSCP file window is displayed in the program, it is interactive. Each pathway tag in the WSCP file links to a PSCP file or a term in the internal database. For example, clicking on one of its divided boxes of the pathway tag "Cholesterol Synthesis" (Fig. (Fig.3)3 To assess the ontogeny of expression of genes in the cholesterol biosynthetic pathway, we analyzed microarray datasets from livers of wild-type mice or embryos sacrificed at different time points during development (Fig. (Fig.5).5 Pattern extraction and statistical evaluation of gene lists for biological themes One strategy to analyze microarray or other HTP data is to look for genes with certain user-defined expression patterns across one or more datasets with some biological implications and themes. The expression level-switch phenomenon at birth and weaning across the time course experiment within the cholesterol synthesis pathway described earlier (Fig. (Fig.5),5 Gene-Term Association Network (GTAN) for gene-specific functional subnetwork domains or function-oriented gene clusters To further study the underlying relationships between genes and involved/enriched pathways or GO terms, we used WPS to search involved pathways from Biocarta, KEGG and GO/Biological Processes terms for the extracted gene list from Fig. Fig.6,6
To investigate the biological themes from the resulting GTAN, we used a variety of manipulation methods available in a utility window to filter and simplify the network within WPS (Fig. (Fig.10).10 Shared term extraction and disease gene annotation We then filtered out genes with minimal association of terms and looked for shared terms within the network (Fig. (Fig.12).12 Interestingly, when we used disease-association highlight feature in WPS (see Additional file 6 for screenshot), we found that Rab4a , Ldlr , and srebf1 are all more or less associated with obesity annotated in Genetic Association Database [33] and MedGene databases [34] (data not shown). Gene-gene networks and neighbor identification To study how genes are related to each other directly within the network, we merged the gene-term association network in Figure Figure1212
In order to learn more relationships among these "distinct genes", we highlighted them in red with WPS (see Additional file 14, Fig. Fig.13)13 Discussion WPS has unique features not found in a single similar application The new software program, WPS, is described that facilitates and enhances the analysis of HTP data. Unique features of WPS include the ability to simultaneously display HTP data from multiple experiments within the context of known biological pathways, visualizing and analyzing gene-pathway/term and gene-gene relationships and biological implications within created gene-term association networks, extracting a gene list that may reflect certain biological themes by means of a user-defined pattern template with color cues, and statistically estimating the enrichment of biological pathways or GO terms within a distinguished list or a PSCP file under analysis (see Table 2 for comparison with several free and commercial pathway analysis tools).
WPS also interfaces easily with clustering programs by accepting the gene lists from clustering analysis. This will aid in the identification of interactions among biological pathways and relating the expression profiles of genes of unknown function to those of established pathways. The program accepts data from any microarray platform (including oligonucletide arrays and cDNA arrays), and accommodates data generated by SAGE (serial analysis of gene expression) [51] as well as proteomics data. In summary, WPS was developed to provide the following important features which were not previously available under a single application (Table 2): Analyze multiple datasets simultaneously First, many of the current programs display data from just a single HTP experiment. This limitation hampers direct visual comparison of results from different HTP experiments. When the number of datasets is large, as in a typical time course experiment, it becomes much harder for investigators to remember what is happening at each time point. So far, to our knowledge, our unique way in WPS of displaying multiple datasets simultaneously is absent from most, if not all, of the free and commercial pathway-based HTP data analysis tools. Even a very large number of datasets, if pre-processed and combined for same or similar categories, can still be effectively displayed and visualized in the program. In our example, shown in Figure Figure5,5 Analyze multiple pathways simultaneously and generation of GTAN to explore gene-term relations in an intuitive graphical manner The second feature that distinguishes WPS from current microarray programs is the ability to display multiple pathways simultaneously, either in their entirety or in summary form. A given collection of pathway files can be grouped into a single WSCP file by means of pathway tags, and changes in the behavior of genes in each pathway can be flagged according to criteria specified by the user. Thus, the pathway(s) that are significantly affected by the experimental conditions are easily identified without having to visualize each pathway individually. Furthermore, the generation and manipulation of biological gene-term association network (GTAN) greatly expands the capacity to study the gene-term and gene-gene relationships in a genome-wide fashion and provides a new way to look at genes and their involved pathways or functional GO terms. WPS has the statistical capacity, specifically using the Fisher's exact test method, to identify over-represented biological themes (pathways/processes/GO terms) in a given list of genes. More importantly, the filtering of the GTANs based on the Fisher's exact test result would give rise to a subnetwork enriched in genes and terms/pathways with statistical significance. This would help to narrow down the "core" genes and their associated terms/pathways with biological relevance of higher priority. A solid example of network filtering with the help of Fisher's exact test result is described in Fig. Fig.11.11 Currently, prediction and creation of genome-wide pathways [55], as well as utilization, and exploration of biological networks (genetic, regulation, and biochemical) as a method for data analysis is becoming a major trend in systems biology and computational biology [56,57]. There are many new tools and algorithms being developed to move in this direction [58-61]. Many use complicated algorithms such as Bayesian network, Petri nets, probabilistic graphical model or newly defined rules to simulate regulatory networks and dynamic trajectories of genes. In addition, most of the commercial tools use gene-gene relationships as their major components of the networks, such that the users may easily lose track of the role of genes. The GTAN approach in WPS is not only simple to use, but also unique and effective in that gene-term association relationships are the major components of the network, so that users can easily keep track of genes and their involved functional terms or pathways and predict gene-gene relationships through their shared terms. Color cue template-based pattern extraction of gene lists for biological themes A third unique feature of WPS is pattern extraction, which is different from other pattern or profile-based approaches (e.g. typical clustering and classification methods) and some statistics-based methods (e.g. SAM [18]). Instead of relying solely on data values, the pattern extraction method in WPS takes advantage of user-defined color criteria in CRI files representing HTP datasets. Since users define criteria in CRI files with logical expression and not just based on the data values of genes in datasets, genes with quite different data values can be defined as the same class of data in terms of their behaviors. For example, a gene with fold change of 2 may be defined into the same data class category as another gene with fold change of 8, if the user defines "no less than 2 fold" as up-regulated genes. If the user defined this criterion with red color, then red color would represent a class category of genes, which are up-regulated, with fold change no less than 2 fold no matter whether it is 2 fold or 8 fold change, as long as the user is confident with the definition based on his own experience. In fact, one can enhance this definition by adding other quality control factors such as p-value in the definition logical expression, another advantage of CRI color criteria. Thus, this kind of definition eliminates the mathematical difference but maintains the embedded biological meanings in the data values, since biological processes are more qualitative than quantitative in most cases. Limitations and future direction WPS facilitates comprehensive analysis and visualization of HTP data within the context of known biological pathways and gene-term association networks. The program will continue to be improved, as characterization of biological pathways and networks becomes increasingly comprehensive and challenging. The ultimate goal of WPS is to integrate all the available information and databases as well as an individual user's data with different forms and formats in the contexts of biological pathways and networks. The current version of WPS is a windows-based program and serves as proof of concept of pathway/gene analysis of HTP data. The future version of the program will move to a three-tier architecture in a production-scale platform to allow WPS, through a middle layer, such as Java Servlets, to communicate with the server's resource, which may have excellent data storage as well as computation capacity. Its front end interface will also evolve into a platform-independent client such as a web-browser, depending on resource performance and other factors. Integration with other data sources and additional pathways are also to be added, so that the magnitude of HTP data analysis can be largely extended with the power of an expandable server. WPS provides Fisher's exact test method for statistical evaluation in both global system and local current PSCP files either derived from internal database or user-customized under analysis. It could be improved by the addition of more sophisticated statistical utilities such as false discovery rate (FDR) estimation [62] or other statistical enhancements (e.g. bootstrap method) to analyze HTP data and determine the significance of functional enrichment in individual pathways or GO terms in a more solid way. Computational requirements limit the full integration of these statistical methods, but even without them, our software contributes significantly to improve integrative data analysis. Conclusion We have described WPS, as a new pathway-based analysis tool, that facilitates and enhances the analysis of HTP data in the context of biological pathways and networks. WPS has many unique features not found in a single existing application. WPS has implemented a clustering analysis-like approach but using a more biologically relevant approach in the color cue-templated pattern extraction method. In addition, WPS uses Fisher's exact test to evaluate statistical significance of identified genes. Finally, WPS incorporates pathway and association network-based biological contexts as a platform, and unique coloring scheme with multiple datasets and multiple pathways as an intuitive way to visualize and analyze data of different resources. This is likely to be important for comparison of HTP data from diverse sources such as microarray and proteomics. Within WPS, the new way of pattern extraction may provide another dimension for uncovering genes with more quality-based, not just quantity-based, expression patterns likely with implications and themes more closely related to ongoing biological processes. Within WPS, the new way of visualizing and analyzing the biological relations among genes, pathways, and terms under GTANs provides a new platform for integrated discovery. This tool represents a pathway-based platform for discovery integration to maximize analysis power. Availability and requirements Project name: Pathway analysis tool WPS for high-throughput data; Project home page: http://www.abcc.ncifcrf.gov/wps/wps_index.php[63] Operating system: Microsoft Window 2000 or XP Programming language: Microsoft Visual Basic 6 Other requirements: Internal databases for different species and a collection of over 1900 PathwayScopeFiles (PSCP files for mouse) available on web site; Additional user-provided PSCP files and those from other sources will be made available as they are collected. License: Free to academics; distributed through license agreement Any restrictions to use by non-academics: commercial license needed List of abbreviations used WPS – WholePathwayScope PSCP – PathwayScope File WSCP – WholeScope File CRI – Criteria File GTAN – Gene-Term Association Network HTP – High Throughput G5G8 – ABCG5 and ABCG8 ABC – ATP-binding cassette Tg – Transgenic KO – Knockout KEGG – Kyoto Encyclopedia of Genes and Genomes CGAP – The Cancer Genomes Anatomy Project SVG – Scalable Vector Graphics technology GO – Gene Ontology SAGE – Serial analysis of gene expression Authors' contributions M Y – Programmer Analyst III and Scientific Application Specialist from ABCC, responsible for design and implementation of WPS, a former member of UTSW. J D H – Assistant Professor of UTSW, a collaborator who provided microarray data in the application examples. J C C – Associate Professor of UTSW, a scientific partner of H H H, responsible for initial design of WPS in UTSW. H H H – Director of McDermott Center and Investigator of HHMI, responsible for initial funding and design of WPS in UTSW. R M S – Senior author, Supervisor of M Y in ABCC, responsible for design and improvement of WPS and funding for WPS in ABCC. Additional File 1 A Microsoft PowerPoint file including a few slides of screenshots to describe the features for displaying a Biocarta pathway graph and highlighting selected genes to display their data in the graph. Slide1: The window for creating a PSCP file for a Biocarta pathway from the internal database. Slide 2: The PSCP file including all the genes in the created Biocarta pathway "FXR and LXR Regulation of Cholesterol Metabolism". Slide 3: Color the created PSCP file with loaded CRI files (the time-course data used in Fig. Fig.5)5 Click here for file(521K, ppt) Additional File 2 A Microsoft PowerPoint file including a slide of a screenshot for a microarray raw dataset in a worksheet of an Excel file to graphically illustrate the format and 3 requirements of a data file to be converted into a CRI file in WPS. Click here for file(122K, ppt) Additional File 3 A Microsoft Word file including a detailed description of the three types of files in WPS. Click here for file(35K, doc) Additional File 4 A Microsoft PowerPoint file including a few slides of screenshots to describe the feature for pattern extraction of genes from a colored PSCP file. Slide1: A colored PSCP file (previously has been loaded with CRI files) subjected to pattern extraction. Slide 2: The pattern extraction window for extraction of genes from the colored PSCP file in slide 1 that match with the defined color pattern in the color template panel. Slide 3: The created PSCP file including the extracted genes in slide 2 to verify the pattern of extracted genes colored with same set of CRI files. Click here for file(285K, ppt) Additional File 5 A Microsoft Word file including description of the feature for manipulation and filtering of GTANs. Click here for file(25K, doc) Additional File 6 A Microsoft PowerPoint file including a slide for screenshot of the window for searching network for specific genes or terms or for disease-associated genes. The selected disease from database is used to search and highlight the associated genes in current GTAN/PSCP file for further analysis. Click here for file(130K, ppt) Additional File 7 A Microsoft PowerPoint file including a slide for illustration of a 2 × 2 contingency table used as basis for Fisher's exact test. Click here for file(17K, ppt) Additional File 8 A Microsoft Excel file including an example result of Fisher's exact test exported from WPS in Figure Figure77 Click here for file(23K, xls) Additional File 9 A Microsoft PowerPoint file including a few slides of screenshots to describe the feature for pathway or PSCP-scoped "local Fisher's exact test" of user-defined pattern enrichment of choice genes colored with CRI file(s) in a PSCP file being analyzed. Slide1: A colored PSCP file (previously has been loaded with CRI files) subjected to "local Fisher's exact test". Slide 2: The "local Fisher's exact test" window for measuring statistically the enrichment of genes with user-defined criteria, in this example, the enrichment degree of differentiated expressed genes (red and green colors in the color template panel) for each dataset within this pathway. Click here for file(264K, ppt) Additional File 10 A Microsoft PowerPoint file including a slide for screenshot of the window from WPS for sorting a dataset to pathway/term scoped sub-datasets for further processing. Click here for file(148K, ppt) Additional File 11 A Microsoft Word file describing the materials and methods for preparation of microarray data used for describing the program features. Click here for file(25K, doc) Additional File 12 A Microsoft Excel file including the complete pair-wise gene-term relations in the GTAN in Fig. Fig.99 Click here for file(33K, xls) Additional File 13 A graphical tif file to illustrate a filtered GTAN from the GTAN of Fig. Fig.99 Click here for file(1.2M, tiff) Additional File 14 A graphical tif file to illustrate a GTAN derived from the GTAN of Fig. Fig.1212 Click here for file(749K, tiff) Additional File 15 A Shockwave Flash file to show a movie clip as a program demo for how to convert a dataset file to a CRI file to be used in WPS. (Note: the movie files can be viewed directly using internet browser with Flash Animation plug-in) Click here for file(2.5M, swf) Additional File 16 A Shockwave Flash file to show a movie clip as a program demo for how to load CRI files(s) to color a PSCP or a WSCP file. Click here for file(2.7M, swf) Additional File 17 A Shockwave Flash file to show a movie clip as a program demo for how to create a PSCP file or WSCP file from the internal database. Click here for file(1.1M, swf) Additional File 18 A Shockwave Flash file to show a movie clip as a program demo for how to do pattern extraction from selected CRI file(s), how to do the global Fisher's exact test for a given list (e.g. a gene list from pattern extraction), and how to create a GTAN from a given list. Click here for file(2.7M, swf) Additional Files 19–34 zip files that include each of 16 microarray CRI files used in the application examples in the manuscript (Day-5.zip, Day-3.zip, Day1.zip, Day5.zip, Day10.zip, Day14.zip, Day18.zip, Day21.zip, Day30.zip, Day60.zip, Day90.zip, G5G8KO1.zip, G5G8KO2.zip, G5G8KO3.zip, G5G8Tg1.zip, G5G8Tg2.zip) (unzip them using WinZip program or other appropriate programs before use). Click here for file(2.7M, zip) Additional Files 35–50 zip files that include each of 16 microarray raw files (Excel files) used in the application examples in the manuscript (rDay-5.zip, rDay-3.zip, rDay1.zip, rDay5.zip, rDay10.zip, rDay14.zip, rDay18.zip, rDay21.zip, rDay30.zip, rDay60.zip, rDay90.zip, rG5G8KO1.zip, rG5G8KO2.zip, rG5G8KO3.zip, rG5G8Tg1.zip, rG5G8Tg2.zip (unzip them using WinZip program or other appropriate programs before use) Click here for file(2.3M, zip) Acknowledgements We thank Linda Giang, Jigui Shan, Tammy Qiu, and Gary Smythers for technical assistance. We also thank Yanhui Hu and Joshua Labaer from Harvard Medical School for kindly providing partial MedGene database, Richard A. Lempicki and Wei Gao from NIAID, NIH for technical assistance and valuable discussion. We sincerely thank Carl Schaefer from National Cancer Institute Center for Bioinformatics (NCICB) for providing CGAP biocarta pathway data and information as well as other technical assistance. We especially thank Robert Guzman, Norma Anderson and Esther Nie from UT Southwestern Med Ctr. for excellent technical assistance. We also thank David W. Russell, Alexander Pertsemlidis and Jeff Schageman from UT Southwestern Med Ctr. for their helpful discussions. This work has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institute of Health, under Contract No. NO1-CO-12400. Initial funding came from The Howard Hughes Medical Institute and the National Institute of Health (ROI HL72304 and NHLBI Program for Genomic Applications UOI HL66880). References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||
Science. 1999 Oct 15; 286(5439):531-7.
[Science. 1999]Nature. 2002 Jan 31; 415(6871):530-6.
[Nature. 2002]Nat Med. 1998 Nov; 4(11):1293-301.
[Nat Med. 1998]Nat Genet. 2000 Mar; 24(3):236-44.
[Nat Genet. 2000]Science. 2001 Feb 2; 291(5505):881-4.
[Science. 2001]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2907-12.
[Proc Natl Acad Sci U S A. 1999]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Bioinformatics. 2001 Sep; 17(9):763-74.
[Bioinformatics. 2001]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2002 Jan 1; 30(1):42-6.
[Nucleic Acids Res. 2002]Nat Genet. 2002 May; 31(1):19-20.
[Nat Genet. 2002]Genome Res. 2003 Nov; 13(11):2498-504.
[Genome Res. 2003]Genome Res. 2002 Jul; 12(7):1121-6.
[Genome Res. 2002]Bioinformatics. 2002 Jun; 18(6):813-8.
[Bioinformatics. 2002]Nat Genet. 2004 May; 36(5):431-2.
[Nat Genet. 2004]Nucleic Acids Res. 2002 Jan 1; 30(1):42-6.
[Nucleic Acids Res. 2002]Genome Biol. 2003; 4(10):R70.
[Genome Biol. 2003]J Clin Invest. 2002 Sep; 110(5):671-80.
[J Clin Invest. 2002]Proc Natl Acad Sci U S A. 2002 Dec 10; 99(25):16237-42.
[Proc Natl Acad Sci U S A. 2002]Science. 2000 Dec 1; 290(5497):1771-5.
[Science. 2000]J Clin Invest. 2002 Sep; 110(5):671-80.
[J Clin Invest. 2002]Proc Natl Acad Sci U S A. 2002 Dec 10; 99(25):16237-42.
[Proc Natl Acad Sci U S A. 2002]Development. 2000 Jul; 127(13):2933-43.
[Development. 2000]Nat Genet. 2004 May; 36(5):431-2.
[Nat Genet. 2004]Proc Natl Acad Sci U S A. 2001 Feb 13; 98(4):1637-42.
[Proc Natl Acad Sci U S A. 2001]J Neurochem. 2003 Apr; 85(2):318-28.
[J Neurochem. 2003]Development. 2000 Jul; 127(13):2933-43.
[Development. 2000]Science. 1995 Oct 20; 270(5235):484-7.
[Science. 1995]Genome Biol. 2003; 4(10):R70.
[Genome Biol. 2003]Genome Biol. 2005; 6(1):R2.
[Genome Biol. 2005]Bioinformatics. 2005 Apr 1; 21(7):1189-93.
[Bioinformatics. 2005]Bioinformatics. 2003 Feb 12; 19(3):336-44.
[Bioinformatics. 2003]Nat Genet. 2003 Jun; 34(2):166-76.
[Nat Genet. 2003]Proc Natl Acad Sci U S A. 2001 Apr 24; 98(9):5116-21.
[Proc Natl Acad Sci U S A. 2001]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9440-5.
[Proc Natl Acad Sci U S A. 2003]