![]() |
| Major Histocompat-ibility Complex database (dbMHC) RefSeq Release 1 Ready for Download GenBank Release 137 New Microbial Genomes in GenBank Sequence Revision History Page Offers New Comparison Function BLAST Lab Masthead |
The Gene Expression Omnibus (GEO) database, the first public repository for gene expression data, premiered at NCBI in July 2000. The GEO database contains a wide assortment of high-throughput experimental data, including single and dual channel microarray-based experiments measuring the abundance of mRNA, genomic DNA and protein molecules. Data from non-array-based high-throughput functional genomics and proteomics technologies are also archived, including serial analysis of gene expression (SAGE) and protein identification technology. To date, the GEO database contains data representing almost 10,000 hybridization experiments and SAGE libraries from 30 different organisms. Several new tools and features have been developed to enable effective exploration, visualization and analysis of the data in GEO. To create these tools, GEO data are first assembled into comparable sets, or GEO DataSets (GDS). A GDS represents a collection of biologically and statistically comparable GEO samples. Two new databases have been created to query these datasets - Entrez GEO and Entrez GEO DataSets. Entrez GDS queries dataset definitions and original experimental annotation to facilitate identification of experiments of interest. Entrez GEO displays individual gene expression/molecular abundance profiles from each dataset. Searching and browsing GEO Several methods are available for searching, browsing and retrieving data from GEO. Specific GEO records may be retrieved by entering a valid GEO accession number into the Accession Display toolbar on the GEO Home Page. A listing of the current holdings in GEO is accessible from the 'Repository Browser' link on the GEO home page. The 'DataSet Browser' link displays the full collection of GDS's, which can be sorted alphabetically by title, platform type, GEO platform (GPL) identifier, organism, and GDS accession. Sophisticated searches of GEO data and linking to other Entrez databases can be accomplished using Entrez GEO and Entrez GDS. The Quick Query Builder available on the GEO home page facilitates popular Entrez GEO/Entrez GDS query construction. To search for an experiment of interest, submit a query under the 'Datasets' tab, or from the GEO DataSet database in Entrez. This initiates a search of all dataset annotation including the GDS description, reference series and sample descriptions, titles, keywords, source material, contributor, authors and organisms, as well as some general technical information including experiment type, probe type and value measurement type. The results will list all datasets that fit the user-defined search criteria. To search for individual gene expression/molecular abundance profiles of interest, submit a query under the Gene Profiles tab, or from the GEO database in Entrez. A particular gene or molecule of interest may be searched for by gene name, symbol or alias, or using sequence identifiers such as GenBank accession numbers, clone IDs or ORF names. Several parameters are available to refine an Entrez GEO search and help identify interesting or significant molecular abundance profiles. GEO datasets are partitioned into subsets which reflect experimental design. Queries can be made for differences related to a specific experimental variable such as age, developmental stage, disease state, and others. The relative abundance of a particular molecule and the degree of measurement variability are also search parameters in GEO. Results are returned as a set of pre-computed molecular abundance profiles such as that shown in Figure 1 for human synaptopodin 2. Results are returned in order of most-interesting-first, based on a scoring scheme which considers statistically significant differences, expression level, outliers, and variability. Click on image to view larger Figure 1: Expression profile for human synaptopodin 2 from GDS265 showing decreased expression in tissues taken from muscle biopsies of Duchenne muscular dystrophy patients relative to controls. Entrez GEO may also be queried for sequences of interest based on nucleotide sequence similarity, thus facilitating the identification of sequence homologs of interest, e.g., related gene family members or for cross-species comparisons across all GEO datasets. The Sequence BLAST search function on the GEO Home Page accepts either a FASTA sequence, GI number or accession number as input and performs a BLAST search against all the sequences represented on microarray platforms or SAGE libraries in GEO. Within Entrez GEO results, following the Profile Neighbors link from selected expression profiles will display those probes within the same dataset that show an expression profile that is similar to the one selected. When the Sequence Neighbors link is selected, the results will be those sequences that are similar or identical to the query probe over all GEO datasets. Entrez GEO and Entrez GDS retrieval results are fully integrated with each other as well as other Entrez databases including Nucleotide, UniGene, MapViewer and PubMed. All original GEO records as well as GDS data are available for download at:
Questions regarding the submission of data to GEO may be sent to: General inquiries about GEO may be sent to the geo alias or to the NCBI Help Desk: |
|||||
|
|