Handout    NAR 2006 Paper     NAR 2002 Paper     FAQ     Email GEO  
   NCBI > GEO > Info

   

Entrez GEO Profiles and Entrez GEO DataSets query tutorial

Background

GEO stores a wide assortment of high-throughput experimental data, processed in a variety of ways. To enable statistical analyses, GEO data are assembled into comparable sets, or GEO DataSets (GDS). A GDS represents a curated collection of biologically and statistically comparable GEO samples. Samples within a dataset refer to the same platform, that is, they interrogate a common set of elements. Calculations are computed on the VALUE column provided by the submitter in GEO sample data tables. These value measurements are assumed to be calculated in an equivalent manner for each sample within a dataset, that is, considerations such as background processing and normalization are consistent across the dataset. GDSs form the basis of GEO's query, data display, and analysis tools.

GDS experiment descriptions and value measurements are stored in two NCBI Entrez databases:

Entrez GEO DataSets:
Entrez GEO DataSets contains curated GEO dataset definitions to facilitate identification of experiments of interest. Entrez GEO DataSets can be searched with any text found in either the curated DataSet or the original submitter-supplied GEO records that make up the DataSet.

Entrez GEO Profiles:
Entrez GEO Profiles stores individual, dataset-specific gene expression profiles. Entrez GEO Profiles may be used to query for specific genes of interest, profiles of interest based on flagged significant effects or similar expression profile patterns, and related profiles based on sequence similarity.

Query construction

Data of interest may be located by entering text in the Entrez GEO DataSets or Entrez GEO Profiles search boxes (or similarly using the "query DataSets" and "query Gene profiles" boxes on the GEO home page). As with other Entrez databases, searches may be refined using a Boolean phrase restricted to any number of supported attribute fields. The Preview/Limits link on Entrez GEO DataSets and Entrez GEO Profiles pages assist greatly in construction of complex queries. Alternatively, complex search statements can be written and executed directly in the search boxes. To perform such a search, specify the search terms, their fields, and the Boolean operations to perform on the term using the following syntax:

    term [field] OPERATOR term [field]

where term(s) are the search terms, the field(s) are the search fields and qualifiers, and the OPERATOR(s) are the Boolean operators (AND, OR, NOT). The indexes (available on the Preview/Limits page) may be used to browse and/or select the terms by which data are described.

Fields for Entrez GEO DataSets include:
[All Fields] [Author] [Experiment Publication Date] [Experiment Type] [Filter] [GDS Creation/Update Date] [GDS Text] [GEO Accession] [GEO Description/Title Text] [Gene Name or Description] [Number of Samples] [Number of Platform Probes] [Organism] [Platform Reporter Type] [Reporter Identifier] [Sample Source] [Sample Title] [Sample Value Type] [Reporter Identifier] [Sample Source] [Sample Value Type] [Submitter Institute] [Subset Description] [Subset Variable Type]

Fields for Entrez GEO Profiles include:
[All Fields] [Experiment Type] [Filter][Flag Information] [Flag Type] [GDS Text] [GEO Accession] [GEO Description/Title Text] [GI] [Gene Description] [ID_REF] [Max Value Rank] [Min Value Rank] [Number of Samples] [Organism] [Platform Reporter Type] [Ranked Standard Deviation] [Reporter Identifier] [Sample Source] [Sample Value Type]

The following query examples further demonstrate how to effectively mine GEO data.

Search for an experiment of interest

Search for datasets of interest using Entrez GEO DataSets. Entrez GEO DataSets retrievals display the dataset title, a brief experiment description, organism, experimental variables, and links to the complete GDS record, parent platform, and reference series records.

Example:
    To identify all dual channel nucleotide microarray experimental datasets exploring metastasis in humans, use the "query DataSets" (or Entrez GEO DataSets) box, and enter:

    "dual channel"[Experiment Type] AND metastasis AND human[Organism]

Search for a gene of interest

Search for a gene of interest using Entrez GEO Profiles. Entrez GEO retrievals display individual, precomputed, dataset-specific gene expression/molecular abundance profile charts. Click the chart thumbnails to view a breakdown of the dataset experimental design. Entrez GEO Profiles retrievals also display whatever gene identifier information is available (gene name, GenBank accession, clone ID, ORF), mapping information, the dataset title, and additional flags regarding outliers and detection calls. Retrievals are listed in order of most-interesting-first, based on a scoring scheme that considers flagged effects, expression level, outliers, and variability.
Profiles for a gene of interest may be located by entering all or part of the gene name, gene symbol, alias, GenBank accession number, clone ID, ORF name, or the reference identifier (ID_REF) from the parent platform.

Example:
    To view profiles of kallikrein family genes across all datasets, use the 'query Gene profiles' (or Entrez GEO Profiles) box, and enter:

    kallikrein

    To limit these kallikrein retrievals to datasets investigating progesterone, enter:

    kallikrein AND progesterone[GDS Text]

Search for interesting/significant/specific gene expression profiles

Several fields are available to refine an Entrez GEO Profiles search to help identify interesting or significant molecular abundance profiles.

GEO datasets are partitioned into subsets that reflect experimental design. Genes are flagged as having significant effects in relation to subset types if the values or ranks pass a threshold of statistical difference between any non-single subset and another. Thus, queries can be made for genes that show interesting expression profiles with regard to experimental subsets.

Example:
    To view profiles showing interesting value subset effects in either dataset GDS186 or GDS187, use the "query Gene profiles" (or Entrez GEO Profiles) box, and enter:

    (GDS186 OR GDS187) AND "value subset effect"[Flag Type]

The value measurements of each sample in a dataset are rank-ordered. It is possible to refine searches to view genes with profiles that fall within a specified abundance bracket.

Example:
    To view profiles that fall into the top 1% abundance rank bracket in at least one sample in dataset GDS186, use the "query Gene profiles" (or Entrez GEO Profiles) box, and enter:

    GDS186 AND 100[Max Value Rank]

    A range can also be specified. To view the top 5%, enter:

    GDS186 AND 96:100[Max Value Rank]

Variability for each gene is calculated using the standard deviation of rank across the dataset. It is possible to refine searches to view highly variable gene expression profiles across a dataset.

Example:
    To view profiles that fall into the top 1% variable molecular abundance profiles in dataset GDS186, use the "query Gene profiles" (or Entrez GEO Profiles) box, and enter:

    GDS186 AND 100[Ranked Standard Deviation]

Search for a sequence of interest

The GEO BLAST tool queries Entrez GEO Profiles for molecular abundance profiles of interest based on nucleotide sequence similarity. The GEO BLAST database contains all GenBank identifiers represented on microarray platforms or SAGE libraries in GEO. This interface is helpful in identifying sequence homologs of interest, e.g., related gene family members or for cross-species comparisons.



| NLM | NIH | GEO Help | NCBI Help | Disclaimer | Section 508 |
NCBI Home NCBI Search NCBI SiteMap