GEO Overview
General overview
GEO is an international public repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data submitted by the research community.
The three main goals of GEO are to:
- Provide a robust, versatile database in which to efficiently store high-throughput functional genomic data (see Data organization)
- Offer simple submission procedures and formats that support complete and well-annotated data deposits from the research community (see Submission guide)
- Provide user-friendly mechanisms that allow users to query, locate, review and download studies and gene expression profiles of interest (see Query and analysis)
Please see the GEO Documentation listings to find more information about various aspects of GEO.
Data organization
GEO records are organized as follows:
Selected primary records undergo an upper-level of rendering into DataSet and gene Profile records:
DataSet |
DataSet records are assembled by GEO curatorsAs explained above, A GEO Series record is an original submitter-supplied record that summarizes an experiment. These data are reassembled by GEO staff into GEO Dataset records (GDSxxx). A DataSet represents a curated collection of biologically and statistically comparable GEO Samples and forms the basis of GEO's suite of data display and analysis tools. Samples within a DataSet refer to the same Platform, that is, they share a common set of array elements. Value measurements for each Sample within a DataSet are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the DataSet. Information reflecting experimental factors is provided through DataSet subsets. Both Series and DataSets are searchable using the GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have corresponding DataSet record(s). For more information, see About GEO DataSets page. |
H | |
---|---|---|---|
Profile |
Profiles are derived from DataSetsA Profile consists of the expression measurements for an individual gene across all Samples in a DataSet. Profiles can be searched using the GEO Profiles interface. For more information, see About GEO Profiles page. |
I |
Query and Analysis
GEO data can be retrieved and analyzed in several ways:
- To look at a particular GEO record for which you have the accession number, use the GEO accession box located on the GEO homepage or at the top of each GEO record.
- To download data, see the various options described on the Download GEO data page.
-
To quickly locate data relevant to your interests, search
GEO DataSets and
GEO Profiles:
-
GEO DataSets is a study-level database which users can search for studies relevant to their interests. The database stores descriptions of all original submitter-supplied records, as well as curated DataSets. More information about GEO DataSets and how to interpret GEO DataSets results pages can be found on the About GEO DataSets page.
-
GEO Profiles is a gene-level database which users can search for gene expression profiles relevant to their interests. More information about GEO Profiles and how to interpret GEO Profiles results pages can be found on the About GEO Profiles page.
GEO DataSet and GEO Profiles searches may be effectively performed by simply entering appropriate keywords and phrases into the search box. However, given the large volumes of data stored in these databases, it is often useful to perform more refined queries in order to filter down to the most relevant data. Examples and full details about how to perform sophisticated queries are provided in the Querying GEO DataSets and GEO Profiles page. Additionally, the Advanced Search tool, linked at the head of the GEO DataSets and GEO Profiles pages, assists greatly in the construction of complex queries:
-
- Once you have identified a DataSet of interest there are several features on the DataSet record that help identify interesting gene expression profiles within that study, including a t-test tool and clusters. Full information about these features is provided on the About GEO DataSets page.
- Once you have identified gene expression profiles of interest there are several links on the Profile records that help identify additional genes of interest, including similarly expressed genes or genes within close proximity on the chromosome. Full information about these links is provided on the About GEO Profiles page.