Handout    NAR 2006 Paper     NAR 2002 Paper     Email GEO  
   NCBI > GEO > Info

   

Frequently Asked Questions

  • Submission
  • Query and search


  • Submission FAQ

    What is GEO?
    The Gene Expression Omnibus (GEO) is a public repository that archives and freely distributes microarray and other forms of high-throughput data submitted by the scientific community. In addition to data storage, a collection of web-based interfaces and applications are available to help users query and download the experiments and gene expression patterns stored in GEO. For more information, see publications.

    Why should I submit my data to GEO?
    There are several good reasons for submitting your data to us. The most likely reason is that the journal in which you are publishing your research requires deposit of microarray data to a MIAME-compliant public repository like GEO. We endeavor to make data deposit procedures as straightforward as possible and will provide as much assistance as you require to get your data submitted to GEO. If you have problems or questions about the submission procedures, just e-mail us at geo@ncbi.nlm.nih.gov and one of our curators will quickly get back to you. In addition to satisfying possible journal requirements for publication, there are other significant benefits to depositing data with GEO. Your data receive long term archiving at a centralized repository, and are integrated with other NCBI resources which afford greatly increased usability and visibility. You may also include links back to your own project websites within your submission, again increasing visibility of your research. Journal publication is not a requirement for data submission to GEO.

    How do I submit my data to GEO?
    To submit data, you first need to establish your identity with us by setting up your own GEO account, with a private username and password. The contact information you supply will be displayed on your GEO records.
    As explained on the Submitting data page there are several deposit formats you can use to submit your data to GEO. These include filling in simple Web forms, spreadsheets, plain text, and XML formats. Regardless of the submission method you choose, the final GEO records will look the same and contain equivalent information. If you have any problems with the submission process, please do not hesitate to e-mail us at geo@ncbi.nlm.nih.gov and we will be happy to provide assistance.

    When do I submit my data to GEO?
    Many journals require accession numbers for microarray data before acceptance of a paper for publication. Also, reviewers and editors may need access to your data during the review process. Thus, ideally, data should be deposited in GEO before a manuscript describing the data is sent to a journal for review. Your records may remain private until your data are published. Once your submissions have been approved, you can cite the GEO accession number(s) in your manuscript and you can generate an access link by which editors and reviewers can access your private submissions.

    Does GEO support MIAME?
    Yes. GEO encourages submitters to supply MIAME compliant data. To assist submitters in providing data that comply with MIAME, GEO submission procedures are designed to closely follow the MIAME checklist. If you provide all requested information, your submission will be MIAME-compliant. Processing delays may occur if your submission lacks critical MIAME components. Submitters and reviewers are encouraged to refer to the MIAME checklist and to use it as a guide to determine what information should be included when describing a microarray experiment. Note that MIAME compliance is determined by the content provided, not by the submission format or route. If you have any comments or concerns regarding this issue, please e-mail us at geo@ncbi.nlm.nih.gov.

    What kinds of data will GEO accept?
    GEO was designed around the common features of most of the high-throughput and parallel molecular abundance-measuring technologies in use today. These include data generated from microarray technology as well as other forms of high-throughput 'omic technologies, for example:

    The GEO database has a flexible and open design that is responsive to developing trends. If you have questions about whether GEO can accept your data type, please do not hesitate to contact us (e-mail geo@ncbi.nlm.nih.gov).

    Does GEO store raw data?
    Yes. All submitters are now required to provide raw data with their submissions. Raw data facilitates the unambiguous interpretation of the data and potential verification of the conclusions as described in the MIAME guidelines. Raw data may be supplied either within the Sample data tables or as external files, e.g., Affymetrix CEL or GenePix GPR scan files. Supplementary data files for public records are made available from the GEO FTP site.

    When will my data receive GEO accession numbers?
    Processing time normally takes approximately 2-5 business days after completion of submission. If you need approval of your GEO accession numbers to be expedited, please e-mail us at geo@ncbi.nlm.nih.gov. If format or content problems are identified, a curator will contact you by e-mail explaining how to address the issue. Please address the issues raised by curators; failure to do so may result in processing delays or removal of the records. Once your records pass review, the curator will send you an e-mail confirming your GEO accession numbers and their release dates. If you do not receive an e-mail from us within 5 days of your submission, please check your spam or junk e-mail folders because some systems recognize GEO e-mail correspondence as spam. Do not quote GEO accession numbers in manuscripts until you have received an approval e-mail notice from a GEO curator.

    How are submitters authenticated?
    In their first submission to GEO, submitters are asked to create a GEO account, with a confidential username and password. This account can be used to submit additional data in the future without re-entering contact information, as well as to authenticate the submitter when updating or editing an existing GEO accession number. We will send all e-mail correspondence, approvals, and reminders to the contact e-mail addresses provided in the GEO account – please be sure to inform us if your contact e-mail address changes (please see How can I make edits to my contact information?).

    Can I keep my data private while my manuscript is being prepared or under review?
    Yes. GEO records may remain private until a manuscript describing the data is published (journal publication is not a requirement for data submission to GEO). During the submission process you are prompted to specify a release date for your records. Although the maximum allowable limit is one year, this date may be brought forward or pushed back at any time (please see How can I make corrections to data that I already submitted?) - or you can e-mail us at any time to request a change of release date. This feature allows a submitter to deposit data and receive a GEO accession number to quote in a manuscript before the data become public. We will send you an e-mail reminder 10 days before the scheduled release date, inviting you to postpone the release date as necessary. It is important to inform us as soon as your manuscript is published so that we can release your records and link them with PubMed. Submitters also have the opportunity to create a private access link that allows collaborators or reviewers confidential, read-only access to private data before manuscript publication.

    Can I keep my data private after my manuscript is published?
    No. If GEO accession numbers are quoted in a publication, the records must be released so that the data are accessible to the scientific community. If GEO accession numbers are found to be quoted in a publication before the scheduled release date, GEO staff are obliged to release those records, even if a second manuscript describing the same data is pending.

    How can I allow reviewers access to my private records?
    After your records have been approved, you can create an access link to your private submissions using the 'Click here to create a reviewer access link' near the top of your Series (GSExxx) record. The link that is generated can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.

    How can I make corrections to data that I already submitted?
    You may perform updates and edits at any time to any of your submissions by logging in to your account and using the 'UPDATE' button on the Web deposit/update page or the 'UPDATE' button at the top of each of your GEO records. If you have a lot of edits to make, you may prefer to perform a batch update in SOFT format. Alternatively, please feel free to e-mail batch edit details to us at geo@ncbi.nlm.nih.gov and we will process them for you. Updates will be reflected immediately on your GEO records.

    How can I delete my records?
    Only GEO staff can remove records from the database; it is necessary to e-mail us at geo@ncbi.nlm.nih.gov to request deletion of specific accession numbers. Please keep in mind that updating records is preferable to deleting records (see "How can I make corrections to data that I already submitted?" section above). Note that a 'validation-only' tool is available on the Direct Deposit page - this feature allows you to validate and test your SOFT or MINiML files without actually submitting (and then deleting) the records. If the accessions in question have been published in a manuscript, we cannot delete the records. Rather, a comment will be added to the record indicating the reason the submitter requested withdrawal of the data, and the record content adjusted/deleted accordingly.

    How can I make edits to my contact information?
    After logging in with your username and password, follow the View your account link on the home page where you will find an 'Edit' button. Edits to contact information will be applied immediately to all existing records submitted under that account. If you need the contact information to remain unedited on existing records, but different contact details to appear on new records, it is necessary to open a separate account and submit new data under that account.

    Can I submit an extracted or summary subset of data?
    Unfortunately no. Sample records should be supplied as complete hybridization tables and Platform records should contain meaningful, trackable sequence identifier information. The principal reason we maintain this archive and the rationale behind many journals’ requirement for data deposit into GEO is so that the community can access and comprehensively re-examine data that form the basis of scientific reporting. Therefore, we do not accept partial datasets. We do understand the various reasons and difficulties some researchers have with sharing data. However, the demand from users and journal editors together with our need to maintain a useful and transparent database has led to our policy of only accepting complete datasets. If you have any questions or concerns regarding this issue, please e-mail us at geo@ncbi.nlm.nih.gov.




    Query and search FAQ

    Who can use GEO data?
    Anybody can access and download public GEO data. There are no login requirements. For more information, please read our data disclaimer.

    What kinds of retrievals are possible in GEO?
    There are several ways to retrieve GEO data. One way is by entering a valid GEO accession number in the Accession Display bar. Another is to browse the list of current GEO repository contents. All data are available for download from the GEO FTP site. Both simple and sophisticated searches of GEO data and linking to other Entrez databases can be accomplished using Entrez GEO Profiles and Entrez GEO DataSets. As with other NCBI Entrez databases (e.g., PubMed) a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling effective query and mining. Entrez GEO Profiles retrieves individual gene expression profiles, and Entrez GEO DataSets retrieves complete experiments. Please see the overview or recent publications for more information.

    Can I get notified when new data is available?
    Yes. This can be accomplished using a 'My NCBI' account; register here. Then construct a search for data relevant to your interests in Entrez GEO DataSets. For example, if you are only interested in experiments performed on Platform GPL96 search with GPL96[GEO Accession]; to see any apoptosis experiments search with apoptosis; or if you want to see all new experiments search with all[filter]. Next to the search box, you should see a 'Save Search' option. You will be presented with the option to receive e-mail alerts when new data matching your search criteria have been added to the database. This database is updated on a weekly basis.

    How can I query and analyze GEO data?
    Several features are provided to assist with the exploration, visualization, and analysis of GEO data. These include individual gene expression profile charts, DataSet hierarchical and K-means/median clusters, DataSet value distribution charts, a 'Query mean group A vs B' tool, and profile and sequence neighbor searches. Alternatively, full text, tab-delimited value data tables provided with DataSet downloads (available on the DataSet record, or via FTP) may prove suitable for upload into your favorite microarray analysis software package. Please see the overview or recent publications for more information.

    What is the difference between a Series and a DataSet?
    A GEO Series (GSExxx) is an original submitter-supplied record that summarizes an experiment. These data are reassembled by GEO staff into GEO Datasets (GDSxxx). A DataSet represents a collection of biologically- and statistically-comparable Samples processed using the same Platform. Information reflecting experimental variables is provided through DataSet subsets. Both Series and DataSets are searchable using the Entrez GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have a corresponding DataSet record(s).

    Why can't I find gene profile charts or DataSet clusters for my experiment of interest?
    As explained in the 'What is the difference between a Series and a DataSet?' FAQ above, suitable submitter-supplied GEO records are reassembled by GEO staff into comparable DataSets. At periodic intervals, these DataSets are then indexed and loaded into Entrez GEO Profiles and Entrez GEO DataSets, which allows users to query gene names, visualize charts and clusters, and more. If your GEO records of interest have not yet been assembled into a DataSet, these features will not be available. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have corresponding DataSet(s). However, the entire experiment is still available for download from our FTP site and from links at the bottom of Series records.

    Why can't I find supplementary/raw data for my experiment of interest?
    Supplementary data are made available for download from GEO's public FTP site and from links at the bottom of Series records and Entrez GEO DataSets retrievals. All GEO submitters have been asked to provide supplementary data (for example, Affymetrix .cel files) to accompany their GEO records. If supplementary data links are not provided for your experiment of interest, we suggest that you contact the submitter directly to encourage that they supply supplementary data files to GEO so that we may make them available to the scientific community.

    What do the red bars and blue squares represent in GEO profile charts?
    In GEO Profile charts, the red bars represent values extracted from original GEO Sample records as supplied by submitters. For single channel data, values are assumed to be submitted as normalized signal count data, reflecting the relative measure of abundance of each transcript. For Affymetrix data, the "detection call" (A=absent, P=present, M=marginal) data are taken into consideration, if supplied (absent calls faded out). For dual channel experiments values are normalized log ratios, and SAGE values reflect "tags per million" counts.
    The blue squares represent the percentile ranked value of a spot compared to all other spots within that Sample. That is, all values within each Sample are rank ordered and placed into rank percentile 'bins'. This gives an indication of the relative expression level of that gene compared to all other genes on the array.
    Value profiles are plotted on a scale that fits each individual gene, whereas rank data are always plotted on a scale of 0-100%.

    Can GEO data be accessed programmatically?
    Yes. Users can take advantage of NCBI's Entrez programming utilities to access data stored in Entrez GEO DataSets and Entrez GEO Profiles... more information and examples. Additionally, BioConductor users may be interested in the GEOquery package which parses GEO SOFT files for integration with BioConductor 'R' analysis resources, see publication.

    What is GEO BLAST?
    The GEO BLAST tool queries Entrez GEO Profiles for molecular abundance profiles of interest based on nucleotide sequence similarity. The GEO BLAST database contains all GenBank sequences represented on microarray Platforms or SAGE libraries in GEO. This interface is helpful in identifying sequence homologs of interest, e.g., related gene family members or for cross-species comparisons.




    | NLM | NIH | GEO Help | NCBI Help | Disclaimer | Section 508 |
    NCBI Home NCBI Search NCBI SiteMap