NCBI » GEO » Info » GEOarchive submission instructionsLogin

GEOarchive submission instructions

WARNING: If you are submitting human data, it is your responsibility to comply with Human Subject Guidelines.

GEOarchive formatBack to top

GEOarchive is a flexible spreadsheet-based submission format useful for batch deposit of experiments. GEOarchive submissions can be created in any spreadsheet software, usually Microsoft Excel.

A GEOarchive submission consists of several parts as follows:

Metadata spreadsheet 'Metadata' refers to descriptive information and protocols for the overall experiment and individual Samples. This information is supplied by completing all fields of the appropriate metadata spreadsheet template which can be downloaded from the GEOarchive templates and examples section below.
Matrix table The matrix table is a spreadsheet containing the final, normalized values that are comparable across rows and Samples, and preferably processed as described in any accompanying manuscript. A complete data matrix should be supplied, not a summary subset. It is possible to include additional data columns in the table, for example, Affymetrix Detection calls and P-values, or background or flag columns. See the Affymetrix template for an example.
Raw data files In addition to the normalized data provided in the Matrix table, submitters are required to provide raw data, usually in the form of supplementary raw data files. This facilitates the unambiguous interpretation of the data and potential verification of the conclusions as described in the MIAME and MINSEQE standards.
Affymetrix submissions must include CEL files. Non-Affymetrix GEOarchive submissions should include the original software-generated scan quantification files, for example, GenePix GPR files. Next-generation sequence submissions must include files containing reads and quality scores.
Platform If your experiments are performed using a commercial array (e.g., Affymetrix GeneChip) or other array already deposited in GEO, please use the FIND PLATFORM tool to find the GEO accession number (GPLxxxx) for inclusion in the 'platform' column in the SAMPLES section of the metadata spreadsheet. If your array does not already exist in GEO, please include a PLATFORM section in your metadata spreadsheet and include Platform annotation columns in your matrix table.
The Platform data must include meaningful, trackable, sequence identifiers (e.g. GenBank/RefSeq accessions, locus tags, clone IDs, oligo sequences, chromosome locations, etc - see the Platform content guidelines for full list). References to in-house databases or top BLAST hits are not sufficient. Platform submission is not necessary for SAGE or next-generation sequence submissions.

Bundle all parts (Excel file containing the metadata spreadsheet and matrix spreadsheet, raw data files) together into a .zip, .rar, or .tar archive using a program like WinZip, and transfer to GEO using the 'Transfer files to GEO with web form' option on the Submit to GEO page. Incomplete submissions will result in processing delays.

GEOarchive templates and examplesBack to top

The first step in creating your GEOarchive submission is to download the appropriate template (Excel spreadsheet) from the list below. Each Excel file consists of several worksheets, including a metadata template, and examples of metadata and matrix tables. Click the tabs at the bottom of the worksheet window to switch between worksheets. Mouse over field names in the templates to view content guidelines.

MicroarrayBack to top

For the following microarray vendors, please download templates from the vendor-specific instructions pages:

For microarrays not from the vendors above, please use a 'Generic' template. For generic microarray submissions where the Platform is already deposited in GEO, please download the most appropriate template:

For generic microarray submissions where the Platform is not deposited in GEO, please download the most appropriate template:

To submit only a Platform, please download the following template (this option is appropriate only if you have no hybridization or sequence data to deposit):

High-throughput sequencingBack to top

For high-throughput sequence submissions, please refer to full instructions at:

Other data typesBack to top

For NanoString submissions, please use one of the 'Generic single channel' templates as appropriate:

For high-throughput RT-PCR submissions, please refer to full instructions at:

For traditional SAGE submissions, please refer to full instructions at:

Notes for Microsoft excel usersBack to top

The following notes draw attention to common Excel-related problems.

  • Please be aware that Excel may automatically apply irreversible formatting to your data. According to Microsoft support:
    - If a number contains a slash mark (/) or hyphen (-), it may be converted to a date format.
    - If a number contains a colon (:), or is followed by a space and the letter A or P, it may be converted to a time format.
    - If a number contains the letter E (in uppercase or lowercase letters; for example, 10e5), or the number contains more characters than can be displayed based on the column width and font, the number may be converted to scientific notation, or exponential, format.
    - If a number contains leading zeros, the leading zeros are dropped.
    Certain clone identifiers, gene names, and plate coordinates are particularly susceptible to these issues. To avoid the problem, make sure to first select the whole spreadsheet and Format -> Cells -> Number -> Text when pasting data into Excel (the default is "General"). For more information, see Zeeberg et al., 2004.
  • If you Format -> Cells -> Number -> Text as described above, very long data strings (e.g., sequence data) may be converted to hash (#) characters. If this occurs, it is necessary to switch these cells back to "General" format.
Last modified: June 13, 2018