 |
Submitting high-throughput sequence data to GEO
|
Introduction
|
|
GEO accepts various categories of sequence data generated by next-generation sequencing methodologies (e.g., by Illumina/Solexa, 454 Life Sciences and Applied Biosystems).
Data provision and standards:
The GEO database supports and encourages provision of all elements
of a study with a view to facilitating comprehensive interpretation of an experiment (see MIAME information).
We apply the same principles to provision of sequence data. GEO sequence submission procedures are designed to encourage provision of all the following elements:
thorough descriptions of the biological samples under investigation, and procedures to which they were subjected
thorough descriptions of the technical protocols used to generate and process the data
processed data files (e.g., filtered sequence reads, detection counts, ChIP-seq log ratio data)
original short read format sequence files
Administration:
All standard GEO administration and processing procedures apply to sequence submissions. These include:
Unique and stable GEO accession numbers are issued to experiments; these accessions can be cited in manuscripts
GEO accession numbers are typically approved within 2-5 business days after completion of submission
Data can be held private until publication
Reviewers can have password-controlled access to private records
Submitters can update their records at any time
More information on these aspects is provided in our FAQ.
|
Categories of sequence submissions
|
|
| We accept |
We do not accept |
mRNA expression profiling (including MPSS)
small RNA discovery and profiling
SAGE (see Web submission instructions)
ChIP-seq
All accepted data should have a quantitative component, e.g., a sequence abundance count. If you have questions about whether GEO can accept your data type, please do not hesitate to contact us at
geo@ncbi.nlm.nih.gov.
|
genome sequencing projects
metagenomics sequencing projects
For information on how to submit these types of data to NCBI, contact the Short Read Archive database
at sra@ncbi.nlm.nih.gov.
|
|
Deposit instructions
|
|
Sequence data should be submitted using a modified GEOarchive format which is composed of the following components:
| Metadata spreadsheet |
'Metadata' refers to descriptive information and protocols for the overall experiment and individual Samples.
Information is supplied by completing all fields of this metadata spreadsheet template.
|
| Processed data file(s) |
The processed data file should be a plain text, tab-delimited table.
It should contain filtered, unique sequence reads and detection counts, preferably processed as described in any accompanying manuscript.
The file name should be referenced as appropriate in the Metadata spreadsheet.
If you have sequence mapping or identity information, please include that in this table.
It is possible to include as many columns as necessary in the table to thoroughly describe your data.
Processed data may be supplied either as an individual file per sample, or a multi-sample matrix file (see example)
- this example is shown in a spreadsheet for clarity, please provide your data as plain text, tab-delimited table(s).
|
| Raw data files |
The raw data files should be the original short read format sequence files, for example:
| 454 |
.sff |
| Solexa |
_seq.txt _prb.txt _sig2.txt
or
_seq.txt _prb.txt _sig2.txt _qhg.txt
|
The names of these files should be referenced as appropriate in the Metadata spreadsheet.
These files will be linked to and made available through NCBI's Short Read Archive database. |
 |
The Metadata spreadsheet, Processed data file(s) and Raw data files should be zipped or tarred
together and transferred directly to GEO by selecting the 'GEOarchive' option on the
Direct Deposit page.
If you find that your file archive is too large to transfer using this option,
please contact us for details on where to FTP your data.
|
These submission procedures and requirements will be refined in coming months.
However, the accession numbers we assign to your data are stable and will not change.
If you have any suggestions or concerns regarding any of these issues, please
email us at geo@ncbi.nlm.nih.gov.
|
|
|
|
|

|
|
|
 |