NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

SRA Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Cover of SRA Handbook

SRA Handbook [Internet].

Show details

Submission Quick Start Guide

Created: ; Last Update: April 12, 2014.

Steps for SRA Submission

(Note: If you have already created BioSample(s) and BioProject(s) as a part of WGS, Genome or TSA submission, please use those for your SRA submission as well.)

1.

Create a BioProject for this research

2.

Create a BioSample submission for your biological sample(s)

3.

Gather Sequence Data Files

4.

Enter Metadata on SRA website

a.

Create SRA submission

b.

Create Experiment(s) and link to BioProject and BioSample

c.

Create Run(s)

5.

Transfer Data files to SRA

6.

Update Submission with PubMed links, Release Date, or Metadata Changes

(See Figure 1)

Figure 1 . Example of a finished SRA Submission viewed from the Interactive Tool.

Figure 1

Example of a finished SRA Submission viewed from the Interactive Tool. Each Experiment references only one Sample, but many Experiments can reference the same Sample. An Experiment will have 1 or more Runs.

BioProject

The SRA Study has merged with the NCBI BioProject resource. If no BioProject yet exists for this research, one can be created by following the link to ‘Submit New BioProject’ on the Experiment entry page. If submitting data for an existing BioProject, the accession for the project can be entered in the provided text field. Please note that BioProjects bear an accession like PRJNA#. Incomplete projects bearing a temporary submission ID like SUB# will need to be completed before linking to the SRA Experiment. For more context, see Describing an Experiment. More information on submitting to BioProject is available here.

BioSample

Submitters will create new Samples through the BioSample Submission Portal. Samples created through BioSample submission will be linked by a reference in the Experiment portion of a SRA submission. Registered BioSamples have accessions like SAMN#. As with BioProject, incomplete submissions that have only SUB# IDs must be completed prior to creating an SRA Experiment. If the submission contains more than 25 BioSamples, please contact SRA so that we can help you submit your data in batch.

Creating Samples

Each biological sample used in a study will be described by a BioSample record. A submission may contain many BioSamples. If samples were irreversibly pooled, a single BioSample record may describe the pooled components. Barcoded data files, on the other hand, should be demultiplexed prior to submission and a unique BioSample should be created for each barcoded sample; in other words, each BioSample must be linked to one or more unique data files. If more than 24 hours have passed since completion of a BioSample submission and the sample has not received a SAMN# accession, check the BioSample submission for errors. If none are found please contact SRA at vog.hin.mln.ibcn@ars.

(See Figure 2)

Figure 2 . Once logged in to BioSample, click the ‘New submission’ button to begin creating a BioSample record.

Figure 2

Once logged in to BioSample, click the ‘New submission’ button to begin creating a BioSample record.

Login to the Sequence Read Archive

From the SRA Homepage:

Click the Submit tab.

Then login. (PDA and myNCBI have merged. You may log in with either by clicking on ‘NCBI PDA’. If you do not have a PDA or myNCBI account already, one can be created. Alternatively, you may sign in with one of the 3rd party login options present after clicking on ‘NCBI PDA’. If you have used a PDA account in the past but no longer see your previous SRA submissions, please contact SRA at vog.hin.mln.ibcn@ars for assistance with your account view.)

(See Figure 3a)

Figure 3a . From the 'Submit' tab, click ‘NCBI PDA’ to login for Submission.

Figure 3a

From the 'Submit' tab, click ‘NCBI PDA’ to login for Submission.

(See Figure 3b)

Figure 3b . Alternative 3rd party log in options are available after clicking on ‘NCBI PDA’.

Figure 3b

Alternative 3rd party log in options are available after clicking on ‘NCBI PDA’.

Creating a New Submission

(See Figure 4)

Figure 4 . To start a Submission, click the ‘Create new submission’ button.

Figure 4

To start a Submission, click the ‘Create new submission’ button.

Submission Alias and Comment

Alias – An ID used by submitters to track the submission of a set of Experiments and Runs. This field should be something that is used internally to refer to the project and makes sense to the submitter. Once saved, the Submission Alias cannot be changed. Like all Alias fields in SRA, this is not an indexed field and will not be visible to the public during normal usage of the database.

Example: C. elegans resequencing project (this field is NOT indexed in Entrez).

Submission Comment – area for submitter to enter a comment about the submission.

Example: prepared with assistance by John Smith (this field is NOT indexed in Entrez).

(See Figure 5)

Figure 5 . The Submission is not created until the ‘Save’ button is clicked.

Figure 5

The Submission is not created until the ‘Save’ button is clicked.

Setting a Submission Release Date

A release date is required for all submissions. It is advisable to enter a release date before loading any data into a Submission. This will prevent accidental early release of data. Dates may be set for up to one year in the future in anticipation of a publication release date. They can be changed at any time by accessing your submission and changing the release date at the bottom of the page. This action does not require you to contact SRA.

(See Figure 6)

Figure 6

Figure 6

To save the date on which the submission is scheduled to be published/released to the public, enter a date in the box using a YYYY-MM-DD format, then click ‘Set release date’ The release date can be changed as long as the submission has (more...)

Status

(See Figure 7)

Figure 7

Figure 7

The Status Bar

The status bar provides a visual representation of the current state of the submission. Done (Dark Green) indicates the number of completed objects. Wait (Gray) further information or file uploads are needed. Processing (Light Blue- not shown) an object is currently being processed, if an object/file is processing for more than 48 hours, contact SRA at vog.hin.mln.ibcn@ars. Queue (Dark Blue) the object will be processing when the pipeline is available. Replaced (Bright Green) an object/file was replaced by another. Error (Red) intervention is required, please contact SRA.

Experiment

Creating Experiments

An Experiment describes a sequencing library and instrument. An Experiment references 1 BioProject and 1 BioSample. (See Figure 8)

Figure 8 . Click the ‘New Experiment’ button to begin creating an Experiment.

Figure 8

Click the ‘New Experiment’ button to begin creating an Experiment.

Describing an Experiment

Meta Information

Platform- This describes the sequencing platform used in the experiment.

Alias- An ID used as a reference for the user and archive. (Like all Alias fields in SRA, this is not an indexed field and will not be visible to the public during normal usage of the database.)

Title- A publicly viewable and formal title used to describe the Experiment.

BioProject Accession– Links this Experiment to a BioProject. If no BioProject yet exists for this research, one can be creating by following the link to ‘Submit New BioProject’ on the Experiment entry page. If the submitter is submitting data for an existing BioProject, the accession for the project can be entered in the provided text field. Please note that BioProjects bear an accession like PRJNA#. Projects with temporary accessions like SUB# will need to be completed before linking the SRA Experiment.

BioSample Accession- Links this Experiment to a BioSample. Like BioProject, links to SUB# are not accepted. Note that only 1 BioSample can be referenced in each Experiment. Thus, your submission will have at least 1 Experiment for each BioSample that you have registered for the project. Please also note that it is possible to reference the same BioSample in mutiple Experiments. (See Figure 1)

Design Description- Describes the setup, experimental design, and goals of this Experiment.

Library

Additional descriptions of library terms can be found in Table 1 or the Glossary.

Table 1 . List of available Experiment library descriptors.

Table 1

List of available Experiment library descriptors.

Name- Name of the Library that was sequenced

Strategy- Sequencing strategy used in the experiment

Source- Type of genetic source material sequenced

Selection- Method of selection or enrichment used in the Experiment

Layout- Configuration of the read layout. Paired, Fragment, etc.

Nominal Size (paired)- Size of the insert for Paired reads.

Nominal Standard Deviation (paired)- Standard deviation of insert size (typically ~10% of Nominal Size)

Processing

This section varies with the sequencer selected. Please pay close attention to the answers provided in this section, as they may affect proper loading of data.

Pipeline

This section describes the processing pipeline used to generate the data. The Program and Version should be entered for each step in the processing pipeline. The sequencer platform software and version is expected for each experiment. Users can add additional lines to describe additional processing steps in the pipeline using the ‘Add’ button.

Saving the Experiment

In order to the save all the metadata, please click the “Save” button at the bottom of the page.

(See Figure 9)

Figure 9 . Click the 'Save' button to store the Experiment information.

Figure 9

Click the 'Save' button to store the Experiment information. Saved Experiments can be updated as necessary.

For the public view of a completed Experiment, see Figure 10.

Figure 10 . The public view of a completed Experiment-red boxes and text denote the source of displayed information.

Figure 10

The public view of a completed Experiment-red boxes and text denote the source of displayed information.

Run

Creating Runs

Runs describe the files that belong to the previously created Experiments. They specify the data files for a specific sample to be processed by SRA. Experiments may contain many Runs depending on how many sequencer runs were involved in data acquisition.

(See Figure 11)

Figure 11 . Click the 'New Run' button to the right of the Experiment for which a Run is needed.

Figure 11

Click the 'New Run' button to the right of the Experiment for which a Run is needed. Each Experiment will have its own ‘New Run’ button.

Describing a Run

Alias- An ID used as a reference for the user and archive. (Like all Alias fields in SRA, this is not an indexed field and will not be visible to the public during normal usage of the database.)

Run data file type- The storage format (srf, sff, fastq, etc.) of the sequence data being submitted. The SRA cannot accept FASTA format alone (FASTA/qual file pairs may be processed as FASTQ). More information about the file types currently accepted by the SRA can be found in the File Format Guide.

File Name- Name of the file transferred to the SRA including any file extensions. The SRA does not accept files compressed as .zip or .rar; it is NOT necessary to compress files transmitted to NCBI but files compatible with either gzip or bzip2 can be processed. Data files contained in a .tar archive need to be individually enumerated in a run. Note that original file names are not maintained after data is loaded to SRA. Each SRA Run produces a single .sra archive file that is amalgamated from all files listed in the Run.

MD5 checksum- A checksum or hash sum generated for the file listed in ‘File Name’ that is used to detect errors introduced through storage or transfer. SRA uses the file name and md5 checksum to track and link files to their proper Runs.

Unix- md5sum <file>
OS X- md5 <file>

Windows- Application required. Fsum Frontend (Please use Base16 for md5sum calculations) and WinMD5Sum are two possible options.

(See Figure 12)

Figure 12 . Click the 'Save' button to store the Run information.

Figure 12

Click the 'Save' button to store the Run information. Runs can only be updated until data has been loaded for the Run. Once there is data in a Run, it will be locked from further updates. Contact SRA for changes to be made to locked Runs.

Submission Checklist

  • Does each biological sample have a BioSample record?
  • Do you have at least 1 data file for each sample?
  • Does each Experiment have at least 1 Run?
  • Are file names entered exactly as they will be uploaded, including file extension?
  • Is there enough information in titles and descriptions for other users to interpret the data? (Users cannot search based on “Alias” and will not see the “Alias” field during normal use)

Data Transfer

After the metadata is entered, data may be uploaded to the SRA.

Upload via FTP:

ftp://sra:password@ftp-private.ncbi.nih.gov/

(Windows Explorer may be used in Windows or an FTP client may be used in either Windows or OS X)

FileZilla is one of many free FTP clients that can be used by on PC or Mac.

Or from unix/linux/OS X command line

Address: ftp-private.ncbi.nlm.nih.gov

Login: sra

Password is provided in the browser once at least one Run is entered. If everything is correct files will be linked and loaded automatically.

Troubleshooting FTP

If you are having trouble with your FTP connection to NCBI, try

1.

Setting passive mode rather than active mode

2.

Ask your sysadmin to increase FTP buffer size to 32 MB

3.

Try another host, or another platform (Windows instead of Unix)

4.

Try another FTP client software:

Unix ncftp (http://www.NcFTP.com)

Windows filezilla (http://filezilla.sourceforge.net/)

If you still have trouble, please write us with the following details:

1.

time of transfer (GMT or local time)

2.

IP address of FTP client (the system you are transmitting from)

3.

version of operating system software (Unix - uname -a, or cat /proc/version)

4.

FTP account used

5.

specific error messages (connection closed, etc)

Establishing a Center Account with SRA

A center account only needs to be established if you are going to be submitting data all year around on a regular basis and you are prepared to develop a programmatic method of generating XML. The pipeline will need to be kept up-to date with our schema updates so that your XML continues to stay valid.

To create a new Center, please provide the following information:

1.

suggested center abbreviation (16 char max)

2.

center name (full)

3.

center URL

4.

center mailing address (including country and postcode)

5.

phone number (main phone for center or lab)

6.

contact person (someone likely to remain at the location for an extended time)

7.

contact email (ideally a service account monitored by several people)

Please click here to be taken to the Aspera Transfer guide, you will need to scroll down to the “Initiating an Account for Aspera Bulk Transfer for Centers Accounts” section.

Please write to vog.hin.mln.ibcn@ars for answers to submission questions.

PubReader format: click here to try

Views

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...