NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

SRA Handbook [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Cover of SRA Handbook

SRA Handbook [Internet].

Show details

SRA Submission Telemetry

Created: ; Last Update: March 12, 2012.

Overview

This document shows the various methods by which submissions can be tracked interactively and programmatically.

The submitter is responsible for making sure that all components of a submission have been delivered to NCBI. While NCBI works to operate a smooth submission process and does correspond with submitters about problems, the submitter is responsible for repairing errors and resubmitting replacement components when necessary. Various channels of submission telemetry equip submitters with information and tools for them to complete their submissions in a correct and timely way.

Related Documents

Submission Model

SRA submissions take place at two levels: metadata described interactively through web page forms or in bulk submission through xml, and content data encoded in recognizable file formats or content data in tar archive files of recognized files. Incoming data may be compressed or uncompressed. Metadata and content may arrive at the SRA at different times (asynchronous delivery). The purpose of submission telemetry is to empower the user to monitor the progress of their submissions at several points in the submission workflow.

NCBI operates a dual SRA submission interface: interactive and batch.

Interactive Telemetry

Interactive metadata tracking using the submission web tool

The interactive submission tool shows you the status of each component in your submission. For individual account submitters or those who use ftp exclusively, this is the only method for monitoring progress of your metadata submission. Completed submissions are listed under the “Submissions-Published” tab. The submission components will be displayed in green if they have been loaded.

Image Submission_tool.jpg

Submissions that need attention are listed under the “Submissions-Attention” tab. Problems are color coded as follows:

  • Green – metadata component has been loaded
  • Grey – metadata component has been received but is not linked to any data
  • Red – error detected in metadata component, cannot load it. A portion of the error stream from the attempt to load the component will be displayed under the “Comments” column.

Image Submission_tool1.jpg

Interactive files tracking using the submission web tool

The status of SRA content files can be viewed through the “Tracking” tab of the interactive submission tool. This lists the files that are not yet loaded. The default reporting period is the most recent week, be sure to set the search date options for another time period.

Image Submission_tool2.jpg

The components will be displayed in green if they have been loaded. Problems are coded in the following color scheme:

  • Green – content component has been linked and loaded into the SRA
  • Grey – content component has not been linked to SRA metadata so is not loaded
  • Red – error detected in content component so is not loaded

Limits to interactive telemetry

The submission telemetry displayed in the interactive submission tool may be limited in the following ways:

  • A submission containing more than 1000 components cannot be completely displayed.
  • A large number of submissions cannot be effectively tracked this way because of the large number of web pages that need to be examined.
  • Submissions of high granularity (one component per submission) cannot be effectively tracked this way.

Batch Telemetry

The batch submissions telemetry stream consists of the following objects updated daily:

  • Accessions tab file that shows current status of submitted SRA metadata components
  • Metadata xml annotated with accessions assigned during the submission process
  • Files tab file showing the current state of content data files sent to the SRA
  • Submission area space usage report for both open and protected SRA submission channels

The telemetry stream can provide input to the “roundtrip” processing module for a submitters laboratory information management system (LIMS). Newly submitted documents can be downloaded to obtain the accessions assigned by NCBI. Documents can be downloaded in order to compare with internal state to make sure that the version at NCBI is current. Documents can be inspected to see that certain modification operations succeeded. A LIMS can track submissions to NCBI and generate reports that can be used to compare against the submission telemetry stream from NCBI.

Batch accessions status tracking with tab files

The Accessions Report is a list of SRA metadata objects and their status. This report is a tab delimited file called SRA_Accessions found within the metadata dump file:

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/ NCBI_SRA_Metadata_Full_*_20110101.tar.gz

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/ NCBI_SRA_Metadata_*_20120312.tar.gz

The fields are defined as follows:

TagDefinitionValuesUnits or meaning
accessionaccession of the metadata object as assigned by NCBISRP, SRS, SRX, SRR, SRZ 
submissionsubmission containing the metadata object  
status
 
 
 
status of the metadata object in the archive
 
 
 
liveThe object is indexed and available for retrieval
suppressedThe object has been removed from indexing but can still be retrieved. This state usually reflects objects that have been superceded by successor objects.
unpublishedThe object has not been published, or, it was returned to an unpublished state after being published.
withdrawnThe object has been redacted from the Archive. This state reflects rare situations where data was inappropriately released and copies in the Archive must be completely removed.
updatedThe date of last update of the object ISO date
publishedThe date of the initial publication (release) of the object or its re-publication. ISO date
receivedThe date at which the Archive received the data from the submitter. ISO date
type
 
 
 
 
The object's document type
 
 
 
 
STUDY 
SAMPLE 
EXPERIMENT 
RUN 
ANALYSIS 
centerShort name for the submitting center.  
visibilityWhether the object has been archived at the open SRA no usage restrictions (public), or at the controlled access SRA (usage restrictions in place, the user must apply for access to the data). Note that visibility is orthogonal to the publication or embargo status of the data.  
aliasThe submitter’s name for the object  
md5sumThe MD5 checksum of the metadata object.   This value is computed in a canonical way, see below section.

Batch metadata tracking with xml files

Files of current metadata annotated with assigned accessions and any submission-time transformations are generated monthly with a daily incremental version. This is deposited into the open aspera account of the submitter.

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/ NCBI_SRA_Metadata_Full_*_20110101.tar.gz

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/ NCBI_SRA_Metadata_*_20120312.tar.gz

Batch files tracking with tab files

A file containing the current state of content data file transfers and loading is generated monthly with a daily incremental version. This is deposited into the open aspera account of the submitter. A public version of this file is not provided because of its prerelease focus.

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/NCBI_SRA_Files_*_20120312.gz

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/NCBI_SRA_Files_Full_*_20120229.gz

The tab file has the following format:

TagdefinitionValuesUnits or meaning
realmWhether the data have been submitted to the open SRA for eventual unrestricted use, or through to the protected SRA for authorized access.OpenAfter release, data are publicly accessible and used without restriction.
ProtectedData deposited into inner firewall, after release only accessible through authorized access credentials.
upload_idNCBI upload tracking id. This id is sequential in order of uploads.
upload_dateNCBI upload tracking date. Sometimes this date is curated to the original NCBI receipt date rather than the date at which the file entered the system.
file_namefile name of extracted file
file_sizefile size of extracted file bytes
file_md5sumfile checksum of extracted file using MD5 method
upload_namefile name of upload package this file was found in (or = if same as file_name)
upload_sizeupload file size (or = if same as file_size)bytes
upload_md5sumupload file checksum using MD5 method (or = if same as file_md5sum)
file_statusfinal status of extracted fileDoneProcessing of the file is complete
ErrorAn error was encountered
FailedThe processing of the file failed
LoadedContent was loaded into the SRA
ObsoleteFile has been marked as not needed
ReceivedFile has been received only
replaced_by_<upload_id>File has been replaced by another
file_typecomputed file type of extracted fileBAM
BZIP2
DATA
EMPTY
FASTQ
FLI
GZ
HDF5
HTML
MSOFFICE
RAR
SFF
SHELL
SHORTCUT
SRF
SYSTEM
TAR
TEXT
UNKNOWN
XSL
ZIP
load_datedate of content loadISO date
file_errorerror message from file tracking systembad_chunk_at_offset_<file_offset>SRF file integrity problem
bad_read_header_lengthSFF file integrity problem
bunzip2_errorbzip2 decompression error
copy_errorinternal error
corrupt_at_offset_<file_offset>SRF file integrity problem
corrupt_filefile could not be processed
Duplicateduplicate file
duplicate_to_upload_<upload_id>duplicate file
empty_filezero length file
failure_during_copyinternal error
file_changed_during_copyfile was written to or removed by submitter
gunzip_errorgzip decompression error
missing_read_datafile does not have read data
repeated_file_in_archiveinternal error
size_changed_during_copyfile was written to or removed by submitter
tar_erroruntar error
tar_missing_terminating_blockstar file is truncated
truncated_filefile is truncated
unzip_errordecompression error
upload_file_not_foundinternal error
submissionssubmission(s) containing this file (CSV)
loaded_runsloaded run(s) linked to this file (CSV)
unloaded_runsunloaded run(s) linked to this file (CSV)
suppressed_runssuppressed run(s) linked to this file (CSV)
loaded_analysesloaded analyses(s) linked to this file (CSV)
unloaded_analysesunloaded analyses(s) linked to this file (CSV)
suppressed_analysessuppressed analyses(s) linked to this file (CSV)

Batch account space tracking with tab files

Each day a report is compiled showing the submitters aspera account quota usage and a list of files that are currently still located in the account. This file is produced in the open realm and in the protected realm so each instance should be retrieved to give a complete view of space utilization. If the quota is reached (or threatened) you may write to NCBI request an increase. You may also back off new submissions until the space “drains out”.

To retrieve the files, use ascp against these addresses:

upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/usage_report.txt

gap-upload@ncbi.nlm.nih.gov://asp-<center>/outgoing/Files/usage_report.txt

Here is an example output portion for one submitter :

Usage report at Sun Mar 11 23:58:18 EDT 2012 created on cfengine1:
Filesystem            Size  Used Avail Use% Mounted on
panfs://pan1:global   9.1T  191G  9.0T   3% .
******************
List of all files:
.:
total 632
drwxrwsr-x  2 shumwaym trace 151552 Mar 11 23:09 incoming
drwxrwsr-x  4 shumwaym trace   4096 Mar  1 08:12 outgoing
drwxrwxr-x  4 shumwaym trace  90112 Mar  7 09:00 test


./incoming:
total 128110132
-rw-rw-r--  1 asp-bi trace 1789376627 Mar 11 22:55 D0E3KACXX111201.1.tagged_190.v2.bam
-rw-rw-r--  1 asp-bi trace 1693468095 Mar 11 22:43 D0E3KACXX111201.1.tagged_236.v2.bam
-rw-rw-r--  1 asp-bi trace  141840099 Mar 11 22:51 D0E3KACXX111201.1.tagged_289.v2.bam
-rw-rw-r--  1 asp-bi trace 1654940006 Mar 11 22:43 D0E3KACXX111201.1.tagged_332.v2.bam
-rw-rw-r--  1 asp-bi trace 1455232677 Mar 11 23:01 D0E3KACXX111201.1.tagged_34.v2.bam

Tools and Methods

Release check in Entrez

You can use Entrez SRA to check for the appearance of your submission. This only works if your submission has been released (published). To run this check, enter your submission accession or submission component accession as follows (example is using SRA025969):

http://www.ncbi.nlm.nih.gov/sra?term=SRA025969

This Entrez SRA search may be limited in the following ways:

  • It can take 1-2 business days before released objects are fully indexed in Entrez.
  • Components that have been released but do not have any data loaded (and also released) will not appear in Entrez.
  • Submission to the protected SRA for distribution through dbGaP are released by dbGaP, and may not appear in Entrez until the data have reached the next periodic study release.

Public archive mirror reports

The annotated metadata xml and the Accessions status tab file are available in a public, released form at this address. http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=faspftp_metadata&m=downloads&s=download_reports

Note that submissions that have not yet been released are redacted from this report. The information in this report is equivalent to that in Entrez, except that accessions that have been suppressed or those that returned to unpublished from a published state are still listed (though their xml are not dumped).

How to compare metadata versions

SRA metadata are not versioned in an explicit, public way. Rather, metadata are tagged when they change in a substantive way. A checksum is used to record the current content. One can tell whether the content has changed by comparing the checksum to a previously computed value.

Accession Submission  Status    Md5sum
SRA000001    SRA000001    public        d703b0b98a686a84ff232b9967e3d55c
SRP000057    SRA000001    public        bdac682ff9dca87f158b3d327832ec66
SRR000289    SRA000001    public        2fc12909aff893cdc20c48c0aa875bdf
SRS000246    SRA000001    public        7bf577c49d282529bc5f35f63137e6c0
SRX000068    SRA000001    public        a474043e97911936fe49f06f7a301aa5
SRA000002    SRA000002    public        7810982f118198eaf207a351e1550aa9
SRP000058    SRA000002    public        4c5d2a1c8a7fca885a09d690e49a5d06
SRR000290    SRA000002    public        aed8942276489b55ab98e282725ee920
SRS000247    SRA000002    public        999486234ce2e7420e16f169c2a86578

The md5sum value is equivalent to putting an xmllint 'noblanks' version of the xml associated with an accession in a file by itself (without a line feed) and executing md5sum -b on that file. If there is no meta data difference, then no increment is generated for that center on that day.

On the 1st of every month, a complete meta data dump is created in addition to an incremental dump. The first meta data dump for a new center is both an incremental and complete dump.

Obtain a copy of the script used to get md5 values for each accession chunk in a meta data xml file:

wget ftp-trace.ncbi.nlm.nih.gov:/sra/utilities/getMetaMd5.pl

The usage is:

getMetaMd5.pl < meta xml file path (based on ending in .xml) >


       OR


getMetaMd5.pl < file containing list of meta xml file paths >


       OR


<list of meta xml file paths> | getMetaMd5.pl

So you can provide the path to a single .xml file, a file containing a list of paths to xml files, or pipe to it a list of paths to xml files.

The getMetaMd5.pl script runs on Linux and requires

  • perl to be at /usr/local/bin/perl (version 5.8.3 or higher),
  • xmllint in your executable path (libxml 20630 or higher),
  • xsltproc in your executable path (version 10102 or higher),
  • Digest::MD5, a perl library for calculating md5 sums, and,
  • parseMeta.xsl, to be in the same directory as the getMetaMd5.pl executable.

How to view files you have uploaded to your aspera account

To access NCBI servers with limited shell access a submitter must use their secret key (usually used with ascp for file transfer).

For access from unix/linux/macos the secret key must be in openssh format. In this case ssh command is used and command line is as (where zzz is your center name):

For open SRA account:

 ssh -i secretkey.openssh asp-zzz@upload.ncbi.nlm.nih.gov

For protected SRA account:

 ssh -i secretkey.openssh asp-zzz@gap-upload.ncbi.nlm.nih.gov

For similar access from windows the key must be in putty format. And the putty.exe command should be used. The command line is as (where zzz is your center name):

For open SRA account:

 putty.exe -i secretkey.ppk asp-zzz@upload.ncbi.nlm.nih.gov

For protected SRA account:

 putty.exe -i secretkey.ppk asp-zzz@gap-upload.ncbi.nlm.nih.gov

This limited shell has aspsh> as a prompt and allows only few commands like ls,cp,mv,rm. The cd command is not allowed so you must use ls with directory name as an argument.

Examples:

aspsh> ls -l
total 240
drwxrwsr-x  2 5608 trace 65536 May  7 10:08 incoming
drwxrwsr-x  3 5608 trace  4096 Apr 15  2009 outgoing
drwxrwsr-x  2 5608 trace  8192 Apr 27 20:04 test


aspsh> ls -l analysis
total 0


aspsh> ls -l incoming
total 15663023352
-rw-rw-r--  1 asp-zzz trace   16504539868 May  6 10:06 0083_20090930_2_SP_ANG_HSAP_NG_005sA_01003244491_4.srf
…
PubReader format: click here to try

Views

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...