SRA XML Schema 1.3 Release Notes

Draft C – 14 Jul 2011

Created: ; Last Update: August 11, 2011.

Active Date2011-08-11
Inactive Date

1. Overview

This document summarizes the proposed changes for Release 1.3 of the Sequence Read Archive (SRA) schemas governing XML metadata. This schema will be used by the SRA archive instances and has been developed under the auspices of the International Nucleotide Sequence Database Collaboration (INSDC,

Release 1.3 is a change over Release 1.2, which was in introduced in October 2010. While the schemas are incompatible, all data have been migrated so that documents submitted or modified before release remain valid. The goal of this release is to update choices, introduce new features, and specify an Analysis object usable for BAM file submissions. These changes are being introduced with the objective of not invalidating any current valid XML documents.

Major new features in this release are:

  • Addition of new instrument models
  • Require certain fields that have already been migrated
  • Allow for modification of already-loaded analysis objects

1.1. Notice

The features and modalities described in the XML schema DO NOT constitute a statement of features and mechanisms available in the SRA. The schema changes frequently must precede actual implementation. New feature rollouts and functionality changes are made asynchronously with XML schema changes. Each SRA implementation by INSDC partners may impose additional business rules not reflected in the schema.

1.2. Related Documents

The SRA schema for this release can be obtained from this site:

1.3. Revision History

Drafts C- 2011-07-14 for approval by INSDC partners

2. Explanation of Changes

2.1. Changes to All Documents

2.1.1. Adjustment to import statements

All document importing SRA.common.xsd now point to a resolvable URL:

Note that this will be implemented on the final distribution copy of the schema files.

2.2. Changes to SRA.Common.xsd

2.2.1. Add new instrument models

New instrument values have been added to Platform block :

  • "Illumina HiSeq 1000" [Illumina],
  • "Illumina MiSeq" [Illumina],
  • AB SOLiD 5500xl SOLiD System
  • AB SOLiD 5500 SOLiD System
  • "PacBio RS" [Pacific Biosciences]
  • "Complete Genomics" [Complete Genomics] (platform already exists)
  • ION_TORRENT (new platform) and instrument models:
    • Ion Torrent PGM

2.2.1. Remove deprecated instrument models

  • Solexa 1G Genome Analyzer (use Illumina choices)
  • GS 20 (use 454 GS 20)
  • GS FLX (use 454 GS FLX)
  • 454 Titanium (use 454 GS FLX Titanium)

2.2.2. Add GapDescriptor

A new structure called the GapDescriptor is introduced that will encode the placement of spot subsequences (tags) against a reference or assembly substrate. This structure encodes mate pair gaps and tandem read gaps. It is possible to express gaps distances in three ways: as mean/standard deviation, as min-max range, and as histogram. Orientation of the tag pairs can be described as “innie”, “outie”, “normal”, and strand-opposite “anti-normal”, following the nomenclature of the Celera Assembler.

Introduction of the GapDescriptor element was motivated by the need to describe CompleteGenomics platform sequencing. It is also intended that the GapDescriptor replace the LIBRARY_LAYOUT element in the LibraryType. The GapDescriptor can be specified at the level of Run in order to override any general settings at the level of experiment.

2.2. Changes to SRA Experiment

2.2.2. New Library Source choice METATRANSCRIPTOMIC

This was requested by EBI.

2.2.3. Removed deprecated library strategy choice BARCODE

This change was requested by EBI. No records have this designation.

2.3. Changes to Study

2.3.1. The RELATED_STUDIES/STUDY block removed

In preparation for migration to BioProjects, this deprecated block has been removed.

2.4. Changes to Sample

2.4.1. TAXON_ID now required

The TAXON_ID field in the SAMPLE_NAME block is now required. All records already have this.

2.5. Changes to Submission

2.5.1. Submission handle removed

The deprecated SUBMISSION/@handle attribute has been removed.

2.5.2. Submission submission_id removed

The deprecated SUBMISSION/@submission_id attribute has been removed.

2.5.3. PROTECT action is now a complex type

This is a technical improvement requested by a major submitter.

2.6. Changes to Run

2.6.1. Replicated descriptors at Run level

  • Replicated GAP_DESCRIPTOR at the level of Run. If specified at Run, it will override the setting at the level of Experiment.
  • Replicated SPOT_DESCRIPTOR at the level of Run. If specified at Run, it will override the setting at the level of Experiment.
  • Replicated PLATFORM descriptor at the level of Run. If specified at Run, it will override the setting at the level of Experiment.
  • Replicated PROCESSING descriptor at the level of Run. If specified at Run, it will override the setting at the level of Experiment.

2.6.2. Require checksum and checksum method

The DATABLOCK/FILES/FILE/@checksum and @checksum_method are now required attributes.

2.6.3. New Filetype Choice

The filetype option "PacBio_HDF5” has been created to support the native loader for PacBio.

2.6.3. Old Filetype removed

The filetype option "_seq.txt, _prb.txt, _sig2.txt, _qhg.txt" has been eliminated in favor of “Illumina_native”.

2.7. Changes to Analysis

2.7.1. DATA_BLOCK not required for modification

The DATA_BLOCK is now required for add submissions, but no longer for modify submissions.

3. Deprecated Fields

SRA 1.3 contains the following fields, branches, and options that should no longer be used in current submissions.

DocumentFieldUse instead
SRA.common.xsd'454 Titanium'use '454 GS FLX Titanium'
SRA.common.xsd'GS 20'use '454 GS 20'
SRA.common.xsd'GS FLX'use '454 GS FLX'
SRA.common.xsd'Solexa 1G Genome Analyzer'use 'Illumina Genome Analyzer'
SRA.experiment.xsdLIBRARY_STRATGEY/BARCODEuse another library strategy
SRA.experiment.xsd@expected_number_reads'_seq.txt, _prb.txt, _sig2.txt, _qhg.txt'use 'Illumina_native' instead PLATFORM/INSTRUMENT_MODEL instead
SRA.submission.xsd@submission_iduse alias instead

4. Future Planned Revisions

The next revision is anticipated to be contracting revision (one that potentially invalidates current documents). The main changes will be to remove deprecated fields. This will involve migration of data in anticipation of future schema changes.

