Download SRA sequence data using Amazon Web Services (AWS)

SRA Data in the AWS Registry of Open Data

Amazon Web Services publicly hosts SRA dataoffsite image through the Registry of Open Dataoffsite image. SRA has several datasets in the AWS Registry of Open Data, all of which can be accessed freely, without charge, through either an HTTPS or S3 URL. One dataset contains public SRA data in the originally submitted format from select high value and newly-released studies. The second dataset acts as a centralized repository of SARS-CoV-2 related sequences submitted to NCBI. Included are both the original files submitted by the principal investigator as well as SRA-processed sequences (including normalized sequence files and SRA aligned read format files) that require the SRA Toolkit for analysis. This dataset also includes metadata searchable in AWS Athena by BLAST result, taxonomic analysis, and more, to allow rapid discovery of the most relevant data to your research.

Coronaviridae Datasetsoffsite image

  • Runs directory contains normalized sequence data, accessible in multiple formats (fastq, sam, fasta) via the SRA Toolkit and organized by Run accession.
  • sra-src directory contains the submitted sequence files in their original format, organized by Run accession.
  • RAO directory contains the SRA aligned reads file for each Run, organized by Run accession.
  • VCF directory contains SRA generated VCF files, organized by Run accession.

Public user-submitted filesoffsite image

  • Contains submitted sequence files in their original format, organized by Run accession.

Accessing SRA Data in AWS

If you know your Run accessions of interest you can access the data several ways. To download files from the AWS Consoleoffsite image using a browser, visit the HTTPS URL for the Coronaviridae dataset and Public SRA files respectively:

  • https://s3.console.aws.amazon.com/s3/buckets/sra-pub-sars-cov2
  • https://s3.console.aws.amazon.com/s3/buckets/sra-pub-src-2

From there you can navigate the directory structure using the provided graphical interface and you can search a given directory for your accession of interest using the provided search box near the top of the page. Once you have navigated to a specific file of interest you can click the Object URL link or use the Object actions button to copy the file to your own S3 bucket or download a copy to local storage.

To access files from within AWS, e.g. from an EC2 instance, you can use the AWS CLIoffsite image to perform an S3 copy or sync, using a command like this:

aws s3 cp s3://sra-pub-sars-cov2/README.txt

These data can also be accessed using various other tools and librariesoffsite image. Access to files in the AWS Registry of Open Dataoffsite image is free. This is true whether you use the HTTPS or S3 URL. For S3 URLs, the transfer is free even if it crosses an AWS region boundary; there is no inter-regional data transfer feeoffsite image.

If you don't know the Run accessions you are interested in, you can start by searching in the SRA Run Selector,
AWS Athenaoffsite image, or SRA Entrez.
A full list of Coronaviridae-containing SRA runs as detected with NCBI's kmer analysis tool is available here: ftp://ftp.ncbi.nlm.nih.gov/sra/reports/AccList/ .

Introduction for First Time Users

Amazon Elastic Compute Cloud (EC2) is the Amazon Web Service you use to create and run virtual machines in the cloud. AWS calls these virtual machines 'instances'. You will need to install your bioinformatics tools for data analysis and the SRA Toolkit for accessing the SRA data.

Creating an AWS Instance

Exclamation point Users will need to address accounts on their own.
Please work with your organization for credential and billing questions. If using a personal account, this guide attempts to stay within AWS Free Tier for users who are still eligible.

Exclamation point Users of this guide are expected to have experience using a Unix command-line interface.

Sign-in and Enter the Amazon EC2 Console

Sign-in using your AWS account: Amazon AWS Consoleoffsite image.

Create an AWS Instance

Please follow this Amazon step-by-step guideoffsite image that will help you launch a Linux virtual machine on Amazon EC2 within Amazon AWS Free Tier.
Please make sure to create your EC2 instance in the US East (N. Virginia) us-east-1 region.

Connect to the Instance

Use either a Unix/OSX terminal or your preferred ssh application to connect the same as the Amazon tutorial linked above. - This AMI username is ec2-user.

Terminate the Instance

  • Remember to terminate the EC2 instance from the AWS console when you have finished using it. If you do not terminate the instance, charges can be generated on your account even when no users are connected.
  • Data stored on the EC2 instance will be deleted when the instance is terminated. Users will likely want to have stable s3 storage to store results from their work.

The SRA Toolkit in AWS

Installing The SRA Toolkit in your instance

Once you connected, you will be able to work in Unix-like command line environment where you can install and configure the SRA Toolkit.

Using the SRA Toolkit in AWS

  • For downloading public SRA data from our cloud buckets to your cloud storage you can use the SRA Toolkit utilities as described in the SRA Download Guide
  • For downloading dbGAP data from our cloud buckets to your cloud storage you need to use jwt.cart file as descibed in the Downloading dbGaP data with JWT

TackDon't forget to STOP your instance after you finished your work!

Youtube Video Tutorial - Setting up AWS - demo

Engage

NCBI wants your feedback on SRA in the Cloud. Contact sra@ncbi.nlm.nih.gov with questions or if you would like to provide input on new functionality.

Support Center

Last updated: 2021-02-19T16:50:31Z