U.S. flag

An official website of the United States government

Get Started in Athena

Setup

To get started with Athena, you will need an Amazon AWS account. Please follow the AWS-provided tutorial to become familiar with:

Please make sure you create your bucket for saving results in the US-east-1 region.
We recommend using AWS Glue to create the tables from the bucket. In order to create the tables, you need to include the S3 location of the metadata. SRA provides data in two different locations:

  1. Coronaviridae dataset in the AWS Public Dataset Program: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/
  2. Entire SRA metadata: s3://sra-pub-metadata-us-east-1

AWS Glue does have a small charge associated with it, based on the number of tables in the catalog and the amount of time it takes to run the crawler to find all the datasets. The crawler charge will generally be less than $1.
With the AWS Glue Data Catalog, you can store up to a million objects for free. An object in the AWS Glue Data Catalog is a table, table version, partition, or database. The first million access requests to the AWS Glue Data Catalog per month are free.
Alternatively, you can opt to manually create a database yourself and add the tables.
You can find the table definitions here: SRA Aligned Read Format Table Definitions.

The table S3 locations

For all SRA metadata

  • metadata: s3://sra-pub-metadata-us-east-1/sra/metadata/
  • taxonomy analysis: s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis/
  • tax analysis info: s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/tax_analysis_info/
  • taxonomy: s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/taxonomy/
  • kmer: s3://sra-pub-metadata-us-east-1/sra_tax_analysis_tool/kmer/

For the Coronaviridae specific dataset

  • contig taxonomy analysis: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/tax_analysis/
  • run taxonomy analysis: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/run_tax_analysis/
  • contigs: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/contigs/
  • blastn: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/blastn/
  • hmmsearch: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/hmmsearch/
  • peptides: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/peptides/
  • metadata: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/metadata/
  • variations: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/variations/
  • annotated variations: s3://sra-pub-sars-cov2-metadata-us-east-1/v2/annotated_variations/

Access methods

We recommend first using the Athena query editor to become familiar with writing queries before attempting to use the command line tools or client libraries.

Athena can be accessed through a web browser query editor:
https://console.aws.amazon.com/athena/.
 

Athena client library documentation is also available for reference if you plan to access it through the AWS CLI or one of the supported SDKs:
https://docs.aws.amazon.com/cli/latest/reference/athena/.
 

AWS command line tools can be downloaded and set up from here:
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html.
 

Please see the AWS documentation for more information on these options.

Payment

The user pays to run queries against public data sets and for storage of results in S3. We recommend all users review the payment requirements for on-demand queries from Athena.

Engage

NCBI wants your feedback on SRA in the Cloud. Contact sra@ncbi.nlm.nih.gov with questions or if you would like to provide input on new functionality.

Support Center

Last updated: 2021-05-21T13:19:32Z