NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

SRA Knowledge Base [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2011-.

Cover of SRA Knowledge Base

SRA Knowledge Base [Internet].

Show details

Downloading SRA data using command line utilities

.

Overview

When to use a command line utility rather than the SRA website.

For multiple simultaneous downloads of SRA data, or for high-volume downloads, we recommend using command line utilities such as wget, FTP, or Aspera’s ‘ascp’ utility. As with web-based downloads, the best speed is achieved with Aspera’s FASP implementation. ascp is bundled with the Aspera Connect plugin.

Downloading SRA data using the SRA Toolkit.

The SRA Toolkit has the capacity to download data files directly (when properly configured) simply by calling a Toolkit command and specifying the accession of interest. For example:

$ fastq-dump -X 5 -Z SRR390728

This example will retrieve data for SRR390728 (a small dataset: 193 MB), print the first five spots (-X 5) to standard out (-Z). The above operation does not require prior download of SRR390728; fastq-dump (and all other Toolkit ‘dump’ utilities) will identify the accession and contact NCBI to download it.

Alternatively, the Toolkit utility ‘prefetch’ can be used in conjunction with HTTP transfer (default) or ascp. This can be a simpler way for many users to utilize ascp to download SRA data.

Accessing the ‘ascp’ utility.

As of version 3.3x of Aspera Connect, the default install location for ascp is:

Microsoft Windows: ‘C:\Program Files\Aspera\Aspera Connect\bin\ascp.exe’

Mac OS X: ‘/Applications/Aspera Connect.app/Contents/Resources/ascp’ (Administrator-installed Aspera Connect) or ‘/Users/[username]/Applications/Aspera\ Connect.app/Contents/Resources/ascp’ (Non-administrator install)

Linux: ‘/opt/aspera/bin/ascp’ or ‘/home/[username]/.aspera/connect/bin/ascp’

What key file should be used to download SRA data by ascp?

Downloading SRA data does not require an NCBI-generated private key file. Aspera Connect / ascp is packaged with Asperasoft-generated key files:

asperaweb_id_dsa.openssh (openssh formatted key for newer ascp installations)

asperaweb_id_dsa.putty (putty formatted key for older ascp installations)

You may need to browse the directories produced during the Aspera Connect / ascp installation to find the above key files. They are stored in slightly different locations depending on your operating system.

Determining the location of SRA data files for automated or scripted downloads.

Beginning with a list of desired SRA data sets (e.g., a list of SRA Run accessions, “SRRs”), the exact download location for that data file can be determined as follows:

wget/FTP root: ftp://ftp-trace.ncbi.nih.gov

ascp root: anonftp@ftp.ncbi.nlm.nih.gov:

Remainder of path:

/sra/sra-instant/reads/ByRun/sra/{SRR|ERR|DRR}/<first 6 characters of accession>/<accession>/<accession>.sra

Where

{SRR|ERR|DRR} should be either ‘SRR’, ‘ERR’, or ‘DRR’ and should match the prefix of the target .sra file

Examples:

Downloading SRR304976 by wget or FTP:

wget ftp://ftp-trace.ncbi.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra

Downloading the same file by ascp:

[path_to_ascp_binary]/ascp -i [path_to_Aspera_key]/asperaweb_id_dsa.openssh -k 1 –T -l200m anonftp@ftp.ncbi.nlm.nih.gov:/sra/sra-instant/reads/ByRun/sra/SRR/SRR304/SRR304976/SRR304976.sra [local_target_directory]

It is recommended that you always use the “-l” (target transfer rate) option when configuring ascp. A more detailed discussion of ascp options and configuration can be found in our Aspera Transfer Guide. Aspera’s documentation on ascp also provides usage and examples.

Performance comparison of FTP and ascp downloads

It might be beneficial to consider using ascp if you plan to download more than 1 Gigabyte of data, or if your location is distant from NCBI (located on the eastern coast of North America) since:

  • The amount of data for a given SRA project can exceed 10 gigabytes, and traditional FTP may be too slow to download your data efficiently.
  • FTP performance degrades proportionally with the number of hops or switches the data must take to get to you. Aspera performance does not degrade with distance.
  • Aspera is typically 10 times faster than FTP and reduces the chance of drops or time-outs in the middle of a transfer. Best-case transfer rates for ascp are ~ 600 Mbps, while typical rates are closer to 100-200 Mbps.

If you are located in Europe or Asia and wish to download via FTP, you would have less trouble getting a successful transfer of the data from our INSDC partners — either the European Nucleotide Archive (EMBL-EBI) or the DDBJ Sequence Read Archive (DNA Database of Japan).

Views

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...