NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

GaP FAQ Archive [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2009-.

Cover of GaP FAQ Archive

GaP FAQ Archive [Internet].

Show details

Downloading Data

Aspera Connect

What is Aspera software, where to get it?

The dbGaP Authorized Access System uses Aspera, a high-speed file transfer system, to facilitate client download. It requires Aspera Connect to be installed on client’s download machine. Aspera Connect is an install-on-demand browser plugin. It is available for free from the Aspera website. Please make sure to download and install Aspera Connect instead of other Aspera client products. Aspera Connect is available for Linux, Mac, and Windows platforms. In addition to the web user interface, it also includes a command line ASCP executable utility.

Please download latest version of Aspera Connect for your operation system directly from the Aspera website. On the Aspera Connect download page, after specifying the operation system of your download machine, besides the download, you can also find the Aspera user guide at the bottom of the page. It is shown as links named “HTML Guide” and “PDF Guide”. (Note: These links may be subject to change). (06/21/2011)

Download Procedure

My data access request has been approved. How do I make download?

The principal investigator (PI) of the project or downloaders designated by the PI can download the data as soon as the data access request is approved. The following is how to make the download.

1.

Login to the dbGaP Authorized Access System using the eRA account login credentials. (Intramural NIH scientists and staff need their NIH email username and password).

2.

Click on “My Requests” tab. The list of Approved Requests is under “Approved” sub-tab.

3.

Under “Approved sub-tab, click on the link named “Request files” in the “Actions” column, which leads to “Access Request” page.

4.

Different types of data files available for download are shown separately under different sub-tabs. Under each sub-tab, the files are either displayed in an expandable tree or in a filter based file selector.

5.

If it is a file tree, open up the tree by clicking on “+” sign on left-side of each tree node. Check the checkboxes beside file directory or file names of interest, and finally click on “Create download request” button to assemble a download package

6.

The SRA data of some studies are displayed in a file selector available under the link named “SRA RUN selector”. The file selector allows selecting a subset of data files to download based on various attributes (sex, body site, assay type, platform, etc.). Once suitable filters are set, select files by clicking on “Add to Selected” button, and finally click on “Get Data” button to assemble a download package.

7.

Unprocessed SRA data from original submission is available under the “SRA submitted files” tab. Please note these data files are 2 to 4 times larger than processed SRA data under “SRA data (read and reference alignments” tab and the data are served from robotic tape, which often runs slower than normal file server.

The processed SRA data under “SRA data (reads and reference alignments” tab therefore is recommended format of the data for download.

8.

If it is the first download ever created for the project, a new “Downloads” tab will be created. All submitted download packages are listed under the tab. (Skip this step if the download package is just created,). Click on “Downloads” tab and launch the Aspera download page through the link named “Download” in “Actions” column of the table.

Note: there are a few more options related to download package management available in the “Actions” column of the “Downloads” table. Please see here for more details.

9.

In the browser window labeled with “Data-request #”, click on the link named “new directory” to start the download into a custom defined directory; Click on the link “default location” to start the download into Aspera default directory.

(01/17/2013)

Download Using ASCP Command-line Utility

Is there way that I can make my download using command-line interface?

The Aspera Connect software comes with ascp, a command-line fasp transfer program. For general usage information of the program, please refer to the Aspera user guide that can be found from the Aspera website. On the Aspera Connect download page, after specifying the operation system of your download machine, you can find the Aspera user guide at the bottom of the page. It is shown as links named “HTML Guide” and “PDF Guide” (Note: These links may be subject to change).
The information of using ascp command-line utility to download dbGaP data can be found from the
Aspera download page, which is a browser window labeled with “Data-request #” and shown right after a download package is submitted or opened up as a new window after clicking on the "Download" option in “Action” column of “Available downloads” table (see the step 7 described at here for more details).

To use the utility, you can either download a Perl script written specifically for this purpose or directly run a pre-formulated command line provided in a text-box under the section named “Use ascp command line utility”.

The “%ASPERA_CONNECT_DIR%” in the pre-formulated command-line should be replaced by the real path to the Aspera installation directory. The last part of the command is shown as a dot (“.”), which represents the current directory and can be replaced by the path to specific download destination directory. Please note that you may have to adjust some ascp-specific command line parameters for your environment and network connectivity.



(09/09/2011)

Example of ASCP Command-line Download

Could you give me a real world example of how exactly ASCP download should be carried out?

The following is a detailed notes created while assisting a dbGaP user in his download. I post it at here as an example of downloading dbGaP data using ASCP command-line utility in a Unix/Linux environment. The ASCP download on Windows and Mac OS should be very similar.

Download and install Aspera connect

Download the file named “aspera-connect-2.7.9.58060-linux-64.sh” from the Aspera connect download page on Aspera website (see below) and copy it into any directory such as “/home/foo/test/apsera_connect”.

http://www.asperasoft.com/en/products/client_software_2/aspera_connect_8

(Note: The URL is subject to change. Please always use the most recent version.)

The following is terminal display of the installation steps:


$ cd /home/foo/test/apsera_connect

$ ./aspera-connect-2.7.9.58060-linux-64.sh

Installing Aspera Connect

Deploying Aspera Connect (/home/foo/.aspera/connect) for the current user only.

Restart firefox manually to load the Aspera Connect plug-in

Install complete.

The default location of the software installation is “.aspera” directory under the home. To confirm:


$ cd /home/foo/.aspera/connect

$ ls

bin etc lib var

When launching the web-browser, installed Aspera connect is automatically pickup by the browser as a plug-in.


$ firefox&


Assemble download package

Following this instruction to assemble a download package. Submit assembled package by clicking on "Send Request" button.

Get pre-formulated ASCP command-line command

A pre-formulated command-line command for ASCP download is provided from the “FASP download page”. If your Linux machine has no web-browser installed, you can get the pre-formulated command from any other machine (Windows, Mac, Linux OS) that has web-browser installed. The command obtained from other machine can be used on different machine the same way as described below.

(Note: In the following examples, if the Aspera Connect plugin version is lower than 3.3.3, you would need to change “asperaweb_id_dsa.openssh” to “asperaweb_id_dsa.putty”. The version can be fond by runing command ascp -A )

The “FASP downloads” page is displayed right after the download package is submitted, Alternatively, to download from previously created download packages, go to “Download” tab, from the “Available downloads” table, find out the table row of download package of interest, click on the “Download” link in the “Actions” column, the “FASP downloads” page will popup.

In the middle of the “FASP downloads” page, there is a textbox under the section named "Use ascp command line utility.". The textbox contains pre-formulated ASCP command-line command, which looks like the following (in a single line):

“%ASPERA_CONNECT_DIR%\bin\ascp" -QTr -l 300M -k 1 –I “%ASPERA_CONNECT_DIR%\etc\ asperaweb_id_dsa.openssh" -W A4E3796BxxxxxxFE4B7D dbtest@gap-upload.ncbi.nlm.nih.gov:data/instant/dbgap_test1/26008 .

Please note that the string “A4E3796BxxxxxxFE4B7D” is a security token specific of the download package. The dot (.) at the very end of the command means that using current directory as output directory. It can be replaced by specific path of desired output directory.

Copy/past the pre-formulated command to a text editor and replace string “%ASPERA_CONNECT_DIR%

with” the Aspera installation path

/home/foo/.aspera/connect

so that the final command becomes (in a single line):

/home/foo/.aspera/connect/bin/ascp -QTr -l 300M -k 1 –I /home/foo/.aspera/connect/etc/ asperaweb_id_dsa.openssh -W A4E3796BxxxxxxFE4B7D dbtest@gap-upload.ncbi.nlm.nih.gov:data/instant/dbgap_test1/26008 .


(note: the dot “.” at the end represents the current directory)


Execute the command

Go to intended directory for downloaded files and execute above command. The terminal display is below:

$ cd /home/foo/download_file

$ /home/foo/.aspera/connect/bin/ascp -QTr -l 300M -k 1 -i

/home/foo/.aspera/connect/etc/ asperaweb_id_dsa.openssh -W A4E3796BxxxxxFE4B7D

dbtest@gap-upload.ncbi.nlm.nih.gov:data/instant/dbgap_test1/26008 .

The server's host key is not cached. You have no guarantee

that the server is the computer you think it is.

The server's rsa2 key fingerprint is:

ssh-rsa 1024 02:e9:ed:40:c3:86:5d:e2:92:6c:5a:65:53:e1:6b:74

If you trust this host, enter "y" to add the key to

PuTTY's cache and carry on connecting.

If you want to carry on connecting just once, without

adding the key to the cache, enter "n".

If you do not trust this host, press Return to abandon the

connection.

Store key in cache? (y/n) y

GS00360-

DNA_A01_chr21.bam.bai.ncbi_enc

........................ 100% 122KB ........ --:--

GS00360-DNA_A01_chr21.bam.header.ncbi_enc

...

...

...

GS00360-DNA_A01_chr22.bam.ncbi_enc

........................ 100% 8270MB 291Mb/s..... 07:48

Completed: 16563657K bytes transferred in 469 seconds

(289291K bits/sec), in 8 files, 8 directories.


Confirm the download

Under the download directory /home/foo/download_file


$ cd 26008/reads/AML/TARGET-20-PAEAKL/tumor_sample/TARGET-20-PAEAKL-01-01D/SRZ022786/provisional/

$ ls -l

-rw-r--r-- 1 foo gwas 124944 Jul 2 21:36 GS00360-DNA_A01_chr21.bam.bai.ncbi_enc

-rw-r--r-- 1 foo gwas 6176 Jul 2 21:36 GS00360-DNA_A01_chr21.bam.header.ncbi_enc

-rw-r--r-- 1 foo gwas 192 Jul 2 21:36 GS00360-DNA_A01_chr21.bam.md5.ncbi_enc

-rw-r--r-- 1 foo gwas 8289064656 Jul 2 21:40 GS00360-DNA_A01_chr21.bam.ncbi_enc

-rw-r--r-- 1 foo gwas 126544 Jul 2 21:40 GS00360-DNA_A01_chr22.bam.bai.ncbi_enc

-rw-r--r-- 1 foo gwas 6176 Jul 2 21:40 GS00360-DNA_A01_chr22.bam.header.ncbi_enc

-rw-r--r-- 1 foo gwas 192 Jul 2 21:40 GS00360-DNA_A01_chr22.bam.md5.ncbi_enc

-rw-r--r-- 1 foo gwas 8671856560 Jul 2 21:44 GS00360-DNA_A01_chr22.bam.ncbi_enc


Please note that downloaded files are all encrypted (end with .ncbi_enc). Please see here for the instructions of how to decrypt dbGaP encrypted files.

(01/30/2014)

Download timeout

My aspera download aborted prematurely. Do you have any idea why?

If the Aspera connect is installed correctly, the most common problem is download timeout. The following are a few tips.

1. The timeout problem often suggests that the Aspera connection speed is set too high relative to your network bandwidth. You can estimate the raw bandwidth of you internet connection using web site like http://speedtest.net/. The actual download speed is limited by that. The Aspera default speed setting should match the maximum download speed, not too much lower or too much higher. For the network of most of small institutions, the connection speed is expected to be in 45-200 mpbs range. To change Aspera connection speed, right click the Aspera icon from the system tray. Select 'networks' and update the connection speed. As an initial start, try 45 Mbps and then higher. Please refer the following for more details:

https://dbGaP.NCBI.nlm.nih.gov/aa/aspera_transfer_guide.pdf

For ASCP command line download, the speed can be changed by alter the speed option from "-l 300M" to an appropirte speed, such as "-l 150M".

2. Disk writing speed is often another bottleneck of the download speed. Check to make sure disk writing (especially when writing to remote disk is fast enough).

3. Most importantly, the very large size download is prone to have timeout problem. When the internet bandwidth is relatively small, it is strongly suggested to break the download into multiple smaller packages. Please see here for more about how to assemble a download package.

(05/16/2014)

Download SRA data by SRR accession using SRA toolkit

I am trying to automate the process of dbGaP SRA data download. Is there anyway that I can download SRA data through a command line without assembling a download package?

The NCBI SRA Toolkit is capable of dumping SRA data to different formats directly through the internet by SRA Run (SRR) accessions. The SRA toolkit needs to be configured with dbGaP repository key before running the toolkit. Please see here for details of how to configure the SRA Toolkit.

Once the toolkit is configured correctly, the command can be run on any SRR accession available for download and dump the data in respective fomate remotely.

For instance to use sam-dump, from within your dbGaP project's workspace

directory run.

sam-dump <accession>

To instead dump in BAM format, the SAM output needs to be piped to samtools installed on your local machine (please not the trailing - is necessary). As an example,

sam-dump SRR390728 | samtools view -bS - > SRR390728.bam

Please see here for some information on the options available to customize your SAM/BAM output, and here is more information about unititlies of SRA toolkit.

(05/16/2014)

Manage Download Packages

I have assembled multiple download packages. What can I do with them?

Once data access request is approved, the PI or downloaders can assemble download packages through the dbGaP system. The download packages are displayed and managed through the “Downloads” page under “Download” tab (see here for more details). The following functions are available for managing download packages. They are provided as links in the “Action” column of “Available download” table.

1.

Download: Open Aspera download page.

2.

Remind password: Recover existing decryption password. The link is available only for PI.

3.

Delete: Delete the download package.

4.

Request again: Allow reassemble expired download package. The link is available only for expired download package.

How to Add Downloaders to Projects?

I am a principal investigator (PI). Is it possible to allow my lab staff or collaborator to download data without sharing my eRA login credentials?

Here is a video related to this topic. Recently improved user-interface of the dbGaP Authorized Access System allows principal investigator (PI) to designate one or more downloaders within PI’s institution. A Downloader is an individual assigned by the PI to perform the time-consuming task of retrieving large data files. The downloaders can login to the dbGaP system through their own account, select files of interest, assemble download packages, and make download. The selected files are limited to the approved data access requests of the projects specified by the PI. Downloader will have to obtain decryption password from the PI in order to decrypt downloaded files. Decryption password is created, maintained, and distributed exclusively by the PI.

The following is how to assign downloaders to approved datasets within all or specific projects:

1.

Login to the dbGaP Authorized Access System as a PI using the eRA login credentials; If respective project hasn’t yet been created, create the project and follow multiple steps to complete and submit the online application.

2.

Navigate to “Downloader” page through “Downloaders” tab. Search for the name of intended downloader by the first name and last name using the search boxes.

Note: A downloader needs to have a valid NIH eRA Commons account or a NIH email account, and have successfully logged into the dbGaP Authorized Access System at least once. Downloader’s eRA account does not need to have a PI role, but it does need to be affiliated with PI’s institution.

1.

Confirm to make sure the resulting user name is correct; Click on the name; select all or a specific project from the pull-down manual, and finally click on “Set downloader” button to make the assignment. The downloader’s name and the projects accessible to the downloader will be displayed on the page.

2.

The PI can use the “X” buttons in “Remove Role” column of downloader table to remove any downloaders or downloader’s projects. (07/13/2011)

How to Become a Downloader?

I am a data analyst working for a principal investigator (PI) who has multiple approved data access requests. How can I download PI’s datasets without logging into his account?

Here is a video related to this topic. Downloader has to be designated by the PI through the dbGaP system. Please see here for more details. Prior to be chosen as a downloader, the individual must

1.

Have a valid NIH eRA Commons account affiliated with the same organization as the PI, or has an NIH email account. The eRA account does not need to have a PI role.

2.

Have already completed at least one successful login to the dbGaP Authorized Access System.

(07/12/2011)

Download Procedure for Downloader

I am a downloader designated by the principal investigator (PI). How do I make download?

The download procedure is nearly the same for PI and for downloaders. Please see here for more details. (06/30/2011)

Expired Download Package

My download package is expired. What can I do with it?

In most of cases, the expiration interval of a download package is set to two months. You can always delete expired package and order a new one if you need to download the same data again. The new download package can include some or all of the previously downloaded files. Please see here for more details. (06/30/2011)

FTP Site Availability for Downloads

Can I use FTP instead of Aspera to download dbGaP data? I don’t have large file to download.

No, the FTP interface is no longer available for downloading dbGaP data. The Aspera Connect is the only choice. (06/21/2011)