Format

Send to

Choose Destination
Bioinformatics. 2016 Aug 15;32(16):2551-3. doi: 10.1093/bioinformatics/btw177. Epub 2016 Apr 21.

Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce.

Author information

1
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
2
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.
3
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA Center for Computational Biology, Johns Hopkins University, Baltimore, MD, USA.

Abstract

MOTIVATION:

Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data.

RESULTS:

We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise.

AVAILABILITY AND IMPLEMENTATION:

Rail-RNA is available from http://rail.bio Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/

CONTACTS:

: anellore@gmail.com or langmea@cs.jhu.edu

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
27153614
PMCID:
PMC4978928
DOI:
10.1093/bioinformatics/btw177
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center