Format

Send to

Choose Destination
J Biomed Inform. 2014 Jun;49:119-33. doi: 10.1016/j.jbi.2014.01.005. Epub 2014 Jan 22.

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

Author information

1
NEC Labs China, Beijing 100084, China. Electronic address: liu_bo@nec.cn.
2
Computation Institute, University of Chicago, Chicago, IL, USA; Mathematics and Computer Science Division, Argonne National Lab, IL, USA.
3
Computation Institute, University of Chicago, Chicago, IL, USA.
4
School of Software Engineering, Beijing University of Technology, Beijing 100022, China.
5
NEC Labs China, Beijing 100084, China.

Abstract

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.

KEYWORDS:

Bioinformatics; Cloud computing; Galaxy; Scientific workflow; Sequencing analyses

PMID:
24462600
PMCID:
PMC4203338
DOI:
10.1016/j.jbi.2014.01.005
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center