Format

Send to

Choose Destination
Sci Rep. 2015 Sep 18;5:14283. doi: 10.1038/srep14283.

SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

Guo Y1,2, Ding X3, Shen Y4, Lyon GJ5,6, Wang K1,2,6,7.

Author information

1
Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, CA 90033, USA.
2
Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA 90032, USA.
3
School of Forestry and Environment, Nanjing Forestry University, Nanjing, Jiangsu 210037, China.
4
Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY 10032, USA.
5
Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, New York, NY 11797, USA.
6
Utah Foundation for Biomedical Research, 150 S 100 W, Provo, UT, 84601, USA.
7
Department of Psychiatry &Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA.

Abstract

Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

PMID:
26381817
PMCID:
PMC4585643
DOI:
10.1038/srep14283
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center