Format

Send to

Choose Destination
Genome Biol. 2015 Sep 17;16:197. doi: 10.1186/s13059-015-0758-2.

An ensemble approach to accurately detect somatic mutations using SomaticSeq.

Author information

1
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. li\_tai.fang@bina.roche.com.
2
Department of Electrical Engineering, Stanford University, Stanford, 94305, CA, USA. pegahta@stanford.edu.
3
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. aparna.chhibber@bina.roche.com.
4
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. marghoob.mohiyuddin@bina.roche.com.
5
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, 77030, TX, USA. YFan1@mdanderson.org.
6
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. john.mu@bina.roche.com.
7
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. greg.gibeling@bina.roche.com.
8
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. sharon.barr@bina.roche.com.
9
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. narges.baniasadi@bina.roche.com.
10
Program in Computational Biology and Bioinformatics, Yale University, New Haven, 06520, CT, USA. mark.gerstein@yale.edu.
11
The Genome Institute, Washington University in St Louis, St Louis, 63108, MO, USA. dkoboldt@genome.wustl.edu.
12
Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, 77030, TX, USA. wwang7@mdanderson.org.
13
Department of Statistics, Stanford University, Stanford, 94305, CA, USA. whwong@stanford.edu.
14
Department of Health Research and Policy, Stanford University, Stanford, 94305, CA, USA. whwong@stanford.edu.
15
Bina Technologies, Roche Sequencing, Redwood City, 94065, CA, USA. hugo.lam@bina.roche.com.

Abstract

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.

PMID:
26381235
PMCID:
PMC4574535
DOI:
10.1186/s13059-015-0758-2
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center