Format

Send to

Choose Destination
iScience. 2019 Aug 30;18:1-10. doi: 10.1016/j.isci.2019.05.037. Epub 2019 May 29.

Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads.

Author information

1
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
2
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA.
3
The Institute for Genomic Medicine, Nationwide Children's Hospital, Columbus, OH, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
4
Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA; Department of Neurosurgery, Nationwide Children's Hospital, Columbus, OH, USA.
5
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. Electronic address: langmea@cs.jhu.edu.
6
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Department of Biology, Johns Hopkins University, Baltimore, MD, USA; Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA. Electronic address: mschatz@cs.jhu.edu.

Abstract

Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.

KEYWORDS:

Bioinformatics; Biological Sciences; Genomics

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center