Format

Send to

Choose Destination
Nat Genet. 2018 Dec;50(12):1735-1743. doi: 10.1038/s41588-018-0257-y. Epub 2018 Nov 5.

A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data.

Author information

1
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA.
2
Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA.
3
Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA.
4
Department of Neurological Surgery, Center for Human Immunology and Immunotherapy Programs, Washington University School of Medicine, St. Louis, MO, USA.
5
Department of Surgery/Otolaryngology, Brigham and Women's Hospital and Dana-Farber Cancer Institute, Boston, MA, USA.
6
Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA.
7
Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA.
8
Institute for Genomic Medicine, The Research Institute at Nationwide Children's Hospital, Columbus, OH, USA.
9
Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
10
Department of Pathology and Immunology, Washington University School of Medicine, St. Louis, MO, USA. swamidass@wustl.edu.
11
Institute for Informatics, Washington University School of Medicine, St. Louis, MO, USA. swamidass@wustl.edu.
12
McDonnell Genome Institute, Washington University School of Medicine, St. Louis, MO, USA. obigriffith@wustl.edu.
13
Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO, USA. obigriffith@wustl.edu.
14
Division of Oncology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA. obigriffith@wustl.edu.
15
Department of Genetics, Washington University School of Medicine, St. Louis, MO, USA. obigriffith@wustl.edu.

Abstract

Cancer genomic analysis requires accurate identification of somatic variants in sequencing data. Manual review to refine somatic variant calls is required as a final step after automated processing. However, manual variant refinement is time-consuming, costly, poorly standardized, and non-reproducible. Here, we systematized and standardized somatic variant refinement using a machine learning approach. The final model incorporates 41,000 variants from 440 sequencing cases. This model accurately recapitulated manual refinement labels for three independent testing sets (13,579 variants) and accurately predicted somatic variants confirmed by orthogonal validation sequencing data (212,158 variants). The model improves on manual somatic refinement by reducing bias on calls otherwise subject to high inter-reviewer variability.

PMID:
30397337
DOI:
10.1038/s41588-018-0257-y

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center