VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data

PLoS Comput Biol. 2014 Feb 6;10(2):e1003460. doi: 10.1371/journal.pcbi.1003460. eCollection 2014 Feb.

Abstract

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Adenocarcinoma / genetics
  • Adenocarcinoma of Lung
  • Algorithms*
  • Computational Biology
  • Consensus Sequence
  • DNA Mutational Analysis / statistics & numerical data
  • DNA, Neoplasm / genetics
  • Databases, Genetic / statistics & numerical data
  • Gene Frequency
  • Gene Regulatory Networks
  • High-Throughput Nucleotide Sequencing / statistics & numerical data
  • Humans
  • Lung Neoplasms / genetics
  • Melanoma / genetics
  • Models, Genetic
  • Molecular Sequence Annotation / statistics & numerical data
  • Mutation*
  • Neoplasms / genetics*
  • Oncogenes*

Substances

  • DNA, Neoplasm