methCancer-gen: a DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder

BMC Bioinformatics. 2020 May 11;21(1):181. doi: 10.1186/s12859-020-3516-8.

Abstract

Background: Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increased. To satisfy this, large-scale projects were launched to discover biological insights into cancer, providing a collection of the dataset. However, public cancer data, especially for certain cancer types, is still limited to be used in research. Several simulation tools for producing epigenetic dataset have been introduced in order to alleviate the issue, still, to date, generation for user-specified cancer type dataset has not been proposed.

Results: In this paper, we present methCancer-gen, a tool for generating DNA methylome dataset considering type for cancer. Employing conditional variational autoencoder, a neural network-based generative model, it estimates the conditional distribution with latent variables and data, and generates samples for specified cancer type.

Conclusions: To evaluate the simulation performance of methCancer-gen for the user-specified cancer type, our proposed model was compared to a benchmark method and it could successfully reproduce cancer type-wise data with high accuracy helping to alleviate the lack of condition-specific data issue. methCancer-gen is publicly available at https://github.com/cbi-bioinfo/methCancer-gen.

Keywords: Cancer; Conditional variational autoencoder; DNA methylation; Generator; Simulator.

MeSH terms

  • Algorithms*
  • Computer Simulation
  • DNA Methylation / genetics*
  • Databases, Genetic*
  • Humans
  • Neoplasms / genetics*
  • Neural Networks, Computer
  • Support Vector Machine