Format

Send to

Choose Destination
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):368.

Application of dynamic topic models to toxicogenomics data.

Author information

1
NIH/National Center for Advancing Translational Sciences, Rockville, MD, USA.
2
Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, Jefferson, AR, USA.
3
Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, Jefferson, AR, USA. weida.tong@fda.hhs.gov.

Abstract

BACKGROUND:

All biological processes are inherently dynamic. Biological systems evolve transiently or sustainably according to sequential time points after perturbation by environment insults, drugs and chemicals. Investigating the temporal behavior of molecular events has been an important subject to understand the underlying mechanisms governing the biological system in response to, such as, drug treatment. The intrinsic complexity of time series data requires appropriate computational algorithms for data interpretation. In this study, we propose, for the first time, the application of dynamic topic models (DTM) for analyzing time-series gene expression data.

RESULTS:

A large time-series toxicogenomics dataset was studied. It contains over 3144 microarrays of gene expression data corresponding to rat livers treated with 131 compounds (most are drugs) at two doses (control and high dose) in a repeated schedule containing four separate time points (4-, 8-, 15- and 29-day). We analyzed, with DTM, the topics (consisting of a set of genes) and their biological interpretations over these four time points. We identified hidden patterns embedded in this time-series gene expression profiles. From the topic distribution for compound-time condition, a number of drugs were successfully clustered by their shared mode-of-action such as PPARɑ agonists and COX inhibitors. The biological meaning underlying each topic was interpreted using diverse sources of information such as functional analysis of the pathways and therapeutic uses of the drugs. Additionally, we found that sample clusters produced by DTM are much more coherent in terms of functional categories when compared to traditional clustering algorithms.

CONCLUSIONS:

We demonstrated that DTM, a text mining technique, can be a powerful computational approach for clustering time-series gene expression profiles with the probabilistic representation of their dynamic features along sequential time frames. The method offers an alternative way for uncovering hidden patterns embedded in time series gene expression profiles to gain enhanced understanding of dynamic behavior of gene regulation in the biological system.

KEYWORDS:

Clustering; Dynamic topic model (DTM); Latent Dirichlet model; TG-GATEs; Times-series gene expression; Topic modeling; Toxicogenomics

PMID:
27766956
PMCID:
PMC5073961
DOI:
10.1186/s12859-016-1225-0
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center