Format

Send to

Choose Destination
Front Bioeng Biotechnol. 2017 Aug 28;5:48. doi: 10.3389/fbioe.2017.00048. eCollection 2017.

Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts.

Author information

1
Bioinformatics Program, University of Memphis, Memphis, TN, United States.
2
Center for Translational Informatics, University of Memphis, Memphis, TN, United States.
3
Computer and Information Sciences Program, Harrisburg University of Science and Technology, Harrisburg, PA, United States.
4
Department of Mathematical Sciences, University of Memphis, Memphis, TN, United States.
5
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, United States.
6
Center for Molecular Medicine and Therapeutics, University of British Columbia, Vancouver, BC, Canada.
7
Department of Biological Sciences, University of Memphis, Memphis, TN, United States.

Abstract

In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term-gene-TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene-TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene-TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs.

KEYWORDS:

applied multilinear algebra; biomedical text mining; multiway analysis; tensor decomposition; tensor factorization; transcription factors

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center