Assessing key decisions for transcriptomic data integration in biochemical networks

PLoS Comput Biol. 2019 Jul 19;15(7):e1007185. doi: 10.1371/journal.pcbi.1007185. eCollection 2019 Jul.

Abstract

To gain insights into complex biological processes, genome-scale data (e.g., RNA-Seq) are often overlaid on biochemical networks. However, many networks do not have a one-to-one relationship between genes and network edges, due to the existence of isozymes and protein complexes. Therefore, decisions must be made on how to overlay data onto networks. For example, for metabolic networks, these decisions include (1) how to integrate gene expression levels using gene-protein-reaction rules, (2) the approach used for selection of thresholds on expression data to consider the associated gene as "active", and (3) the order in which these steps are imposed. However, the influence of these decisions has not been systematically tested. We compared 20 decision combinations using a transcriptomic dataset across 32 tissues and showed that definition of which reaction may be considered as active (i.e., reactions of the genome-scale metabolic network with a non-zero expression level after overlaying the data) is mainly influenced by thresholding approach used. To determine the most appropriate decisions, we evaluated how these decisions impact the acquisition of tissue-specific active reaction lists that recapitulate organ-system tissue groups. These results will provide guidelines to improve data analyses with biochemical networks and facilitate the construction of context-specific metabolic models.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biochemical Phenomena
  • Computational Biology
  • Data Interpretation, Statistical
  • Decision Support Techniques
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / statistics & numerical data
  • Gene Regulatory Networks
  • Humans
  • Metabolic Networks and Pathways / genetics*
  • Systems Biology