Optimizing network propagation for multi-omics data integration

PLoS Comput Biol. 2021 Nov 11;17(11):e1009161. doi: 10.1371/journal.pcbi.1009161. eCollection 2021 Nov.

Abstract

Network propagation refers to a class of algorithms that integrate information from input data across connected nodes in a given network. These algorithms have wide applications in systems biology, protein function prediction, inferring condition-specifically altered sub-networks, and prioritizing disease genes. Despite the popularity of network propagation, there is a lack of comparative analyses of different algorithms on real data and little guidance on how to select and parameterize the various algorithms. Here, we address this problem by analyzing different combinations of network normalization and propagation methods and by demonstrating schemes for the identification of optimal parameter settings on real proteome and transcriptome data. Our work highlights the risk of a 'topology bias' caused by the incorrect use of network normalization approaches. Capitalizing on the fact that network propagation is a regularization approach, we show that minimizing the bias-variance tradeoff can be utilized for selecting optimal parameters. The application to real multi-omics data demonstrated that optimal parameters could also be obtained by either maximizing the agreement between different omics layers (e.g. proteome and transcriptome) or by maximizing the consistency between biological replicates. Furthermore, we exemplified the utility and robustness of network propagation on multi-omics datasets for identifying ageing-associated genes in brain and liver tissues of rats and for elucidating molecular mechanisms underlying prostate cancer progression. Overall, this work compares different network propagation approaches and it presents strategies for how to use network propagation algorithms to optimally address a specific research question at hand.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aging / genetics
  • Aging / metabolism
  • Algorithms*
  • Animals
  • Bias
  • Brain / metabolism
  • Computational Biology / methods*
  • Computational Biology / statistics & numerical data
  • Data Interpretation, Statistical
  • Disease Progression
  • Gene Expression Profiling / statistics & numerical data
  • Gene Regulatory Networks
  • Genomics / statistics & numerical data
  • Humans
  • Liver / metabolism
  • Male
  • Prostatic Neoplasms / etiology
  • Prostatic Neoplasms / genetics
  • Prostatic Neoplasms / metabolism
  • Protein Interaction Maps
  • Proteomics / statistics & numerical data
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism
  • Rats
  • Systems Biology

Substances

  • RNA, Messenger

Grants and funding

KC received funding by the German Federal Ministry for Education and Research (BMBF, PhosphoNetPPM 031A259). MC received funding by the German Federal Ministry for Education and Research (BMBF, SyBACol 0315893). RJ received funding by the German Research Foundation (DFG, CRC 1310). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.