Format

Send to

Choose Destination
Bioinformatics. 2014 Sep 1;30(17):i461-7. doi: 10.1093/bioinformatics/btu455.

Stronger findings for metabolomics through Bayesian modeling of multiple peaks and compound correlations.

Author information

1
Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, FI-00076 Espoo, Finland, School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.
2
Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, FI-00076 Espoo, Finland, School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland Helsinki Institute for Information Technology HIIT, Department of Information and Computer Science, Aalto University, FI-00076 Espoo, Finland, School of Computing Science, University of Glasgow, Glasgow G12 8QQ, UK and Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland.

Abstract

MOTIVATION:

Data analysis for metabolomics suffers from uncertainty because of the noisy measurement technology and the small sample size of experiments. Noise and the small sample size lead to a high probability of false findings. Further, individual compounds have natural variation between samples, which in many cases renders them unreliable as biomarkers. However, the levels of similar compounds are typically highly correlated, which is a phenomenon that we model in this work.

RESULTS:

We propose a hierarchical Bayesian model for inferring differences between groups of samples more accurately in metabolomic studies, where the observed compounds are collinear. We discover that the method decreases the error of weak and non-existent covariate effects, and thereby reduces false-positive findings. To achieve this, the method makes use of the mass spectral peak data by clustering similar peaks into latent compounds, and by further clustering latent compounds into groups that respond in a coherent way to the experimental covariates. We demonstrate the method with three simulated studies and validate it with a metabolomic benchmark dataset.

AVAILABILITY AND IMPLEMENTATION:

An implementation in R is available at http://research.ics.aalto.fi/mi/software/peakANOVA/.

PMID:
25161234
PMCID:
PMC4147908
DOI:
10.1093/bioinformatics/btu455
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center