Format

Send to

Choose Destination
BMC Bioinformatics. 2017 Sep 29;18(1):429. doi: 10.1186/s12859-017-1838-y.

pulver: an R package for parallel ultra-rapid p-value computation for linear regression interaction terms.

Author information

1
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany. Sophie.molnos@helmholtz-muenchen.de.
2
Institute of Epidemiology II, Helmholtz Zentrum München, Neuherberg, Germany. Sophie.molnos@helmholtz-muenchen.de.
3
German Center for Diabetes Research (DZD), Neuherberg, Germany. Sophie.molnos@helmholtz-muenchen.de.
4
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany.
5
Institute of Epidemiology II, Helmholtz Zentrum München, Neuherberg, Germany.
6
German Center for Diabetes Research (DZD), Neuherberg, Germany.
7
Department of Medicine I, University Hospital Grosshadern, Ludwig-Maximilians-Universität, Munich, Germany.
8
Institute of Genetic Epidemiology, Helmholtz Zentrum München, Neuherberg, Germany.
9
Chair of Genetic Epidemiology, IBE, Faculty of Medicine, LMU Munich, Munich, Germany.
10
DZHK (German Centre for Cardiovascular Research), Partner Site Munich Heart Alliance, Munich, Germany.
11
Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Germany.
12
Institute of Human Genetics, Technische Universität München, Munich, Germany.
13
Genome Analysis Center, Helmholtz Zentrum München, Neuherberg, Germany.
14
Institute of Experimental Genetics, Technical University of Munich, Freising-Weihenstephan, Germany.
15
Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Neuherberg, Germany.
16
Department of Twins Research and Genetic Epidemiology, Kings College, London, UK.
17
Department of Biophysics and Physiology, Weill Cornell Medical College in Qatar, Doha, Qatar.
18
Institute of Computational Biology, Helmholtz Zentrum München, Neuherberg, Germany.
19
Department of Mathematics, Technische Universitat München, Garching, Germany.

Abstract

BACKGROUND:

Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables.

RESULTS:

We developed an R package called pulver to compute p-values for the interaction term in a very large number of linear regression models. Comparisons based on simulated data showed that pulver is much faster than the existing tools. This is achieved by using the correlation coefficient to test the null-hypothesis, which avoids the costly computation of inversions. Additional tricks are a rearrangement of the order, when iterating through the different "omics" layers, and implementing this algorithm in the fast programming language C++. Furthermore, we applied our algorithm to data from the German KORA study to investigate a real-world problem involving the interplay among DNA methylation, genetic variants, and metabolite levels.

CONCLUSIONS:

The pulver package is a convenient and rapid tool for screening huge numbers of linear regression models for significant interaction terms in arbitrary pairs of quantitative variables. pulver is written in R and C++, and can be downloaded freely from CRAN at https://cran.r-project.org/web/packages/pulver/ .

KEYWORDS:

Algorithm; Linear regression interaction term; SNP–CpG interaction; Software

PMID:
28962546
PMCID:
PMC5622569
DOI:
10.1186/s12859-017-1838-y
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center