Format

Send to

Choose Destination

See 1 citation found by title matching your search:

Front Cell Dev Biol. 2014 Oct 27;2:62. doi: 10.3389/fcell.2014.00062. eCollection 2014.

Integrative analysis of multiple diverse omics datasets by sparse group multitask regression.

Author information

1
Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA.
2
Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA.
3
Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA.

Abstract

A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.

KEYWORDS:

group lasso; multitask learning; osteoporosis; significant test; sparse regression

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center