Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method

Lei Du; Heng Huang; Jingwen Yan; Sungeun Kim; Shannon L Risacher; Mark Inlow; Jason H Moore; Andrew J Saykin; Li Shen; Alzheimer’s Disease Neuroimaging Initiative

doi:10.1093/bioinformatics/btw033

Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method

Bioinformatics. 2016 May 15;32(10):1544-51. doi: 10.1093/bioinformatics/btw033. Epub 2016 Jan 21.

Authors

Lei Du¹, Heng Huang², Jingwen Yan¹, Sungeun Kim¹, Shannon L Risacher¹, Mark Inlow³, Jason H Moore⁴, Andrew J Saykin¹, Li Shen¹; Alzheimer’s Disease Neuroimaging Initiative

Affiliations

¹ Department of Radiology and Imaging Sciences, Indiana University, Indianapolis, IN, USA.
² Department of Computer Science & Engineering, The University of Texas at Arlington, Arlington, TX, USA.
³ Department of Mathematics, Rose-Hulman Institute of Technology, Terre Haute, IN, USA and.
⁴ Institute for Biomedical Informatics, School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Abstract

Motivation: Structured sparse canonical correlation analysis (SCCA) models have been used to identify imaging genetic associations. These models either use group lasso or graph-guided fused lasso to conduct feature selection and feature grouping simultaneously. The group lasso based methods require prior knowledge to define the groups, which limits the capability when prior knowledge is incomplete or unavailable. The graph-guided methods overcome this drawback by using the sample correlation to define the constraint. However, they are sensitive to the sign of the sample correlation, which could introduce undesirable bias if the sign is wrongly estimated.

Results: We introduce a novel SCCA model with a new penalty, and develop an efficient optimization algorithm. Our method has a strong upper bound for the grouping effect for both positively and negatively correlated features. We show that our method performs better than or equally to three competing SCCA models on both synthetic and real data. In particular, our method identifies stronger canonical correlations and better canonical loading patterns, showing its promise for revealing interesting imaging genetic associations.

Availability and implementation: The Matlab code and sample data are freely available at http://www.iu.edu/∼shenlab/tools/angscca/

Contact: shenli@iu.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Algorithms*
Brain*
Humans
Neuroimaging / methods*

Abstract

MeSH terms

Grants and funding