Send to

Choose Destination
Genetics. 2017 Mar;205(3):1049-1062. doi: 10.1534/genetics.116.192377. Epub 2016 Dec 30.

A Robust and Powerful Set-Valued Approach to Rare Variant Association Analyses of Secondary Traits in Case-Control Sequencing Studies.

Author information

Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105
Department of Biostatistics, St. Jude Children's Research Hospital, Memphis, Tennessee 38105.
Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, People's Republic of China.
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, Texas 77030.
Department of Biostatistics, The University of North Carolina at Chapel Hill, North Carolina 27599.
Institute of Mental Health, Key Laboratory of Mental Health, Ministry of Health & National Clinical Research Center for Mental Disorders, Sixth Hospital, Peking University, Beijing 100191, People's Republic of China


In many case-control designs of genome-wide association (GWAS) or next generation sequencing (NGS) studies, extensive data on secondary traits that may correlate and share the common genetic variants with the primary disease are available. Investigating these secondary traits can provide critical insights into the disease etiology or pathology, and enhance the GWAS or NGS results. Methods based on logistic regression (LG) were developed for this purpose. However, for the identification of rare variants (RVs), certain inadequacies in the LG models and algorithmic instability can cause severely inflated type I error, and significant loss of power, when the two traits are correlated and the RV is associated with the disease, especially at stringent significance levels. To address this issue, we propose a novel set-valued (SV) method that models a binary trait by dichotomization of an underlying continuous variable, and incorporate this into the genetic association model as a critical component. Extensive simulations and an analysis of seven secondary traits in a GWAS of benign ethnic neutropenia show that the SV method consistently controls type I error well at stringent significance levels, has larger power than the LG-based methods, and is robust in performance to effect pattern of the genetic variant (risk or protective), rare or common variants, rare or common diseases, and trait distributions. Because of the SV method's striking and profound advantage, we strongly recommend the SV method be employed instead of the LG-based methods for secondary traits analyses in case-control sequencing studies.


case-control sequencing study; rare variants association analyses; secondary traits; set-valued model

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center