Bayesian analysis of case control polygenic etiology studies with missing data

Biostatistics. 2001 Sep;2(3):309-22. doi: 10.1093/biostatistics/2.3.309.

Abstract

Many genetic studies are based on analysing multiple DNA regions of cases and controls. Usually each is tested separately for association with disease. However, some diseases may require interacting polymorphisms at several regions, and most disease susceptibility is polygenic. In this paper, we develop new methods for determining combinations of polymorphisms that affect the risk of disease. For example, two different genes might produce normal proteins, but these proteins improperly function when they occur together. We consider a Bayesian approach to analyse studies where DNA data from cases and controls have been analysed for polymorphisms at multiple regions and a polygenic etiology is suspected. The method of Gibbs sampling is used to incorporate data from individuals who have not had every region analysed at the DNA sequence or amino acid level. The Gibbs sampling algorithm alternatively generates a sample from the posterior distribution of the sequence of combinations of polymorphisms in cases and controls and then uses this sample to impute the data that are missing. After convergence the algorithm is used to generate a sample from the posterior distribution for the probability of each combination in order to identify groups of polymorphisms that best discriminate cases from controls. We apply the methods to a genetic study of type I diabetes. The protein encoded by the TAP2 gene is important in T cell function, and thus may affect the development of autoimmune diseases such as insulin dependent diabetes mellitus (IDDM). We determine pairs of polymorphisms of genetic fragments in the coding regions of linked HLA genes that may impact the risk of IDDM.