Analysis Name and Accession
Name: Genotype-Phenotype Associations in Alzheimer’s disease
Accession: pha002879.1

Analysis Description
GenADA is a multi-site collaborative study, involving GlaxoSmithKline Inc. and nine medical centers in Canada, to develop a dataset containing 1000 Alzheimer’s disease patients and 1000 ethnically-matched controls in order to associate DNA sequence (allelic) variations in candidate genes with Alzheimer’s disease phenotypes. The analysis presented here was performed with the specific goal of re-calling genotypes from the original Affymetrix CEL files using the CRLMM algorithm, and performing a general QC on the resulting data. Following QC filtering of samples and markers, Fisher’s Exact tests for association of nominal case-control AD status were performed using a genotypic model on all samples which passed QC, first using all markers and then using only passing markers. Odds ratio was calcuated based on the single-copy change of an testing allele. Two SNPs in the APOE gene were included with the GWAS data for these tests. The tests demonstrated little evidence of batch effects as shown in QQ plots. The second set of tests resulted in two markers with genome-wide significance. No further association testing was performed, and all of the analyzed data are provided as well as data from several samples excluded from analysis for QC reasons.
3290 individual CEL files (1646 NSP and 1644 STY) from an Affymetrix 500K scan were paired for 1628 subjects, with one STY file removed in favor of a replicate file with higher call rate for the same sample. The CRLMM algorithm was used to call genotypes on complete batches of NSP and STY files, so as not to introduce computational batch effects in the calling. Genotypes with a CRLMM confidence score less than 0.95 were set to "missing". 11 samples with a combined call rate less than 0.94 were excluded from further analysis. Five samples were removed due to apparent NSP/STY mismatches. Gender tests, based on both X heterozygosity and log-ratio intensities, removed 20 samples. Tests of cryptic relatedness resulted in removal of 20 samples. Analysis of population structure identified one sample of possible mixed ancestry, but not sufficient to warrant exclusion. After all sample-related tests, 51 samples of the original 1628 were excluded, resulting in 1577 samples used for analysis.

Autosomal SNPs with HWE p-value < 5e-07 in controls were excluded, as were as SNPs with both a MAF < 0.05 and call rate < 0.99. Remaining SNPs with call rate < 0.95 were excluded, along with any SNP without a valid map position. Two SNPs in the APOE gene were added to the analyzed set due to this gene’s reported association with Alzheimer’s disease. Following all marker QC, 74,539 markers were dropped from the original set of 500,570.

