Detection, imputation, and association analysis of small deletions and null alleles on oligonucleotide arrays

Am J Hum Genet. 2008 Jun;82(6):1316-33. doi: 10.1016/j.ajhg.2008.05.008.

Abstract

Copy-number variation (CNV) is a major contributor to human genetic variation. Recently, CNV associations with human disease have been reported. Many genome-wide association (GWA) studies in complex diseases have been performed with sets of biallelic single-nucleotide polymorphisms (SNPs), but the available CNV methods are still limited. We present a new method (TriTyper) that can infer genotypes in case-control data sets for deletion CNVs, or SNPs with an extra, untyped allele at a high-resolution single SNP level. By accounting for linkage disequilibrium (LD), as well as intensity data, calling accuracy is improved. Analysis of 3102 unrelated individuals with European descent, genotyped with Illumina Infinium BeadChips, resulted in the identification of 1880 SNPs with a common untyped allele, and these SNPs are in strong LD with neighboring biallelic SNPs. Simulations indicate our method has superior power to detect associations compared to biallelic SNPs that are in LD with these SNPs, yet without increasing type I errors, as shown in a GWA analysis in celiac disease. Genotypes for 1204 triallelic SNPs could be fully imputed, with only biallelic-genotype calls, permitting association analysis of these SNPs in many published data sets. We estimate that 682 of the 1655 unique loci reflect deletions; this is on average 99 deletions per individual, four times greater than those detected by other methods. Whereas the identified loci are strongly enriched for known deletions, 61% have not been reported before. Genes overlapping with these loci more often have paralogs (p = 0.006) and biologically interact with fewer genes than expected (p = 0.004).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Alleles
  • Case-Control Studies
  • Celiac Disease / genetics
  • Databases, Genetic
  • Gene Deletion*
  • Gene Dosage*
  • Genetic Variation
  • Humans
  • Linkage Disequilibrium
  • Oligonucleotide Array Sequence Analysis / methods*
  • Polymorphism, Single Nucleotide