Format

Send to

Choose Destination
Bioinformatics. 2015 Aug 1;31(15):2434-42. doi: 10.1093/bioinformatics/btv168. Epub 2015 Mar 24.

DISSCO: direct imputation of summary statistics allowing covariates.

Author information

1
Department of Biostatistics, Department of Genetics, Department of Computer Science.
2
Department of Genetics, Curriculum in Bioinformatics and Computational Biology, Department of Statistics, University of North Carolina, Chapel Hill, NC 27599, USA.
3
Division of Pediatric Pulmonary Medicine, Allergy and Immunology, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, University of Pittsburgh School of Medicine, Department of Biostatistics, Department of Human Genetics, University of Pittsburgh School of Public Health, Pittsburgh, PA 15224, USA and.
4
Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA.
5
Department of Biostatistics, Department of Genetics.

Abstract

BACKGROUND:

Imputation of individual level genotypes at untyped markers using an external reference panel of genotyped or sequenced individuals has become standard practice in genetic association studies. Direct imputation of summary statistics can also be valuable, for example in meta-analyses where individual level genotype data are not available. Two methods (DIST and ImpG-Summary/LD), that assume a multivariate Gaussian distribution for the association summary statistics, have been proposed for imputing association summary statistics. However, both methods assume that the correlations between association summary statistics are the same as the correlations between the corresponding genotypes. This assumption can be violated in the presence of confounding covariates.

METHODS:

We analytically show that in the absence of covariates, correlation among association summary statistics is indeed the same as that among the corresponding genotypes, thus serving as a theoretical justification for the recently proposed methods. We continue to prove that in the presence of covariates, correlation among association summary statistics becomes the partial correlation of the corresponding genotypes controlling for covariates. We therefore develop direct imputation of summary statistics allowing covariates (DISSCO).

RESULTS:

We consider two real-life scenarios where the correlation and partial correlation likely make practical difference: (i) association studies in admixed populations; (ii) association studies in presence of other confounding covariate(s). Application of DISSCO to real datasets under both scenarios shows at least comparable, if not better, performance compared with existing correlation-based methods, particularly for lower frequency variants. For example, DISSCO can reduce the absolute deviation from the truth by 3.9-15.2% for variants with minor allele frequency <5%.

PMID:
25810429
PMCID:
PMC4514926
DOI:
10.1093/bioinformatics/btv168
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center