Format

Send to

Choose Destination
Bioinformatics. 2015 Nov 15;31(22):3577-83. doi: 10.1093/bioinformatics/btv457. Epub 2015 Aug 6.

Strategies to improve the performance of rare variant association studies by optimizing the selection of controls.

Author information

1
Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany.
2
Institute for Statistics, University of Bremen, 28344 Bremen, Germany.
3
Berlin-Brandenburg Center for Regenerative Therapies (BCRT), 13353 Berlin, Germany.
4
Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany and.
5
GeneTalk, 13189 Berlin, Germany.

Abstract

MOTIVATION:

When analyzing a case group of patients with ultra-rare disorders the ethnicities are often diverse and the data quality might vary. The population substructure in the case group as well as the heterogeneous data quality can cause substantial inflation of test statistics and result in spurious associations in case-control studies if not properly adjusted for. Existing techniques to correct for confounding effects were especially developed for common variants and are not applicable to rare variants.

RESULTS:

We analyzed strategies to select suitable controls for cases that are based on similarity metrics that vary in their weighting schemes. We simulated different disease entities on real exome data and show that a similarity-based selection scheme can help to reduce false positive associations and to optimize the performance of the statistical tests. Especially when data quality as well as ethnicities vary a lot in the case group, a matching approach that puts more weight on rare variants shows the best performance. We reanalyzed collections of unrelated patients with Kabuki make-up syndrome, Hyperphosphatasia with Mental Retardation syndrome and Catel-Manzke syndrome for which the disease genes were recently described. We show that rare variant association tests are more sensitive and specific in identifying the disease gene than intersection filters and should thus be considered as a favorable approach in analyzing even small patient cohorts.

AVAILABILITY AND IMPLEMENTATION:

Datasets used in our analysis are available at ftp://ftp.1000genomes.ebi.ac.uk./vol1/ftp/

CONTACT:

: peter.krawitz@charite.de

SUPPLEMENTARY INFORMATION:

Supplementary data are available at Bioinformatics online.

PMID:
26249812
DOI:
10.1093/bioinformatics/btv457
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center