Format

Send to

Choose Destination
See comment in PubMed Commons below
Am J Hum Genet. 2015 Jun 4;96(6):926-37. doi: 10.1016/j.ajhg.2015.04.018. Epub 2015 May 28.

Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation.

Author information

1
Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore. Electronic address: wangcl@gis.a-star.edu.sg.
2
Quantitative Biomedical Research Center, Department of Clinical Sciences, Center for the Genetics of Host Defense, UT Southwestern Medical Center, Dallas, TX 75235, USA.
3
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA.
4
Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.
5
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA.

Abstract

Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.

PMID:
26027497
PMCID:
PMC4457959
DOI:
10.1016/j.ajhg.2015.04.018
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science Icon for PubMed Central
    Loading ...
    Support Center