Send to

Choose Destination
Am J Hum Genet. 2015 Jun 4;96(6):926-37. doi: 10.1016/j.ajhg.2015.04.018. Epub 2015 May 28.

Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation.

Author information

Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore. Electronic address:
Quantitative Biomedical Research Center, Department of Clinical Sciences, Center for the Genetics of Host Defense, UT Southwestern Medical Center, Dallas, TX 75235, USA.
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA; Department of Epidemiology, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA.
Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA.
Department of Biostatistics, T.H. Chan School of Public Health, Harvard University, Boston, MA 02115, USA.


Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center