Format

Send to

Choose Destination
BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

Author information

1
Department of Pediatrics, University of California San Diego, 9500 Gilman Drive, La Jolla, 92093, CA, USA. vibansal@ucsd.edu.
2
Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA. vibansal@ucsd.edu.
3
Scripps Translational Science Institute, 3344 N Torrey Pines Court, La Jolla, 92037, CA, USA. ondrej@mdrevolution.com.
4
Current address: MD Revolution, San Diego, CA, USA. ondrej@mdrevolution.com.

Abstract

BACKGROUND:

Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads.

RESULTS:

We describe a fast method for estimating the relative contribution of known reference populations to an individual's genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling.

CONCLUSIONS:

Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix .

PMID:
25592880
PMCID:
PMC4301802
DOI:
10.1186/s12859-014-0418-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center