Format

Send to

Choose Destination
Annu Rev Genomics Hum Genet. 2012;13:337-61. doi: 10.1146/annurev-genom-082410-101510. Epub 2012 Jun 11.

Population identification using genetic data.

Author information

1
Heilbronn Institute for Mathematical Research, School of Mathematics, University of Bristol, Bristol BS8 1TW, UK. dan.lawson@bristol.ac.uk

Abstract

A large number of algorithms have been developed to classify individuals into discrete populations using genetic data. Recent results show that the information used by both model-based clustering methods and principal components analysis can be summarized by a matrix of pairwise similarity measures between individuals. Similarity matrices have been constructed in a number of ways, usually treating markers as independent but differing in the weighting given to polymorphisms of different frequencies. Additionally, methods are now being developed that take linkage into account. We review several such matrices and evaluate their information content. A two-stage approach for population identification is to first construct a similarity matrix and then perform clustering. We review a range of common clustering algorithms and evaluate their performance through a simulation study. The clustering step can be performed either on the matrix or by first using a dimension-reduction technique; we find that the latter approach substantially improves the performance of most algorithms. Based on these results, we describe the population structure signal contained in each similarity matrix and find that accounting for linkage leads to significant improvements for sequence data. We also perform a comparison on real data, where we find that population genetics models outperform generic clustering approaches, particularly with regard to robustness for features such as relatedness between individuals.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center