Format

Send to

Choose Destination
G3 (Bethesda). 2017 Oct 5;7(10):3359-3377. doi: 10.1534/g3.117.300131.

CONE: Community Oriented Network Estimation Is a Versatile Framework for Inferring Population Structure in Large-Scale Sequencing Data.

Author information

1
Department of Mathematical Sciences, University of Oulu, FI-90014, Finland.
2
Swedish Defense Research Agency, SE-906 21 Umeå, Sweden.
3
Department of Mathematical Sciences, University of Oulu, FI-90014, Finland mikko.sillanpaa@oulu.fi.
4
Biocenter Oulu, FI-90014, Finland.

Abstract

Estimation of genetic population structure based on molecular markers is a common task in population genetics and ecology. We apply a generalized linear model with LASSO regularization to infer relationships between individuals and populations from molecular marker data. Specifically, we apply a neighborhood selection algorithm to infer population genetic structure and gene flow between populations. The resulting relationships are used to construct an individual-level population graph. Different network substructures known as communities are then dissociated from each other using a community detection algorithm. Inference of population structure using networks combines the good properties of: (i) network theory (broad collection of tools, including aesthetically pleasing visualization), (ii) principal component analysis (dimension reduction together with simple visual inspection), and (iii) model-based methods (e.g., ancestry coefficient estimates). We have named our process CONE (for community oriented network estimation). CONE has fewer restrictions than conventional assignment methods in that properties such as the number of subpopulations need not be fixed before the analysis and the sample may include close relatives or involve uneven sampling. Applying CONE on simulated data sets resulted in more accurate estimates of the true number of subpopulations than model-based methods, and provided comparable ancestry coefficient estimates. Inference of empirical data sets of teosinte single nucleotide polymorphism, bacterial disease outbreak, and the human genome diversity panel illustrate that population structures estimated with CONE are consistent with the earlier findings.

KEYWORDS:

community detection; graphical models; neighborhood selection; population genetic structure; population graph

PMID:
28830924
PMCID:
PMC5633386
DOI:
10.1534/g3.117.300131
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center