Send to

Choose Destination
Gigascience. 2015 Jan 14;4:1. doi: 10.1186/2047-217X-4-1. eCollection 2015.

SmileFinder: a resampling-based approach to evaluate signatures of selection from genome-wide sets of matching allele frequency data in two or more diploid populations.

Author information

Biology Department, University of Puerto Rico at Mayagüez, Mayagüez, 00680 Puerto Rico.
Department of Animal Sciences, University of Illinois at Urbana-Champaign, Urbana, 61801 Illinois USA.
Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg University, St. Petersburg, 199034 Russia ; Oceanographic Center, Nova Southeastern University, Ft. Lauderdale, 33004 Florida USA.
Biology Department, University of Puerto Rico at Rio Piedras, Rio Piedras, 00931 Puerto Rico.



Adaptive alleles may rise in frequency as a consequence of positive selection, creating a pattern of decreased variation in the neighboring loci, known as a selective sweep. When the region containing this pattern is compared to another population with no history of selection, a rise in variance of allele frequencies between populations is observed. One challenge presented by large genome-wide datasets is the ability to differentiate between patterns that are remnants of natural selection from those expected to arise at random and/or as a consequence of selectively neutral demographic forces acting in the population.


SmileFinder is a simple program that looks for diversity and divergence patterns consistent with selection sweeps by evaluating allele frequencies in windows, including neighboring loci from two or more populations of a diploid species against the genome-wide neutral expectation. The program calculates the mean of heterozygosity and FST in a set of sliding windows of incrementally increasing sizes, and then builds a resampled distribution (the baseline) of random multi-locus sets matched to the sizes of sliding windows, using an unrestricted sampling. Percentiles of the values in the sliding windows are derived from the superimposed resampled distribution. The resampling can easily be scaled from 1 K to 100 M; the higher the number, the more precise the percentiles ascribed to the extreme observed values.


The output from SmileFinder can be used to plot percentile values to look for population diversity and divergence patterns that may suggest past actions of positive selection along chromosome maps, and to compare lists of suspected candidate genes under random gene sets to test for the overrepresentation of these patterns among gene categories. Both applications of the algorithm have already been used in published studies. Here we present a publicly available, open source program that will serve as a useful tool for preliminary scans of selection using worldwide databases of human genetic variation, as well as population datasets for many non-human species, from which such data is rapidly emerging with the advent of new genotyping and sequencing technologies.


Evolution; Galaxy; Genome; Population; Resampling; Selection

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center