Send to

Choose Destination
Genome Biol Evol. 2018 Nov 1;10(11):2899-2905. doi: 10.1093/gbe/evy223.

Effect of Collapsed Duplications on Diversity Estimates: What to Expect.

Author information

Institute of Evolutionary Biology (Universitat Pompeu Fabra - CSIC), PRBB, Barcelona, Catalonia, Spain.
Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
Laboratoire de Biométrie et Biologie Évolutive UMR 5558, Université de Lyon, Université Lyon 1, CNRS, Villeurbanne, France.
National Institute for Bioinformatics (INB), Barcelona, Catalonia, Spain.
Centre for Genomic Regulation (CRG), Barcelona, Catalonia, Spain.


The study of segmental duplications (SDs) and copy-number variants (CNVs) is of great importance in the fields of genomics and evolution. However, SDs and CNVs are usually excluded from genome-wide scans for natural selection. Because of high identity between copies, SDs and CNVs that are not included in reference genomes are prone to be collapsed-that is, mistakenly aligned to the same region-when aligning sequence data from single individuals to the reference. Such collapsed duplications are additionally challenging because concerted evolution between duplications alters their site frequency spectrum and linkage disequilibrium patterns. To investigate the potential effect of collapsed duplications upon natural selection scans we obtained expectations for four summary statistics from simulations of duplications evolving under a range of interlocus gene conversion and crossover rates. We confirm that summary statistics traditionally used to detect the action of natural selection on DNA sequences cannot be applied to SDs and CNVs since in some cases values for known duplications mimic selective signatures. As a proof of concept of the pervasiveness of collapsed duplications, we analyzed data from the 1,000 Genomes Project. We find that, within regions identified as variable in copy number, diversity between individuals with the duplication is consistently higher than between individuals without the duplication. Furthermore, the frequency of single nucleotide variants (SNVs) deviating from Hardy-Weinberg Equilibrium is higher in individuals with the duplication, which strongly suggests that higher diversity is a consequence of collapsed duplications and incorrect evaluation of SNVs within these CNV regions.

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center