Format

Send to

Choose Destination
Nucleic Acids Res. 2015 Sep 3;43(15):7217-28. doi: 10.1093/nar/gkv677. Epub 2015 Jun 30.

The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection.

Author information

1
Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Center for Biomedical Informatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
2
Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada.
3
Centre for Computational Medicine, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Program in Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON, M5G 0A4, Canada Department of Computer Science, University of Toronto, Toronto, ON, M5S 3G4, Canada brudno@cs.toronto.edu.

Abstract

With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them.

PMID:
26130710
PMCID:
PMC4551921
DOI:
10.1093/nar/gkv677
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center