Format

Send to

Choose Destination
Nucleic Acids Res. 2015 May 26;43(10):e68. doi: 10.1093/nar/gkv178. Epub 2015 Mar 27.

A thesaurus of genetic variation for interrogation of repetitive genomic regions.

Author information

1
Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria.
2
Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria tomasz.konopka@ludwig.ox.ac.uk.
3
Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria sebastian.nijman@ludwig.ox.ac.uk.

Abstract

Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.

PMID:
25820428
PMCID:
PMC4446415
DOI:
10.1093/nar/gkv178
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center