Format

Send to

Choose Destination
BMC Genomics. 2015 Feb 28;16:143. doi: 10.1186/s12864-015-1333-7.

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes.

Challis D1,2,3, Antunes L4,5,6, Garrison E7, Banks E8, Evani US9,10,11, Muzny D12,13, Poplin R14, Gibbs RA15,16, Marth G17,18, Yu F19,20,21.

Author information

1
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. dannychallis@gmail.com.
2
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. dannychallis@gmail.com.
3
Present address: Monsanto Company, Ankeny, IA, 50021, USA. dannychallis@gmail.com.
4
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. lilian.antunes@gmail.com.
5
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. lilian.antunes@gmail.com.
6
Present address: Washington University School of Medicine, Saint Louis, MO, 63110, USA. lilian.antunes@gmail.com.
7
Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. erik.garrison@bc.edu.
8
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA. ebanks@broadinstitute.org.
9
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. udayevani@gmail.com.
10
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. udayevani@gmail.com.
11
Present address: New York Genome Center, New York, NY, 10013, USA. udayevani@gmail.com.
12
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. donnam@bcm.edu.
13
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. donnam@bcm.edu.
14
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, 02142, USA. rpoplin@broadinstitute.org.
15
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. agibbs@bcm.edu.
16
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. agibbs@bcm.edu.
17
Department of Biology, Boston College, Wellcome Trust Sanger Institute, Chestnut Hill, MA, 02467, USA. marth@bc.edu.
18
Present address: Department of Human Genetics and Utah Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, 84112, USA. marth@bc.edu.
19
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA. fyu@bcm.edu.
20
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA. fyu@bcm.edu.
21
Institute of Neurology, Tianjin Medical University General Hospital, Tianjin, 300052, China. fyu@bcm.edu.

Abstract

BACKGROUND:

Identifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G), maximizing both the sensitivity and the specificity of the calls.

RESULTS:

This consensus exome INDEL call set features 7,210 INDELs, from 1,128 individuals across 13 populations included in the 1000 Genomes Phase 1 dataset, with a false discovery rate (FDR) of about 7.0%.

CONCLUSIONS:

In our study we further characterize the patterns and distributions of these exonic INDELs with respect to density, allele length, and site frequency spectrum, as well as the potential mutagenic mechanisms of coding INDELs in humans.

PMID:
25765891
PMCID:
PMC4352271
DOI:
10.1186/s12864-015-1333-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center