Finding the lost treasures in exome sequencing data

David C Samuels; Leng Han; Jiang Li; Sheng Quanghu; Travis A Clark; Yu Shyr; Yan Guo

doi:10.1016/j.tig.2013.07.006

Finding the lost treasures in exome sequencing data

Trends Genet. 2013 Oct;29(10):593-9. doi: 10.1016/j.tig.2013.07.006. Epub 2013 Aug 22.

Authors

David C Samuels¹, Leng Han, Jiang Li, Sheng Quanghu, Travis A Clark, Yu Shyr, Yan Guo

Affiliation

¹ Center for Human Genetics Research, Vanderbilt University, Nashville, TN, 37232, USA.

Abstract

Exome sequencing is one of the most cost-efficient sequencing approaches for conducting genome research on coding regions. However, significant portions of the reads obtained in exome sequencing come from outside of the designed target regions. These additional reads are generally ignored, potentially wasting an important source of genomic data. There are three major types of unintentionally sequenced read that can be found in exome sequencing data: reads in introns and intergenic regions, reads in the mitochondrial genome, and reads originating in viral genomes. All of these can be used for reliable data mining, extending the utility of exome sequencing. Large-scale exome sequencing data repositories, such as The Cancer Genome Atlas (TCGA), the 1000 Genomes Project, National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project, and The Sequence Reads Archive, provide researchers with excellent secondary data-mining opportunities to study genomic data beyond the intended target regions.

Keywords: exome capture; mitochondria; mtDNA copy number; unmapped read; virus; virus integration.

Publication types

Review

MeSH terms

DNA / genetics
Databases, Nucleic Acid*
Exome / genetics*
Genome, Mitochondrial
Humans
Polymorphism, Single Nucleotide
Sequence Analysis, DNA*

Substances

DNA

Abstract

Publication types

MeSH terms

Substances

Grants and funding