Format

Send to

Choose Destination
Sci Rep. 2016 Sep 16;6:33256. doi: 10.1038/srep33256.

Investigation of rare and low-frequency variants using high-throughput sequencing with pooled DNA samples.

Author information

1
Department of Biosciences and Nutrition, Karolinska Institutet, SE-14183 Huddinge, Sweden.
2
Science for Life Laboratory, Stockholm, Sweden.
3
Molecular Neurology Research Program, University of Helsinki and Folkhälsan Institute of Genetics, Helsinki, Finland.
4
Medical and Clinical Genetics, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.
5
Obstetrics and Gynecology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland.
6
Institute for Molecular Medicine Finland, University of Helsinki, Helsinki, Finland.
7
Department of Orthopedics, Karolinska University Hospital and Department of Clinical Sciences, Intervention and Technology (CLINTEC) Karolinska Institutet, Stockholm, Sweden.
8
Department of Orthopaedics, Sundsvall and Harnosand County Hospital, Sundsvall, Sweden.
9
Department of Veterinary Biosciences, and Research Programs Unit, Molecular Neurology, University of Helsinki and Folkhälsan Research Center, Helsinki, Finland.

Abstract

High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies.

PMID:
27633116
PMCID:
PMC5025741
DOI:
10.1038/srep33256
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center