Linkage Disequilibrium Estimation in Low Coverage High-Throughput Sequencing Data

Genetics. 2018 Jun;209(2):389-400. doi: 10.1534/genetics.118.300831. Epub 2018 Mar 27.

Abstract

High-throughput sequencing methods that multiplex a large number of individuals have provided a cost-effective approach for discovering genome-wide genetic variation in large populations. These sequencing methods are increasingly being utilized in population genetic studies across a diverse range of species. Two side-effects of these methods, however, are (1) sequencing errors and (2) heterozygous genotypes called as homozygous due to only one allele at a particular locus being sequenced, which occurs when the sequencing depth is insufficient. Both of these errors have a profound effect on the estimation of linkage disequilibrium (LD) and, if not taken into account, lead to inaccurate estimates. We developed a new likelihood method, GUS-LD, to estimate pairwise linkage disequilibrium using low coverage sequencing data that accounts for undercalled heterozygous genotypes and sequencing errors. Our findings show that accurate estimates were obtained using GUS-LD, whereas underestimation of LD results if no adjustment is made for the errors.

Keywords: allelic dropout; genotyping-by-sequencing; linkage disequilibrium; low coverage; maximum likelihood.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Animals
  • Data Accuracy
  • Deer / genetics
  • Genome-Wide Association Study / methods*
  • Genome-Wide Association Study / standards
  • Genotype
  • Heterozygote
  • Linkage Disequilibrium*
  • Sequence Analysis, DNA / standards*