MicroHapDB: A Portable and Extensible Database of All Published Microhaplotype Marker and Frequency Data

Front Genet. 2020 Aug 7:11:781. doi: 10.3389/fgene.2020.00781. eCollection 2020.

Abstract

Microhaplotypes are the subject of significant interest in the forensics community as a promising multi-purpose forensic DNA marker for human identification. Microhaplotype markers are composed of multiple SNPs in close proximity, such that a single NGS read can simultaneously genotype the individual SNPs and phase them in aggregate to determine the associated donor haplotype. Abundant throughout the human genome, numerous recent studies have sought to discover and rank microhaplotype markers according to allelic diversity within and among populations. Microhaplotypes provide an appealing alternative to STR markers for human identification and mixture deconvolution, but can also be optimized for ancestry inference or combined with phenotype SNPs for prediction of externally visible characteristics in a multiplex NGS assay. Designing and evaluating panels of microhaplotypes is complicated by the lack of a convenient database of all published data, as well as the lack of population allele frequency data spanning disparate marker collections. We present MicroHapDB, a comprehensive database of published microhaplotype marker and frequency data, as a tool to advance the development of microhaplotype-based human forensics capabilities. We also present population allele frequencies derived from 26 global population samples for all microhaplotype markers published to date, facilitating the design and interpretation of custom multi-source panels. We submit MicroHapDB as a resource for community members engaged in marker discovery, population studies, assay development, and panel and kit design.

Keywords: Python; bioinformatics; database; forensics; human identification; microhaplotype; next generation sequencing.