Format

Send to

Choose Destination
Proc Natl Acad Sci U S A. 2017 May 30;114(22):5671-5676. doi: 10.1073/pnas.1619944114. Epub 2017 May 15.

Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets.

Author information

1
Department of Biology, Stanford University, Stanford, CA 94305.
2
Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada R3E0J9.
3
Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109.
4
Department of Biology, Stanford University, Stanford, CA 94305; noahr@stanford.edu.

Abstract

Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching-the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people-one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications-we find that 90-98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99-100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers-including databases of forensic significance.

KEYWORDS:

forensic DNA; genomic privacy; imputation; population genetics; record matching

PMID:
28507140
PMCID:
PMC5465933
DOI:
10.1073/pnas.1619944114
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center