Send to

Choose Destination
Science. 2018 Nov 9;362(6415):690-694. doi: 10.1126/science.aau4832. Epub 2018 Oct 11.

Identity inference of genomic data using long-range familial searches.

Erlich Y1,2,3,4, Shor T5, Pe'er I2,3, Carmi S6.

Author information

MyHeritage, Or Yehuda 6037606, Israel.
Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY, USA.
Center for Computational Biology and Bioinformatics (C2B2), Department of Systems Biology, Columbia University, New York, NY, USA.
New York Genome Center, New York, NY, USA.
MyHeritage, Or Yehuda 6037606, Israel.
Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem, Israel.


Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European descent will result in a third-cousin or closer match, which theoretically allows their identification using demographic identifiers. Moreover, the technique could implicate nearly any U.S. individual of European descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. On the basis of these results, we propose a potential mitigation strategy and policy implications for human subject research.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center