Format

Send to

Choose Destination
Science. 2018 Apr 13;360(6385):171-175. doi: 10.1126/science.aam9309. Epub 2018 Mar 1.

Quantitative analysis of population-scale family trees with millions of relatives.

Kaplanis J1,2, Gordon A1,2, Shor T3,4, Weissbrod O5, Geiger D4, Wahl M1,2,6, Gershovits M2, Markus B2, Sheikh M2, Gymrek M1,2,7,8,9, Bhatia G10,11, MacArthur DG7,9,10, Price AL10,11,12, Erlich Y13,2,3,14,15.

Author information

1
New York Genome Center, New York, NY 10013, USA.
2
Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA.
3
MyHeritage, Or Yehuda 6037606, Israel.
4
Computer Science Department, Technion-Israel Institute of Technology, Haifa 3200003, Israel.
5
Computer Science Department, Weizmann Institute of Science, Rehovot 7610001, Israel.
6
Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138, USA.
7
Harvard Medical School, Boston, MA 02115, USA.
8
Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02142, USA.
9
Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
10
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
11
Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
12
Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
13
New York Genome Center, New York, NY 10013, USA. erlichya@gmail.com.
14
Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, NY, USA.
15
Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, NY, USA.

Abstract

Family trees have vast applications in fields as diverse as genetics, anthropology, and economics. However, the collection of extended family trees is tedious and usually relies on resources with limited geographical scope and complex data usage restrictions. We collected 86 million profiles from publicly available online data shared by genealogy enthusiasts. After extensive cleaning and validation, we obtained population-scale family trees, including a single pedigree of 13 million individuals. We leveraged the data to partition the genetic architecture of human longevity and to provide insights into the geographical dispersion of families. We also report a simple digital procedure to overlay other data sets with our resource.

Comment in

PMID:
29496957
DOI:
10.1126/science.aam9309
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center