Format

Send to

Choose Destination
Nat Genet. 2019 Jan;51(1):30-35. doi: 10.1038/s41588-018-0273-y. Epub 2018 Nov 19.

Assembly of a pan-genome from deep sequencing of 910 humans of African descent.

Author information

1
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. rsherman@jhu.edu.
2
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. rsherman@jhu.edu.
3
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA.
4
Departments of Computer Science, Biology, and Mathematics, Harvey Mudd College, Claremont, CA, USA.
5
Department of Medicine, University of Colorado Denver, Aurora, CO, USA.
6
Department of Medicine, Johns Hopkins University, Baltimore, MD, USA.
7
Department of Internal Medicine, Section on Pulmonary, Critical Care, Allergy and Immunologic Diseases, Center for Precision Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA.
8
Department of Public Health Sciences, Henry Ford Health System, Detroit, MI, USA.
9
Department of Medicine, University of California, San Francisco, San Francisco, CA, USA.
10
Department of Parasitology, Leiden University Medical Center, Leiden, The Netherlands.
11
Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA.
12
Institute for Immunological Research, Universidad de Cartagena, Cartagena, Colombia.
13
Department of Internal Medicine, Henry Ford Health System, Detroit, MI, USA.
14
Faculty of Medical Sciences Cave Hill Campus, The University of the West Indies, Bridgetown, Barbados.
15
Department of Medicine, Vanderbilt University, Nashville, TN, USA.
16
Department of Medicine and Center for Global Health, University of Chicago, Chicago, IL, USA.
17
Department of Medicine, University of Chicago, Chicago, IL, USA.
18
Laboratório de Patologia Experimental, Centro de Pesquisas Gonçalo Moniz, Salvador, Brazil.
19
Department of Human Genetics, University of Chicago, Chicago, IL, USA.
20
Department of Medicine, University of Arizona College of Medicine, Tucson, AZ, USA.
21
Centro de Neumologia y Alergias, San Pedro Sula, Honduras.
22
Caribbean Institute for Health Research, The University of the West Indies, Kingston, Jamaica.
23
Pulmonary and Critical Care Medicine, Morehouse School of Medicine, Atlanta, GA, USA.
24
Department of Medicine, Einstein Medical Center, Philadelphia, PA, USA.
25
National Human Genome Center, Howard University College of Medicine, Washington, DC, USA.
26
Department of Microbiology, Howard University College of Medicine, Washington, DC, USA.
27
Departments of Bioengineering & Therapeutic Sciences and Medicine, University of California, San Francisco, San Francisco, CA, USA.
28
Immunology Service, Universidade Federal da Bahia, Salvador, Brazil.
29
Facultad de Ciencias Médicas, Universidad Tecnológica Centroamericana (UNITEC), Tegucigalpa, Honduras.
30
Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
31
Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA.
32
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.
33
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
34
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
35
Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.
36
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA. salzberg@jhu.edu.

Abstract

We used a deeply sequenced dataset of 910 individuals, all of African descent, to construct a set of DNA sequences that is present in these individuals but missing from the reference human genome. We aligned 1.19 trillion reads from the 910 individuals to the reference genome (GRCh38), collected all reads that failed to align, and assembled these reads into contiguous sequences (contigs). We then compared all contigs to one another to identify a set of unique sequences representing regions of the African pan-genome missing from the reference genome. Our analysis revealed 296,485,284 bp in 125,715 distinct contigs present in the populations of African descent, demonstrating that the African pan-genome contains ~10% more DNA than the current human reference genome. Although the functional significance of nearly all of this sequence is unknown, 387 of the novel contigs fall within 315 distinct protein-coding genes, and the rest appear to be intergenic.

Comment in

PMID:
30455414
PMCID:
PMC6309586
DOI:
10.1038/s41588-018-0273-y
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center