Format

Send to

Choose Destination
Nat Commun. 2019 Mar 4;10(1):1025. doi: 10.1038/s41467-019-08992-7.

Genome maps across 26 human populations reveal population-specific patterns of structural variation.

Author information

1
Cardiovascular Research Institute, University of California-San Francisco, San Francisco, CA, 94143, USA.
2
School of Biomedical Engineering, Drexel University, Philadelphia, PA, 19104, USA.
3
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China.
4
School of Life Sciences and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Hong Kong SAR, China.
5
Bionano Genomics, San Diego, CA, 92121, USA.
6
Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Hong Kong SAR, China.
7
Institute of Molecular Medicine and Infectious Disease in the School of Medicine, Drexel University, Philadelphia, PA, 19104, USA.
8
Cardiovascular Research Institute, University of California-San Francisco, San Francisco, CA, 94143, USA. pui.kwok@ucsf.edu.
9
Department of Dermatology, University of California-San Francisco, San Francisco, CA, 94143, USA. pui.kwok@ucsf.edu.
10
Institute for Human Genetics, University of California-San Francisco, San Francisco, CA, 94143, USA. pui.kwok@ucsf.edu.

Abstract

Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.

PMID:
30833565
PMCID:
PMC6399254
DOI:
10.1038/s41467-019-08992-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center