Format

Send to

Choose Destination
Nat Commun. 2016 Jun 30;7:12065. doi: 10.1038/ncomms12065.

Long-read sequencing and de novo assembly of a Chinese genome.

Author information

1
Guangdong-Hongkong-Macau Institute of CNS Regeneration, Jinan University, Guangzhou 510632, China.
2
Ministry of Education Joint International Research Laboratory of CNS Regeneration, Jinan University, Guangzhou 510632, China.
3
Co-innovation Center of Neuroregeneration, Nantong University, Nantong 226001, China.
4
Zilkha Neurogenetic Institute, University of Southern California, Los Angeles, California 90089, USA.
5
Department of Genome Sciences, Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.
6
Genetic, Molecular, and Cellular Biology Program, Keck School of Medicine, University of Southern California, Los Angeles, California 90089, USA.
7
Wuhan Institute of Biotechnology, Wuhan 430000, China.
8
Department of Pediatrics, The Ohio State University, and The Research Institute at Nationwide Children's Hospital, Columbus, Ohio 43205, USA.
9
Nextomics Biosciences, Wuhan 430000, China.
10
School of Chemical Engineering and Pharmacy, Wuhan Institute of Technology, Wuhan 430000, China.
11
Center for Tissue Engineering and Regenerative Medicine, Union Hospital, Huazhong University of Science and Technology, Wuhan 430022, China.
12
Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, New York, New York 11797, USA.
13
USDA/ARS Children's Nutrition Research Center, Department of Pediatrics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA.
14
Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, New York 10032, USA.
15
Department of Psychiatry &Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, California 90033, USA.
16
National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, Maryland 20894, USA.
17
Department of Ophthalmology, The University of Hong Kong, Hong Kong, China.
18
State Key Laboratory of Brain and Cognitive Sciences, The University of Hong Kong, Hong Kong, China.

Abstract

Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

PMID:
27356984
PMCID:
PMC4931320
DOI:
10.1038/ncomms12065
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center