Format

Send to

Choose Destination
Nat Biotechnol. 2018 Apr;36(4):338-345. doi: 10.1038/nbt.4060. Epub 2018 Jan 29.

Nanopore sequencing and assembly of a human genome with ultra-long reads.

Author information

1
UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California, USA.
2
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, Maryland, USA.
3
Institute of Microbiology and Infection, University of Birmingham, Birmingham, UK.
4
Department of Human Genetics, University of Utah, Salt Lake City, Utah, USA.
5
USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah, USA.
6
Michael Smith Laboratories and Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada.
7
Surgical Research Laboratory, Institute of Cancer & Genomic Science, University of Birmingham, UK.
8
DeepSeq, School of Life Sciences, University of Nottingham, UK.
9
Norwich Medical School, University of East Anglia, Norwich, UK.
10
Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA.
11
Ontario Institute for Cancer Research, Toronto, Canada.
12
Department of Computer Science, University of Toronto, Toronto, Canada.

Abstract

We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ∼30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ∼3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.

PMID:
29431738
PMCID:
PMC5889714
DOI:
10.1038/nbt.4060
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center