Format

Send to

Choose Destination
BMC Genomics. 2018 Sep 4;19(1):651. doi: 10.1186/s12864-018-5040-z.

Linked read technology for assembling large complex and polyploid genomes.

Ott A1,2, Schnable JC3,4,5, Yeh CT1,4, Wu L6, Liu C6,7, Hu HC8,9,10, Dalgard CL8,9,11, Sarkar S6, Schnable PS12,13,14.

Author information

1
Department of Agronomy, Iowa State University, Ames, IA, 50011, USA.
2
Present address: Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA.
3
Department of Agriculture and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
4
Data2Bio LLC, 2079 Roy J Carver Co-Laboratory, 1111 WOI Rd, Ames, IA, 50011, USA.
5
Dryland Genetics LLC, 2073 Roy J Carver Co-Laboratory, 1111 WOI Rd, Ames, IA, 50011, USA.
6
Department of Mechanical Engineering, Iowa State University, Ames, IA, 50011, USA.
7
Present address: Department of Thermal Engineering, Tsinghua University, Beijing, 100084, China.
8
The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA.
9
Collaborative Health Initiative Research Program (CHIRP), Uniformed Services University School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA.
10
Present address: Qiagen Sciences Inc, 6951 Executive Way, Frederick, MD, 21703, USA.
11
Department of Anatomy, Physiology and Genetics, Uniformed Services University School of Medicine, Uniformed Services University of the Health Sciences, Bethesda, MD, 20814, USA.
12
Department of Agronomy, Iowa State University, Ames, IA, 50011, USA. schnable@iastate.edu.
13
Data2Bio LLC, 2079 Roy J Carver Co-Laboratory, 1111 WOI Rd, Ames, IA, 50011, USA. schnable@iastate.edu.
14
Dryland Genetics LLC, 2073 Roy J Carver Co-Laboratory, 1111 WOI Rd, Ames, IA, 50011, USA. schnable@iastate.edu.

Abstract

BACKGROUND:

Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly.

RESULTS:

Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with > 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb.

CONCLUSIONS:

Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes.

KEYWORDS:

Genome assembly; Long molecule sequencing; Polyploid assembly

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center