Format

Send to

Choose Destination
Gigascience. 2017 Jan 1;6(1):1-4. doi: 10.1093/gigascience/giw016.

An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing.

Author information

1
Institute for Physical Sciences and Technology, University of Maryland, College Park, MD.
2
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD.
3
Department of Evolution and Ecology, University of California, Davis, CA.
4
Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT.
5
Department of Plant Sciences, University of California, Davis, CA.
6
Departments of Biomedical Engineering, Computer Science, and Biostatistics, Johns Hopkins University, Baltimore, MD.

Abstract

The 22-gigabase genome of loblolly pine (Pinus taeda) is one of the largest ever sequenced. The draft assembly published in 2014 was built entirely from short Illumina reads, with lengths ranging from 100 to 250 base pairs (bp). The assembly was quite fragmented, containing over 11 million contigs whose weighted average (N50) size was 8206 bp. To improve this result, we generated approximately 12-fold coverage in long reads using the Single Molecule Real Time sequencing technology developed at Pacific Biosciences. We assembled the long and short reads together using the MaSuRCA mega-reads assembly algorithm, which produced a substantially better assembly, P. taeda version 2.0. The new assembly has an N50 contig size of 25 361, more than three times as large as achieved in the original assembly, and an N50 scaffold size of 107 821, 61% larger than the previous assembly.

KEYWORDS:

Conifers; Genome assembly; Genomics; Next-gen sequencing; Pine genomes

PMID:
28369353
PMCID:
PMC5437942
DOI:
10.1093/gigascience/giw016
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center