Format

Send to

Choose Destination
G3 (Bethesda). 2016 Nov 8;6(11):3485-3495. doi: 10.1534/g3.116.030411.

First Draft Assembly and Annotation of the Genome of a California Endemic Oak Quercus lobata Née (Fagaceae).

Author information

1
Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095 vlsork@ucla.edu.
2
Institute of the Environment and Sustainability, University of California, Los Angeles, California 90095.
3
Institute of Genomics and Proteomics, University of California, Los Angeles, California 90095.
4
Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205.
5
Department of Evolution and Ecology, University of California, Davis, California 95616.
6
Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California 90095.
7
University of Maryland Center for Environmental Science, Appalachian Laboratory, Frostburg, Maryland 21532.
8
Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218.
9
Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, California 90095.
10
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218.
11
Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland 21218.

Abstract

Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720-730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37-52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices.

KEYWORDS:

GenPred; Genomic Selection; Quercus; Shared Data Resources; adaptation; annotation; chloroplast; nuclear genome assembly

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center