Format

Send to

Choose Destination
BMC Genomics. 2017 Jul 19;18(1):541. doi: 10.1186/s12864-017-3927-8.

Hybrid assembly with long and short reads improves discovery of gene family expansions.

Author information

1
J. Craig Venter Institute, 9714 Medical Center Drive, Rockville, MD, 20850, USA. jmiller@jcvi.org.
2
Department of Plant Biology, University of Minnesota, Saint Paul, MN, USA.
3
National Center for Genome Resources, Santa Fe, NM, USA.
4
Cold Spring Harbor Laboratory, Harbor, Cold Spring, NY, USA.
5
Stanford School of Medicine, Stanford, CA, USA.
6
National Human Genome Research Institute, Bethesda, MD, USA.
7
Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA.
8
Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA.
9
Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.
10
School of Integrative Plant Sciences, Plant Breeding and Genetics section, Cornell University, Ithaca, NY, 14850, USA.
11
Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD, USA.
12
Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA.

Abstract

BACKGROUND:

Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation.

METHODS:

We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation.

RESULTS:

Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies.

CONCLUSION:

Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.

KEYWORDS:

Genome assembly; Hybrid assembly pipeline; Medicago truncatula; Tandem repeats

PMID:
28724409
PMCID:
PMC5518131
DOI:
10.1186/s12864-017-3927-8
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center