Format

Send to

Choose Destination
BMC Genomics. 2017 Mar 27;18(1):261. doi: 10.1186/s12864-017-3654-1.

Exploring structural variation and gene family architecture with De Novo assemblies of 15 Medicago genomes.

Author information

1
Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA.
2
Supercomputing Institute for Advanced Computational Research, University of Minnesota, Minneapolis, MN, USA.
3
National Center for Genome Resources, Santa Fe, NM, USA.
4
Department of Plant Biology, University of Minnesota, St. Paul, MN, USA.
5
Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, USA.
6
Science and Mathematics Faculty, Arizona State University, Mesa, AZ, USA.
7
J. Craig Venter Institute, Rockville, MD, USA.
8
Department of Plant Pathology, University of Minnesota, St. Paul, MN, USA. neviny@umn.edu.
9
Department of Plant Biology, University of Minnesota, St. Paul, MN, USA. neviny@umn.edu.

Abstract

BACKGROUND:

Previous studies exploring sequence variation in the model legume, Medicago truncatula, relied on mapping short reads to a single reference. However, read-mapping approaches are inadequate to examine large, diverse gene families or to probe variation in repeat-rich or highly divergent genome regions. De novo sequencing and assembly of M. truncatula genomes enables near-comprehensive discovery of structural variants (SVs), analysis of rapidly evolving gene families, and ultimately, construction of a pan-genome.

RESULTS:

Genome-wide synteny based on 15 de novo M. truncatula assemblies effectively detected different types of SVs indicating that as much as 22% of the genome is involved in large structural changes, altogether affecting 28% of gene models. A total of 63 million base pairs (Mbp) of novel sequence was discovered, expanding the reference genome space for Medicago by 16%. Pan-genome analysis revealed that 42% (180 Mbp) of genomic sequences is missing in one or more accession, while examination of de novo annotated genes identified 67% (50,700) of all ortholog groups as dispensable - estimates comparable to recent studies in rice, maize and soybean. Rapidly evolving gene families typically associated with biotic interactions and stress response were found to be enriched in the accession-specific gene pool. The nucleotide-binding site leucine-rich repeat (NBS-LRR) family, in particular, harbors the highest level of nucleotide diversity, large effect single nucleotide change, protein diversity, and presence/absence variation. However, the leucine-rich repeat (LRR) and heat shock gene families are disproportionately affected by large effect single nucleotide changes and even higher levels of copy number variation.

CONCLUSIONS:

Analysis of multiple M. truncatula genomes illustrates the value of de novo assemblies to discover and describe structural variation, something that is often under-estimated when using read-mapping approaches. Comparisons among the de novo assemblies also indicate that different large gene families differ in the architecture of their structural variation.

PMID:
28347275
PMCID:
PMC5369179
DOI:
10.1186/s12864-017-3654-1
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center