Format

Send to

Choose Destination
Front Plant Sci. 2017 Feb 14;8:184. doi: 10.3389/fpls.2017.00184. eCollection 2017.

Analysis of Plant Pan-Genomes and Transcriptomes with GET_HOMOLOGUES-EST, a Clustering Solution for Sequences of the Same Species.

Author information

1
Estación Experimental de Aula Dei - Consejo Superior de Investigaciones CientíficasZaragoza, Spain; Fundación ARAIDZaragoza, Spain.
2
Estación Experimental de Aula Dei - Consejo Superior de Investigaciones Científicas Zaragoza, Spain.
3
DOE Joint Genome Institute, Walnut Creek CA, USA.
4
Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México Cuernavaca, Mexico.

Abstract

The pan-genome of a species is defined as the union of all the genes and non-coding sequences found in all its individuals. However, constructing a pan-genome for plants with large genomes is daunting both in sequencing cost and the scale of the required computational analysis. A more affordable alternative is to focus on the genic repertoire by using transcriptomic data. Here, the software GET_HOMOLOGUES-EST was benchmarked with genomic and RNA-seq data of 19 Arabidopsis thaliana ecotypes and then applied to the analysis of transcripts from 16 Hordeum vulgare genotypes. The goal was to sample their pan-genomes and classify sequences as core, if detected in all accessions, or accessory, when absent in some of them. The resulting sequence clusters were used to simulate pan-genome growth, and to compile Average Nucleotide Identity matrices that summarize intra-species variation. Although transcripts were found to under-estimate pan-genome size by at least 10%, we concluded that clusters of expressed sequences can recapitulate phylogeny and reproduce two properties observed in A. thaliana gene models: accessory loci show lower expression and higher non-synonymous substitution rates than core genes. Finally, accessory sequences were observed to preferentially encode transposon components in both species, plus disease resistance genes in cultivated barleys, and a variety of protein domains from other families that appear frequently associated with presence/absence variation in the literature. These results demonstrate that pan-genome analyses are useful to explore germplasm diversity.

KEYWORDS:

Arabidopsis thaliana; RNA-seq; accessory genome; barley; comparative genomics; core-genome; pan-genome

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center