Format

Send to

Choose Destination
BMC Genomics. 2019 Jan 17;20(1):56. doi: 10.1186/s12864-019-5431-9.

Peptimapper: proteogenomics workflow for the expert annotation of eukaryotic genomes.

Author information

1
Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France.
2
Protim, Univ Rennes, F-35042, Rennes cedex, France.
3
Sorbonne Université, UPMC, CNRS, UMR 8227, Integrative Biology of Marine Models, Biological Station, CS 90074, F-29688, Roscoff, France.
4
INRIA Grenoble-Rhône-Alpes, F-38330, Montbonnot-Saint-Martin, France.
5
University Grenoble Alpes, CEA, Inserm, BIG-BGE, 38000, Grenoble, France.
6
Present address: Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, F-75005, Paris, France.
7
University Rennes, Inria, CNRS, IRISA, F-35042, Rennes, France.
8
Univ Rennes, Inserm, EHESP, Irset (Institut de recherche en santé, environnement et travail) - UMR_S 1085, F-35042, Rennes cedex, France. charles.pineau@inserm.fr.
9
Protim, Univ Rennes, F-35042, Rennes cedex, France. charles.pineau@inserm.fr.

Abstract

BACKGROUND:

Accurate structural annotation of genomes is still a challenge, despite the progress made over the past decade. The prediction of gene structure remains difficult, especially for eukaryotic species, and is often erroneous and incomplete. We used a proteogenomics strategy, taking advantage of the combination of proteomics datasets and bioinformatics tools, to identify novel protein coding-genes and splice isoforms, assign correct start sites, and validate predicted exons and genes.

RESULTS:

Our proteogenomics workflow, Peptimapper, was applied to the genome annotation of Ectocarpus sp., a key reference genome for both the brown algal lineage and stramenopiles. We generated proteomics data from various life cycle stages of Ectocarpus sp. strains and sub-cellular fractions using a shotgun approach. First, we directly generated peptide sequence tags (PSTs) from the proteomics data. Second, we mapped PSTs onto the translated genomic sequence. Closely located hits (i.e., PSTs locations on the genome) were then clustered to detect potential coding regions based on parameters optimized for the organism. Third, we evaluated each cluster and compared it to gene predictions from existing conventional genome annotation approaches. Finally, we integrated cluster locations into GFF files to use a genome viewer. We identified two potential novel genes, a ribosomal protein L22 and an aryl sulfotransferase and corrected the gene structure of a dihydrolipoamide acetyltransferase. We experimentally validated the results by RT-PCR and using transcriptomics data.

CONCLUSIONS:

Peptimapper is a complementary tool for the expert annotation of genomes. It is suitable for any organism and is distributed through a Docker image available on two public bioinformatics docker repositories: Docker Hub and BioShaDock. This workflow is also accessible through the Galaxy framework and for use by non-computer scientists at https://galaxy.protim.eu . Data are available via ProteomeXchange under identifier PXD010618.

KEYWORDS:

Bioinformatics; Genome annotation; Peptide sequence tag; Proteogenomics; Proteomics; Tandem mass spectrometry

PMID:
30654742
PMCID:
PMC6337836
DOI:
10.1186/s12864-019-5431-9
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center