Format

Send to

Choose Destination
Mass Spectrom Rev. 2017 Sep;36(5):584-599. doi: 10.1002/mas.21483. Epub 2015 Dec 15.

Proteogenomics from a bioinformatics angle: A growing field.

Author information

1
Department of Mathematical Modeling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Lab of Bioinformatics and Computational Genomics, Ghent University, Ghent, Belgium.
2
Department of Biochemistry and Molecular Pharmacology, Center for Health Informatics and Bioinformatics, New York University School of Medicine, New York, NY.

Abstract

Proteogenomics is a research area that combines areas as proteomics and genomics in a multi-omics setup using both mass spectrometry and high-throughput sequencing technologies. Currently, the main goals of the field are to aid genome annotation or to unravel the proteome complexity. Mass spectrometry based identifications of matching or homologues peptides can further refine gene models. Also, the identification of novel proteoforms is also made possible based on detection of novel translation initiation sites (cognate or near-cognate), novel transcript isoforms, sequence variation or novel (small) open reading frames in intergenic or un-translated genic regions by analyzing high-throughput sequencing data from RNAseq or ribosome profiling experiments. Other proteogenomics studies using a combination of proteomics and genomics techniques focus on antibody sequencing, the identification of immunogenic peptides or venom peptides. Over the years, a growing amount of bioinformatics tools and databases became available to help streamlining these cross-omics studies. Some of these solutions only help in specific steps of the proteogenomics studies, e.g. building custom sequence databases (based on next generation sequencing output) for mass spectrometry fragmentation spectrum matching. Over the last few years a handful integrative tools also became available that can execute complete proteogenomics analyses. Some of these are presented as stand-alone solutions, whereas others are implemented in a web-based framework such as Galaxy. In this review we aimed at sketching a comprehensive overview of all the bioinformatics solutions that are available for this growing research area.

KEYWORDS:

bioinformatics; gene annotation; mass spectrometry; next-generation sequencing; proteoform; proteogenomics; ribosome profiling

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center