Send to

Choose Destination
Proteomics. 2015 Oct;15(20):3553-65. doi: 10.1002/pmic.201500074. Epub 2015 Jul 24.

Metaproteomic analysis using the Galaxy framework.

Author information

Center for Mass Spectrometry and Proteomics, University of Minnesota, Minneapolis, MN, USA.
Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA.
Hamline University, St. Paul, MN, USA.
Carleton College, Northfield, MN, USA.
Minnesota Supercomputing Institute, Minneapolis, MN, USA.
School of Dentistry, University of Minnesota, Minneapolis, MN, USA.


Metaproteomics characterizes proteins expressed by microorganism communities (microbiome) present in environmental samples or a host organism (e.g. human), revealing insights into the molecular functions conferred by these communities. Compared to conventional proteomics, metaproteomics presents unique data analysis challenges, including the use of large protein databases derived from hundreds or thousands of organisms, as well as numerous processing steps to ensure high data quality. These challenges limit the use of metaproteomics for many researchers. In response, we have developed an accessible and flexible metaproteomics workflow within the Galaxy bioinformatics framework. Via analysis of human oral tissue exudate samples, we have established a modular Galaxy-based workflow that automates a reduction method for searching large sequence databases, enabling comprehensive identification of host proteins (human) as well as "meta-proteins" from the nonhost organisms. Downstream, automated processing steps enable basic local alignment search tool analysis and evaluation/visualization of peptide sequence match quality, maximizing confidence in results. Outputted results are compatible with tools for taxonomic and functional characterization (e.g. Unipept, MEGAN5). Galaxy also allows for the sharing of complete workflows with others, promoting reproducibility and also providing a template for further modification and enhancement. Our results provide a blueprint for establishing Galaxy as a solution for metaproteomic data analysis. All MS data have been deposited in the ProteomeXchange with identifier PXD001655 (


Bioinformatics; Customized database generation; Mass spectrometry; Metaproteomics; Peptide sequence match; Sequence database search

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center