MzJava: An open source library for mass spectrometry data processing

J Proteomics. 2015 Nov 3:129:63-70. doi: 10.1016/j.jprot.2015.06.013. Epub 2015 Jun 30.

Abstract

Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics.

Keywords: Glycomics; Hadoop; Java; Mass spectrometry; Proteomics; Spark.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Database Management Systems
  • Databases, Protein*
  • Information Storage and Retrieval / methods*
  • Mass Spectrometry / methods*
  • Molecular Sequence Data
  • Peptide Mapping / methods
  • Programming Languages*
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods
  • User-Computer Interface*

Substances

  • Proteins