Logo of bioinfoLink to Publisher's site
Bioinformatics. Oct 15, 2010; 26(20): 2617–2619.
Published online Aug 25, 2010. doi:  10.1093/bioinformatics/btq475
PMCID: PMC2951089

BioRuby: bioinformatics software for the Ruby programming language

Abstract

Summary: The BioRuby software toolkit contains a comprehensive set of free development tools and libraries for bioinformatics and molecular biology, written in the Ruby programming language. BioRuby has components for sequence analysis, pathway analysis, protein modelling and phylogenetic analysis; it supports many widely used data formats and provides easy access to databases, external programs and public web services, including BLAST, KEGG, GenBank, MEDLINE and GO. BioRuby comes with a tutorial, documentation and an interactive environment, which can be used in the shell, and in the web browser.

Availability: BioRuby is free and open source software, made available under the Ruby license. BioRuby runs on all platforms that support Ruby, including Linux, Mac OS X and Windows. And, with JRuby, BioRuby runs on the Java Virtual Machine. The source code is available from http://www.bioruby.org/.

Contact: gro.yburoib@amayatak

1 INTRODUCTION

Research in molecular biology depends critically on access to databases and web services. The BioRuby project was conceived in 2000 to provide easy access to bioinformatics resources through free and open source tools and libraries for Ruby, a dynamic open source programming language with a focus on simplicity and productivity (www.ruby-lang.org).

The BioRuby software components cover a wide range of functionality that is comparable to that offered by other Bio* projects, each targeting a different computer programming language (Stajich and Lapp, 2006), such as BioPerl (Stajich et al., 2002), Biopython (Cock et al., 2009) and BioJava (Holland et al., 2008). BioRuby software components are written in standard Ruby, so they run on all operating systems that support Ruby itself, including Linux, OS X, FreeBSD, Solaris and Windows. With JRuby, BioRuby also can run inside a Java Virtual Machine (JVM), allowing interaction with Java applications and libraries, like Cytoscape for visualization (Shannon et al., 2003).

Both BioRuby and Ruby are used in bioinformatics for scripting (Aerts and Law, 2009), scripting against applications (Katoh et al., 2005), modelling (Lee and Blundell, 2009; Metlagel et al., 2007), analysis (Prince and Marcotte, 2008), visualization and service integration (Philippi, 2004). The web development framework ‘Ruby on Rails’ is used to create web applications and web services (Biegert et al., 2006; Jacobsen et al., 2010). BioRuby provides connection functionality for major web services, such as the Kyoto Encyclopedia of Genes and Genomes (KEGG; see example in Fig. 1) (Kanehisa et al., 2008), and the TogoWS service, which provides a uniform web service front-end for the major bioinformatics databases (Katayama et al., 2010).

Fig. 1.
BioRuby shell example of fetching a KEGG graph using BioRuby's KEGG API (Kanehisa et al., 2008). After installing BioRuby, the ‘bioruby’ command starts the interactive shell. With the bfind command, the KEGG MODULE database is queried ...

The BioRuby source tree contains over 580 documented classes, 2800 public methods and 20 000 unit test assertions. Source code is kept under Git version control, which allows anyone to clone the source tree and start submitting. We have found that Git substantially lowers the barrier for new people to start contributing to the project. In the last 2 years, the source tree has gained 100 people tracking changes and 32 people cloned the repository.

The BioRuby project is part of the Open Bioinformatics Foundation, which hosts the project website and mailing list, and organizes the annual Bioinformatics Open Source Conference together with the other Bio* projects. A number of BioRuby features support Bio* cross-project standards, such as the BioSQL relational model for interoperable storage of certain data objects or their implementation is coordinated across the Bio* projects, including support for the FASTQ (Cock et al., 2010) and phyloXML (Han and Zmasek, 2009) data exchange formats.

2 FEATURES

BioRuby covers a wide range of functional areas which have been logically divided into separate modules (Table 1).

Table 1.
BioRuby modules

BioRuby allows accessing a comprehensive range of public bioinformatics resources. For example, BioRuby supports the Open Biological Database Access (ODBA) as a generic and standardized way of accessing biological data sources. In addition, BioRuby can directly process local database files in a variety of different flat file formats, including FASTA, FASTQ (Fig. 2), GenBank and PDB. BioRuby also allows querying and accessing remote online resources through their interfaces for programmatic access, such as those provided by KEGG, the DNA Databank of Japan (DDBJ), the National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute (EBI).

Fig. 2.
BioRuby example of masking sequences from next generation sequencing data in FASTQ format using a defined quality_threshold, and writing the results in FASTA format.

BioRuby has online documentation, tutorials and code examples. It is straightforward to get started with BioRuby and use it to replace, or glue together, legacy shell scripts or to mix Ruby on Rails into an existing web application.

BioRuby comes with an interactive environment, both for the command-line shell and in the browser. Ideas can be quickly prototyped in the interactive environment and can be saved as ‘scripts’ for later use. Such an interactive environment has shown to be especially useful for bioinformatics training and teaching (Fig. 1).

New features, and refinements of existing ones, are constantly being added to the BioRuby code base. Current development activity focuses on adding support for the semantic web, and on designing a plugin system that allows adding entirely new components in a loosely coupled manner, such that experimental new code can be developed without having an impact on BioRuby's core stability and portability.

3 CONCLUSION

The BioRuby software toolkit provides a broad range of functionality for molecular biology and easy access to bioinformatics resources. BioRuby is written in Ruby, a dynamic programming language with a focus on simplicity and productivity, which targets all popular operating systems and the JVM. The BioRuby project is an international and vibrant collaborative software initiative that delivers life science programming resources for those researchers who want to benefit from the productivity features of the Ruby language, as well as from the larger Ruby ecosystem of reusable open source components.

ACKNOWLEDGEMENTS

All individual contributors are credited on the BioRuby website. We thank Hilmar Lapp and Chris Fields for comments and review of the manuscript.

Funding: Information-technology Promotion Agency Japan (IPA); Database Center for Life Science (DBCLS) Japan; Human Genome Center, Institute of Medical Science, University of Tokyo; Open Bioinformatics Foundation (OBF); the National Evolutionary Synthesis Center (NESCent); Google Summer of Code for ongoing support.

Conflict of Interest: none declared.

REFERENCES

  • Aerts J, Law A. An introduction to scripting in Ruby for biologists. BMC Bioinformatics. 2009;10:221. [PMC free article] [PubMed]
  • Biegert A, et al. The MPI Bioinformatics Toolkit for protein sequence analysis. Nucleic Acids Res. 2006;34:W335–W339. [PMC free article] [PubMed]
  • Cock PJ, et al. Biopython: freely available python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. [PMC free article] [PubMed]
  • Cock PJ, et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 2010;38:1767–1771. [PMC free article] [PubMed]
  • Han MV, Zmasek CM. phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics. 2009;10:356. [PMC free article] [PubMed]
  • Holland RC, et al. BioJava: an open-source framework for bioinformatics. Bioinformatics. 2008;24:2096–2097. [PMC free article] [PubMed]
  • Jacobsen A, et al. miRMaid: a unified programming interface for microRNA data resources. BMC Bioinformatics. 2010;11:29. [PMC free article] [PubMed]
  • Kanehisa M, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–D484. [PMC free article] [PubMed]
  • Katayama T, et al. TogoWS: integrated SOAP and REST APIs for interoperable bioinformatics Web services. Nucleic Acids Res. 2010;38:W706–W711. [PMC free article] [PubMed]
  • Katoh K, et al. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518. [PMC free article] [PubMed]
  • Lee S, Blundell TL. Ulla: a program for calculating environment-specific amino acid substitution tables. Bioinformatics. 2009;25:1976–1977. [PMC free article] [PubMed]
  • Metlagel Z, et al. Ruby-Helix: an implementation of helical image processing based on object-oriented scripting language. J. Struct. Biol. 2007;157:95–105. [PubMed]
  • Philippi S. Light-weight integration of molecular biological databases. Bioinformatics. 2004;20:51–57. [PubMed]
  • Prince JT, Marcotte EM. mspire: mass spectrometry proteomics in Ruby. Bioinformatics. 2008;24:2796–2797. [PMC free article] [PubMed]
  • Shannon P, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–2504. [PMC free article] [PubMed]
  • Stajich JE, Lapp H. Open source tools and toolkits for bioinformatics: significance, and where are we? Brief Bioinform. 2006;7:287–296. [PubMed]
  • Stajich JE, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–1618. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...