• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 1, 2010; 38(Web Server issue): W689–W694.
Published online May 19, 2010. doi:  10.1093/nar/gkq394
PMCID: PMC2896129

BioCatalogue: a universal catalogue of web services for the life sciences

Abstract

The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.

INTRODUCTION

As of 2010, there are more than 1400 publicly available bioinformatics tools and databases on the Web (1,2), with over 100 new Web servers providing interactive analysis tools reported in 2009 alone (3). These published resources are just the tip of a very large iceberg, and many others exist in relative obscurity, advertised only via project or laboratory Web Pages.

Though interactive access to these resources via Web Pages has been of enormous benefit to the community over the years, there is a growing demand for programmatic interfaces that allow these tools and databases to be linked together in automated analysis pipelines (4). Web Services are becoming an increasingly popular way of providing robust remote access (5), and this approach has been adopted by major service providers including the EMBL-EBI (6), KEGG (7), NCBI (8) and the DDBJ (9). Web Services can easily be accessed from most programming languages, or chained together as workflows using free tools [e.g. Taverna (10) or Kepler (11)], or their commercial equivalents [e.g. PipeLine Pilot (http://accelrys.com/products/pipeline-pilot/)].

The resources to which Web Services provide access are distributed across centres, projects, countries and disciplines and, for the most part, are currently likely to be discovered by word-of-mouth, Google searches, or from simple on-line lists such as http://www.xmethods.net/ or http://www.webservicelist.com/. As the number of Web Services has grown, so has the need for gathering information about them into one place. Table 1 gives a short summary of prominent public service registries that are relevant to the Life Sciences. These broadly fall into two categories: those that represent collections of services based on a specific schema and/or technology [e.g. BioMOBY Central (12,13), the DAS registry (14) and those that do not (e.g. seekda (http://www.seekda.com/) and the European Model for Bioinformatics Research and Community Education (EMBRACE) registry (15)]. Although some have major commercial or institutional backing, others have grown out of fixed term projects and hence their long-term future is unclear. Alongside registry building, there have also been ongoing efforts to describe Web Services with rich semantic annotations using ontologies and modern ontology languages. Examples include SSwap [16], Feta [17], SADI (http://sadiframework.org/) and BioMOBY.

Table 1.
A summary of existing on-line collections of Web Services

Drawing together the experience of these existing initiatives, the BioCatalogue provides a universal catalogue of Web Services for the Life Sciences. Launched in June 2009 and hosted at the EMBL-EBI, it allows registration of services that are specific to the Life Sciences (such as those for protein sequence or molecular structures) as well as more generic services that are of direct utility in this domain (e.g. text mining and image analysis). The catalogue does not host these services itself; instead it provides a mechanism to discover services and annotate them. The BioCatalogue has five key properties:

  1. It provides a single up-to-date port-of-call for finding Life Science Web Services, regardless of their technology or provenance. As well as allowing new registration of services manually and via its own Web Service interface, it aggregates contributions from other registries. For example, the catalogue carries service registrations from the EMBRACE Registry and domain-specific services from seekda.
  2. It offers a long-term sustained resource for service descriptions that is also a safe haven for securing the contents of registries beyond their originating projects (e.g. EMBRACE and BioSapiens services).
  3. It adds uniform and rich annotations to the services that harmonize their descriptions regardless of source or type. The annotations explain what the service does and how to use it. The descriptions draw upon existing and emerging work in the Semantic Web Services [e.g. Semantic Annotation for WSDL (SAWSDL) (18) and Semantic Annotation for REpresentational State Transfer (SA-REST) (19)]. Annotations from the EMBRACE and Feta registries have been contributed to the catalogue. Content is monitored by a full-time curator assisted by the registered members.
  4. It provides a rich range of facilities, adopting the best components of other registries where available, e.g. the EMBRACE service monitoring framework and endpoint validation software, and the use of seekda for service scavenging.
  5. It addresses the combined needs of service providers, users, annotators and developers alike, enabling the catalogue's content to be readily extended, curated and used by the community.

Currently, the BioCatalogue has over 300 registered members. It describes 1627 Web Services (1585 SOAP services, and 42 REST services) from over 158 different providers from 25 countries. All the services of the major data centres (EMBL-EBI, DDBJ and NCBI) are present.

USING THE BIOCATALOGUE

The BioCatalogue can be accessed via two mechanisms: a human-readable ‘Web 2.0’-style interface which supports browsing, searching and the manual creation and annotation of service entries; and a Web Service API for programmatic access.

The ‘Web 2.0’ interface

The BioCatalogue's Web interface provides faceted browsing, extensive link-based navigation and filtering on multiple criteria including service categories, keywords, providers, location and service type. Displayed information such as service popularity based on view statistics, comments from other users and the number and quality of annotations, helps to identify suitable services and find alternative or similar services.

All available information held on a service, including its annotations, tags and provider documentation is included in the search. Searching is facilitated by term suggestion based on tags, previous user searches and terms from the myGrid ontology (20). The ‘Search by Data’ feature matches a sample of a user's input data against example input data provided in the service annotations, allowing the user to discover services that provide methods for analysing their data. The BioCatalogue is configured so as to be indexable by generic Web search engines (e.g. Google) as well as being explicitly indexed in the specialist EB-eye (21) search engine.

Announcements and release notes are posted on Twitter and syndicated on RSS feeds. Registry entries may be bookmarked using social bookmarking systems such as Delicious (http://delicious.com/) or Digg (http://digg.com/). Users may log in using OpenID, Google, Facebook, Twitter, Yahoo! or Verisign accounts, simplifying registration, and limiting username and password proliferation.

The Biocatalogue web service interface

The BioCatalogue provides a REST Web Service API, enabling tools such as Taverna and registry aggregation sites such as ONIX (http://www.ncri-onix.org.uk/) to access its contents. The main exchange format is XML, with JSON (http://www.json.org/) output available for the annotations. The API broadly reflects the same functionality that can be accessed via the interactive Web interface. Table 2 outlines the main XML endpoints and their functions. Full documentation, along with code examples, is available from http://apidocs.biocatalogue.org/.

Table 2.
A representative sample of BioCatalogue REST API methods, accessible via http://www.biocatalogue.org/

SERVICE ANNOTATION

The descriptions of the services registered in the BioCatalogue are drawn from service providers, the user community and monitoring and usage analysis. Each annotation is associated with a source (automatic analysis, other registries, the providers or named curators) and can take the form of structured data, free text, tags or ontology terms. Annotations are divided into four main categories:

  • Functional: outlines the task of a service, the type(s) of analyses possible, information relating to underlying data resources used, its various operations, the function and format of any inputs and outputs, and whether parameters are mandatory or optional. Examples of input data or service usage are provided where available. Services are classified into multiple categories based on their biological category (e.g. proteomics) and their technology (e.g. text mining).
  • Operational: describes the mechanisms and any conditions and assumptions necessary to execute a service (e.g. restrictions by the service provider placed on the number of invocations allowed in a given interval). We previously observed (22) that many service providers structure their services to work in idiomatic ways: (i) combining the numerous useful functions beneath a single service interface [e.g. SoapLab (23), GenePattern (24) and RapidMiner (http://rapid-i.com)]; (ii) requiring operations to be combined to deliver a task [e.g. the EMBL-EBI Web Services (6)]; or (iii) prescribing that the services' interface be mapped to a semantic signature [e.g. BioMOBY (12), SSWAP (16) and SADI].
  • Profile: records objective analyses drawn from monitoring metrics automatically mined from other resources: for example, workflow management systems and subjective comments about the use and usability of services from the user community.
  • Provenance: includes details of where the service is hosted, who submitted the service to the registry, and who has provided annotation. Changes to the service description (e.g. its WSDL document) or its associated annotations are also recorded in order to provide a history of the service as well as an audit trail.

The BioCatalogue currently holds more than 33 000 annotations. Approximately a third of services have all operations described. As much documentation as possible is automatically extracted from the published service interfaces, and additional annotations may be added during or after initial submission by the contributor. These semantic service descriptions can be imported and exported in formats compliant with SAWSDL (18) and SA-REST (19) standards.

MONITORING WEB SERVICES

The status, reliability and stability of a Web Service are often the deciding factors for choosing a service. The BioCatalogue has adopted the EMBRACE Registry's system for monitoring service availability, service interface changes and service functionality (15). Availability is indicated using a simple ‘traffic light’ mechanism, whereby green means the service is active, yellow means it has one or more unresolved issues, and red means it is currently unavailable. Service interface changes are managed by periodically re-parsing interface documents and comparing them with the existing entry. Functionality is checked by the submission of scripts that exercise specific aspects of the services, managed by a separate server. By automatically monitoring changes, a history of service versions and performance can be provided and users relying on specific services can be notified of these changes by RSS subscription or Twitter.

Usage of the BioCatalogue is monitored to build up a profile of searches and access. This reveals relationships between services, including usage patterns; for example, services that are commonly used together, and/or services that provide similar functionality, which may be used as substitutes if one of these services becomes unavailable.

COMMUNITY CONTRIBUTION TO CONTENT

Members can register a Web Service, share their views, make comments or annotations on any service and provide examples of service usage with relevant input and output data. Automatic harvesting of service annotations provides the foundation on which user-provided annotations rest. Submission of services and annotations contribute to the reputation of a member, encouraging further contributions. Content is monitored by a full-time curator who oversees content and coordinates a small pool of curators to help members improve annotations and adopt best practices.

The BioCatalogue team includes several service providers, including the EMBL-EBI. Other providers are encouraged to contribute. As well as an active ‘friends’ mailing list, online news feeds and a wiki (http://www.biocatalogue.org/wiki/), ‘annotation jamborees’—virtual or face to face group efforts to annotate a large set of Web Services and to discuss best practices, new features, directions and general issues—are organized periodically. These jamborees serve as a resource review and a team-building forum as well as a source of new annotations.

All descriptions are attributed and open to scrutiny and all monitoring results are available. Documentation is provided at various levels of detail covering guidelines and best practices for service creation and execution. Help pages provide instructions or links on how to test and run services with different tools: GUI tools, such as soapUI (http://www.soapui.org/), SOAP Client (http://ditchnet.org/soapclient/) or workflow execution engines, such as Taverna and Kepler. Pointers to commonly used software libraries that can be used to incorporate Web Services into new programs in different programming and scripting languages, and links for creating new Web Services or writing a Web Service API to an existing tool, are also provided.

CONCLUSION

The first phase of the BioCatalogue has focused on the design and development of its Web interface and API, on establishing its core content and on the building of a contributing community. Since its launch in 2009, it has had over 14 000 visits and is successfully growing a community of contributors and users. The majority of visitors use the search and browsing features to discover services. Of the 300 or so registered members, a subgroup of around 20 actively contribute high quality manual annotations.

In cooperation with their respective developers, services generated during the EMBRACE and BioSapiens projects and relevant services found by the seekda search engine have already been included, and content from BioMOBY Central and the DAS registry will be added shortly. Thus the bulk of current services have been accumulated from registries, by scavanging, and by the major service providers. We now observe a growing number of more specialist service providers each adding a small number of domain-specific services to the catalogue.

The next phase of development concentrates on extending functionality and content, improving the quality and coverage of service curation, and integration with other systems. Support for tagging with community-curated ontologies will be extended. The myGrid ontology is already used and the EMBRACE project's EDAM ontology (http://sourceforge.net/projects/edamontology/) is under review.

Contributions will be made easier by the release of a write-API, providing members with the ability to register and update services programmatically. Consequently, profiles derived from other service-using and monitoring software, like the Taverna workflow system and its Web Service workflow library myExperiment (http://www.myexperiment.org/), and the service monitoring systems of BioMOBY, DAS and QBIOS (http://qbios.gforge.inria.fr/) will be integrated to form aggregated profiles.

The BioCatalogue aims to satisfy the needs of service providers, users and experts in the field, bringing them together in a common effort to make Web Services for biology more visible, better documented and easier to use. It is an important ‘one stop shop’ where users can locate Web Services that implement the analysis relevant for their experiments, learn how these services work and, most importantly, learn how to make the most of these valuable resources.

FUNDING

Funding for open access charge: Biotechnology and Biological Sciences Research Council (BB/F01046X/1, BB/F010540/1 to BioCatalogue project); the European Commission via the EMBRACE project (LHSG-CT-2004-512092); EMBO (ASTF 338.00-2009 to Development on Search By Data).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

Authors would like to thank all members of the BioCatalogue focus group, Strategy Advisory Board and other people who help us to improve the registry: Duncan Hull, Benjamin Good, Chrysanthi Ainali, Olivier Sallou, Chris Rawlings, Anil Wipat, Jo Dicks, Robert Gill, Steve Kemp, Antoine H.C. van Kampen, Holger Lausen, Terry Payne, Mark Wilkinson, Janusz Bujnicki, Paul Gordon, Khalid Belhajjame, Philip McDermott, Dave De Roure and all participants of annotation jamborees. Special acknowledgments are given to our partners and all projects that cooperate with BioCatalogue to ease and popularize usage of Web Services in Life Sciences, especially to EU EMBRACE network, OMII-UK, BioMOBY Central, seekda, myExperiment, myGrid, EU BioSapiens network and NBIC.

REFERENCES

1. Brazas MD, Yamada JT, Ouellette BF. Evolution in bioinformatic resources: 2009 update on the bioinformatics links directory. Nucleic Acids Res. 2009;37:W3–W5. [PMC free article] [PubMed]
2. Cochrane GR, Galperin MY. The 2010 Nucleic Acids Research database issue and online database collection: a community of data resources. Nucleic Acids Res. 2010;38:D1–D4. [PMC free article] [PubMed]
3. Benson G. Nucleic acids research annual web server issue in 2009. Nucleic Acids Res. 2009;37:W1–W2. [PMC free article] [PubMed]
4. Goble C, Stevens R, Hull D, Wolstencroft K, Lopez R. Data curation + process curation = data integration + science. Brief Bioinform. 2008;9:506–517. [PubMed]
5. Romano P, Marra D, Milanesi L. Web services and workflow management for biological resources. BMC Bioinformatics. 2005;6(Suppl 4):S24. [PMC free article] [PubMed]
6. McWilliam H, Valentin F, Goujon M, Li W, Narayanasamy M, Martin J, Miyar T, Lopez R. Web services at the European Bioinformatics Institute-2009. Nucleic Acids Res. 2009;37:W6–W10. [PMC free article] [PubMed]
7. Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 2010;38:D355–360. [PMC free article] [PubMed]
8. Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2009;37:D5–D15. [PMC free article] [PubMed]
9. Kaminuma E, Mashima J, Kodama Y, Gojobori T, Ogasawara O, Okubo K, Takagi T, Nakamura Y. DDBJ launches a new archive database with analytical tools for next-generation sequence data. Nucleic Acids Res. 2010;38:D33–D38. [PMC free article] [PubMed]
10. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006;34:W729–W732. [PMC free article] [PubMed]
11. Altintas I, Berkley C, Jaeger E, Jones M, Ludascher B, Mock S. In 16th International Conference on Scientific and Statistical Database Management, Proceedings. 2004. Kepler: an extensible system for design and execution of scientific workflows; pp. 423–424.
12. Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief. Bioinform. 2002;3:331–341. [PubMed]
13. Wilkinson M, Schoof H, Ernst R, Haase D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services the PlaNet exemplar case. Plant Physiol. 2005;138:5–17. [PMC free article] [PubMed]
14. Prlic A, Down TA, Kulesha E, Finn RD, Kahari A, Hubbard TJ. Integrating sequence and structural biology with DAS. BMC Bioinformatics. 2007;8:333. [PMC free article] [PubMed]
15. Pettifer S, Thorne D, McDermott P, Attwood T, Baran J, Bryne JC, Hupponen T, Mowbray D, Vriend G. An active registry for bioinformatics web services. Bioinformatics. 2009;25:2090–2091. [PMC free article] [PubMed]
16. Gessler DDG, Schiltz GS, May GD, Avraham S, Town CD, Grant D, Nelson RT. SSWAP: a simple semantic web architecture and protocol for semantic web services. BMC Bioinformatics. 2009;10:309. [PMC free article] [PubMed]
17. Lord P, Alper P, Wroe C, Goble C. In The Semantic Web: Research and Applications, Vol 3532/2005 of Lect. Notes Compur Sci. Berlin/Heidelberg: Springer; 2005. Feta: a light-weight architecture for user oriented semantic service discovery; pp. 17–31.
18. Vitvar T, Bournez C, Farrell J, Kopeck J. SAWSDL: semantic annotations for WSDL and XML schema. IEEE Internet Comput. 2007;11:60–67.
19. Sheth AP, Gomadam K, Lathem J. SA-REST: semantically interoperable and easier-to-use services and mashups. IEEE Internet Computing. 2007;11:91–94.
20. Wolstencroft K, Alper P, Hull D, Wroe C, Lord PW, Stevens RD, Goble CA. The mygrid ontology: bioinformatics service discovery. Int. J. Bioinform. Res. Appl. 2007;3:303–325. [PubMed]
21. Valentin F, Squizzato S, Goujon M, McWilliam H, Paern J, Lopez R. Fast and efficient searching of biological data resources–using EB-eye. Brief. Bioinform. 2010 February 11, 2010 [Epub ahead of print;doi:10.1093/bib/bbp065] [PMC free article] [PubMed]
22. Lord P, Bechhofer S, Wilkinson MD, Schiltz G, Gessler D, Hull D, Goble C, Lincoln S. In International Semantic Web Conference, Vol. 3298/2004 of Lecture Notes in Computer Science. Berlin/Heidelberg: Springer; 2004. Applying semantic Web Services to bioinformatics: experiences gained, lessons learnt; pp. 350–364.
23. Senger M, Rice P, Oinn T. Soaplab-A Unified Sesame Door to Analysis Tools. In: Cox SJ, editor. Proceedings, UK e-Science, All Hands Meeting, 2–4 September. Nottingham, UK: 2003. pp. 509–513.
24. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP. Genepattern 2.0. Nat. Genet. 2006;38:500–501. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...