Logo of narLink to Publisher's site
Nucleic Acids Res. Jul 2007; 35(Web Server issue): W21–W26.
Published online May 7, 2007. doi:  10.1093/nar/gkm298
PMCID: PMC1933131

iHOP web services

Abstract

iHOP provides fast, accurate, comprehensive, and up-to-date summary information on more than 80 000 biological molecules by automatically extracting key sentences from millions of PubMed documents. Its intuitive user interface and navigation scheme have made iHOP extremely successful among biologists, counting more than 500 000 visits per month (iHOP access statistics: http://www.ihop-net.org/UniPub/iHOP/info/logs/). Here we describe a public programmatic API that enables the integration of main iHOP functionalities in bioinformatic programs and workflows.

INTRODUCTION

iHOP (1) (iHOP literature server, http://www.ihop.net.org) allows researchers to explore a network of gene and protein interactions by directly navigating the pool of published scientific literature. Rather than providing long lists of entire abstracts upon keyword searches, iHOP selectively retrieves information that is specific to genes and proteins and summarizes their interactions and functions. The system adds value by filtering and ranking extracted sentences according to significance, impact factor, date of publication and syntax.

iHOP web content is pre-compiled and generated in a multi-step process to annotate biomedical texts with gene and protein names, chemical compounds and MeSH terms. This annotation task is computationally expensive because of the sheer number of entities, but more importantly, hindered by a high semantic overloading of abbreviations and synonyms in biomedicine. The continuous development and optimization of heuristics and machine learning algorithms to improve entity detection and synonym disambiguation is therefore a central effort in the maintenance of iHOP.

Given the complexity and effort that goes into the development and maintenance of a text-mining pipeline, it makes sense to build upon the existing infrastructure of iHOP rather than reinventing the wheel. Already numerous online resources are linking to iHOP and novel tools are emerging which are based on the iHOP resource, e.g. iHOPerator (2). The iHOP web service API has already been tested in selected projects over the last 2 years and is made publicly available now. Although on any biocomputing facility APIs are not as visible to the end user, they are very important for the different omics, which usually depend on powerful data set analysis. Those powerful analysis run distributed workflows, which have to semantically integrate the results from diverse biocomputing facilities and data sources. Other large-scale biocomputing facilities provide environments such as NCBI Entrez (3) (Entrez CGI services, http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html; Entrez SOAP services, http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html) or EBI WS (4) (EBI SOAP services, http://www.ebi.ac.uk/Tools/webservices/).

METHODS

To make the iHOP programmatic interface remotely accessible and integrable on workflows in a way that is neutral to programming languages and vendor independent, we decided to implement the public API in the form of web services (5). Three popular web service API models have been implemented for iHOP: the REST model (6) (Wikipedia description of REST, http://en.wikipedia.org/wiki/Representational_State_Transfer), which the DAS (7) protocol follows; SOAP + WSDL, which is based on WSDL document description and uses SOAP messages and XML for messaging; BioMOBY (8), which is focused on bioinformatic workflow building (9). Table 1 contains a brief description of these API models. All three API implementations are based on a common internal library and common XML schemas to facilitate maintenance and future developments. MOBY implementation required additional efforts to integrate iHOP web services into the MOBY ontology.

Table 1.
Brief description of web services API models, used on iHOP web services

The schema design was driven by the iHOP functionalities that are directly useful for bioinformatical workflows (Figures 1 and and2).2). Table 2 contains a brief description of these functionalities, with their inputs and outputs.

Table 2.
Logical iHOP web services functionalities. These functionalities and the web services which implement them are focused on automation, so almost all functionalities have more than one input type. So, depending on the API model, some of these functionalities ...
Figure 1.
Schema of operations of iHOP web services. Each box is a web service, and double boxes are the recommended starting points for workflows. Links show some suggested flows between services that are useful for workflow building. Green links represent bi-directional ...
Figure 2.
This is a Taverna workflow diagram which takes as input some free text (e.g. ‘breast cancer’). The workflow fetches the gene and protein symbols related to the input free text, and it returns those symbols, their synonyms and all the abstracts ...

A key issue in the development was the design of an XML schema rich enough to describe and integrate the valuable information that is already accessible through the iHOP user interface. For instance, annotated sentences are generated by getSymbolDefinitions, getSymbolInteractions and getPubMed functionalities. Each sentence also provides information about the abstract, journal and the journal impact factor. Basic symbol information is provided by getSymbolInfo, and it can also be found on getSymbolDefinitions and getSymbolInteractions results. The designed XML Schema, along with its documentation, is available at the iHOP web services site.

Usually, gene symbol disambiguation is a hard task, made in the last term by the user, and its automation is an essential part in a useful workflow. Using specific heuristics for these web services, we have created an additional functionality called guessSymbolIdFromSymbolText, which guesses the nearest unambiguous iHOP gene symbol id from free text input and an optional target organism. This concept is very similar to ‘I'm feeling lucky’ Google functionality, and the functionality speeds up workflow building. Workflow writers are not tied to this service and its heuristics, because anyone can create their own heuristics about symbol selection using getRelatedSymbols output.

Under the REST (Representational State Transfer) paradigm there is a CGI-XML service available for all functionalities described earlier. Special return cases have been modelled using standard HTTP codes: when there is no answer for a query in a CGI-XML service, a 404 Not Found error is returned; if an internal error happens, a 500 Internal Server Error is used; if no input parameter is specified, a 400 Bad Request error is returned.

For SOAP (Simple Object Access Protocol), we created for each functionality variations of the same web service, to simplify workflow building. SOAP services use the RPC/encoded WSDL style, so they can be used from Perl programs with any SOAP::Lite version. Critical errors (no input parameter, internal server error) are reported by the iHOP SOAP services using the standard SOAP fault mechanism. When there is no answer to return, the services return a specific XML structure (iHOPSOAPNotFound) designed for these SOAP services, instead of using SOAP fault mechanism. This is important, because some workflow enactment tools (like Taverna) stop the whole workflow when a SOAP service returns a SOAP fault, an undesirable effect when a service invocation has not failed.

In the design of BioMOBY services it was necessary to comply with the common object ontology on MOBY Central and the portfolio of services that are using this ontology. Although the main iHOP services take the same parameters as input and use the same XML schema as CGI-XML and SOAP for their outputs, the true power of iHOP MOBY service are the additional translation services. These services take as input iHOP XML structures generated by the iHOP services, and translate the content into a collection of usable MOBY objects. This way, other MOBY services which use the same ontology can be chained to this output.

CGI-XML services were tested using both web browsers and command-line HTTP retrieval tools (like wget). We tested and cross validated the functionality of iHOP SOAP web services with unit tests based on the Perl SOAP::Lite library and in the context of Taverna (10,11), a workflow enactment tool extensively used by the bioinformatics community. We found that SOAP::Lite 0.60 had a better behaviour than former versions and some new intermediate ones (last version is 0.69). Taverna 1.4 and 1.5 are discouraged, because SOAP services results are pruned. Taverna 1.5.1 solves these and other issues, and it is recommended. Older versions, like Taverna 1.3.1, also work, but they have many limitations related to BioMOBY services.

RESULTS AND DISCUSSION

A proof for the functionality, completeness and usefulness of the iHOP web service APIs are a number of collaborative projects that make programmatic use of iHOP content. Table 3 contains a brief list of the projects where iHOP web services have been used, and Figure 3 shows a CARGO (Cases et al., submitted to NAR-WEB 2007) widget using information provided by iHOP CGI-XML services.

Figure 3.
This is a snapshot of the CARGO framework, showing information about P53. The iHOP widget shows sentences with evidences of some relationship between P53 and other genes.
Table 3.
Projects where iHOP web services have been (or are being) used

In the context of the use of iHOP as a web service it is necessary to be aware of the current limitations of biological text mining. BioCreAtIvE (12) and other blind community assessments (13) have clearly shown that name identification and in particular matching gene/name in the literature with the corresponding database entries is a hard problem and the best systems are still far from perfect (14). Our own evaluation of iHOP in 2005 (15) shows that in model organisms the average precision is around 94% and the recall around 87%. Even if the inclusion of additional refinements and dictionaries is producing continuous progress the poor adhesion of the community to naming standards (16) will continue creating problems in this area.

Other obvious limitations of iHOP and all other current text mining systems are imposed by the limited availability of full text sources [main reason for the common use of abstract collections (17)] and the still limited possibilities to incorporate effective Natural Processing Techniques for the extraction of additional features from biomedical text. A more detailed description of the status of this fast developing field can be found in (18–20).

Despite these general limitations in the field, the iHOP web interface has become popular among biologists searching for information about the function and relation of the genes and proteins of their interest. To our knowledge, iHOP is the only large-scale text mining resource in biology that is offered as an open web service, we therefore, expect that the novel possibilities described in this work will contribute to the use of iHOP as part of numerous high-throughput analysis environments.

AVAILABILITY

Information relevant to developers, like detailed documentation of the iHOP web service XML file format, the URLs required to invoke the REST API, the WSDL document describing SOAP services and usage examples in Perl and Taverna are available at http://www.ihop-net.org/UniPub/iHOP/webservices/.

ACKNOWLEDGEMENTS

Funding to pay the Open Access publication charges for this article was provided by ENFIN Network of Excellence (LSHG-CT-2005-518254).

Conflict of interest statement. None declared.

REFERENCES

1. Hoffmann R, Valencia A. A gene network for navigating the literature. Nat. Genet. 2004;36:664–664. [PubMed]
2. Good BM, Kawas EA, Kuo BY, Wilkinson MD. iHOPerator: User-scripting a personalized bioinformatics Web, starting with the iHOP website. BMC Bioinformatics. 2006;7:534. [PMC free article] [PubMed]
3. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007;35:D5–D12. [PMC free article] [PubMed]
4. Labarga A, Pilai S, Valentin F, Anderson M, Lopez R. Web services at EBI. EMBnet.news. 2005;11:18–23.
5. Kreger H. Web Sevices Conceptual Architecture (WSCA) 1.0. IBM Software Group; 2001.
6. Fielding RT. Architectural Styles and the Design of Network-based Software Architectures. University of California, Irvine: Doctoral dissertation; 2000.
7. Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics. 2001;2:7. [PMC free article] [PubMed]
8. Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief. Bioinform. 2002;3:331–341. [PubMed]
9. Wilkinson M, Schoof H, Ernst R, Haase D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol. 2005;138:5–17. [PMC free article] [PubMed]
10. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20:3045–3054. [PubMed]
11. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006;34:W729–W732. [PMC free article] [PubMed]
12. Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics. 2005;6(Suppl. 1):S1. [PMC free article] [PubMed]
13. Jin-Dong K, Ohta T, Tsuruoka Y, Tateisi Y, Collier N. Introduction to the bio-entity recognition task at JNLPBA. In. Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04); 2004. pp. 70–75.
14. Hirschman L, Colosimo M, Morgan A, Yeh A. Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics. 2005;6(Suppl. 1):S11. [PMC free article] [PubMed]
15. Hoffmann R, Valencia A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics. 2005;21(Suppl. 2):ii252–ii258. [PubMed]
16. Tamames J, Valencia A. The success (or not) of HUGO nomenclature. Genome Biol. 2006;7:402. [PMC free article] [PubMed]
17. Schuemie MJ, Weeber M, Schijvenaars BJ, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA. Distribution of information in biomedical abstracts and full-text publications. Bioinformatics. 2004;20:2597–2604. [PubMed]
18. Krallinger M, Erhardt RA, Valencia A. Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today. 2005;10:439–445. [PubMed]
19. Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, Valencia A. Text mining for metabolic pathways, signaling cascades, and protein networks. Sci. STKE. 2005;2005:21. [PubMed]
20. Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol. 2005;6:224. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

  • iHOP web services
    iHOP web services
    Nucleic Acids Research. Jul 2007; 35(Web Server issue)W21

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...