• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of procamiaLink to Publisher's site
AMIA Annu Symp Proc. 2006; 2006: 579–583.
PMCID: PMC1839506

A Web Portal that Enables Collaborative Use of Advanced Medical Image Processing and Informatics Tools through the Biomedical Informatics Research Network (BIRN)

Shawn N. Murphy, MD Ph.D.,1 Michael E. Mendis,1 Jeffrey S. Grethe, Ph.D.,3 Randy L. Gollub, MD Ph.D.,2 David Kennedy, Ph.D.,2 and Bruce R. Rosen, MD Ph.D.2

Abstract

Launched in 2001, the Biomedical Informatics Research Network (BIRN; http://www.nbirn.net) is an NIH – NCRR initiative that enables researchers to collaborate in an environment for biomedical research and clinical information management, focused particularly upon medical imaging. Although it supports a vast array of programs to transform and calculate upon medical images, three fundamental problems emerged that inhibited collaborations. The first was that the complexity of the programs, and at times legal restrictions, combined to prohibit these programs from being accessible to all members of the teams and indeed the general researcher, although this was a fundamental mission of the BIRN. Second, the calculations that needed to be performed were very complex, and required many steps that often needed to be performed by different groups. Third, many of the analysis programs were not interoperable. These problems combined to created tremendous logistical problems. The solution was to create a portal-based workflow application that allowed the complex, collaborative tasks to take place and enabled new kinds of calculations that had not previously been practical.

INTRODUCTION

The ability to send data through a succession of software programs is critical for the successful analysis of complex images. Over the years, various groups have developed “data pipelines”, many of which are simple scripts, but some of which are entire applications to handle these processes. Although the pipelines are effective in their various local environments, they tend to fail under circumstances where a high degree of collaboration is required in a calculation. Being local, they also are not easily transferable to other institutions where calculations are being tested for reproducibility or extended for further experimentation. They are not very effective in keeping the data organized by research subject and content. Finally, the pipelines are not available to the clinical researcher as the domain space in which collaboration is taking place increases to genomics and epidemiology.

Nonetheless, the current state-of-the-art for image processing exists in these data pipeline applications. Perhaps the most sophisticated is the LONI pipeline1 from the Laboratory of Neuro Imaging at the University of California at Los Angeles. Others in use include the Kepler pipeline2 from the University of California at Berkley and at San Diego, as well as the jBPM workflow engine from JBoss (http://jboss.com).

A system was envisioned that could consume the existing pipeline applications and achieve the following goals: 1) Allow software produced by the BIRN to be made available to people inside and outside of the BIRN group, 2) allow a consistent computing platform of BIRN software to be maintained with special attention to metadata and data provenance, and 3) allow study metadata to be tightly organized across groups to allow for collaboration and comparison of results.

METHODS

The potential for a portal-based solution was appreciated by the Morphometry BIRN test bed. A portal-based solution is where a website would accept uploaded images and host the calculating machinery (both hardware and software) so that image processing could be initiated via the website. The resulting transformed images would then be returned to the users, along with the summarized numerical results of any calculations upon the images, such as the volume of segmented structures.

To understand what would be required from the portal-based solution we surveyed the needs across all of the Morphometry BIRN partner sites which included two groups at Harvard University, one at Johns Hopkins University, one at Washington University, one at University of California at San Diego, one at University of California at Irvine, and one at University of California at Los Angeles. The requirements that emerged were as follows: 1) The system must be able to incorporate the pipeline tools that are currently available for image processing including LONI, Kepler, and jBPM; 2) The system must allow human review of intermediate results of calculations. The most common use case supporting this requirement is the review of images to ensure that calculations have not resulted in a gross error and converged to irrelevant local minima. 3) The system must allow human handoffs. Several projects exist within the BIRN where various groups participate in various portions of a calculation. Therefore, a process where one group automatically indicates what calculations are ready for the next group is necessary. 4) The system must allow data provenance to be managed to allow calculations to be reproduced accurately. 5) The system must be available both for direct human interaction through a set of web pages, and also to software processes through a set of services such that other computerized systems can call and interact with the system directly. 6) The system must provide a clear plan on how to represent the results of calculations and have the ability to access to their results by direct viewing or through software processes. 7) The system must contain the security, scalability, and reliability to be expected in a multi-user production system.

An external file that holds a picture, illustration, etc.
Object name is amia2006_0579f1.jpg

An example process is the Semi-Automated Shape Analysis pipeline (SASHA) as shown below. First, 3D Structural MRI data of the brain with good gray-white matter contrast-to-noise ratio is acquired at a participating site. In order to be shared, the image data has to be de-identified within the site’s firewall: patient information is removed from the image headers and face information is stripped from the images while leaving the brain intact. The de-identified data then needs to be uploaded to a common site where it can be accessed by other participating sites. Second, the de-identified structural brain MRI data is automatically segmented using MGH’s Freesurfer morphometry tools. The derived segmented data (e.g., the hippocampal surfaces) is consumed by the JHU site and used for shape analysis using their Large Deformation Diffeomorphic Metric Mapping tool (LDDMM)3. The combined morphometric results (surfaces, volumes, labels, deformation fields) can be viewed from the database using 3D Slicer as the common visualization platform.

Current pipeline tools are used to work with this data at the various local sites. It was imperative that the portal did not require the functionality of these tools to be reinvented, because this would not represent an efficient use of BIRN resources. For example, the MGH Freesurfer calculation consists of over 40 steps, and we did not wish to redo the workflow in a new portal-based tool. Therefore, the web portal needed to assimilate the following pipeline applications:

  1. Kepler2 (http://www.kepler-project.org) Kepler is a visual modeling tool written in Java. It was begun in 1997 at UC Berkley. Several recent efforts have extended the Ptolemy-II platform (http://ptolemy.eecs.berkeley.edu/) to allow for the drag-and-drop creation of scientific workflows from libraries of actors. The Ptolemy actor is often a wrapper around a call to a web service or grid service. Ptolemy leverages an XML-meta language called Modeling Markup Language (MoML) to produce a workflow document describing the relationships of the entities, properties, and ports in a workflow. The process of creating a workflow with the Ptolemy software is centered on creating Java classes that extends a built-in Actor class.
  2. LONI pipeline1 (http://www.loni.ucla.edu/twiki/bin/view/Pipeline) The LONI Pipeline is a visual environment for constructing complex scientific analyses of data. It is written in Java and utilizes an OWL-based XML representation of the workflow. The environment also takes advantage of supercomputing environments by automatically parallelizing data-independent programs in a given analysis whenever possible.
  3. jBPM (http://www.jboss.com/products/jbpm) The primary focus of the JBoss jBPM development has been the BPM (business process management) core engine. Besides further development of the engine, the JBoss roadmap for jBPM focuses on three areas a) native BPEL support, b) a visual designer to model workflows, and c) process management capabilities enhancement. jBPM can stand alone in a Java VM, inside any Java application, inside any J2EE application server, or as part of an enterprise service bus.

The ability to use web services provides a way to perform distributed computing, and also in a grander scheme a way to allow rapid deployment of new computation algorithms. This is achieved by enabling the ownership and maintenance of the web service by those who are actually developing a specific computational algorithm at their local site.

RESULTS

The goal of the Portal was not to produce new software, but rather to link together and support existing BIRN software such that it could be more effectively utilized by various groups of collaborating users. To achieve this goal, we architected the system as shown in the diagram below. Because the BIRN is dedicated to open source solutions, all parts of the infrastructure are available to the public for free as open source projects including the Kepler pipeline engine. In the diagram, the parts built by the authors of this paper are shown in dark gray, while pre-existing software that was integrated into the solution are shown in light gray.

The system relies on uploads and downloads of images and other accompanying data to and from an open-source file management system named the Storage Resource Broker (SRB, available at http://www.sdsc.edu/srb/). The SRB provides a way to access data sets and resources based on their attributes and/or logical names rather than their names or physical locations and allows file security to be managed on a network shared resource in conjunction with the Grid Account Management Architecture (GAMA, available at http://grid-devel.sdsc.edu/gama). The GAMA system is used for authorization and authentication and consists of two components, a backend security service that provides secure management of credentials, and a front-end set of portlets and clients that provide tight integration into web/grid portals4.

The main system software is divided amongst a Web Server and an Execution Server to comply with the general architecture of the BIRN portal. The Execution server has access to a Condor grid (http://www.cs.wisc.edu/condor/). We chose to use jBPM as the principle engine for scheduling and executing other applications. This is because it is a reliable, open source Workflow engine that is particularly geared towards making human handoffs in a workflow. It’s “out of the box” functionality includes a set of services that allows breakpoints in a workflow to be defined where the workflow will enter a “wait” state until human intervention occurs. This gives the chance for handoffs between groups to occur and intermediate calculations to be checked. Additional required software includes the web portal open source software, GridSphere (http://www.gridsphere.org/gridsphere/gridsphere), and the open source Apache Tomcat project (http://www.apache.org/).

An external file that holds a picture, illustration, etc.
Object name is amia2006_0579f2.jpg

We combined the above pieces with a custom designed workflow portlet that drives web access to the infrastructure, a J2EE (http://java.sun.com/javaee/index.jsp) based interface to some of the deeply embedded functionality of jBPM, custom interfaces to the Kepler and LONI pipeline engines, and a versatile database that tracks workflows and stores results. All of the BIRN custom designed pieces of the above workflow solution will be made available through the BIRN website in the spring of 2007.

Controlling the versions of software used to perform calculations is important to guaranteeing reproducible image processing. An important design principal of the custom workflow portlet architecture is that it defines calculation “zones” that use consistent versions of the Java Virtual Machine, the pipeline engine (jBPM, Kepler, and LONI) and all the associated programs that will be used in the calculation. To this end, users may not upload new programs and indeed must restrict themselves to software available in a predefined calculation zone. These zones are defined by BIRN administrators.

The functionality works as follows, and the creation of a request to run a workflow is illustrated in the figure above. The workflows are stored as objects and can be called when an instance is requested to be created by the “Request” form part of the User Interface (UI). The Request form is used to start the Workflow which in the figure is a LONI predefined workflow being overseen by jBPM (all child workflow applications are ultimately overseen by the jBPM workflow engine). Data is retrieved that had been uploaded to the SRB. As the workflow starts, runs, and finishes, updates are made to the custom database (DB) from which they are displayed from the “Confirm Request” and the “Check on Request” forms in the UI. Upon finishing, the “Check on Request” UI form is used to show confirmation of the run and the resulting image data is then downloaded from the SRB. Numerical results are downloaded from the Custom DB. The “Check on Request” UI allows the intermediate states of the workflow to be checked and acted upon. The UI’s available in the diagram are also designed to be available as web service calls so that other client applications can be used to communicate with the user on the state of the workflows.

The Custom DB stores Entity-Attribute-Value combinations5 in a star schema6. The database schema does not change as new data sources are added. New data will result in additional rows added to the fact and dimension tables, but new columns and tables do not need to be added for each new data source. This is very useful in large projects such as the BIRN where there are many tools depending upon a specific database schema. A strategy where the database grows by adding rows for new data rather than adding new tables and columns allows tools developed to work with one kind of data to also work with a new type of data.

Attribute definitions are managed through a concept dimension table that ensures the integrity of the ontology and provides easily managed ad-hoc query capabilities. This strategy is also used for maintaining the data provenance.

DISCUSSION

The development of this portal-based workflow application framework allows BIRN applications that previously were not generally available to become accessible to the general clinical researcher. This expands the impact that the BIRN can make on clinical research and allows efficient sharing of available hardware resources. Additionally, the workflow application allows for a stabilization of the BIRN calculation process, the enablement of more efficient collaborations, and an informatics-oriented revision of the BIRN platform such that ontology systems are effectively utilized in the storage and retrieval of results. Finally, an emergent property of the system is that both the raw and derived medical image data are stored in a format that is compatible with advanced medical informatics systems of analysis.

Besides collaboration, the use of a well functioning portal enables not only the initial calculation of the experimental results, but also the recalculation for verification of the results, and the exploration of the parameter space of the results. The amount of change per change of an initial parameter may be graphed as a parameter vector space, and such graphs help to show where care must be taken with the initial estimates of the parameters.

Disadvantages of the portal exist, some that have potential solutions, others that are inherent parts of the architecture. Because of the careful ontology mapping and data provenance tracking requirements, more time must be spent setting up a calculation. This discourages quick, ad-hoc calculations from being performed. If one is in the initial stages of a using a new application to perform calculations, the portal will be cumbersome. The de-identification of data prior to being used in calculations is also cumbersome in initial phases of a project. We are currently working through optimizing this process, and it appears a software solution should help alleviate this problem. Finally, the architecture requires hardware be available to perform the portal calculations. Grid enabling the architecture is part of the solution, and may allow very effective distribution of the calculations over available national resources.

Setting up the BIRN analysis portal allows general use of BIRN resources and enables effective collaborations between sites. It allows greater exploration of recalculated experiments and the ability to routinely explore complex parameter spaces. The BIRN analysis portal is built as a completely open source solution and is based upon existing workflow expression standards and architecture. The requirements of the BIRN analysis portal are common to those of other large projects offering the opportunity of code and design reuse.

Acknowledgments

This work was supported by the Morphometry BIRN (U24-RR021382) and the BIRN Coordinating Center (U24-RR019701) (Biomedical Informatics Research Network, http://www.nbirn.net), a National Center for Research Resources Project, U.S.A.

References

1. David E Rex, Jeffrey Q Ma, Arthur W Toga. The LONI Pipeline Processing Environment. NeuroImage. 2003;19:1033–1048. [PubMed]
2. Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger-Frank E, Jones M, Lee E, Tao J, Zhao Y. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience. 2005 Published Online: 13 Dec 2005.
3. Beg MF, Buckner R, Fischl B, Park Y, Ceyhan E, Priebe C, Ceritoglu C, Kolasny A, Brown T, Quinn B, Yu P, Gold B, Ratnanather JT, Miller M. BIRN Brain Morphometry. Pattern classification of hippocampal shape analysis in a study of Alzheimer's Disease; Human Brain Mapping Conference.2005.
4. Karan Bhatia, Kurt Mueller, Sandeep Chandra. GAMA: Grid Account Management Architecture. IEEE International Conference on EScience and Grid Computing; Dec 2005..
5. Kimball R. The Data Warehousing Toolkit. New York: John Wiley; 1997.
6. Nadkarni PM, Brandt C. Data Extraction and Ad Hoc Query of an Entity-Attribute-Value Database. J Am Med Inform Assoc. 1998;5:511–7. [PMC free article] [PubMed]
7. Murphy SN, Gainer VS, Chueh, H. A Visual Interface Designed for Novice Users to find Research Patient Cohorts in a Large Biomedical Database. AMIA, Fall Symp. 2003:489–493. [PMC free article] [PubMed]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...