• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of procamiaLink to Publisher's site
AMIA Annu Symp Proc. 2010; 2010: 617–621.
Published online Nov 13, 2010.
PMCID: PMC3041280

The TRITON Project: Design and Implementation of an Integrative Translational Research Information Management Platform

Abstract

Multi-site consortia have become the preferred setting for team-based translational research programs. Such consortia are able to facilitate increased breadth and depth of basic science and clinical research activities, but also present numerous challenges related to data collection, analysis, storage, and exchange. The Chronic Lymphocytic Leukemia (CLL) Research Consortium (CRC), a s a prototypical instance of such a consortia, uses numerous loosely coupled web applications to address its informatics needs. Over a decade of operations have allowed the CRC to identify usability and computational limitations relative to the preceding information management architecture. In response, the CRC has launched the TRITON project, with the ultimate objective of developing an open-source, extensible, and fully integrative translational research information management platform. In this manuscript, we describe the architecture, design processes, and initial implementation of thatplatform.

Introduction and Background

The Chronic Lymphocytic Leukemia Research Consortium (CRC, http://cll.ucsd.edu) is an NCI-funded program/project (P01CA081534) consisting of eight sites. Initially funded in 1999, the CRC coordinates and facilitates an integrated translational research program, with specific emphasis on basic and clinical research targeting the genetic, biochemical and immunologic bases of Chronic Lymphocytic Leukemia (CLL). A critical facility supporting the ability of the CRC to engage in such research is the use of shared data repositories, associated data collection instruments, and data mining and analysis tools. The CRC Integrated Information Management System (CIMS), is the data management system currently used by the consortium, incorporating: 1) multiple task-specific web portal interfaces supporting clinical trial, basic science and tissue bank data management; and 2) a set of shared data repositories. CIMS facilitates the collection and storage of numerous heterogeneous bio-molecular data sources generated by instrumentation and methodological approaches including: quantitative and qualitative immunophenotyping, multiple modalities of gene expression analysis, and Fluorescent In Situ Hybridization (FISH) analyses of cytogenetic abnormalities. CIMS was initially deployed for use by the CRC in 2000, and at the time of this submission, is being used to collect, manage and analyze data for well over 5000 patients involved in multiple clinical trial modalities, as well as hundreds of thousands of CLL-specific tissue samples. Despite the success of CIMS in satisfying the informatics requirements of the CRC over the past ten years, CRC participants have identified numerous usability and computational limitations of CIMS, including:

  • A reliance upon proprietary software architectures and standards, thus limiting the extensibility of CIMS to other, analogous clinical and translational research programs, as well as the exchange of data with external systems that utilize modern electronic data interchange mechanisms;
  • A loose-coupling of constituent data entry, management and query tools, which introduce significant complexity to the design and execution of data integration and analysis workflows that span multiple levels of granularity from bio-molecules to clinical phenotypes; and
  • A complex human-computer interface model that requires significant end-user training and acculturation in order to ensure optimal system utilization and high quality data.

Motivated by these limitations, the CRC has launched the TRITON (Translational Research Information Technology Omnibus) project, in order to re-engineer the current CIMS platform and develop a highly usable, extensible, standards-based, open source, and integrative translational research information management platform. A primary goal of these efforts is to enable the integration between TRITON and basic science, clinical research and translational science focused data management tools and interchange mediums associated with the NCI’s Cancer Biomedical Informatics Grid (caBIG) initiative, including the caGrid service-oriented middleware (1, 2). In doing so, our objectives is to increase the translational capacity of the CRC by enabling consortium investigators to discover, integrate, analyze and disseminate heterogeneous, multi-dimensional data sets. It is anticipated that many of these data sets will be generated by high-throughput bio-molecular technologies or instrumentation, as well as electronic health record (EHR) systems that are currently in use at the majority of CRC sites.

Methods

In the following sub-sections, we will describe three complementary and concurrent axes consisting of both technologies and methodologies, which collectively are being used to design and implement the TRITON platform.

Axis 1: Technology Migration

The first objective of the TRITON project is to migrate the existing CIMS database management systems and web-based interface applications to a standards-based, open-source software platform. This migration is necessary due to the current reliance of many CIMS components on an operational data repository that is implemented using the proprietary relational database management system, web application platforms, and programming languages. Such dependence significantly reduces the extensibility and adoptability of CIMS to other, analogous research programs. The specific open-source components we are utilizing for the TRITON project include: 1) the MySQL relational database management system; 2) the LifeRay web portal platform; 3) J2EE (JSR 168) compliant portlets; 4) the caGrid electronic data interchange (EDI) middleware (1, 2) and GAARDS grid-based user authentication and authorization system (3); and 5) the caTissue bio-repository management suite (4). In order to mitigate potential workflow disruptions associated with the migration of an actively utilized information management system to a new software “stack”, we are implementing the above technologies in an phased manner, as summarized in Table 1. Of note, at the time of submission, the TRITON project has completed Phases 1-2 of this technology migration, and is actively involved in Phase 3.

Table 1:
Technology migration phases.

Axis 2: Service-oriented Data Interchange

The second objective of the TRITON project is to develop a foundational domain model, derived from CIMS-specific workflows, that maps constituent objects and attributes to NCI EVS-compliant concept definitions, and to utilize that model to build caGrid “wrappers” capable of supporting the interoperability of TRITON with external caBIG electronic data interchange standards and scientific analysis applications. The first part of this objective, has been accomplished via a multi-step process consisting of:

  1. Deploying an instance of the open-source LexEVS terminology server (8) and openMDR metadata repository and toolkit (9);
  2. Employing model-driven architecture techniques to formally represent, using the Unified Modeling Language (UML), generalizeable domain models derived from CIMS-specific workflows and functional components; and
  3. The semantic annotation of such domain models using both standard (e.g., caDSR) and locally relevant LexEVS-annotated metadata via the openMDR toolkit.

The rationale for this approach is that the extensibility afforded by using a terminology management platform, such as LexEVS, and an ISO11179-compliant metadata management system, such as openMDR, will enable: 1) external semantic interoperability with the NCI-EVS and caDSR; as well as 2) additional terminology or ontology standards or services that already exist or may evolve during the course of this project or that are required by other adopters in the future. This is particularly desirable to ensure that TRITON can be generalized beyond the immediate oncology domain. Borlawsky et al. (5) provide a more detailed description of the model-driven architecture and knowledge engineering techniques being utilized by our team.

The second component of this objective, the implementation of caGrid-compliant wrappers that leverage the preceding ontology-anchored domain models, will be accomplished using the caGrid Data Service Framework and Introduce toolkit (1, 6). This will ensure that where appropriate, TRITON data sets will be caBIG-compliant, and therefore interoperable with other nationwide efforts. Our initial objective in this regard is to implement a bidirectional wrapper that will support the execution of queries against the TRITON participant registry, study calendar and protocol metadata, by enabling the mapping between SQL and CQL (Common Query Language), an axiomatic logical query syntax that is supported by the caGrid middleware. The wrapper will utilize an instance-specific rule base to define the semantics and logic of such mappings. The rules will be defined in terms of both local data type definitions and ontology-anchored concept definitions, maintained in the project’s LexEVS and openMDR instances.

Axis 3: Integration of Novel Data and Tools

The third objective of this project is to extend the existing CIMS operational data repository and web-based interface applications to support the collection, storage and analysis of novel bio-molecular and phenotypic data sets. This objective will be primarily satisfied through the adoption and integration of existent or emergent caBIG-developed data storage and analysis platforms. A primary goal in the context of this objective is to support tissue sample and correlative phenotypic data capture in the context of longitudinal studies. This goal will be accomplished using a two-part approach: 1) an instance of the caGrid-compatible caTissue Suite bio-specimen management system will be deployed to manage CRC tissue core logistics, and integrated with the previously described LifeRay portal interface; and 2) an instance of the open-source Jess production rules system, with an accompanying ontology-anchored rule base, will be de ployed and linked to the LifeRay portal interface in order to execute and generate data-driven messages via both the web-portal interface and e-mail, based upon the axiomatic rules defined in the rule base. A primary use of this decision support mechanism will be to employ rule-based alerting and prompts in order to increase compliance with bio-specimen and correlative data collection protocols.

The second goal in the context of the preceding objective is to facilitate on-demand, integrative query and analysis of tissue sample availability and corresponding phenotype and bio-molecular data sets, using a combination of the following components: 1) ontology-anchored data definition and integration schemas; 2) caGrid-based electronic data interchange platforms; and 3) a web portal application intended to support the discovery, integration and interchange of heterogeneous biomedical data sets using the two preceding components. Specifically, we will deploy an instance of the TOKEn conceptual knowledge discovery platform and web portal (10). This portal will be integrated with the overall TRITON LifeRay portal in order to provide an integrative and federated query mechanism that spans the re-factored CIMS data repository, caTissue data repository and associated tissue annotations, and any pertinent and appropriately grid-enabled correlative or external data sources. The ultimate rationale behind this approach is that by leveraging caGrid-compatible portal technologies in conjunction with local data repositories, caGrid-compliant data repositories and potentially data analysis services, CRC investigators and staff will be able to realize the efficiencies associated with the creation of truly translational informatics pipelines, as have been previously described by Kickenger and colleagues (11).

Results

Table 2 summarizes the current state of the TRITON project, relative to the three axes described in the preceding section. Relative to the user-centered design methodologies introduced in our description of Phase 3 of the project, an iterative series of focus-group sessions have been conducted with members of the CRC in order to critical human-factors concerns relative to current CIMS functionality and future TRITON functionality. Thematic analyses of such sessions have identified a number of reoccurring and high priority areas, as summarized below:

  • Reduction of redundant data entry, especially between clinical trial and bio-specimen management systems, is a significant enabler of efficient and timely entry of quality research data sets;
  • Tight-integration of research administrative functionality (e.g., document management, calendaring) with existing productivity software, such as Microsoft Office, is highly desirable;
  • The ability to configure user-specific portal interfaces to constituent functional TRITON components is likely to support rapid adoption and adaptation to the system;
  • The use of targeted decision support, including both passive and active alerting, is needed to assist in mitigating potential protocol deviations (e.g., missing critical event deadlines, such as those associated with data and tissue collection during active clinical trials); and
  • The ability to rapidly exchange data with external partners and systems for both collaboration and reporting purposes is exceptionally critical and very difficult in the current CIMS environment as well as other prevailing research management systems.
Table 2:
TRITON project axes and their status

Discussion

The TRITON platform, as a successor the existing CIMS tools and data repositories, will adhere to the overall architectural model illustrated in Figure 2. Fundamental to this architecture are:

  • The use a component-based software engineering approach, in which functional units are designed, implemented, and deployed using a model-driven modular approach, and linked by a common service oriented electronic data interchange medium (caGrid);
  • The shallow integration of components using the LifeRay portal environment, thus enabling end-user configuration of the presentation model as well as a single conceptual interface to constituent TRITON components;
  • Disposition and management of clinical, basic science, and bio-specimen data sets in both a shared relational data repository and longitudinally oriented data warehouse, the later utilizing an extensible entity-attribute-value data modeling approach;
  • The implementation of a fine-grained, policy-based end-user authentication and authorization platform, spanning all of the aforementioned components, and enabling user- and role-specific controls relative to data access and system functionality;
  • The incorporation of best-of-breed bio-specimen management components and rules-engine technologies, in order to ensure maximal extensibility and reusability of the system; and
  • The application of prevailing knowledge engineering frameworks, platforms, and best practices in order to support syntactic and semantic interoperability of the system with external customers or data exchange mechanisms, while still maintaining a localized control of such semantic annotations and data models.

We are actively providing open-source access to all TRITON software components via a gForge collaboration and software distributed site (https://project.bmi.ohio-state.edu).

Conclusions

The TRITON project represents a prototypical instance of the use of prevailing open-source and standards-based technologies in order to develop, deploy, and disseminate an extensible and integrative translational research information management platform. The design and implementation approach used in this project, should be informative to analogous efforts and programs. Of note, the availability of such informatics platforms is critical and pre-requisite to the advancement and success of a wide variety of research registry and dissemination portals, such as those associated with a number of large-scale NIH programs (TCGA, dbGaP, etc.) Furthermore, given the open-source and multi-institutional nature of the TRITON project, there is significant opportunity for the development of a community-based effort to extend and adopt this platform, with demonstrable benefits in terms of clinical and translational research efficiencies and capacity.

Figure 1:
Overview of TRITON architecture

Acknowledgments

This work was supported in part by NCI grants R01CA134232 and P01CA081534.

References

1. Oster S, Langella S, Hastings S, Ervin D, Madduri R, Phillips J, et al. caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research. Journal of the American Medical Informatics Association. JAMIA. 2008 Jan 1;15(2):138–49. [PMC free article] [PubMed]
2. Saltz J, Oster S, Hastings S, Langella S, Kurc T, Sanchez W, et al. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics. 2006 Aug 1;22(15):1910–6. [PubMed]
3. Langella S, Oster S, Hastings S, Siebenlist F, Phillips J, Ervin D, et al. AMIA Annual Symposium. Chicago, IL: American Medical Informatics Association; 2007. The Cancer Biomedical Informatics Grid (caBIG) Security Infrastructure; pp. 433–7. [PMC free article] [PubMed]
4. Dergunov AD, Ponthieux A, Mel’kin MV, Lambert D, Visvikis-Siest S, Siest G. Capillary isotachophoresis study of lipoprotein network sensitive to apolipoprotein E phenotype. 1. ApoE distribution between lipoproteins. Mol Cell Biochem. 2009 May;325(1–2):41–51. [PubMed]
5. Borlawsky T, Dhaval R, Hastings S, Payne PR. 2009 AMIA Translational Bioinformatics Summit. San Francisco: American Medical Informatics Association; 2009. Development of an Agile Knowledge Engineering Framework in Support of Multi-Disciplinary Translational Research. [PMC free article] [PubMed]
6. Hastings S, Oster S, Langella S, Ervin D, Kurc T, Saltz J. Introduce: An Open Source Toolkit for Rapid Development of Strongly Typed Grid Services. Journal of Grid Computing. 2007 2007 Dec;5(4):407–27.
7. Payne PR, Kwok A, Greaves A. AMIA Annu Symp Proc. 2008. Integrating Web Portlet Technologies with caGrid to Enable Rapid Application Development: the CRC Patient Study Calendar; p. 1087. [PubMed]
8. Pathak J, Solbrig H, Buntrock J, Johnson TM, Chute CM. LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime. J Am Med Inform Assoc. 2009;16(3):305–15. [PMC free article] [PubMed]
9. Hastings S. openMDR. Columbus, OH: The Ohio State University; 2009. [cited 2009 October 6]; http://www.cagrid.org/display/MDR/Overview.
10. Payne PR, Borlawsky T, Kwok A, Greaves A. Supporting the Design of Translational Clinical Studies Through the Generation and Verification of Conceptual Knowledge-anchored Hypotheses. AMIA Annu Symp Proc. 2008:566–70. [PMC free article] [PubMed]
11. Kickinger G, Hofer J, Brezany P, Min Tjoa A. Grid Knowledge Discovery Processes and an Architecture for Their Compositio. 2004. IASTED: International Association for Science and Technology Development.

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...