Format

Send to

Choose Destination
BMC Med Inform Decis Mak. 2017 Sep 13;17(1):134. doi: 10.1186/s12911-017-0532-3.

Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading.

Author information

1
Departments of Pediatrics, University of Colorado Anschutz Medical Campus, School of Medicine, Building AO1 Room L15-1414, 12631 East 17th Avenue, Mail Stop F563, Aurora, CO, 80045, USA. Toan.Ong@ucdenver.edu.
2
Departments of Pediatrics, University of Colorado Anschutz Medical Campus, School of Medicine, Building AO1 Room L15-1414, 12631 East 17th Avenue, Mail Stop F563, Aurora, CO, 80045, USA.
3
Colorado Clinical and Translational Sciences Institute, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, CO, USA.
4
Departments of Family Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, CO, USA.
5
Departments of Medicine, University of Colorado Anschutz Medical Campus, School of Medicine, Aurora, CO, USA.
6
DARTNet Institute, Aurora, CO, USA.
7
OSR Data Corporation, Lincoln, MA, USA.

Abstract

BACKGROUND:

Electronic health records (EHRs) contain detailed clinical data stored in proprietary formats with non-standard codes and structures. Participating in multi-site clinical research networks requires EHR data to be restructured and transformed into a common format and standard terminologies, and optimally linked to other data sources. The expertise and scalable solutions needed to transform data to conform to network requirements are beyond the scope of many health care organizations and there is a need for practical tools that lower the barriers of data contribution to clinical research networks.

METHODS:

We designed and implemented a health data transformation and loading approach, which we refer to as Dynamic ETL (Extraction, Transformation and Loading) (D-ETL), that automates part of the process through use of scalable, reusable and customizable code, while retaining manual aspects of the process that requires knowledge of complex coding syntax. This approach provides the flexibility required for the ETL of heterogeneous data, variations in semantic expertise, and transparency of transformation logic that are essential to implement ETL conventions across clinical research sharing networks. Processing workflows are directed by the ETL specifications guideline, developed by ETL designers with extensive knowledge of the structure and semantics of health data (i.e., "health data domain experts") and target common data model.

RESULTS:

D-ETL was implemented to perform ETL operations that load data from various sources with different database schema structures into the Observational Medical Outcome Partnership (OMOP) common data model. The results showed that ETL rule composition methods and the D-ETL engine offer a scalable solution for health data transformation via automatic query generation to harmonize source datasets.

CONCLUSIONS:

D-ETL supports a flexible and transparent process to transform and load health data into a target data model. This approach offers a solution that lowers technical barriers that prevent data partners from participating in research data networks, and therefore, promotes the advancement of comparative effectiveness research using secondary electronic health data.

KEYWORDS:

Data harmonization; Distributed research networks; Electronic health records; Extraction; Rule-based ETL; Transformation and loading

PMID:
28903729
PMCID:
PMC5598056
DOI:
10.1186/s12911-017-0532-3
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central Icon for University of Colorado, Health Sciences Library
Loading ...
Support Center