Format

Send to

Choose Destination
See comment in PubMed Commons below
J Am Med Inform Assoc. 2017 Apr 8. doi: 10.1093/jamia/ocx033. [Epub ahead of print]

A longitudinal analysis of data quality in a large pediatric data research network.

Author information

1
Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, USA.
2
Department of Pediatrics, Children's Hospital of Philadelphia.
3
Department of Pediatrics, University of Colorado Denver Anschutz Medical Campus, Aurora, CO, USA.
4
University of Cincinnati Department of Pediatrics, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
5
Information Services Department, Children's Hospital Boston, Boston, MA, USA.
6
Department of Pediatrics, Washington University in St. Louis, St. Louis, MO, USA.
7
Research Informatics, Seattle Children's Research Institute, Seattle, WA, USA.
8
Research Information Solutions and Innovation, Nationwide Children's Hospital, Columbus, OH, USA.
9
Center for Pediatric Auditory and Speech Sciences, Nemours Biomedical Research, Wilmington, DE, USA.
10
Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.

Abstract

Objective:

PEDSnet is a clinical data research network (CDRN) that aggregates electronic health record data from multiple children's hospitals to enable large-scale research. Assessing data quality to ensure suitability for conducting research is a key requirement in PEDSnet. This study presents a range of data quality issues identified over a period of 18 months and interprets them to evaluate the research capacity of PEDSnet.

Materials and Methods:

Results were generated by a semiautomated data quality assessment workflow. Two investigators reviewed programmatic data quality issues and conducted discussions with the data partners' extract-transform-load analysts to determine the cause for each issue.

Results:

The results include a longitudinal summary of 2182 data quality issues identified across 9 data submission cycles. The metadata from the most recent cycle includes annotations for 850 issues: most frequent types, including missing data (>300) and outliers (>100); most complex domains, including medications (>160) and lab measurements (>140); and primary causes, including source data characteristics (83%) and extract-transform-load errors (9%).

Discussion:

The longitudinal findings demonstrate the network's evolution from identifying difficulties with aligning the data to a common data model to learning norms in clinical pediatrics and determining research capability.

Conclusion:

While data quality is recognized as a critical aspect in establishing and utilizing a CDRN, the findings from data quality assessments are largely unpublished. This paper presents a real-world account of studying and interpreting data quality findings in a pediatric CDRN, and the lessons learned could be used by other CDRNs.

KEYWORDS:

CDRN; data quality; electronic health record; extract-transform-load; secondary use

PMID:
28398525
DOI:
10.1093/jamia/ocx033
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems
    Loading ...
    Support Center