Send to

Choose Destination
See comment in PubMed Commons below
PLoS One. 2015 Nov 25;10(11):e0143480. doi: 10.1371/journal.pone.0143480. eCollection 2015.

Merging Children's Oncology Group Data with an External Administrative Database Using Indirect Patient Identifiers: A Report from the Children's Oncology Group.

Author information

Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, United States of America.
Division of Oncology, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.
Children's Hospital Association, Overland Park, Kansas, United States of America.
Division of Infectious Disease, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.
Center for Pediatric Clinical Effectiveness, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.
Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America.
Children's Oncology Group, Monrovia, California, United States of America.
Department of Hematology/Oncology, The Hospital for Sick Children, University of Toronto, Toronto, Canada.
Division of Hematology/Oncology/Bone Marrow Transplantation, Children's Mercy Hospital and Clinics, Kansas City, Missouri, United States of America.



Clinical trials data from National Cancer Institute (NCI)-funded cooperative oncology group trials could be enhanced by merging with external data sources. Merging without direct patient identifiers would provide additional patient privacy protections. We sought to develop and validate a matching algorithm that uses only indirect patient identifiers.


We merged the data from two Phase III Children's Oncology Group (COG) trials for de novo acute myeloid leukemia (AML) with the Pediatric Health Information Systems (PHIS). We developed a stepwise matching algorithm that used indirect identifiers including treatment site, gender, birth year, birth month, enrollment year and enrollment month. Results from the stepwise algorithm were compared against the direct merge method that used date of birth, treatment site, and gender. The indirect merge algorithm was developed on AAML0531 and validated on AAML1031.


Of 415 patients enrolled on the AAML0531 trial at PHIS centers, we successfully matched 378 (91.1%) patients using the indirect stepwise algorithm. Comparison to the direct merge result suggested that 362 (95.7%) matches identified by the indirect merge algorithm were concordant with the direct merge result. When validating the indirect stepwise algorithm using the AAML1031 trial, we successfully matched 157 out of 165 patients (95.2%) and 150 (95.5%) of the indirectly merged matches were concordant with the directly merged matches.


These data demonstrate that patients enrolled on COG clinical trials can be successfully merged with PHIS administrative data using a stepwise algorithm based on indirect patient identifiers. The merged data sets can be used as a platform for comparative effectiveness and cost effectiveness studies.

[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Support Center