Send to

Choose Destination
Pharmacoepidemiol Drug Saf. 2018 Jul;27(7):781-788. doi: 10.1002/pds.4440. Epub 2018 Apr 17.

Assumptions made when preparing drug exposure data for analysis have an impact on results: An unreported step in pharmacoepidemiology studies.

Author information

Arthritis Research UK Centre for Epidemiology, Centre for Musculoskeletal Research, School of Biological Sciences, Manchester Academic Health Science Centre, The University of Manchester, Manchester, UK.
Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada.
Clinical and Health Informatics Research Group, McGill University, Montreal, Quebec, Canada.
Brigham and Women's Hospital, Boston, MA, USA.
Health eResearch Centre, Farr Institute for Health Informatics Research, The University of Manchester, Manchester, UK.
Faculty of Science, Division of Pharmacoepidemiology and Clinical Pharmacology, Utrecht University, Utrecht, The Netherlands.
Department of Medicine, McGill University, Montreal, Quebec, Canada.
NIHR Manchester Biomedical Research Centre, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, UK.
Rheumatology Department, Salford Royal NHS Foundation Trust, Salford, UK.



Real-world data for observational research commonly require formatting and cleaning prior to analysis. Data preparation steps are rarely reported adequately and are likely to vary between research groups. Variation in methodology could potentially affect study outcomes. This study aimed to develop a framework to define and document drug data preparation and to examine the impact of different assumptions on results.


An algorithm for processing prescription data was developed and tested using data from the Clinical Practice Research Datalink (CPRD). The impact of varying assumptions was examined by estimating the association between 2 exemplar medications (oral hypoglycaemic drugs and glucocorticoids) and cardiovascular events after preparing multiple datasets derived from the same source prescription data. Each dataset was analysed using Cox proportional hazards modelling.


The algorithm included 10 decision nodes and 54 possible unique assumptions. Over 11 000 possible pathways through the algorithm were identified. In both exemplar studies, similar hazard ratios and standard errors were found for the majority of pathways; however, certain assumptions had a greater influence on results. For example, in the hypoglycaemic analysis, choosing a different variable to define prescription end date altered the hazard ratios (95% confidence intervals) from 1.77 (1.56-2.00) to 2.83 (1.59-5.04).


The framework offers a transparent and efficient way to perform and report drug data preparation steps. Assumptions made during data preparation can impact the results of analyses. Improving transparency regarding drug data preparation would increase the repeatability, reproducibility, and comparability of published results.


data preparation; pharmacoepidemiology; reproducibility; transparency

Supplemental Content

Full text links

Icon for Wiley Icon for PubMed Central
Loading ...
Support Center