Send to

Choose Destination
Int J Med Inform. 2019 Jan;121:10-18. doi: 10.1016/j.ijmedinf.2018.10.009. Epub 2018 Nov 3.

ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata.

Author information

Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA. Electronic address:
Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA.
Department of Medicine, Brigham and Women's Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.



Reproducibility of research studies is key to advancing biomedical science by building on sound results and reducing inconsistencies between published results and study data. We propose that the available data from research studies combined with provenance metadata provide a framework for evaluating scientific reproducibility. We developed the ProvCaRe platform to model, extract, and query semantic provenance information from 435, 248 published articles.


The ProvCaRe platform consists of: (1) the S3 model and a formal ontology; (2) a provenance-focused text processing workflow to generate provenance triples consisting of subject, predicate, and object using metadata extracted from articles; and (3) the ProvCaRe knowledge repository that supports "provenance-aware" hypothesis-driven search queries. A new provenance-based ranking algorithm is used to rank the articles in the search query results.


The ProvCaRe knowledge repository contains 48.9 million provenance triples. Seven research hypotheses were used as search queries for evaluation and the resulting provenance triples were analyzed using five categories of provenance terms. The highest number of terms (34%) described provenance related to population cohort followed by 29% of terms describing statistical data analysis methods, and only 5% of the terms described the measurement instruments used in a study. In addition, the analysis showed that some articles included a higher number of provenance terms across multiple provenance categories suggesting a higher potential for reproducibility of these research studies.


The ProvCaRe knowledge repository (https://provcare.


edu/) is one of the largest provenance resources for biomedical research studies that combines intuitive search functionality with a new provenance-based ranking feature to list articles related to a search query.


ProvCaRe knowledge repository; ProvCaRe ontology; Provenance metadata; Provenance-based ranking; S3 model; Scientific reproducibility; W3C PROV specifications

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center