Send to

Choose Destination
Health Policy. 2016 Oct;120(10):1141-1150. doi: 10.1016/j.healthpol.2016.09.002. Epub 2016 Sep 5.

The effectiveness of payment for performance in health care: A meta-analysis and exploration of variation in outcomes.

Author information

Department of Health Sciences, University of York, York, YO10 5DD, UK; Health Strategy and Delivery Foundation (HSDF), 1980 Wikki Spring Street, Maitama, Abuja, Nigeria. Electronic address:
Department of Health Sciences, University of York, York, YO10 5DD, UK.
Hull York Medical School, University of York, York, YO10 5DD, UK.



Pay for performance (P4P) incentive schemes are increasingly used world-wide to improve health system performance but results of evaluations vary considerably. A systematic analysis of this variation in the effects of P4P schemes is needed.


Evaluations of P4P schemes from any country were identified by searching for and updating systematic reviews of P4P schemes in health care in four bibliographic databases. Outcomes using different measures of effect were converted into standardized effect sizes (standardized mean difference, SMD) and each study was categorized as to whether or not it found a positive effect. Subgroup analysis, meta-regression and multilevel logistic regression were used to investigate factors explaining heterogeneity. Random-effects models were used because they take into account heterogeneity likely to be due to differences between studies rather than just chance. Sensitivity analysis was used to test the effect of different assumptions.


96 primary studies were identified; 37 were included in the meta-analysis and meta-regression and all 96 in the logistic regression. The proportion of observed variation in study results that can be explained by true heterogeneity (I2) was 99.9%. Estimates of effect of P4P schemes were lower in evaluations using randomized controlled trials (SMD=0.08; 95% CI: 0.01-0.15) compared to no controls (0.15; 95% CI: 0.09-0.21), and lower for those measuring outcomes (e.g., smoking cessation) (SMD=0.0; 95% CI: -0.01 to 0.01) compared to process measures (e.g., giving cessation advice) (0.18; 95% CI: 0.06-0.31). Adjusting for other design features and the evaluation method, the odds of showing a positive effect was three times higher for schemes with larger incentives (>5% of salary/usual budget) (OR=3.38; 95% CI: 1.07-10.64). There were non-statistically significant increases in the odds of success if the incentive is paid to individuals (as opposed to groups) (OR=2.0; 95% CI: 0.62-6.56) and if there is a lower perceived risk of not earning the incentive (OR=2.9; 95% CI: 0.78-10.83). Schemes evaluated using less rigorous designs were 24 times more likely to have positive estimates of effect than those using randomized controlled trials (OR=24; 95% CI: 6.3-92.8).


Estimates of the effectiveness of incentive schemes on health outcomes are probably inflated due to poorly designed evaluations and a focus on process measures rather than health outcomes. Larger incentives and reducing the perceived risk of non-payment may increase the effect of these schemes on provider behavior.


Design; Evaluation; Heterogeneity; Meta-regression; Pay-for-performance (P4P)

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center