Imbalance p values for baseline covariates in randomized controlled trials: a last resort for the use of p values? A pro and contra debate

Andreas Stang; Christopher Baethge

doi:10.2147/CLEP.S161508

Imbalance p values for baseline covariates in randomized controlled trials: a last resort for the use of p values? A pro and contra debate

Clin Epidemiol. 2018 May 8:10:531-535. doi: 10.2147/CLEP.S161508. eCollection 2018.

Authors

Andreas Stang^{1

2}, Christopher Baethge^{3

4}

Affiliations

¹ Center of Clinical Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, Medical Faculty, University Hospital of Essen, Hufelandstr, Essen, Germany.
² Department of Epidemiology, School of Public Health, Boston University, Boston, MA, USA.
³ Department of Psychiatry and Psychotherapy, University of Cologne Medical School, Cologne, Germany.
⁴ Editorial Offices, Deutsches Ärzteblatt and Deutsches Ärzteblatt International, Deutscher Ärzte-Verlag, Cologne, Germany.

Abstract

Background: Results of randomized controlled trials (RCTs) are usually accompanied by a table that compares covariates between the study groups at baseline. Sometimes, the investigators report p values for imbalanced covariates. The aim of this debate is to illustrate the pro and contra of the use of these p values in RCTs.

Pro: Low p values can be a sign of biased or fraudulent randomization and can be used as a warning sign. They can be considered as a screening tool with low positive-predictive value. Low p values should prompt us to ask for the reasons and for potential consequences, especially in combination with hints of methodological problems.

Contra: A fair randomization produces the expectation that the distribution of p values follows a flat distribution. It does not produce an expectation related to a single p value. The distribution of p values in RCTs can be influenced by the correlation among covariates, differential misclassification or differential mismeasurement of baseline covariates. Given only a small number of reported p values in the reports of RCTs, judging whether the realized p value distribution is, indeed, a flat distribution becomes difficult. If p values ≤0.005 or ≥0.995 were used as a sign of alarm, the false-positive rate would be 5.0% if randomization was done correctly, and five p values per RCT were reported.

Conclusion: Use of a low p value as a warning sign that randomization is potentially biased can be considered a vague heuristic. The authors of this debate are obviously more or less enthusiastic with this heuristic and differ in the consequences they propose.

Keywords: distribution; random allocation; randomized controlled trial; statistical.