NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

North of England Dyspepsia Guideline Development Group (UK). Dyspepsia: Managing Dyspepsia in Adults in Primary Care. Newcastle upon Tyne (UK): University of Newcastle upon Tyne; 2004 Aug 1. (NICE Clinical Guidelines, No. 17.)

• This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Show details

# Appendix 1Describing the results of trials

## Binary outcomes

A binary outcome provides two possibilities, for example: alive or dead; still on treatment or withdrawn from treatment. Binary data may be expressed in several ways in clinical studies. These are primarily odds ratios, risk ratios (also known as relative risks) and risk differences. Binary data from a comparative trial can be shown in a two by two table:

Intervention GroupAB
ControlCD

Odds ratios are defined as: $AB/CD$

In other words, the odds ratio is the odds of death in the intervention group (number of deaths divided by the number of survivors) divided by the odds of death in the control group.

Risk Ratios are defined as: $AA+B/CC+D$

The risk ratio is the proportion of deaths in the intervention group (number of deaths in the intervention group divided by the total number allocated to the intervention) divided by the proportion of deaths in the control group. Trials sometimes refer to relative risk reductions (RRRs) which are calculated as one minus the Risk Ratio.

Risk Differences are defined as: $AA+B-CC+D$

The Risk Difference is the proportion of deaths in the intervention group (number of deaths in the intervention group divided by the total number allocated to the intervention) minus the proportion of deaths in the control group.

Worked Example:

In a trial of an ACE inhibitor in patients with heart failure there were 452 deaths among 1,285 patients randomised to receive enalapril, and 510 deaths among 1,284 allocated to control after an average follow-up of 4.5 years [a]. Shown in a two by two table this is:

Intervention Group452833
Control510774

Using the formulae provides an odds ratio of 0.82, a risk ratio of 0.89, and a risk difference of −0.045 (or a 4.5% reduction in the risk of death).

Each measure has advantages and disadvantages. The Odds Ratio is a statistically robust measure, but is hard to interpret clinically. The Risk Ratio is superficially easier to interpret, and both odds ratios and risk ratios may be particularly useful when attempting to combine studies which are estimating the same common underlying effect, but in which both severity of condition and length of follow up may vary. Neither measure is sufficient for clinical decision making alone: an odds ratio or risk ratio apparently showing a large effect from an intervention will not lead to large benefits in practice where the events are rare, and an apparently small relative effect may have a substantial impact where events are very common.

Risk Differences are not very helpful for exploring common underlying effects, but are very useful for describing the practical importance of the effects of treatment. Similarly, Number Needed to Treat is used to describe absolute benefits (NNT is the inverse of the risk difference: 1/0.045 or 22 in our example). It expresses the number of patients that would have to receive the intervention for one patient to receive (or avoid) the outcome described in a trial. A main advantage of the risk difference is that it expresses the practical value of interventions and allows comparisons between alternative treatments. However, a standard problem for risk differences and numbers needed to treat is that they are often derived from trials that have different lengths of follow up. The risk difference tends to become bigger as follow-up increases. Thus the incidence risk difference is used to estimate treatment effects using a common time frame, for example the number of deaths avoided as a result of treating 1,000 patients for a year [b].

Trials enrol a sample from the population of all patients and estimate the effect of treatments. These estimates have a degree of uncertainty which becomes less the bigger the sample size. A Confidence Interval (CI) for a treatment effect estimated in a trial is the range in which the actual population treatment effect is assumed to lie, with a specified probability. The specified probability is arbitrary: 95% is the most commonly chosen value, meaning that the true underlying treatment effect is assumed to lie within the range 19 times out of 20. The smaller the confidence interval, the greater the precision of measurement in the study. More precise confidence intervals are achieved, all things being equal, by studies which enrol more patients. The best and most likely estimate of effect is the point estimate at the centre of the confidence interval range. For our example the best estimate was that after nearly 5 years of treatment, an ACE inhibitor achieves a 4.5% reduction in the risk of death with a 95% confidence interval of 0.8% to 8.3%.

## Meta-analysis of binary data

Commonly more than one trial exists to inform the value of a particular treatment. Where studies feature similar designs and use adequately similar outcomes it is possible to combine these to obtain an overall estimate of effect. This statistical process, called meta-analysis, involves taking a weighted average of the results of trials, where the most informative trials (biggest and with most events) contribute most to the overall result. Figures called forest plots are often used to display the findings of meta analyses. The example below shows a meta-analysis of the results of trials of statin therapy following a myocardial infarction to reduce the risk of subsequent mortality. The finding from each trial is shown as a mark on a graph with a line showing its confidence interval. In this instance, the mark used is a box, the size of which indicates how important the trial is to the combined, or pooled, result. The pooled finding is shown after the individuals studies (in the example as a lozenge) and indicates a risk ratio for death of 0.79 or 79% for patients receiving a statin when compared to those receiving placebo. Alternatively this may be expressed as a 21% relative reduction in the risk of death. The 95% confidence indicates, 19 times out of 20, that the true effect of the drug will lie between a relative reduction of 72% and 86%: the range excludes the line of no effect or no change (one). The advantage of meta analysis is it provides the most precise guess at the effect of treatment reflecting all available studies. However, if the studies themselves have limitations or differ in important ways, then meta analysis can be misleading.

## Meta-analysis of continuous data

Many outcomes are not binary but continuous (or nearly so), such as blood pressure readings and pain or symptom scores. With continuous data, the mean score for treatment and control groups in each trial are subtracted to calculate a mean difference (for example a reduction in blood pressure) and confidence intervals for this change are calculated using standard formulae that reflect the spread of the data (referred to as the standard deviation). Where studies use a common continuous outcome measure, meta-analysis can combine these to calculate a summary weighted mean difference comparing treatment and control groups.

Dichotomising data that are naturally continuous (for example into treatment failures and successes) is not generally advisable. It is often arbitrary, may result in pooling scores based on different cut-offs in different studies or cut-offs that have been identified with knowledge of the data and thus show the data in a favourable light. Dichotomisation may exaggerate small differences in effect, and more fundamentally the approach removes much information from the original data.

### Standardisation

When there are concerns that measurement between studies is not undertaken using a common metric, standardised mean differences can be calculated for each trial. Examples might be where different but related measures are used to estimate the same outcome in patients, or where it is likely that measures are used inconsistently by different investigators. Standardisation is achieved by dividing mean differences from studies by their standard deviation [c, d]. Standardised weighted mean differences lack physical interpretation but can be worked back to a value on an original physical scale.

## Studies examining different doses

Sometimes trials examine multiple dose regimens compared with a single control group. These trials are often conducted early during product development, are designed to examine the most appropriate dosage of a drug and may include groups receiving doses both within and outside the range ultimately licensed. It is important that such comparisons are not considered separately in the analyses, since they share a single control group and the resulting confidence intervals will be inappropriately narrow. In order to include all relevant information without undue statistical precision, an average effect is estimated for the range of therapeutic doses available.

## Naturalistic studies

Double-blind randomised trials are occasionally criticised for inadequately representing treatment in the real world. In other words, trials that use a well defined population without co-morbidity, limit treatment options and make both the doctor and patient blind to the treatment received may provide different results from those realised in practice. The evaluation of pharmaceuticals is best undertaken using a series of experimental studies. This is reflected in phase II and III studies (small-scale dose ranging through to larger trials, often for licensing). Studies in phase IV may relax some of the requirements of the earlier trials in order to better reflect the real world: these may include relaxation of blinding, limiting clinical strategies such as choice of drug after initial randomisation and co-morbidity. Such studies have been described as ‘contaminated with the real world’ [e] and it may be difficult to work out what is being estimated (particularly with, say, strong patient or doctor preferences for one treatment). However, when examined with the earlier phase III trials, they may add useful information.

## Meta-regression Analysis

Where a number of trials examine the same underlying question, more complex techniques may be used to understand trial evidence. Regression models can explore whether the size of benefit from treatments varies with certain factors such as age or the presence of other diseases [f].

## References (Appendix 1)

a.
SOLVD Investigators. Effect of enalapril on survival in patients with reduced left ventricular ejection fractions and congestive heart failure. N Engl J Med. 1991;325:293–302. [PubMed: 2057034]
b.
Freemantle N, Mason JM, Eccles M. Deriving Treatment Recommendations From Evidence Within Randomised Trials: The Role and Limitation of Meta Analysis. Int J Technol Assess Health Care. 2000;15:304–315. [PubMed: 10507190]
c.
Hedges LV, Olkin I. London: Academic Press; Statistical methods for meta-analysis . 1985
d.
Hedges LV. Meta-analysis. Journal of Educ Stat. 1992;17:279–96.
e.
Freemantle N, Drummond MF. Should Clinical Trials With Concurrent Economic Analyses Be Blinded? JAMA. 1997;277:63–4. [PubMed: 8980212]
f.
Smith J, Channer KS. Increasing prescription of drugs for secondary prevention after MI. BMJ. 1995;311:917–8. [PMC free article: PMC2550918] [PubMed: 7580549]
Bookshelf ID: NBK53742