Logo of bmjLink to Publisher's site
BMJ. 2002 Aug 10; 325(7359): 327–330.
PMCID: PMC1123834

Removal of radiation dose response effects: an example of over-matching

J L Marsh, PhD student,a J L Hutton, senior lecturer,a and Keith Binks, research group managerb

Over-matching can be a design fault in case-control studies and may lead to bias. J L Marsh and colleagues describe a case-control study of workers at a nuclear reprocessing plant in which over-matching obscured the relation between radiation exposure and mortality from leukaemia

This paper describes the rationale of a case-control study, explains the process of stratification through matching, and summarises the mechanism of confounding. We carried out a matched case-control study of workers at BNFL (British Nuclear Fuels) to clarify the results of past cohort studies, which had found a significant association between risk of leukaemia and cumulative external radiation dose. Fitted models from the current study contradicted these results. After examining the relation between the matching factors and the dose, we suggest that over-matching is a cause of the contradiction.

Summary points

  • Case-control studies are useful in examining the epidemiology of rare diseases
  • They can be biased through a design fault called over-matching
  • The current case-control study of Sellafield workers illustrates over-matching and shows the dependence of conclusions on the correct design


Matching in case-control studies

In a case-control study, individuals with the disease of interest (cases) are taken from the population, together with a random sample of the remaining, healthy individuals (controls). Exposure of the two groups to the risk factors in question is examined. If a significant difference in exposure history is found, it can be inferred that these risk factors have some association with the probability of disease.

If other factors influence the probabilities of disease and exposure to risks, this can disguise the influence of the risk factors in question. This confounding can be removed by stratification. This must be done carefully: stratifying too finely will cause information to be lost, but stratifying too coarsely will not remove enough confounding. The ratio of cases to controls must remain the same in each stratum, or further confounding will occur as a result of stratification. Matching the controls to the cases so that all individuals in a matched set have the same level of the confounding factor provides a solution to this problem by simultaneously stratifying and equalising the ratios of cases to controls at each level. However, the relation of confounder to exposure must be carefully examined. True confounding will occur only if the confounder is associated with both the exposure and the disease. If the exposure itself leads to the confounder, or has equal status with it, then stratifying by the confounder will also stratify by the exposure, and the relation of the exposure to the disease will be obscured. This is called over-matching and leads to biased estimates of risk (fig (fig11).1

Figure 1
Two ways that over-matching might occur. In the upper part of the diagram, the confounding factor is part of the overall link between exposure and disease; in the lower part, the exposure and confounder have equal status in their relation to disease. ...

Radiation and leukaemia

The disease of interest is leukaemia, and the risk factor in question is cumulative occupational external radiation dose.

Smith and Douglas analysed the mortality of the cohort of workers at Sellafield who had been first employed before 1976 to examine the effects of occupational exposure to radiation.2 They followed up this cohort to the end of 1983 using the NHS Central Registers held by the Office for National Statistics (formerly the Office of Populations Censuses and Surveys). They examined the relation between dose and cause specific mortality to investigate the hypothesis that dose was positively associated with risk. A non-significant positive association between cumulative radiation dose and mortality from leukaemia was found when mortality was related to the dose accumulated up to 15 years before death, to allow for a latency period.

The follow up period was extended to the end of 1988 by Douglas et al3 who used a slightly different definition of leukaemia from that used in the previous paper. This study also found a positive association between external dose and leukaemia mortality, which was now significant for all the latency periods considered.

Omar et al then extended the follow up period to the end of 1992 and also examined mortality and morbidity in plutonium workers at Sellafield, comparing them with other workers at the plant.4 They found that among radiation workers, significant positive associations between leukaemia mortality and dose were found for latency periods of zero and two years. These trends were both strengthened when the deaths from chronic lymphatic leukaemia (a leukaemia thought not to be induced by radiation) were excluded.5

The individual measurements of dose recorded from the film badges worn by workers are now computerised for the cases and their matched controls. Dose records were kept if there was reason to believe that exposure to ionising radiation in the course of work would occur. The purpose of the dose record was to ensure compliance with health and safety regulations, rather than for epidemiological purposes. We aimed in our case-control study to examine the influence of variation in the annual dose summaries on the calculation of the risk estimates.

Method: analysis of a matched case-control study

A 1:J matched case-control study involves i matched sets (i=1, . . ., I), each containing 1 case and J controls. We considered a scalar exposure x. The logistic model was generalised to reflect the set effect in the intercept term, so the model fitted is as follows:

logit P(Yij=1‖x)=αi+βxij where

Yij=1 for the case in the ith set (j=0) or

Yij=0 for control j in the ith set (j=1, . . . , J).

The αi's are essentially nuisance parameters that may be eliminated by considering the conditional likelihood: An external file that holds a picture, illustration, etc.
Object name is marj4913.f4.gif

and point estimates of the relative risk associated with exposure can be obtained by exponentiating the estimate of β.1

Models fitted

We discuss here only a subset (namely, from the Sellafield site) of the total case-control study population. This subset contains 37 cases who had died from leukaemia at the beginning of the study and two sets of 148 controls matched to the cases at a ratio of 1:4 (333 workers in total).

The matching factors were site, sex, work status (office workers v workers handling radioactive material), date of birth (within two years), date of entry (within two years), and case's date of death (control alive at the time).

Six partitions of the data arise from two methods of matching and three time periods. The first three sets were matched on all factors (“fully matched”), the second three matched on all factors except date of entry (“partially matched”). We collected the partially matched data because of concern about possible over-matching on date of entry. Controls in the first three sets were therefore not the same as in the second three sets. The full set of data contained all 37 cases, plus their associated controls. The next two sets corresponded to all cases used in the 19994 and 19943 papers, plus their associated controls selected from the relevant available controls.

Conditional logistic models were fitted using SAS PROC PHREG to each of the six sets.6 The only covariate used was cumulative lifetime external radiation dose (in sieverts), censored (for controls) at the date of the case's death to ensure that the correct exposure period was used. Table Table11 shows the results for the fully matched data, and table table22 the partially matched data. Figure Figure22 show the fitted lines for each estimate of β in these tables, over the range of radiation doses, for the fully and partially matched datasets respectively.

Figure 2
Fitted lines for fully matched and partially matched datasets
Table 1
Conditional logistic regression results for fully matched datasets
Table 2
Conditional logistic regression results for partially matched datasets


For the fully matched data (table (table1)1) the parameter corresponding to cumulative lifetime dose is not significant in any of the three fitted models even at the 10% level (likelihood ratio P value >0.1). This is not consistent with results from the earlier two papers, which both found a significant positive association between leukaemia mortality and external dose.3,4

For the partially matched data (table (table2)2) the results from the 1994 dataset show a significant positive association between leukaemia mortality and external dose, at the 0.1% level. The results from the 1999 dataset also show a positive trend that is not quite significant at the 5% level; the results from the full set show a positive trend that is not significant at the 10% level. The effect seems to decrease in magnitude and significance with accumulating data.


The results for the fully matched datasets contradict those of previous studies of the cohort of workers at Sellafield. Omar et al4 and Douglas et al3 both found a significant positive association between mortality from leukaemia and external radiation dose, whereas the results from the case-control study do not find any significant association. Although gross measurement error of dose may have distorted results, this is unlikely as so many studies (both occupational and non-occupational) have shown a positive relation, and examination of the errors has shown them to be reasonably small.

This prompted an examination of the study design, in terms of the matching factors used. Matching on sex and date of birth is necessary because of the underlying difference of the risks of leukaemia between the sexes, and with changing age. Matching on date of entry was thought necessary because the risk of leukaemia changes with calendar time, and this was intended to eliminate this. However, dose also changes with calendar time, hence the collection of the partially matched data. Figure Figure33 shows box plots of radiation doses for each “approved dosimetry service year” (a period correlating approximately with a calendar year used for operational convenience for issuing radiation film badges), for the fully matched, full set.

Figure 3
Box plots of radiation doses for each approved dosimetry service (ADS) year. The annualised dose for an ADS year is the sum of the subject's film badge readings for that ADS year

The general decline in median dose shows that dose and time are associated. The situation seems to be one where dose is partially “explained” by date of entry, both being related to time.


The matching on date of birth and date of entry meant that workers of the same age were working at the same times. As well as eliminating the effect of calendar time, this seems to have had the effect that workers in the same matched set have broadly similar recorded doses. The apparent over-matching on date of entry has distorted the parameter estimate of the risk of leukaemia on cumulative dose by introducing matching (at least partially) on dose.

Breslow and Day have said: “Occasions arise in which the association of one factor with disease appears at least partially to be explained by a second factor (associated both with disease and with the first factor), but where the two factors are essentially measuring the same thing, or where the second factor is a consequence of the first.”1

Although over-matching is mentioned in textbooks, substantive examples are not common. We have presented an example in which results from other studies were different. This difference indicated that over-matching was likely to be a problem.


We thank BNFL, which allowed its data to be used in the study, and the Office for National Statistics for information on mortality. We also thank David McGeoghegan of Westlakes Scientific Consulting for his important contribution to the design, the data collection, and the examination of the matching factors; and Steve Whaley, also of Westlakes Scientific Consulting, whose work on the nested case-control analysis identified date of entry as the likely factor causing the over-matching.


Funding: The Engineering and Physical Sciences Research Council (EPSRC), Westlakes Scientific Consulting (CASE studentship).

Competing interests: JLH was an investigator on the EPSRC CASE award, which funded JLM; the CASE partner was Westlakes Scientific Consulting. KB was also an investigator on the EPSRC CASE award and is employed by Westlakes Scientific Consulting, which receives research funding from British Nuclear Fuels to support KB, DMcG, and SW in their work on the case-control study.


1. Breslow NE, Day NE. Statistical methods in cancer research. Volume 1—The analysis of case-control studies. Lyons: IARC Scientific Publications; 1980. p. 94. [PubMed]
2. Smith PG, Douglas AJ. Mortality of workers at the Sellafield plant of British Nuclear Fuels. BMJ. 1986;293:845–854. [PMC free article] [PubMed]
3. Douglas AJ, Omar RZ, Smith PG. Cancer mortality and morbidity among workers at the Sellafield plant of British Nuclear Fuels. Br J Cancer. 1994;70:1232–1243. [PMC free article] [PubMed]
4. Omar RZ, Barber JA, Smith PG. Cancer mortality and morbidity among plutonium workers at the Sellafield plant of British Nuclear Fuels. Br J Cancer. 1999;79:1288–1301. [PMC free article] [PubMed]
5. Committee on the Biological Effects of Ionizing Radiations. Washington, DC: National Academy Press; 1990. Health effects of exposure to low levels of ionizing radiation.
6. SAS/STAT users guide. Version 8. Cary, NC: SAS Institute; 2000. pp. 2569–2659.

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Group
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...