Logo of bmjLink to Publisher's site
BMJ. 2003 Apr 12; 326(7393): 816–819.
PMCID: PMC1125721
Improving the quality of health care

Research methods used in developing and applying quality indicators in primary care

S M Campbell, research fellow,a J Braspenning, senior researcher,b A Hutchinson, professor in public health,c and M N Marshall, professor of general practicea

Before we can take steps to improve the quality of health care, we need to define what quality care means. This article describes how to make best use of available evidence and reach a consensus on quality indicators

Quality improvement is part of the daily routine for healthcare professionals and a statutory obligation in many countries. Quality can be improved without measuring it—for example, by guiding care prospectively in the consultation using clinical guidelines.1 It is also possible to assess quality without quantitative measures, by using approaches such as peer review, videoing consultations, and patient interviews. Measurement, however, plays an important part in improvement.2 We discuss the methods available for developing and applying quality indicators in primary care.

Summary points

  • Most quality indicators are used in hospital practice but they are increasingly being developed for primary care
  • The information required to develop quality indicators can be derived by systematic or non-systematic methods
  • Non-systematic methods are quick and simple but the resulting indicators may be less credible than those developed by using systematic methods
  • Systematic methods can be based directly on scientific evidence or clinical guidelines or combine evidence and professional opinion
  • All measures should be tested for acceptability, feasibility, reliability, sensitivity to change, and validity

What are quality indicators?

Indicators are explicitly defined and measurable items referring to the structures, processes, or outcomes of care.3 Indicators are operationalised by using review criteria and standards, but they are not the same thing; indicators are also different from guidelines (box (boxB1).B1). Care rarely meets absolute standards,5 and standards have to be set according to local context and patient circumstances.6,7

Definitions and examples of guidelines, indicators, review criteria, and standards

Activity indicators measure how frequently an event happens, such as the rate of influenza immunisation. In contrast, quality indicators infer a judgment about the quality of care provided,6 and performance indicators8 are statistical devices for monitoring performance (such as use of resources) without any necessary inference about quality. Indicators do not provide definitive answers but indicate potential problems or good quality of care. Most indicators have been developed for use in hospitals but they are increasingly being developed for use in primary care.

Principles of development

Three preliminary issues require consideration when developing indicators. The first is which aspects of care to assessw1 w2: structures (staff, equipment, appointment systems, etc),w3 processes (such as prescribing, investigations, interactions between professionals and patients),9 or outcomes (such as mortality, morbidity, or patient satisfaction).w4 Our focus is on process indicators, which have been the primary object of quality assessment and improvement.2,10 The second issue is that stakeholders have different perspectives about quality of care.2 w5 For example, patients often emphasise good communication skills, whereas managers' views are often influenced by data on efficiency. It is important to be clear which stakeholder views are being represented when developing indicators. Finally, development of indicators requires supporting information or evidence. This can be derived by systematic or non-systematic methods.

Non-systematic research methods

Non-systematic approaches are not evidence based, but indicators developed in this way can still be useful, not least because they are quick and easy to create. An example includes a quality improvement project based on one case study such as a termination of pregnancy in a 13 year old girl.11,12 Examination of her medical records showed two occasions when contraception could have been discussed, and this led to the development of a quality indicator relating to contraceptive counselling.

Systematic, evidence based methods

Whenever possible, indicators should be based solely on scientific evidence such as rigorously conducted (trial based) empirical studies.13,14 The better the evidence, the stronger the benefits of applying the indicators in terms of reduced morbidity and mortality. An example of an evidence based indicator is that patients with confirmed coronary artery disease should receive low dose (75 mg) aspirin unless contraindicated, as aspirin is associated with health benefits in such patients.

Systematic methods combining evidence and expert opinion

Many areas of health care have a limited or methodologically weak evidence base,2,6,15 especially within primary care. Quality indicators therefore have to be developed using other evidence alongside expert opinion. However, because experts often disagree on the interpretation of evidence, rigorous methods are needed to incorporate their opinion.

Consensus methods are structured facilitation techniques that explore consensus among a group of experts by synthesising opinions. Group judgments are preferable to individual judgments, which are prone to personal bias. Several consensus techniques exist,1619 including consensus development conferences,17 w6 the Delphi technique,w7 w8 the nominal group technique,w9 the RAND appropriateness method,20 w10 and iterated consensus rating procedures (table).21

Consensus development conferences

In this technique, a selected group of about 10 people are presented with evidence by interested individuals or organisations that are not part of the decision making group. The selected group discusses this evidence and produces a consensus statement.w11 However, unlike the other techniques, these conferences use implicit methods for aggregating the judgments of individuals (such as majority voting). Explicit techniques use aggregation methods in which panellists' judgments are combined using predetermined mathematical rules, such as the median of individual judgments.17 Moreover, although these conferences provide a public forum for debate, they are expensive16 and there is little evidence of their effect on clinical practice or patient outcomes.w12

Indicators derived from guidelines by iterated consensus rating procedure

Indicators can be based on clinical guidelines.w13 w14 Review criteria derived directly from clinical guidelines are now part of NHS policy in England and Wales through the work of the National Institute for Clinical Excellence. One example is the management of type 2 diabetes.w15 Iterated consensus rating is the most commonly used method in the Netherlands,w13 w16 where indicators are based on the effect of guidelines on outcomes of care rated by expert panels and lay professionals.w17

Delphi technique

The Delphi technique is a postal method involving two or more rounds of questionnaires. Researchers clarify a problem, develop questionnaire statements to rate, select panellists to rate them, conduct anonymous postal questionnaires, and feed back results (statistical, qualitative, or both) between rounds. It has been used to develop prescribing indicators.w18 A large group can be consulted from a geographically dispersed population, although different viewpoints cannot be debated face to face. Delphi procedures have also been used to develop quality indicators with users or patients.w19

Nominal group technique

The nominal group technique aims to structure interaction within a group of experts.16,17 w9 The group members meet and are asked to suggest, rate, or prioritise a series of questions, discuss the questions, and then re-rate and prioritise them. The technique has been used to assess the appropriateness of clinical interventionsw20 and to develop clinical guidelines.w21 This technique has not been used to develop quality indicators with patients, although it has been used to determine patients' views of, for example, diabetes.w22

RAND appropriateness method

The RAND method requires a systematic literature review for the condition to be assessed, generation of indicators based on this literature review, and the selection of expert panels. This is followed by a postal survey, in which panellists are asked to read the evidence and rate the preliminary indicators, and a face to face panel meeting, in which panellists discuss and re-rate each indicator.w10 The method therefore combines characteristics of both the Delphi and nominal group techniques. It has been described as the only systematic method of combining expert opinion and evidence.w23 It also incorporates a rating of the feasibility of collecting data.

The method has been used mostly to develop review criteria for clinical interventions in the United Statesw24 and the United Kingdom.7 w25 As with the nominal group technique, panellists meet and discuss the criteria, but because panellists have access to a systematic literature review, they can ground their ratings in the scientific evidence. Agreement between similar panels rating the same indicators has been found to have greater reliability than the reading of mammograms.w10 However, users or patients are rarely included, and the cost implications are not considered.

An external file that holds a picture, illustration, etc.
Object name is cams19083.f1.jpg

Maximising effectiveness

  • Several factors affect the outputs derived using consensus techniques.19 These include:
  • Selection of participants (number, level of homogeneity, etc)
  • How the information is presented (for example, level of evidence)
  • How the interaction is structured (for example, number of rounds)
  • Method of synthesising individual judgments (for example, definition of agreement)
  • Task set (for example, questions to be rated).

The composition of the group is particularly important. For example, group members who are familiar with a procedure are more likely to rate it higher.w26 The feedback provided to panellists is also important.w27

Group meetings rely on skilled moderators and on the willingness of the group to work together in a structured meeting. Unlike postal surveys, group meetings can inhibit some members if they feel uncomfortable sharing their ideas, although panellists' ratings carry equal weight, however much they have contributed to the debate. Panels for group meetings are smaller than Delphi panels for practical reasons.

Research methods for applying indicators

Measures developed by consensus techniques have face validity and those based on rigorous evidence possess content validity. This is a minimum prerequisite for any quality measure. All measures have to be tested for acceptability, feasibility, reliability, sensitivity to change, and validity.3,22 This can be done by assessing measures' psychometric properties (including factor analyses), surveys (patient or practitioner, or both), clinical or organisational audits, interviews or focus groups. Box BoxB2B2 gives an example of the development and testing of review criteria for angina, asthma, and diabetes.9,23

Developing and applying review criteria for angina, asthma, and type 2 diabetes


The acceptability of the data collected depends on whether the findings are acceptable to both those being assessed and their assessors. For example, doctors and nurses can be asked about the acceptability of review criteria being used to assess their quality of care.


Information about quality of care is often driven by availability of data.w28 Quality is difficult to measure without accurate and consistent information,w1 which is often unavailable at both the macro (health organisations) and micro (individual medical records) level.w29 Quality indicators must also relate to enough patients to make comparing data feasible—for example, by excluding those aspects of care that occur in less than 1% of clinical audit samples.


Reliability refers to the extent to which a measurement with an indicator is reproducible. This depends on several factors relating to both the indicator itself and how it is used. For example, indicators should be used to compare organisations or practitioners with similar organisations or practitioners. The inter-rater reliability refers to the extent to which two independent raters agree on their measurement of an item of care.22 In one study, five diabetes criteria out of 31 developed using an expert panel9 were found to have poor agreement between raters when used in an audit.23

Sensitivity to change

Quality measures need to detect changes in quality of care in order to discriminate between and within subjects.22 This is an important and often forgotten dimension of a quality indicator.6 Little research is available on sensitivity to change of quality indicators using time series or longitudinal analyses.


Content validity in this context refers to whether any criteria were rated valid by panels contrary to known results from randomised controlled trials.w30 The validity of indicators has received more attention recently.3 w2 w31 Although little evidence exists of the content validity of the Delphi and nominal group techniques in developing quality indicators,16 there is some evidence of validity for indicators developed with the RAND method.w30 There is also evidence of the predictive validity of indicators developed with the RAND method.w32


Although it may never be possible to produce an error- free measure of quality, measures should be tested during their development and application for acceptability, feasibility, reliability, sensitivity to change, and validity. This will optimise their effectiveness in quality improvement strategies. Indicators are more likely to be effective if they are derived from rigorous scientific evidence. Because evidence in health care is often unavailable, consensus techniques facilitate quality improvement by allowing a broader range of aspects of care to be assessed and improved.7 However, simply measuring something will not automatically improve it, and indicators must be used within quality improvement approaches that focus on whole healthcare systems.24

Characteristics of informal and formal methods for developing consensus*

Supplementary Material

[extra: Further references]


This is the second of three articles on research to improve the quality of health care

Competing interests: None declared.

Further references are available on bmj.com. These are denoted in the text by the prefix w


1. Grimshaw JM, Russell IT. Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet. 1993;342:1317–1322. [PubMed]
2. Donabedian A. Explorations in quality assessment and monitoring. 1. The definition of quality and approaches to its assessment. Ann Arbor, MI: Health Administration Press; 1980.
3. McGlynn EA, Asch SM. Developing a clinical performance measure. Am J Prev Med. 1998;14:14–21. [PubMed]
4. Donabedian A. Explorations in quality assessment and monitoring. 2. The criteria and standards of quality. Ann Arbor, MI: Health Administration Press; 1982.
5. Seddon ME, Marshall MN, Campbell SM, Roland MO. Systematic review of studies of clinical care in general practice in the United Kingdom, Australia and New Zealand. Quality in Health Care. 2001;10:152–158. [PMC free article] [PubMed]
6. Lawrence M, Olesen F. Indicators of quality health care. Eur J Gen Pract. 1997;3:103–108.
7. Marshall M, Campbell SM, Hacker J, Roland MO, editors. Quality indicators for general practice: a practical guide for health professionals and managers. London: Royal Society of Medicine; 2002.
8. Buck D, Godfrey C, Morgan A. Performance indicators and health promotion targets. York: Centre for Health Economics, University of York; 1996. . (Discussion paper 150.)
9. Campbell SM, Roland MO, Shekelle PG, Cantrill JA, Buetow SA, Cragg DK. Development of review criteria for assessing the quality of management of stable angina, adult asthma and non-insulin dependent diabetes in general practice. Quality in Health Care. 1999;8:6–15. [PMC free article] [PubMed]
10. Brook RH, McGlynn EA, Shekelle PG. Defining and measuring quality of care: a perspective from US researchers. Int J Qual Health Care. 2000;12:281–295. [PubMed]
11. Pringle M. Preventing ischaemic heart disease in one general practice: from one patient, through clinical audit, needs assessment, and commissioning into quality improvement. BMJ. 1998;317:1120–1124. [PMC free article] [PubMed]
12. Pringle M. Clinical governance in primary care. Participating in clinical governance. BMJ. 2000;321:737–740. [PMC free article] [PubMed]
13. Hearnshaw HM, Harker RM, Cheater FM, Baker RH, Grimshaw GM. Expert consensus on the desirable characteristics of review criteria for improvement of health quality. Quality in Health Care. 2001;10:173–178. [PMC free article] [PubMed]
14. McCall A, Roderick P, Gabbay J, Smith H, Moore M. Performance indicators for primary care groups: an evidence-based approach. BMJ. 1998;317:1354–1360. [PMC free article] [PubMed]
15. Naylor CD. Grey zones in clinical practice: some limits to evidence based medicine. Lancet. 1995;345:840–842. [PubMed]
16. Jones JJ, Hunter D. Consensus methods for medical and health services research. BMJ. 1995;311:376–380. [PMC free article] [PubMed]
17. Murphy MK, Black NA, Lamping DL, McKee CM, Sanderson CFB, Ashkam J, et al. Consensus development methods, and their use in clinical guideline development. Health Technol Assess 1998;2(3). [PubMed]
18. Fink A, Kosecoff J, Chassin M, Brook RH. Consensus methods: characteristics and guidelines for use. Am J Pub Health. 1984;74:979–983. [PMC free article] [PubMed]
19. Black N, Murphy M, Lamping D, McKee M, Sanderson C, Ashkam J, et al. Consensus development methods: a review of best practice in creating clinical guidelines. Journal of Health Services Research and Policy. 1999;4:236–248. [PubMed]
20. Brook RH, Chassin MR, Fink A, Solomon DH, Kosecoff J, Park RE. A method for the detailed assessment of the appropriateness of medical technologies. International Journal of Technology Assessment in Health Care. 1986;2:53–63. [PubMed]
21. Braspenning J, Drijver R, Schiere AM. Kwaliteits— en doelmatigheidsindicatoren voor het handelen in de huisartspraktijk. Nijmegen, Utrecht: Centre for Quality of Care Research, Dutch College of General Practitioners; 2001.
22. Streiner DL, Norman GR. Health measurement scales: a practical guide to their development and use. Oxford: Oxford Medical Publications; 1995.
23. Campbell SM, Hann M, Hacker J, Roland MO. Quality assessment for three common conditions in primary care: validity and reliability of review criteria developed by expert panels for angina, asthma and type 2 diabetes. Quality and Safety in Health Care. 2002;11:125–130. [PMC free article] [PubMed]
24. Ferlie EB, Shortell SM. Improving the quality of health care in the United Kingdom and the United States: A framework for change. Milbank Q. 2001;79:281–315. [PMC free article] [PubMed]

Articles from BMJ : British Medical Journal are provided here courtesy of BMJ Group
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...