U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Turner J, Siriwardena AN, Coster J, et al. Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme. Southampton (UK): NIHR Journals Library; 2019 Apr. (Programme Grants for Applied Research, No. 7.3.)

Cover of Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme

Developing new ways of measuring the quality and impact of ambulance service care: the PhOEBE mixed-methods research programme.

Show details

Identifying potential measures to assess ambulance service performance and quality of care

The overall aim of this first workstream (see Figure 2) was to explore, as broadly as possible, the range of potential measures that might be used to assess ambulance service performance and quality of care and then, through a consensus process, reduce this down to those suitable for further development as risk-adjusted measures. This was achieved using a stepwise process of five different activities:

  1. Two systematic searches and syntheses of the relevant literature to identify candidate measures.
  2. A qualitative study with recent users of the ambulance service to identify which aspects of ambulance service care were important to patients and carers.
  3. A consensus event at which we presented the outputs from steps 1 and 2 to a group of people representing different interests in ambulance service care and asked participants to rate the importance of the potential measures.
  4. The highest-scoring measures from the consensus event were then developed into more detailed measures and a Delphi survey and a patient and public involvement (PPI) event were conducted to further rate and prioritise them.
  5. A review and assessment of the results from step 4 by the programme steering group to identify the final small set of measures for development in workstream 3.

Systematic searches and review of related research evidence

We conducted two systematic searches to review, assess and synthesise the research literature for existing and potential process and patient outcome measures for prehospital care. These were not conventional systematic reviews in that we were not appraising evidence of the effects of prehospital care. The aim was to identify all measures that had been used to assess the impact, quality and safety of prehospital care as well as potential and as yet untested measures, using systematic searching and evidence synthesis strategies. We conducted two reviews so that we could examine both policy literature and primary research evidence.

Review 1: documentary analysis of policy documents

The first review was designed to identify actual and aspirational quality and performance measures of ambulance and prehospital care. We used a comprehensive search strategy to search four electronic databases: MEDLINE, Scirus, Scopus and Google Scholar (see Appendix 1, Table 15). We also searched relevant websites such as the Department of Health and Social Care,19 National Association of Emergency Medical Services Physicians20 and NHS Confederation.21 We supplemented the searches with our own extensive archive from previous related research studies. Any policy documents produced by national, regional or professional organisations or agencies were included, but these were limited to those in the English language published between 2000 and 2011 to ensure relevance. Searches were conducted in August 2011. The results of the searches are given in Figure 3.

FIGURE 3. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart for review 1.

FIGURE 3

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart for review 1. Reproduced from Moher et al. © 2009 Moher et al. This is an open-access article distributed under the terms of the Creative Commons Attribution (more...)

References were screened by six members of the research team. We screened 319 potential references, assessed 72 full-text papers and included 36 documents.

Double data extraction of included references was carried out by the same six researchers and the measures identified were classified using established frameworks of health-care quality (structure, process and outcome23 and timeliness; efficiency; effectiveness; safety; patient centredness and equity24).

Of the included references, the majority were discussion documents. Some were specific to ambulance services (also known as EMS), setting out the case for the inadequacies of using response times as a performance measure and the need to find alternatives, but stopping short of providing specific alternatives. Others were strategy documents for the management of specific conditions (primarily stroke, coronary heart disease or major trauma) that contained a section on prehospital management with suggestions for potential quality measures, for example time from a call to arriving at a specialist unit for stroke patients.

Of the documents describing performance measures in use or suggestions for measures, these documents were, unsurprisingly, dominated by time measures. After time measures, the most common measures were also process related, mainly recording what ambulance clinicians did to either assess patients or provide treatments. Service measures included types of response (e.g. if a paramedic was sent), how accurately a clinical problem was identified at the time of the emergency call and how calls were managed (e.g. the proportion managed at home and taken to hospital). There were few examples of patient outcome measures and this category was dominated by survival from cardiac arrest. Several documents supported the need to measure patient experience and satisfaction and some identified relief of symptoms, such as pain as important, but there were no examples of methods to do this routinely through quality measures.

Review 2: systematic literature search and synthesis of primary research studies

For the second review, we conducted a systematic literature search and synthesis of longitudinal studies, audits and evaluations of ambulance services (or EMS) performed at a local, regional or national level. The aim was to identify potential performance and quality measures that may have been used to assess differences between, or change in, the delivery of ambulance service care in primary research projects. Some of these may not have been considered as routine measures but could potentially be developed and adapted for this purpose.

We conducted a systematic search of five electronic databases: MEDLINE, EMBASE, CINAHL (Cumulative Index to Nursing and Allied Health Literature), ISI Web of Science and The Cochrane Library (see Appendix 1, Table 16). This was supplemented with references identified in review 1, hand-searching of included studies and articles from our own relevant archive. Any relevant research study that had investigated ambulance service delivery and care from a service perspective and incorporated some measurement of change was included.

We were aware that there was an enormous amount of related research literature for specific patient groups, particularly cardiac arrest and trauma, which could potentially overwhelm our search. Much of this research is about the clinical management of patients and its effectiveness. We therefore excluded studies for which the primary aim was to assess a specific clinical intervention or if it was a descriptive study. The key focus here was comparative research and the measurement of change. Included studies were limited to those in the English language and published between 2000 and 2011. Searches were conducted in October 2011.

As in review 1, references were screened by six members of the research team. For included studies, data extraction was completed using a two-stage process. As review 2 yielded a larger number of studies than review 1, double data extraction was carried out for a 10% random sample of references. For the first stage, six reviewers extracted descriptive information on study aim, population, setting and the main process or outcome measures used. For the second stage, three reviewers carried out a more detailed data extraction on each measure identified including the type of measure, how it was measured and how it was reported.

We screened 5088 references by title and abstract, reviewed 257 full-text references and included 139 references. We identified 136 different measures that were recorded 483 times in the included studies. The results of the searches are given in Figure 4.

FIGURE 4. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart for review 2.

FIGURE 4

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow chart for review 2. Reproduced from Moher et al. © 2009 Moher et al. This is an open-access article distributed under the terms of the Creative Commons Attribution (more...)

We classified measures into three broad groups: service (operational) measures, patient management measures and patient outcomes (see Appendix 1, Boxes 57). The largest group was service measures (41%), which mainly included a large number of time-interval measurements. This group also included call handling, skill level of response and type of response (e.g. transported or not transported). The patient management group accounted for 29% of measures and these were mainly concerned with clinical procedures and interventions, such as assessing symptoms and condition and treatment provided (e.g. drugs given, splinting, defibrillation and oxygen therapy).

It also included decisions about where to take patients (i.e. to the nearest ED or to a specialist hospital) and subsequent hospital measures, such as length of stay or where a patient was discharged to. The third group of patient outcomes included 30% of the identified measures but this was dominated by the single measure of survival (or mortality), which accounted for more than half of the measures. The reason for this is that there were many different end points for measuring survival ranging from < 1 day up to 5 years. A small number of functional measures were recorded, which included quality-of-life measures, physical disability and cognitive (brain) function. Some examples of the types of measures included in each category are provided in Table 1.

TABLE 1

TABLE 1

Example measures identified from systematic evidence review

Example search strategies for each review and a table of the categorisation of the 136 included measures are provided in Appendix 1. Further details on the findings and individual measures identified in the two reviews and the final list of combined measures identified from both reviews is provided in a supplementary file available at www.sheffield.ac.uk/scharr/sections/hsr/mcru/phoebe/reports.25

An update of the searches in 2016 did not reveal any new measures that need to be considered. In our original proposal we said that we would conduct a third systematic review of any tools or instruments that we identified that had been specifically constructed to measure performance for prehospital care. However, we found no relevant instruments and the only composite measures were already validated general tools to assess, for example, quality of life [EuroQol-5 Dimensions (EQ-5D)] or functional outcome (Glasgow Outcome Score); therefore, this review was not conducted. The results of both reviews were pooled and a list of potential measures was constructed using the three broad groups described above. This list provided the focus for the next stage of workstream 1 – the consensus event to begin identifying and prioritising measures for further development.

Consensus event

The next step in workstream 1 was to begin to reduce the number of potential measures identified in the evidence reviews by making an assessment of their importance and beginning the process of prioritising which of these might be suitable for further development.

Performance and quality measures can serve different purposes for different groups of people. For services, this can be how well they are providing a timely and appropriate response to people who request their help and whether or not they are providing the best clinical care determined by current best practice. For commissioners, these measures may help to judge whether or not a local service is performing well and where improvement is needed, which may require additional support and investment. For patients, these measures should provide some indication of what type of service they are likely to receive and whether or not this meets their expectations. For policy-makers and government, these measures should provide an overview of the current level of service provision, whether or not this is consistent with expected standards and to what extent there is variation in different parts of the country. A good set of performance and quality measures should then, as far as possible, be relevant to different groups that have a legitimate interest in how well ambulance services are being delivered. This means that we had to try to find a set of measures that were agreed as important and relevant by the different groups that would potentially be the end users of these measures.

Our first step was to hold a consensus event that brought together different groups of people (i.e. ambulance operational and clinical staff, commissioners, patients and the public, emergency care clinicians, policy-makers and academics) to discuss, assess and rate the potential measures identified by the reviews. We also wanted to include measures that were important to patients who may not have been identified by the reviews. We conducted a separate interview study with recent users of the ambulance service (see Patient and carer views of ambulance service care) at around the same time and, although this was still in progress, there were some emerging themes that we were able to include in the consensus event. To supplement this, we also conducted a small focus group with 10 patient and public participants immediately before the consensus event. Participants in this focus group described their experiences and expectations of ambulance service care and identified a small number of important factors that were added to the list of measures for discussion.

We held the consensus event over 1 day in July 2012. The event was attended by 42 people (excluding the research team) who represented ambulance services, emergency medicine clinicians, patient and public representatives, commissioners, policy-makers and academics. Participants were mainly from the UK, but there were three international attendees.

The reviews identified a large number of potential measures. To make the process of prioritising more manageable for a 1-day event we did two things:

  1. We reduced some measures to a single principle rather than all of the possible options. An example was ‘survival’, which we kept as a single measure rather than providing all of the different time cut-off points identified in the review. Overall, 42 measures (excluding time measures) were presented.
  2. There were a large number of time measures and we did not want the discussions to be dominated by discussions of these to the detriment of other potential measures. We did recognise that time is an important factor, not just in terms of outcome for a small group of patients but more generally as a patient expectation. We therefore conducted a separate exercise where all 28 time measures were listed in a spreadsheet and participants asked to rate how important they thought each was by e-mail. The results of this exercise were subsequently combined with the ratings of the other measures discussed at the consensus event.

On the day we assessed the potential measures in two ways:

  1. Participants were randomly allocated to small groups. Using a nominal group method each group was provided with a list of potential measures with explanatory notes and allowed to discuss and share their opinions about the measures presented. There was also the opportunity for participants to add their own ideas. Each group was facilitated by a member of the research team and participants could add notes to the list.
  2. After the discussions, each measure was presented to the whole group as a Microsoft PowerPoint® (Microsoft Corporation, Redmond, WA, USA) slide (including any new measures identified) and participants were asked to vote on whether they thought the measure was essential, desirable or irrelevant using a live electronic voting system. This meant that all participants could contribute to discussions but also cast their votes independently and anonymously.

This two-step process was repeated three times, once for each of the main categories of measures (service, patient management and patient outcome). After the event the results of the voting were analysed and the measures ranked according to the proportion voting essential, desirable or irrelevant. This was carried out for each of the three groups and for all measures combined. A full description of the results of the voting for all of the measures considered is provided on the PhOEBE programme’s website.25

Table 2 shows an illustration of the voting using the top-10 measures ranked according to the proportion voted essential.

TABLE 2

TABLE 2

Top-10 measures by highest percentage voted as essential

Delphi survey

The consensus event allowed us to begin to prioritise the large number of candidate measures and reject some measures that were agreed as not important. For the next stage, we used another consensus method – a Delphi survey to further rate and prioritise measures. At this point we considered not just what could be measured but also how this might be done.

A total of 67 measures were included and these were categorised into the same three groups: patient outcomes (n = 25), whole service measures (n = 32) and clinical management measures (n = 10). The number of items was larger than those considered in the consensus event because, at this stage, we included time measures and began to develop more explicit, discrete descriptions of potential indicators. For example, where a broad principle such as accuracy of dispatch decisions was used for the consensus work, this was refined into multiple descriptions for specific conditions or call types. Potential measures were presented in a survey that enabled responses to be completed and returned electronically. Participants were asked to consider each measure and score their level of agreement on a scale of 1 to 9 (strongly disagree to strongly agree) using the statement:

This measure (either on its own or within a set of measures) is a good reflection of the quality of care provided by ambulance services and is likely to be a good indicator of the quality of the 999 ambulance service care pathway.

Participants were able to suggest additional indicators for inclusion. Responses to round 1 were recorded and the median score was calculated for each measure. This was followed by a second round during which revisions to measure descriptions were made following suggestions from the first round. Participants were provided with their own and the group median score and asked to score the measures again on the same 1–9 scale.

There were 23 participants who completed the round 1 form and 20 completed round 2 with an overall response rate of 74%. As in the consensus event, the participants represented a wide range of service provider and professional viewpoints, and most UK ambulance trusts. Some participants had also participated in the consensus event. Scores from round 2 were recorded and median scores calculated. A large number of measures scored highly, so a median score of 8 was used to discriminate between measures, with 20 (67%) participants ranked as high scoring.

We intended to include patient and public participants in the Delphi survey but our PPI reference group thought that the level of technical detail would make meaningful participation difficult. Instead, we held a separate event for PPI participants so that the concepts and measures could be explained and discussed in a face-to-face format.

Using a similar format to the previous consensus event, measures were presented for each of the three categories and small group discussions held. Participants then used an electronic voting system to rank each measure. Eighteen PPI representatives attended the PPI workshop and represented a range of people, including young people and vulnerable groups. The results of the PPI event were added to the results of the Delphi survey for the final stage of this workstream.

Two published papers26,27 are freely available and describe in more detail the methods and results of the consensus event and Delphi survey,26 and the co-produced event created with our PPI reference group to complement the Delphi study.27

Patient and carer views of ambulance service care

At the outset of the PhOEBE programme we were aware that little research had been done to investigate the aspects of emergency ambulance service care that are valued by people who use the service. This includes patients but also their carers, who may be the person who makes a 999 call asking for help. To address this we conducted a qualitative study in which we interviewed people who had recently used the ambulance service in one of our study services. Ethics approval for the study was sought and gained from the National Research Ethics Service Committee East Midlands – Northampton (Research Ethics Committee reference 12/EM/0022) on 23 February 2012. During 2012, we talked to 22 patients and eight of their spouses (n = 30) using a semistructured face-to-face (n = 18) or telephone (n = 14) interview. We felt that it was important to explore the processes and outcomes of care that were important to ambulance users and we wanted to ensure that we captured issues that were relevant to the range of ambulance users, not just those with a life-threatening condition. We therefore included patients and carers who had called for serious problems requiring transport to hospital, those who had an ambulance crew attending but who were managed at home, and those managed by telephone advice. In the first part of the interview, we explored positive and negative aspects of their ambulance service experience and this was followed with questions about what they valued about the service and how performance might be measured. Interviews were recorded, transcribed and then analysed using framework analysis. An initial thematic framework was developed and then interviews coded to these themes, adding new ones as they emerged. A thematic map was constructed related to issues participants valued.

Participants in our study, regardless of clinical condition or level of ambulance service response received, valued similar aspects of their prehospital care experience. Users were often extremely anxious about their health and the outcome they valued was reassurance provided by ambulance service staff to alleviate the anxiety, fear or panic that they experienced at the time of calling an ambulance. They also valued reassurance that they were receiving appropriate advice, treatment and care, and this was enhanced by the professional behaviour of staff, which instilled confidence in their care, communication, waiting times for help (i.e. a short wait), and continuity during transfers. These features are themselves a consequence of the ability of call-takers and ambulance clinicians to competently recognise what the problem is and deliver appropriate advice and care and so implicitly reflect good-quality care. A timely response was valued in terms of allaying anxiety quickly. Participants valued the experience that they had, not just with ambulance crews who attended them but also the call-takers when they made their 999 call.

The interviews with users highlighted very clearly that, regardless of the actual clinical problem, the ability of the emergency ambulance service to allay the high levels of fear and anxiety felt by patients and their carers was crucial to the delivery of a high-quality service. Measures developed to assess and monitor the performance of emergency ambulance services have predominantly focused on actions such as response times or treatments provided. However, it was the more human interactions with the service that users recalled and described, and which could be included in the development of ambulance service patient experience measures. We used the findings from this study to add context to the description of ‘patient experience’ as a potential measure within the consensus work. Although it was recognised as important, it was acknowledged that measurement of patient experience is a longer-term objective outside the scope of the programme.

The qualitative study has been published as an open-access peer-reviewed journal article.28

Final selection of measures for further development

The reviews and consensus work allowed us to consider a large number of potential ambulance service performance and quality measures, and to determine which were considered important to a range of end users. The final stage was to select from this list a small set of measures that could reflect the range of perspectives (service measures, patient management and patient outcomes) and take account of the broad population of people calling 999, not just a few with specific conditions.

The final set was selected using an expert panel drawn from our programme management and steering groups. The panel comprised 13 members and included representatives of the research team (reflecting research, statistics, ambulance service clinicians, PPI, emergency medicine) and external expertise from a further emergency medicine consultant, consultant paramedic and commissioner. We assessed all measures considered in the consensus work26 to avoid missing potentially important measures that did not feature highly in the rating exercises. Each measure was rated using a set of criteria that considered, for example, how highly it ranked in the consensus meetings and Delphi survey, the population it applied to, feasibility and availability of data, relevance to ambulance care, importance, meaningfulness and whether or not an item was already being measured. A score was derived for each potential measure using these criteria and the final set selected using these scores and expert judgements so that the set as a whole provided a balanced assessment of the different aspects of ambulance care considered to be important. The full set of criteria used and 56 measures assessed is available in Appendix 2 (see Table 17).

For two measures, survival from an emergency condition and accuracy of call identification, we had to identify a set of relevant conditions, as not all 999 calls were appropriate. We had previously conducted some consensus work as part of a study to develop emergency care system indicators and in this work identified a set of 16 emergency conditions [with relevant International Classification of Diseases, Tenth Edition (ICD-10),29 codes] that were considered appropriate to include in the indicators. We therefore used this same set of validated conditions for this work, including only patients with this diagnosis at discharge from hospital or as a cause of death.30 The 16 emergency conditions are listed in Box 1.

Box Icon

BOX 1

Emergency conditions used for call accuracy and survival indicators

The final set of six measures selected for further development, included in workstream 3, is shown in Table 3. We initially included two further measures in this list. First, the compliance of ambulance clinicians with protocols and guidelines for specific conditions. The current ambulance service Clinical Quality Indicators31 for England already include a measure of compliance with expected care bundles for a small number of conditions. The purpose of this measure was to explore whether or not the availability of linked data and better information on patient outcome could be used to improve this indicator. However, the problems in obtaining the linked data and reduced time available to develop the performance measures meant that we had to exclude at least one intended measure and, as this measure already exists at least in part, we decided to concentrate on new measures. Second, we included a measure of mortality in patients with urgent problems, that is, those who have a low risk of dying. However, the lack of information on final diagnosis for patients not admitted to hospital made it impossible to identify all relevant patients. Instead, we took a different approach with this measure and explored the use of a structured judgement review process to identify potentially avoidable deaths.

TABLE 3

TABLE 3

The final set of measures that could be developed as potential performance or quality indicators

Summary

Workstream 1 encompassed a number of related activities. The evidence reviewed revealed a large number of potential measures although many were variations on a single theme, such as time. The consensus work allowed us to consider this broad range of measures from a number of different perspectives. In particular, there was strong patient and public input including use of a novel approach to meaningful participation in the consensus process. The final set of measures for further development represented the potential to provide a broader and more balanced view of ambulance service care. These were relevant to all people who used the service rather than the current focus on single processes, such as response time or smaller populations with important but more specific conditions (e.g. cardiac arrest). The qualitative study produced new and important primary research evidence in an area that has not been well studied and revealed important insights into patient perceptions that were poorly understood. We found that:

  1. Previous quality measures and performance indicators were dominated by time measures and accounted for over one-third of identified measures.
  2. Outcome measures were dominated by varying durations of survival or mortality, spanning the range from admission to hospital to up to 5 years post admission, in a small number of longitudinal primary research studies.
  3. Measures of accuracy were most frequently voted as essential, followed by measures (including pain) that reflected patient experience.
  4. Patients felt that addressing anxiety and providing reassurance were important. This applied to the call process as well as face-to-face interaction with ambulance clinicians.

Workstream 1 produced a set of candidate measures potentially suitable for further development as indicators of ambulance service quality and performance. Development required an information source that brought together details of what happened to patients at the time of the incident and after their ambulance service contact. This was the focus of the next workstream.

Image rp-pg-0609-10195-fig2
Copyright © Queen’s Printer and Controller of HMSO 2019. This work was produced by Turner et al. under the terms of a commissioning contract issued by the Secretary of State for Health and Social Care. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.
Bookshelf ID: NBK540545

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.4M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...