The diagnostic accuracy of pre-hospital assessment of acute respiratory failure

Introduction: Acute respiratory failure (ARF) is a common medical emergency. Pre-hospital management includes controlled oxygen therapy, supplemented by specific management options directed at the underlying disease. The aim of the current study was to characterise the accuracy of paramedic diagnostic assessment in acute respiratory failure. Methods: A nested diagnostic accuracy and agreement study comparing pre-hospital clinical impression to the final hospital discharge diagnosis was conducted as part of the ACUTE (Ambulance CPAP: Use, Treatment effect and Economics) trial. Adults with suspected ARF were recruited from the UK West Midlands Ambulance Service. The pre-hospital clinical impression of the recruiting ambulance service clinician was prospectively recorded and compared to the final hospital diagnosis at 30 days. Agreement between pre-hospital and hospital diagnostic assessments was evaluated using raw agreement and Gwets AC1 coefficient. Results: 77 participants were included. Chronic obstructive pulmonary disease (32.9%) and lower respiratory tract infection (32.9%) were the most frequently suspected primary pre-hospital diagnoses for ARF, with secondary contributory conditions recorded in 36 patients (46.8%). There was moderate agreement between the primary pre-hospital and hospital diagnoses, with raw agreement of 58.5% and a Gwets AC1 coefficient of 0.56 (95% CI 0.43 to 0.69). In five cases, a non-respiratory final diagnosis was present, including: myocardial infarction, ruptured abdominal aortic aneurysm, liver failure and sepsis. Conclusions: Pre-hospital assessment of ARF is challenging, with limited accuracy compared to the final hospital diagnosis. A syndromic approach, providing general supportive care, rather than a specifically disease-orientated treatment strategy, is likely to be most appropriate for the pre-hospital environment.

characterise the performance of paramedic clinical assessment in ARF. Specific objectives were to calculate diagnostic accuracy and agreement between pre-hospital and hospital diagnoses.

Methods
A nested, pre-planned, diagnostic accuracy and agreement study, comparing pre-hospital clinical impression to the final hospital discharge diagnosis, was conducted as part of the ACUTE (Ambulance CPAP: Use, Treatment effect and economics) pilot trial. Study conduct and reporting was performed in accordance with STARD and GRRAS guidelines for diagnostic accuracy and reliability studies (Bossuyt et al., 2015;.

Study population
The ACUTE trial was an individual patient randomised controlled external pilot trial to determine whether a definitive pragmatic randomised controlled trial (RCT) comparing pre-hospital CPAP to standard oxygen therapy for acute respiratory failure was feasible, acceptable and cost effective. The trial was pre-registered (ISRCTN12048261), and the protocol has been reported in detail previously (Fuller et al., 2018). Briefly, patients with suspected ARF were recruited from four ambulance hubs in the United Kingdom West Midlands Ambulance Service (WMAS) between August 2017 and July 2018. ARF was defined as respiratory distress with peripheral oxygen saturation below British Thoracic Society (BTS) target levels (88% for patients with COPD, or 94% for other conditions), despite supplemental oxygen (titrated low flow oxygen for COPD, or titrated high flow oxygen in other conditions; (O'Driscoll et al., 2017). Eligibility criteria are presented in Table 1. Patients were allocated to either pre-hospital CPAP (O_two system) with supplemental oxygen or standard oxygen therapy using identical equipment boxes (O_two CPAP unit, 2018), Feasibility outcomes were: incidence of recruited eligible patients (target 120); proportion recruited in error; adherence to the allocation schedule and treatment; and retention at 30 days. Effectiveness outcomes comprised: survival at 30 days; proportion undergoing endotracheal intubation; admission to critical care; and length of hospital stay.

Introduction
Acute respiratory failure (ARF) is a common medical emergency which occurs when heart or lung disease result in inadequate blood oxygen levels and/or increased blood carbon dioxide levels (Greene & Peters, 1994). It is caused by a number of common cardiac or respiratory diseases, including heart failure, pneumonia and exacerbations of chronic obstructive pulmonary disease (COPD) and asthma (Chapman, 1984). There are approximately 9000 ARF cases in England per year, with a high 14% risk of death within 30 days (Pandor et al., 2015). ARF has substantial health services costs, with patients often requiring prolonged hospital stays, ventilatory support and critical care admissions (Ray et al., 2006). ARF was responsible for over 3 million National Health Service (NHS) bed days in England in 2014 (Department of Health, 2014). Accurate diagnosis and optimised clinical management of ARF therefore have the potential to improve both health outcomes and cost effectiveness.
Current United Kingdom (UK) pre-hospital clinical practice guidelines recommend a standard management approach of oxygen therapy for the treatment of ARF, supplemented by specific management options directed at the underlying disease (AACE, 2019;NICE, 2010;Ponikowski et al., 2016). Pre-hospital administration of continuous positive airways pressure (CPAP) has been promoted as an additional potentially beneficial treatment strategy in some cases of ARF (Goodacre et al., 2014). An accurate pre-hospital diagnosis may help paramedics tailor therapy to the underlying cause of ARF and improve outcomes. Misdiagnosis could lead to inappropriate treatment, and even harm, for example instigating CPAP in patients with a pneumothorax (BTS, 2012;Davidson et al., 2016).
However, clinical assessment in the pre-hospital environment is often challenging. Details of previous medical history are often unavailable, dyspnoeic patients may not be able to provide a history, the uncontrolled environment can hamper examination, resuscitation of unstable patients may need to be prioritised and limited diagnostic tools are available. Furthermore, patients with ARF frequently suffer from multiple cardiorespiratory co-morbidities, or could have concurrent disease processes. There is limited data available investigating pre-hospital diagnosis of the dyspnoeic patient. The aim of the current study was to Conclusions: Pre-hospital assessment of ARF is challenging, with limited accuracy compared to the final hospital diagnosis. A syndromic approach, providing general supportive care, rather than a specifically disease-orientated treatment strategy, is likely to be most appropriate for the pre-hospital environment.
Keywords acute respiratory failure; diagnostic accuracy; emergency medical services; sensitivity; specificity the hospital case notes or discharge summary, and recorded using the same nominal categories by two ACUTE co-investigators. Hospital clinicians had access to routine pre-hospital patient records, but not the trial case report form containing the index test classification.

Statistical analysis
The statistical analysis proceeded in three stages. Firstly, sample characteristics were described using summary statistics, cross tabulation and a mosaic plot. Secondly, agreement between pre-hospital and hospital diagnostic assessments was evaluated (Gwet, 2001;. Raw agreement was initially calculated as the proportion of cases with an identical pre-hospital and hospital diagnosis (Gwet, 2008;. To account for the possibility that some agreement might be expected due to chance, the Gwets AC1 coefficient was also determined (Gwet, 2008). This statistic was chosen in preference to Cohen's Kappa statistic as it does not depend upon an assumption of independence between different ratings, is robust to marginal probabilities and is less affected by rating prevalence (Wongpakaran et al., 2013). Landis and Koch's benchmark values were chosen as the most established thresholds to interpret the magnitude of agreement coefficients with: 0-0.20 indicating slight, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 substantial and 0.81-1 almost perfect agreement (Landis & Koch, 1977). Agreement was calculated for the primary diagnoses alone; and for combined primary and secondary diagnoses, ignoring the precedence placed on each condition and counting any match. Thirdly, the pre-hospital primary

Data collection
A patient recruitment data collection form, contained within each equipment box, was completed by recruiting ambulance service clinicians every time a patient was enrolled in the trial. This recorded trial-specific information, including the pre-hospital clinical impression. At 30 days, research paramedics reviewed hospital records (including case notes, information systems and discharge letters), with patient consent, to collect details of clinical progress including the final medical diagnosis.

Index test and reference standard
The index test under consideration was the trial paramedic's clinical impression recorded at the scene of incident. Both the most likely clinical diagnosis and the presence of any contributing conditions were recorded prospectively by attending paramedics as a pre-specified six-category nominal variable, comprising: 'heart failure', 'asthma', 'lower respiratory tract infection', 'chronic obstructive pulmonary disease', 'pulmonary embolism' and 'other'. These categories were chosen based on the most common causes of ARF, and conditions benefiting from specific treatment strategies. Conditions specified in the free text 'other' option were coded post hoc by two ACUTE co-investigators, with any disagreements resolved by discussion to achieve a consensus decision. The reference standard was the final hospital diagnosis accounting for the presenting respiratory distress provided by the hospital clinical team. Similarly to the index test, both the primary diagnosis and any contributory conditions were collected. These were determined retrospectively from

Study sample
Over the trial recruitment period, 77 participants were enrolled from 364 potentially eligible patients by 41 individual ambulance service clinicians. Slightly more participants were allocated to the CPAP intervention arm (42 cases) than to the standard oxygen control arm (35 cases). Included patients were predominantly older (median 71 years), male (62%) and with marked respiratory distress (median oxygen saturations 78.5%, respiratory rate 34 breaths/minute and breathlessness score of 9/10). Patient characteristics of enrolled patients are summarised in Table 2. A valid pre-hospital primary diagnosis was available for 76/77 patients. In one case, the primary clinical impression was recorded as 'other', but lacked interpretable information to assign an underlying aetiology for ARF. A final hospital primary diagnosis was available for 65 patients who were included in the complete case agreement and diagnostic accuracy analyses (  cases, two contributory diseases given for three cases and three further supporting diagnoses for one case. The commonest secondary diagnoses were COPD (n = 7/65, 10.8%) and heart failure (n = 8/65, 12.3%). Notably, two patients were diagnosed with a pneumothorax in hospital (one primary diagnosis, one secondary diagnosis, both requiring intercostal drains). Pre-hospital and final hospital diagnoses are summarised in Table 3.

Agreement
There was limited reproducibility between the primary pre-hospital and hospital diagnoses, with raw agreement of 58.5% (n = 38/65). However, if both primary and secondary diagnoses were considered together, counting any match and ignoring the precedence placed on each condition, there was higher raw agreement of 76.9% on at least one causative disease for ARF (n = 50/65). Chance-corrected agreement between pre-hospital and hospital primary diagnosis was moderate, as demonstrated by a Gwets AC1 coefficient of 0.56 (95% CI 0.43 diagnoses. In six cases (n = 76, 7.9%), a non-respiratory primary diagnosis was recorded, comprising: ruptured abdominal aortic aneurysm (n = 1), liver failure (n = 1), sepsis (not specified further, n = 2) and urinary tract infection (n = 2). A secondary diagnosis was recorded for 36 patients (n = 77, 46.8%), with a single contributory condition suspected in 29 patients (n = 77, 37.7%) and two supplementary diagnoses made for seven patients (n = 77, 9.1%). LRTI (n = 9/77, 11.7%) and heart failure (n = 10/77, 13.0%) were the most common concomitantly diagnosed conditions.

Discussion
COPD (n = 25/76, 32.9%) and LRTI (n = 25/76, 32.9%) were the most frequently suspected primary pre-hospital diagnoses for ARF, with secondary contributory conditions recorded in 36 patients (n = 77, 46.8%). There was moderate agreement between the primary pre-hospital and hospital diagnoses, with raw agreement of 58.5% (n = 38/65) and a Gwets AC1 coefficient of 0.56 (95% CI 0.43 to 0.69). In seven cases, a final diagnosis was present where CPAP would not be expected to be effective, or could be harmful, including: myocardial infarction, ruptured abdominal aortic aneurysm, liver failure, sepsis and pneumothorax (n = 7/65, 10.8%). Respiratory distress with low oxygen saturations is common to many conditions, with symptoms and clinical signs shared between differential diagnoses, often making assessment challenging (Chapman, 1984;Delerme & Ray, 2008). It is therefore unsurprising that accuracy of the pre-hospital clinical impression was limited, non-specific working diagnoses such as 'sepsis' were used, some non-cardiorespiratory conditions were diagnosed and concurrent disease processes were suspected in the majority of cases. COPD and an LRTI were the most commonly diagnosed conditions, and distinction between to 0.69). When both primary and secondary diagnoses were assessed together, there was substantial chancecorrected agreement on at least one condition, with a Gwets AC1 coefficient of 0.75 (95% CI 0.64 to 0.87). Agreement between pre-hospital and hospital diagnoses is presented in a mosaic plot in Figure 1 and is tabulated in the web appendix.

Diagnostic accuracy
The performance of ambulance service clinicians' assessment was then investigated by calculating diagnostic accuracy metrics for the most prevalent conditions (COPD, LRTI and heart failure). Other conditions were not evaluated due to low sample size, with consequent imprecision and intractability. While each condition was identified more correctly than not, all three were commonly missed as the primary diagnosis: the sensitivities for COPD, LRTI and heart failure were 71% (95% CI 48% to 89%), 54% (34% to 73%) and 67% (22% to 96%) respectively. The specificity was higher (COPD 84.1% (69.9% to 93.4%), LRTI 86.5% (71.2% to 95.5%) and heart failure 86.4% (75.0% to 94.0%)). When both primary and secondary diagnoses were assessed together, diagnostic accuracy was improved. Considering the index test and reference standard to be positive if the condition was recorded in either the primary or secondary diagnosis gave sensitivities of COPD 95.2%; LRTI 69.2%; Heart failure 85.7%, meaning all three conditions were typically identified, even if although comparing favourably to other published reproducibility studies, the sample size is relatively low, resulting in imprecise results consistent with either poor or moderate agreement. The sample size constraint also meant that we did not attempt to model any clustering for differential effects of paramedics. These findings should therefore be considered as exploratory, requiring confirmation in larger studies. Thirdly, mainly due lack of consent, some reference standard data were missing. Although this represented a relatively small number (<10%) of patients, with similar characteristics to included cases, selection bias is possible if excluded patients differed systematically from the study population. Finally, we pre-specified the relatively liberal Landis and Koch scale for benchmarking agreement coefficients. Although well established and widely used, this may overstate agreement compared to other benchmarks, e.g. Fleiss' or McHugh's proposed scales (Fleiss et al., 2003;McHugh, 2012).
In conclusion, pre-hospital assessment of ARF is challenging, with limited diagnostic accuracy compared to the final hospital diagnosis. A syndromic approach, providing general supportive care, rather than a specifically disease-orientated treatment strategy, is likely to be most appropriate for ARF in the pre-hospital environment.

Author contributions
GF and SG conceived and designed the study. AR, MW, IG and JM collected data. All authors were involved in the analysis and the interpretation of data. GF drafted the report. All authors revised the work critically for important intellectual content and were involved in the final approval of the version to be published. All authors agree to be accountable for all aspects of the work. SG acts as the guarantor for this article.

Ethics
Ethical approval was confirmed with NHS Leeds East Research Ethics Committee. The University of Sheffield provided sponsorship and monitoring oversight of the project.

Funding
Funding was provided by the National Institute for Health Research's HTA Programme (HTA Project: 15/08/40). these two entities is known to be difficult, even in hospital with the benefit of time, access to testing and specialist review (Finney et al., 2019).
Given that the most important treatment for ARF is provision of oxygen, and other treatment modalities currently available to UK paramedics (e.g. nebulisers) have few side effects, it could be argued that an exact pre-hospital diagnosis is unnecessary prior to definitive hospital care (Chapman, 1984). However, if CPAP or non-invasive ventilation is available, then it is important to recognise conditions representing relative or absolute contraindications (Hess, 2013). Although low numbers of patients were studied, it is reassuring that all cases with a final diagnosis of asthma were recognised by paramedics, but potentially concerning that there were two patients with undetected pneumothorax. This is the first study to investigate the diagnostic assessment of patients with ARF presenting to EMS. Previous literature has either focused on less unwell dyspnoeic patients or examined specific diseases including COPD, asthma or heart failure (Christie et al., 2016;Williams et al., 2013Williams et al., , 2015. Although limited by retrospective chart review designs, this body of research has demonstrated similar findings to the current study. Christie et al. reported only moderate agreement between paramedic and hospital diagnosis in a New Zealand cohort, with many cases having no clearly documented working diagnosis (Christie et al., 2016). The sensitivity for prehospital heart failure, asthma and COPD diagnoses was only 29%, 66% and 39% respectively, in Australian EMS studies by Williams and colleagues (Williams et al., 2013(Williams et al., , 2015. More widely, a recent systematic review reported a pooled sensitivity of 0.74 (0.62 to 0.82) and a pooled specificity of 0.94 (0.87 to 0.97) for paramedic diagnosis of myocardial infarction, sepsis, stroke or all diagnoses (Wilson et al., 2018).
Participating WMAS ambulance hubs serve a mixed rural, semi-rural and urban population, and the sample of recruited ARF patients should ensure good external validity to similar EMS settings. However, clinical trial populations may not be fully representative of undifferentiated pre-hospital ARF patients after application of eligibility criteria and consent procedures. The ACUTE trial specifically excluded patients with pre-existing lack of capacity or inability to communicate with trial paramedics -groups for which pre-hospital diagnosis was likely to be even more challenging. Generalisability to areas in which disease prevalence differs to the UK, or which have alternative EMS models (e.g. physician rather than paramedic assessment), is less certain.
The prospective, preplanned data collection, using a defined nominal categorisation for ARF, is a strength of this study. However, there are a number of limitations that could adversely affect the internal validity of the results. Firstly, there is the potential for reference standard misclassification, as the final diagnosis was recorded from the hospital record or discharge letter, rather than determination through formal expert case review. Secondly,