The Use of Wearable Pulse Oximeters in the Prompt Detection of Hypoxemia and During Movement: Diagnostic Accuracy Study

Background Commercially available wearable (ambulatory) pulse oximeters have been recommended as a method for managing patients at risk of physiological deterioration, such as active patients with COVID-19 disease receiving care in hospital isolation rooms; however, their reliability in usual hospital settings is not known. Objective We report the performance of wearable pulse oximeters in a simulated clinical setting when challenged by motion and low levels of arterial blood oxygen saturation (SaO2). Methods The performance of 1 wrist-worn (Wavelet) and 3 finger-worn (CheckMe O2+, AP-20, and WristOx2 3150) wearable, wireless transmission–mode pulse oximeters was evaluated. For this, 7 motion tasks were performed: at rest, sit-to-stand, tapping, rubbing, drinking, turning pages, and using a tablet. Hypoxia exposure followed, in which inspired gases were adjusted to achieve decreasing SaO2 levels at 100%, 95%, 90%, 87%, 85%, 83%, and 80%. Peripheral oxygen saturation (SpO2) estimates were compared with simultaneous SaO2 samples to calculate the root-mean-square error (RMSE). The area under the receiver operating characteristic curve was used to analyze the detection of hypoxemia (ie, SaO2<90%). Results SpO2 estimates matching 215 SaO2 samples in both study phases, from 33 participants, were analyzed. Tapping, rubbing, turning pages, and using a tablet degraded SpO2 estimation (RMSE>4% for at least 1 device). All finger-worn pulse oximeters detected hypoxemia, with an overall sensitivity of ≥0.87 and specificity of ≥0.80, comparable to that of the Philips MX450 pulse oximeter. Conclusions The SpO2 accuracy of wearable finger-worn pulse oximeters was within that required by the International Organization for Standardization guidelines. Performance was degraded by motion, but all pulse oximeters could detect hypoxemia. Our findings support the use of wearable, wireless transmission–mode pulse oximeters to detect the onset of clinical deterioration in hospital settings. Trial Registration ISRCTN Registry 61535692; http://www.isrctn.com/ISRCTN61535692 International Registered Report Identifier (IRRID) RR2-10.1136/bmjopen-2019-034404


Materials and Methods
PR data was collected from all the pulse oximeters (at 1 Hz), alongside the SpO2 data collection described in the methods section of the main manuscript. Reference Heart Rate (HR) was obtained from a standard Philips MX450 monitor 3-lead ECG (at 1 Hz). Both the test PR and the reference HR estimates were computed using median estimates from the 40-second windows considered in the main manuscript: 40 seconds before the stop of a motion task (movement phase); 35 seconds before and 5 seconds after an arterial blood gas sample was taken (hypoxia exposure phase). The root mean squared error, RMSE (and CI), mean bias, mean absolute bias, precision, and Bland-Altman plots (defined also in the main manuscript), were used to assess PR estimation accuracy. One-way ANOVA followed by the Tukey-Kramer test was used to evaluate differences in the mean bias and mean absolute bias, between groups. Levene's test was used to evaluate differences in the precision between groups. Significance was considered at a P < 0.05. The acceptable limit for PR estimation RMSE is not defined in the ISO 80601-2-61:2019. We therefore used that defined in the ANSI/AAMI EC13:2002 guideline, i.e. a RMSE ≤ 5%.

Participants
The participant demographics matched those analysed in the main manuscript (results section). There was a total of 227 and 215 target HR windows available in the movement and hypoxia exposure phases, respectively.

PR estimation in the movement phase
All devices showed a RMSE above 5% in at least one motion task, the sit-to-stand (STS) and rubbing tasks showing the highest errors (Table B.1). E.g., the AP-20 showed a significantly higher mean bias in these two tasks (of -13.38 ± 10.94 bpm and 24.58 ± 14.51 bpm, respectively) than the Philips MX450 and the WristOx2® 3150. Although the Wavelet showed a comparable bias ( Figure A.1), it still presented a low number of available PR estimates in most motion tasks (ranging from 2 to 8 data points), and only 18 PR estimates out of 33 possible target HR windows at rest.
Title: Wearable pulse oximeters in the prompt detection of hypoxaemia and during movement: a diagnostic accuracy study Authors: Santos, M.*, Sarah, V.* et al.
2 PR estimation in the hypoxia exposure phase Table A.2 compares the agreement between the PR and HR estimates for each device during the hypoxia exposure phase. Both the Philips MX450 and the WristOx2® 3150 presented a significantly lower RMSE than the remaining devices. On the other hand, the Wavelet presented a higher number of dropouts, higher mean bias and higher variance (measured via the precision metric). All devices were within the acceptable accuracy limit (i.e. RMSE ≤ 5%).

Discussion
In this sub-study, the performance of PR estimation in wearable pulse oximeters was analysed. In the movement phase it was observed that while the Wavelet device presented PR estimates comparable to the other devices when at rest (18 estimates out of 33 targets), it presented a much lower number of estimates on the remainder tasks (between 2 and 8), and therefore its results may not be comparable with the remainder devices. Amongst the latter, the STS and tapping tasks were the most challenging for PR estimation, especially to the AP-20 and the CheckMe TM O2+, both showing a significantly higher mean bias in these tasks than the remaining finger-worn devices.
In the hypoxia exposure phase the Wavelet also presented a higher number of dropouts, mean absolute bias and variance (measured via the precision metric), but its accuracy was still within the acceptable limit. We note the patients were at rest during this phase and the Wavelet was able to compute PR for 157 of the total of 215 target HR windows, in contrast to the 32 SpO2 values that it was able to compute for the same data windows, as discussed for the SpO2 dataset (see main manuscript). I.e., although there were Wavelet data available during the hypoxia exposure phase target SpO2 windows, their algorithm design was only able to compute PR (usually from the infrared light), but not SpO2, which also requires the red-light signal with high signal to noise ratio. As discussed, this device does not acquire waveform data continuously and it is possible that for some of the remaining 62 target HR windows there were no data to compute the PR, making its comparison with the continuous finger-worn pulse oximeters estimates challenging. Our findings show that the differences between the finger-worn pulse oximeters algorithms also result in different biases in PR estimation, especially when challenged by motion. The AP-20 in particular, which showed good performance in SpO2 estimation, presented a lower performance in PR estimation amongst the selected finger-worn devices. When considering the results from both phases, the WristOx2® 3150 showed to have a PR estimate performance most comparable to that of the standard care Philips MX450 monitor pulse oximeter.

Limitations
The sample size for this study was calculated based on the ISO 80601-2-61:2019 guidelines to evaluate oximeter's performance in detecting changes in SpO2. Therefore, the dataset might not have statistical power to identify differences in the PR estimation between oximeters, or between different motion tasks. We note also that in this dataset the HR changed in the range between 50 and 110 bpm, a subset of the range used in the guideline to validate heart rate monitors, ANSI/AAMI EC13:2002, between 20 and 200 bpm. Finally, the results are based on limited demographics and therefore do not generalise for the wider population (e.g., darker skin types, or patients with comorbidities).

Conclusions
Our sub-study showed that wearable, wireless transmission-mode, pulse oximeters presented differences in the PR estimation accuracy, especially when challenged by motion. PR estimation performance should also be considered when selecting wearable devices to use in a clinical setting. All devices estimated PR within the acceptable clinical range (at rest). The WristOx2® 3150 PR estimation was the most comparable to that of the standard monitor.