Accuracy of digital chest x-ray analysis with artificial intelligence software as a triage and screening tool in hospitalized patients being evaluated for tuberculosis in Lima, Peru

Introduction: Tuberculosis (TB) transmission in healthcare facilities is common in high-incidence countries. Yet, the optimal approach for identifying inpatients who may have TB is unclear. We evaluated the diagnostic accuracy of qXR (Qure.ai, India) computer-aided detection (CAD) software versions 3.0 and 4.0 (v3 and v4) as a triage and screening tool within the FAST (Find cases Actively, Separate safely, and Treat effectively) transmission control strategy. Methods: We prospectively enrolled two cohorts of patients admitted to a tertiary hospital in Lima, Peru: one group had cough or TB risk factors (triage) and the other did not report cough or TB risk factors (screening). We evaluated the sensitivity and specificity of qXR for the diagnosis of pulmonary TB using culture and Xpert as primary and secondary reference standards, including stratified analyses based on risk factors. Results: In the triage cohort (n=387), qXR v4 sensitivity was 0.91 (59/65, 95% CI 0.81-0.97) and specificity was 0.32 (103/322, 95% CI 0.27-0.37) using culture as reference standard. There was no difference in the area under the receiver-operating-characteristic curve (AUC) between qXR v3 and qXR v4 with either a culture or Xpert reference standard. In the screening cohort (n=191), only one patient had a positive Xpert result, but specificity in this cohort was high (>90%). A high prevalence of radiographic lung abnormalities, most notably opacities (81%), consolidation (62%), or nodules (58%), was detected by qXR on digital CXR images from the triage cohort. Conclusions: qXR had high sensitivity but low specificity as a triage in hospitalized patients with cough or TB risk factors. Screening patients without cough or risk factors in this setting had a low diagnostic yield. These findings further support the need for population and setting-specific thresholds for CAD programs.


Introduction
Diagnosis remains the largest gap in the tuberculosis (TB) cascade of care.In 2021, of the 10.6 million people estimated to become sick due to TB, only 6.4 million were diagnosed and notified to national notification systems (1).Efforts to increase and accelerate diagnoses are critical to prevent severe disease, avert TB deaths, and halt ongoing transmission (2).Healthcare facilities are known hotspots for TB transmission in high-incidence settings (3)(4)(5)(6)(7).Globally, the rate of TB disease among healthcare workers is estimated to be at least double that of the general adult population, suggesting significant transmission in health facilities (8,9).The FAST (Find cases Actively, Separate safely, and Treat effectively) strategy was developed to reduce TB transmission in healthcare settings, based on the principle that most transmission occurs from patients with unsuspected and thus undiagnosed TB, including drug-resistant strains (10).FAST relies on identifying potentially infectious patients, typically with cough screening, followed by rapid sputum-based molecular tests that include first line resistance testing to enable prompt initiation of effective treatment (7,10).FAST has been implemented in a variety of settings, including Peru, Bangladesh, Russia, and Vietnam (11)(12)(13)(14).Given the slow scale up of rapid molecular tests (1), due to barriers such as cost, optimizing screening approaches for the FAST strategy is critical for its implementation success.
Triage is the process of making clinical decisions based on symptoms, signs, risk factors, or test results (15).Rapid and accurate triage tests play an important role in identifying patients requiring further diagnostic evaluation among those with symptoms or risk factors for disease (16).Screening similarly involves non-diagnostic testing to distinguish between people who likely have the disease from those who are unlikely to have the disease, typically in a population who do not have symptoms (15).There is a long history of using chest radiography (CXR) to screen for pulmonary TB, .but its utility in high TB incidence settings has been limited by the scarcity of skilled radiologists to interpret images(17).The advent of digital radiography coupled with computer aided detection (CAD) software eliminates this potential barrier, making it more feasible to implement CXR for triage or screening in resource limited settings.CAD uses artificial intelligence algorithms to analyze radiographs for abnormalities consistent with TB.CAD is now recommended by the World Health Organization (WHO) as an alternative to human readers(17).Nonetheless, while CAD sensitivity for both triage and screening is typically >90%, CAD specificity varies widely, from 23%-66% for screening (15,18,19) and 25%-79% for triage (18,20) when compared to a microbiological reference standard.
Questions remain regarding the optimal approach for using CAD to identify potentially infectious people with TB, particularly in hospital settings.A retrospective case-control study evaluating CAD in patients presenting with respiratory symptoms to a tertiary care hospital in India demonstrated moderate sensitivity and specificity (71% and 80% respectively) for the detection of pulmonary TB (21).However, TB prevalence surveys reveal a high proportion of people diagnosed with pulmonary TB who do not report symptoms (22), highlighting poor implementation and yield of symptom screening (23).Moreover, many CAD studies have focused on triage of outpatients presenting with symptoms (24)(25)(26)(27).Although there are some examples of CAD screening programs that are not contingent on symptom screening, these have been community-based (28)(29)(30)(31).
The aim of this study was to evaluate the diagnostic accuracy of digital CXR with CAD software as a tool for: 1) triage-among patients with cough or TB risk factors-and 2) screening-among patients

Study design and participants
We conducted a cross-sectional diagnostic accuracy study that was embedded in a larger prospective study evaluating FAST implementation at Hospital Nacional Hipolito Unanue (HNHU), a 700-bed public, tertiary-care referral hospital in Lima, Peru (https://clinicaltrials.gov/ct2/show/NCT02355223).
Patients admitted to HNHU from January 18th 2018 to December 31st 2019 were consecutively screened by the FAST implementation team study staff using a standardized questionnaire upon facility admission, as previously described (11).This diagnostic accuracy sub-study consisted of two cohorts: triage and screening.Individuals who were eligible for the parent FAST study were eligible for the triage cohort; adults (≥ 18 years old) who, upon questioning by the study team, reported either cough of any duration and/or the following risk factors for TB: contact with someone diagnosed with pulmonary TB, a current active TB diagnosis (however patients who were already on TB treatment were subsequently excluded from this diagnostic accuracy sub-study), or a history of prior active TB.The screening cohort consisted of individuals who were assessed for eligibility for the parent FAST study but were ineligible because they did not have cough or TB risk factors.The rationale for adding a screening cohort to the diagnostic accuracy sub-study was to see the number of patients admitted in our setting in Lima without identified TB risk who may have undiagnosed TB (based on prevalence survey data from other higher TB incidence settings) (22).Every one in five patients with a negative symptom or TB risk screen (undertaken by our FAST implementation study .CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted December 7, 2023.; https://doi.org/10.1101/2023.05.17.23290110 doi: medRxiv preprint team) was randomly approached for enrollment into the screening cohort for this diagnostic substudy.

Ethics statement
The study was approved by the Institutional Review Boards of HNHU and Brigham and Women's Hospital.Written informed consent was obtained from all patients.Participants were assigned a unique study ID number, recorded on data collection forms and clinical specimens to facilitate data linkage; names and other obvious identifiers were not used on data collection forms thus authors did not have access to information that could identify individual participants during or after data collection.

Study procedures, data collection, and outcome classification
On the day of admission, patients in both cohorts who were admitted through the emergency room underwent posterior-anterior digital CXR and study staff collected at least 2 sputum samples for TB testing using smear microscopy, mycobacterial culture, Xpert MTB/RIF (Xpert, Cepheid, Sunnyvale, CA), and/or GenoType MTBDRplus line probe assay (Hain, Germany).De-identified CXR images were electronically transferred for automated analysis and were blinded to other demographic and clinical data including the results of other TB testing by the developers of qXR (qure.ai,Mumbai, India) who ran versions 3.0 (v3) and 4.0 (v4) on all images.CXR was obtained prospectively but qXR results were not used to guide clinical management.Information on socio-demographic and clinical variables including current and prior TB history, co-morbidities, and microbiological test results, was collected at the time of enrollment, or retrieved from the medical records using standardized case report forms.Culture and Xpert results were classified separately as binary variables (positive or negative for .

Mycobacterium tuberculosis
).If a patient had more than one culture result and at least one was positive, the binary result was classified as positive and the same applied to Xpert results.

Analyses
For our primary diagnostic accuracy analyses, the diagnosis of pulmonary TB in both the triage and screening cohorts was established by the presence of a sputum culture that grew Mycobacterium tuberculosis.For our secondary diagnostic accuracy analyses, the diagnosis of pulmonary TB in both the triage and screening cohorts was established by the presence of a positive sputum Xpert result.
Analyses using qXR v4 are presented in the main manuscript and qXR v3 are presented in the supplementary data.qXR sensitivity and specificity (with exact 95% C.I.s) for pulmonary TB were calculated using the manufacturer's prespecified thresholds (0.5 for v3 and v4) per STARD guidelines (see Appendix for STARD checklist) (32).DeLong's non-parametric method was applied to compare differences between the areas under the receiver operating characteristic curve (AUC) for the two qXR software versions.We also estimated the specificity at the threshold score at which sensitivity was closest to 90% (WHO triage test minimum TPP recommended criteria) (33).Pre-specified sensitivity analyses were designed to examine qXR accuracy when certain groups known to have increased risk for TB were excluded: people with HIV, people with prior TB, and people with other respiratory diseases (asthma or bronchiectasis).Using Fisher's exact test, we assessed performance differences in prespecified groups with characteristics or risk factors that may impact diagnostic test performance: male sex, older age, prior TB, HIV co-infection, other respiratory disease co-morbidities, presence of TB symptoms in WHO symptom screen (cough, fever, night sweats, weight loss), and higher-grade sputum smear result.

Results
During the study period we enrolled 1006 patients admitted to HNHU who had cough or TB risk factors, of whom 489 underwent digital CXR in the triage cohort (Figure 1).Participants who were taking TB treatment or had been on TB treatment within one year of enrollment (n=50; 10%) were excluded as were those who had no microbiological testing (n=20; 4%).We enrolled 220 individuals without cough or TB risk factors in the screening cohort.Screening participants who were household contacts of people who experienced TB were excluded (n=27; 13%) as were those who had no microbiological testing (n=9; 4%).

Demographics
Of the 419 participants in the triage cohort, 387 (93%) had a mycobacterial culture result that was positive in 65 (17%) participants, of whom 41 (63%) also had positive sputum-smear microscopy results.In this cohort, 398 (95%) had an Xpert MTB/RIF result; it was positive in 69 (17%), of whom 39 (57%) had positive smear microscopy.Culture and Xpert results were largely concordant, with high Xpert sensitivity for both smear-positive and negative culture confirmed TB (95% and 86%), although Xpert was positive in some people who did not have culture or who had a negative culture (Table S1a and b).Compared to participants without TB (based on sputum culture results), participants with .culture confirmed TB were more likely to be younger, male, have a history of incarceration, report cough longer than 2 weeks, fever, or weight loss, and not have a history of any respiratory diseases or a prior history of TB (Table 1).The primary reason for excluding patients from the triage cohort was that they were not admitted through the emergency department (n=397/517), which was required for us to be able to obtain dCXR.Differences between included versus excluded patients are described in Table S2.2).Using a combined reference standard that was positive if either culture or Xpert was positive, sensitivity and specificity for qXR v4 were similar (0.93 and 0.33 respectively) (Table S3).When the threshold was set such that sensitivity was 90% to match the WHO triage test accuracy performance criterion, specificity was 0.44 (142/322, 95% CI 0.39-0.50)and 0.38 (126/329, 95% CI 0.33-0.44)using the culture and Xpert reference standards respectively (Table 2).Diagnostic accuracy results for qXR v3 are in Table S5.2).

Stratified analyses
There was no difference in qXR v4 sensitivity when stratified by sex, age, prior TB, HIV, and symptoms (Figure 3).qXR v4 specificity was higher in people without prior TB than in people with prior TB, with cough less than 2 weeks compared to cough for more than 2 weeks, and with those who did not report weight loss compared to those who reported weight loss (Figure 4).Similarly, qXR v4 sensitivity appeared to be higher in smear-positive compared to smear-negative disease but did not reach statistical significance and numbers of participants with smear negative disease were low. .

Sensitivity analyses
We examined qXR accuracy when pre-specified groups in whom TB diagnostic tests are often less sensitive (PWH, people with prior TB and people with other respiratory diseases) were excluded.
Sensitivity for qXR v4 was slightly higher in people without HIV (0.  S4).

Screening cohort
Compared to participants in the triage cohort, participants in the screening cohort were more likely to be younger and female, not have a history of HIV, any respiratory diseases or a prior history of TB, not have a history of incarceration, more likely to report current alcohol use, and less likely to report fever, night sweats, or weight loss (Table 1).No participants in the screening cohort had a positive .
culture, and only one participant had a positive Xpert.Since there was only one person with confirmed TB in the screening group (who did have a qXR positive result), we only report specificity.

Discussion
In our study population of hospitalized patients at a tertiary referral hospital in Lima, Peru, the use of qXR artificial intelligence software analysis versions 3 and 4 in a triage cohort of patients with cough or TB risk factors demonstrated a high sensitivity (>90%) but low specificity (~30%), thereby meeting only the WHO triage test criteria for sensitivity.In our screening cohort of patients without cough or risk factors, specificity was high (>90%) but sensitivity could not be evaluated since the diagnostic yield of screening this group in this setting was low (only one patient was diagnosed with Xpertpositive TB).
We previously reported that the FAST strategy using Xpert for molecular diagnosis increased the yield of TB diagnosis and decreased time to treatment initiation (11).Yet, despite WHO guidance that molecular WHO-recommended rapid TB diagnostic tests (mWRD) such as Xpert should be the initial test for people being evaluated for TB, implementation in Peru and other high-incidence settings has lagged (1).While barriers to mWRD implementation are multifactorial (34), cost and limited laboratory capacity were challenges to the implementation of Xpert as a triage or screening test as part of routine practice in our setting.The use of a triage tool such as digital CXR with CAD can help identify which patients should undergo testing with a mWRD (16) as part of transmission prevention strategies .CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(which was not certified by peer review) The copyright holder for this preprint this version posted December 7, 2023.; https://doi.org/10.1101/2023.05.17.23290110 doi: medRxiv preprint such as FAST.In our hospitalized study population, qXR was highly sensitive for correctly triaging people identified as having cough or TB risk factors who had culture confirmed disease.Although low qXR specificity would lead to a large number of patients with false positive results who required confirmatory testing and widespread use of digital CXR with CAD poses implementation challenges, qXR as a triage tool could be of clinical and public health value due to its impact on diagnostic yield and may still save enough mWRDs to be cost-effective depending on the setting (cost-effectiveness analyses from our study are forthcoming).When we adjusted the threshold for qXR v4 to maintain sensitivity at 90%, specificity rose to 38-44%; thus our data add further weight to the need for population-specific thresholds (35) to optimize implementation of CAD tools in different settings.
The low specificity of qXR in inpatients with TB symptoms or risk factors contrasts with crosssectional studies that found that qXR met WHO triage test criteria for both sensitivity (>90%) and specificity (70%) when evaluated in symptomatic outpatients in Bangladesh and Pakistan (24) , (36).
Our triage cohort had a high prevalence of radiographic lung abnormalities, which was likely to be an important contributing factor to the lower than expected specificity in this cohort.Abnormal chest imaging findings in our study population may be due to inpatient populations in a tertiary referral hospital being more likely to have acute illnesses such as pneumonia, and may also reflect a higher proportion of people with chronic lung disease in Lima, a city known to have high rates of air pollution, which has also been associated with a higher risk of tuberculosis (37).We also note that this diagnostic accuracy assessment in the triage cohort reflects use of the test in a pre-screened population who had a high pre-test probability of TB or other lung disease and underwent microbiological testing that revealed a high prevalence of TB.Thus, negative predictive value would .be lower for this cohort than if qXR testing was applied to the population of people initially screened (rather than those enrolled) for FAST.
Increasing data demonstrate symptom screening is insensitive (38) and often poorly implemented (39), and a high proportion of people with TB do not report symptoms (22).The inclusion of individuals without cough or risk factors in our screening cohort was designed to try to understand the potential diagnostic yield of using qXR as a screening tool to identify unsuspected TB in hospitalized patients who may be presenting for various other reasons.In this setting, the diagnostic yield of screening people without symptoms or risk factors was lower than expected (based on outpatient studies).The specificity of qXR was high, suggesting it could be a valuable rule-out test in this setting.The low prevalence of TB in the screening cohort may be an artifact of the sample size or, it may be because people with TB who present to hospital are more likely to be sicker due to TB and thus present with cough (resulting in exclusion from the screening cohort) compared to the outpatient populations in prevalence surveys.The exclusion of people with TB contacts and prior TB from the screening cohort may have also led to the screening cohort being a lower risk group.The implementation of strategies such as FAST should consider local epidemiology--including the pre-test probability of TB in people who do not report symptoms-to determine the optimal approach to determining who should undergo mWRD testing.Other strategies could also be evaluated to increase the sensitivity of screening.
Strengths of our study include generating CAD diagnostic accuracy data from inpatient populations, including those who were symptomatic and/or high-risk and those without identified cough or TB risk factors, also contributing to a body of literature seeking to optimize the FAST facility-based .transmission prevention strategy in a medium incidence setting.We provide the first head-to-head evaluation of version 4 (soon to be commercially available) compared to qXR version 3 and characterize other lung abnormalities detected.We acknowledge the challenges posed by imperfect reference standards for TB diagnostic accuracy studies (16), although we suspect that paucibacillary disease (which could cause culture, Xpert, and also CXR to be negative) is less likely in a hospitalized cohort in a low-HIV prevalence setting.Moreover, the inclusion of reference standard data from both mycobacterial culture and Xpert is a strength since many diagnostic studies only use Xpert as the refence standard.Limitations of our study are that digital CXR could only be performed on inpatients admitted through the emergency room (which may bias the study towards sicker hospitalized patients) and that with only 65 patients who had culture-confirmed TB, the study only had sufficient power such that we can report the lower limit of the 95% CI for sensitivity is 0.885 with 95% precision.We note low numbers in certain subgroups, including the number with HIV due to the low incidence of HIV in Peru and number with smear negative disease, also limit the power to detect differences in our stratified analyses.
In conclusion, qXR had high sensitivity but low specificity as a triage tool in the context of use within the FAST strategy in hospitalized adults admitted to a tertiary referral hospital in Peru who had a high prevalence of other radiographic lung abnormalities.While specificity was high in patients without cough or risk factors, the diagnostic yield of screening these patients was low in this setting.These findings further support the need for population and setting-specific thresholds for CAD programs and provide additional insights into the role for triage testing in hospitalized patients, which remains critical to detect and treat individual patients earlier and to curb hospital TB transmission. .

Figure 2 :
Figure 2: Receiver operating characteristic (ROC) curves and estimates of area under the ROC curves (AUC) for qXR versions 3 and 4 to identify abnormalities consistent with TB in the triage cohort using the culture (left) and Xpert (right) reference standards.

Figure 3 :
Figure 3: Sensitivity of qXR version 4 for culture-confirmed pulmonary tuberculosis, overall and in prespecified stratified groups.p values are from Fisher's exact tests.

Figure 4 :
Figure 4: Specificity of qXR version 4 for culture-confirmed pulmonary tuberculosis, overall and in prespecified stratified groups.p values are from Fisher's exact tests.

which was not certified by peer review)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(cough or TB risk factors-to identify admitted patients who should undergo molecular TB testing in a tertiary care hospital in Lima, Peru. without

which was not certified by peer review)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.(Analyses were completed using STATA/IC version 16 (StataCorp.2019.Stata Statistical Software: Release 16.College Station, TX: StataCorp LLC.).

Table 1 :
Demographic and clinical characteristics of enrolled participants * Fisher's exact test on binary variables, chi-square test for categorical variables, Wilcoxon rank sum test for continuous variables, and Jonckeere-Terpstra test for ordered categorical variables.The first p value represents a comparison between participants with and without pulmonary TB in the triage cohort and the second p value represents the comparison between the overall triage and screening cohort participant groups.
. ^ TB was diagnosed based on positive sputum culture i.e., pulmonary TB, we did not include clinical diagnoses 198 or include evaluation for extra-pulmonary TB 199 200 *Screening cohort consists of patients who did not report cough or TB risk factors *

Table 2 :
Summary of Diagnostic Accuracy for qXR version 4 using the culture (primary) and Xpert (secondary) reference standards in the triage and screening cohorts.