Flu@home: the Comparative Accuracy of an At-Home Influenza Rapid Diagnostic Test Using a Prepositioned Test Kit, Mobile App, Mail-in Reference Sample, and Symptom-Based Testing Trigger

ABSTRACT At-home testing with rapid diagnostic tests (RDTs) for respiratory viruses could facilitate early diagnosis, guide patient care, and prevent transmission. Such RDTs are best used near the onset of illness when viral load is highest and clinical action will be most impactful, which may be achieved by at-home testing. We evaluated the diagnostic accuracy of the QuickVue Influenza A+B RDT in an at-home setting. A convenience sample of 5,229 individuals who were engaged with an on-line health research platform were prospectively recruited throughout the United States. “Flu@home” test kits containing a QuickVue RDT and reference sample collection and shipping materials were prepositioned with participants at the beginning of the study. Participants responded to daily symptom surveys. If they reported experiencing cough along with aches, fever, chills, and/or sweats, they used their flu@home kit following instructions on a mobile app and indicated what lines they saw on the RDT. Of the 976 participants who met criteria to use their self-collection kit and completed study procedures, 202 (20.7%) were positive for influenza by qPCR. The RDT had a sensitivity of 28% (95% CI = 21 to 36) and specificity of 99% (98 to 99) for influenza A, and 32% (95% CI = 20 to 46) and 99% (95% CI = 98 to 99), for influenza B. Our results support the concept of app-supported, prepositioned at-home RDT kits using symptom-based triggers, although it cannot be recommended with the RDT used in this study. Further research is needed to determine ways to improve the accuracy and utility of home-based testing for influenza.

exposure. At-home RDTs for single or combinations of respiratory pathogens such as influenza, respiratory syncytial virus (RSV), and SARS-CoV-2, will likely become prominent public health tools by empowering individuals with accessible patient-led testing after the pandemic has ended.
The most cost-effective and simple RDTs are lateral flow immunochromatographic assays that detect viral antigen from patient samples. Several lateral flow tests (LFT) exist for influenza detection, and several have been recently authorized for detecting SARS-CoV-2 antigens in multiple countries (11)(12)(13)(14)(15)(16). Studies describing the accuracy of LFTs for influenza have shown variable sensitivities between 19% and 96% but high specificities in the range of 97% to 99% when used in ambulatory care settings (17)(18)(19)(20). Lower sensitivity of LFTs used in ambulatory or home settings is not limited to influenza detection; several studies now demonstrate variable and, in some cases, suboptimal sensitivity of antigen-based tests for SARS-CoV-2, RSV, and human metapneumovirus (21)(22)(23)(24)(25).
One of the major methodological challenges in evaluating the accuracy of homeuse tests for influenza (and other RVI) is the often-rapid decrease in viral material present in the upper respiratory tract (26,27). Low viral load may be responsible for the poor sensitivity of antigen-based tests when they are conducted several days after illness onset. To evaluate the impact of minimizing this time interval on the sensitivity of influenza home tests, we employed a novel study design involving prepositioned lateral flow influenza tests and reference swabs with participants before influenza season. Over a period of 4 months, individuals self-reported symptoms in a daily survey which prompted the use of an influenza RDT and self-collected sample for reference testing. We present the successes and weaknesses of this novel study design in relation to the accuracy of the influenza RDT compared with the reference test.

MATERIALS AND METHODS
Study design. This prospective, observational cohort study used a non-probability-based (convenience) sample of participants who were recruited from existing users of the Achievement platform, an online health research community that uses a points-based system to reward participation in studies and has more than 4 million active users (myachievement.com, Evidation Health Inc., San Mateo, CA). This study was part of a broader study conducted by Evidation Health, known as the Home Testing of Respiratory Illness Study, or the "Homekit2020" study, which had a broader goal of developing a methodology to better classify and detect influenza cases and other RVIs through the use of behavioral data collected via wearable devices and patient-reported outcomes (https://www.synapse.org/#!Synapse:syn22803188/wiki/606343). The study was approved by the Western Institutional Review Board (WIRB, Puyallup, WA, USA) and the University of Washington IRB (Study # 1271380). Reporting of this study adheres to the Standards for Reporting of Diagnostic Accuracy Studies (STARD) guidance (28).
Sample size. Due to the exploratory nature of this study, sample size was not based on a formal calculation. We aimed to enroll 4,900 participants of whom 175 people were estimated to have confirmed influenza infection, assuming a 75% compliance rate and a national influenza incidence rate of 7% to 10% consistent with previous similar studies (29,30).
Participant recruitment and enrollment. Email invitations were sent to approximately 50,000 users of the Achievement platform between December 13, 2019 and January 31, 2020. Eligible participants were 18 years or older, lived in the United States, could read and understand English, and owned a Fitbit they were willing to wear day and night for the study duration. Participants also agreed to respond to short daily health surveys and were required to have an iOS or Android smartphone or tablet capable of supporting the flu@home app (Audere, Seattle WA, USA). Participants were required to wear a Fitbit for the duration of this study; analyses of Fitbit data are not presented in this manuscript. After confirming eligibility and consenting, participants completed a baseline survey of demographics, history of respiratory illnesses, medication usage, influenza vaccination, and quality of life. Participants who completed all these steps were considered enrolled.
Flu@home test kit. Enrolled participants were mailed a flu@home kit after their enrollment was confirmed (Appendix S1 in supplemental materials). Kits contained a QuickVue Influenza A1B test strip (henceforth referred to as the index test), reference sample collection materials, and mailing materials. In addition to the test strip, index test materials consisted of a QuickVue foam-tipped swab, QuickVue reagent tube, and saline ampule (Quidel Corporation, San Diego, CA, USA). Reference materials included an additional QuickVue swab and a tube containing Universal Viral Transport System (UVT) (BD Co., Franklin Lakes, NJ, USA).
Influenza-like illness monitoring and flu@home kit trigger. Daily email surveys asked participants if they were currently experiencing any influenza-like illness (ILI) symptoms (defined as cough, fever, chills, sweats, or aches). Those who responded affirmatively were asked additional questions about the presence and severity of their symptoms, medication usage, and quality of life. Participants who reported cough (of any severity) along with aches, fever, chills, and/or sweats (of any severity) met the study trigger criteria and were prompted to download the flu@home app and complete testing procedures. If participants did not experience ILI, additional data were collected consisting of a short survey about their physical, emotional, and symptom status for the last 24 h (Appendix S2). These data were collected as part of the broader Homekit2020 study and are not presented in this comparative accuracy paper. Participants who finished testing procedures were sent a short recovery survey a few weeks later (Appendix S2) asking about actions taken to seek care, medications prescribed by health professionals, illness recovery time, and experience using the flu@home app.
Index test and reference swab. The flu@home app guided participants to self-collect an anterior nares (nasal) swab, first for the index test procedures and then for reference sample collection (Appendix S3). Nasal swabs were used because it would not be safe or feasible for a participant to selfcollect a nasopharyngeal swab (NPS), and because there is evidence supporting the use of self-collected nasal swabs for detection of influenza (31,32). Instructions in the mobile app were the same for both the index test and the reference swab and were similar to those in the QuickVue package insert. For both samples, participants were instructed by the app to insert a foam-tipped swab approximately one half inch into each nostril and rotate four times (per nostril) while maintaining contact with the inner wall of the nostril, as demonstrated by an animation in the app. A 10-min in-app timer tracked index test processing time during which participants completed an illness survey (Appendix S4). After index test processing was complete, participants responded to an in-app survey denoting the presence/absence of test and control lines on their index test and followed instructions to take a photo of their index test (Appendix S3). Test strip images were uploaded to a database managed by the study team and were reviewed by a trained "expert" interpreter who determined whether test and control lines were present or absent using the same prompts as the participant, with the addition of an "uninterpretable" option (for instance, due to poor image quality). Neither the expert nor participants knew the outcome of the reference test when evaluating index test results. Index and reference test results were not communicated to participants.
Reference testing. Reference samples were shipped to Molecular Testing Labs (MTL, Vancouver, WA). Samples with sufficient material were either tested immediately or frozen at 280°C for later testing. Temperature of the sample during transit was monitored using a TransTracker CF card (Temptime Corp., 116 The American Road, Morris Plains, NJ) capable of recording a single temperature excursion below 0°C and/or a single excursion above 25°C. All samples were tested, regardless of temperature excursions, on the Panther Fusion system according to the manufacturer's instructions (Hologic inc., San Diego, CA, Cat. No. 303095). In the lab, 0.5-mL aliquots of each sample were extracted and eluted using Panther Fusion buffers (Hologic #PRD-04331, Hologic #PRD-04334), and purified total nucleic acids were tested using multiplex reverse-transcription PCR (RT-PCR) for influenza A, influenza B, and RSV (Hologic #PRD-04328). RT-PCR cycle thresholds (Ct) were used as an indicator of relative viral load. Ct is the number of PCR cycles counted before the signal exceeds the detection threshold; it is inversely related to the viral load in the sample. Generally, genetic material doubles during each cycle of PCR and thus one cycle corresponds to an approximate 2-fold change in nucleic acid concentration.
Reference test results were linked to participant records using a unique barcode. Lab personnel did not have access to index test results or to any participant data. Reference test results were considered positive if the Panther Fusion detected a signal for the specific pathogen and the internal control results were valid for that run.
Analysis. All analyses were conducted using R v4.0.2 and RStudio v1.3.1056 (30,33). Descriptive statistics of the study population and univariate analysis were conducted for demographics, symptom presence and severity, comorbidities, and risk factors. Results were stratified by reference test result and compared using Pearson's chi-square with Yates' continuity correction, or Fisher's exact test where appropriate. Illness duration was measured by two methods: (i) trigger-test interval, calculated as the time interval between submitting the survey that triggered testing and the date/time when the index test was performed; and (ii) symptom onset-test interval, calculated as the time interval between the self-reported date of illness onset and the date/time the index test was performed.
Inter-rater reliability between user and expert interpretation of the index test was measured with Cohen's kappa. Sensitivity, specificity, and 95% confidence intervals (CI) were calculated using the TableOne package (34). Performance of the index test was determined for participant interpretation of the index test and for expert interpretation of index test photos, for detection of influenza A and B independently, and for the overall ability to detect influenza regardless of strain. For the latter, participant and expert results were considered "influenza positive" if the index test was interpreted as either influenza A or B positive, "influenza negative" if negative for both, and "invalid" if they indicated no control line. The QuickVue instructions recommend retesting if the RDT is positive for both influenza A and B. Because only one RDT was provided to each participant, retesting was not feasible. Therefore, we considered the RDT invalid if the participant indicated a dual positive and either the expert read or the PCR result disagreed with the dual positive result (i.e., if the expert saw one or zero test lines in the photo or if the PCR result was not a dual positive). All eight index tests marked positive for both influenza A and B were determined to be invalid under these criteria. Tests marked invalid by the participant or by the expert were excluded from their respective analyses.
Comparisons between mean PCR Ct values were performed using a two-tailed, paired Student's t test. Correlation between Ct and number of symptoms was calculated using Pearson's r (r) and between Ct and reported impact on activities using Spearman's rho (r s ). Linear models for symptom trigger-test, symptom onset-test times and number of symptoms, each as a function of Ct, were fit using the "lm" function of the stats base package. Models fit to trigger-test intervals, onset-test intervals, and number of symptoms were controlled for age and for each other. Smoothed fits for Ct values as a function of trigger-test and onset-test were done using the "geom_smooth" function in the ggplot2 package, utilizing a local regression (LOESS) method. All visualizations were created using ggplot2 (35).

RESULTS
Study completion and participant demographics. Between mid-December 2019 and early February 2020, a total of 5,229 individuals met inclusion criteria and were enrolled ( Fig. 1). A total of 1,060 participants experienced the symptoms required to trigger their flu kit, of whom 1,027 (97%) completed their index test and uploaded an interpretable index test image. Of those, 976 (95%) also returned a reference swab of which 247 (25.3%) experienced a freezing temperature excursion during transportation and 12 (1.2%) a heating temperature excursion; all samples were tested regardless of temperature excursion and the 12 that experienced a heat excursion were negative by PCR and their associated index tests were also negative. All 976 participants who returned a reference swab also completed the illness survey in the flu@home app; analyses are based on this population.
Participants from all 50 states and the District of Columbia enrolled (median 13 participants per state, IQR: 7 to 25) (Appendix S5). Median age was 36, and participants were predominantly white, college-educated, female, and in the middle-and upper-income tiers (Table 1). Most (94%) reported having health insurance, half (50.7%) received the 2019-2020 flu vaccine, and 694 (71%) reported having at least one comorbidity. More than 99% (968/976) of participants who finished testing procedures also completed the follow-up survey 2 to 3 weeks later. Overall, participants strongly agreed that the instructions in the flu@home app were clear and helpful (796/968, 82%), the app was easy to use (749/968, 77%), and a slightly smaller proportion strongly agreed that the two flu@home nasal swabs were easy (639/ 968, 66%) (Appendix S6). This rating largely held across age groups, education, and whether or not the participant's reference test was positive or negative.
Reference testing and symptoms. A total of 202 (20.7%) individuals were positive for influenza per the reference test. An additional 21 (2.2%) tested positive for only RSV. Race/ethnicity (P = 0.014), physician diagnosed influenza in the household (P = 0.035) and suspected influenza exposure (P ,0.001) were associated with a positive reference test for influenza (Table 1), while primary state of residence (P = 0.229) (Appendix S5) and vaccination status (P = 0.754) ( Table 1) were not. The most common symptoms among all participants were cough (80.8%), sore throat (74.3%), fatigue (80.3%), and headache (68.9%) ( Table 2). Participants with a positive reference test were more likely to have fever (P , 0.001), chills (P , 0.001), sweats (P , 0.001), aches (P , 0.001), cough (P = 0.002), difficulty breathing (P = 0.031), and vomiting (P = 0.033) than those with a negative reference test. Individuals with a positive reference test were also more likely to report that their illness felt worse than a typical cold (P ,0.001), that they thought their illness was the flu rather than a cold or another illness (P , 0.001), that their symptoms started within the last 1 or 1.5 days (P = 0.044), and that their symptoms had a greater versus lesser impact on their daily activities (P , 0.001) than those with a negative test. With a few exceptions, more severe symptoms were associated with a positive influenza PCR test result (Appendix S7).
Performance of the RDT. Participants and experts agreed on 932 RDT results (95.5%, kappa = 0.72). Of the 44 tests with conflicting interpretations, 36 were errors made by the participant, including 15 instances where a participant incorrectly reported the presence of a control line and one test line (positive for either influenza A or B) when both the PCR results and expert interpretation of the RDT photo were negative (Appendix S8) and eight instances where the participant reported two test lines (positive for both influenza A and B) when the expert did not see any test lines (influenza negative) or failed to see a control line (invalid). Additionally, the expert identified four positive tests that a participant marked as negative. When interpreted by participants, the RDT had a sensitivity of 28% (95% CI = 21 to 36) for influenza A and 32% (95% CI = 20 to 46) for influenza B, with specificities for both A and B types of 99% (95% CI = 98 to 99). Participants' interpretation of the RDT for the overall presence of influenza (either A or B) had a sensitivity of 32% (95% CI = 26 to 39) and a specificity of 98% (95% CI = 97 to 99). There were no significant differences in RDT performance when interpreted by the expert or based on vaccination status ( Table 3) Most participants (714, 73%) completed their index test within 24 h of the trigger, with a median trigger-test interval of 7 h (IQR 2 h to 24 h) (Fig. 2a). In comparison, symptom onsettest intervals were longer and more widely distributed (median = 72 h, IQR = 49 h to 110 h)  (Fig. 2b). As both intervals increased, influenza-positive cases declined, and longer intervals were associated with linearly increasing Ct values that suggests an exponential decay of viral load ( Fig. 2a and b). Ct values increased by 1.13 cycles (an approximately 2-fold decrease in viral load) per day from trigger (P ,0.001) and 0.65 cycles (;1.5-fold decrease in viral load)  per day from onset (P ,0.001) when fit to linear models. However, accuracy of the RDT was not significantly different in sub-analyses that explored performance at longer trigger-test and onset-test intervals (Table 4). In addition to time intervals, Ct values were negatively correlated with symptom count (r = 0.27, P , 0.001), and a linear model of this relationship showed a decrease of 0.36 cycles per each additional symptom (P = 0.006). This model also showed an interaction with days from onset (P = 0.015). When grouped by 0 to 3, 4 to 7, and 8 to 12 symptoms, mean Ct values were significantly lower for the 8 to 12 group than when compared with the 0 to 3 group (Tukey's test, P = 0.028) but not the 4 to 7 group, and there was a higher proportion of flu positives in the subgroup with the most (8 to 12) symptoms (x 2 = 28.15, P , 0.001) (Fig. 3a). Sensitivity of the index test increased from 21% (95% CI = 13 to 33) to 41% (95% CI = 32 to 50) for $ 8 symptoms (Table 4). Ct values were negatively correlated with impact of the illness on participants' ability to conduct their daily activities (r s = 0.22, P = 0.002) and there was a higher proportion of flu positive participants reporting a higher impact on activities (x 2 = 89.8, P , 0.001) (Fig.  3B). RDT sensitivity increased with greater impact on daily activities, though confidence intervals were overlapping (Table 4).

DISCUSSION
Main findings. Early identification of individuals with respiratory symptoms using testing to confirm infection are key steps to clinical management of RVI such as influenza and SARS-CoV-2 (5, 6, 36, 37). By prepositioning test kits and identifying ILI through daily symptom surveys we were able to minimize time intervals from onset of specific symptoms to RDT use. Most participants (714, 73%) used their test kit within 24 h of triggering it, and half within 7 h (496, 51%). Low sensitivity of the QuickVue antigen based RDT was similar to that found in other studies (17,18,20), suggesting that testing shortly after onset of the influenza-specific symptoms selected as trigger criteria did not improve the poor limit of detection of this assay. Our findings do not support using the QuickVue A1B RDT for at-home diagnosis of influenza (note, that QuickVue is a CLIA-waived test and was not approved for at-home use).
We hypothesized that earlier testing may improve sensitivity of the QuickVue RDT based on the premise that peak influenza viral shedding occurs 48 h after infection and 24 h before the most severe symptoms appear, then declines rapidly (38). One possible explanation for the poor sensitivity even at the earliest trigger-test and onsettest intervals is that our selected trigger criteria still missed the window when viral load was highest. While the study protocol successfully prompted testing within a few hours of the trigger (short trigger-test intervals), there was a considerably longer time between the date the participant retrospectively reported first feeling sick and when they tested (long symptom onset-test intervals), typically about 72 h. The trigger in our study was contingent upon the presence of cough and thus participants likely experienced symptoms both specific and nonspecific to influenza before triggering their kit.
For an antigen based RDT such as the QuickVue, it is possible that trigger criteria prompting even earlier testing, possibly followed by serial testing, may increase the likelihood that the test is conducted within the window of peak viral shedding, when the sensitivity of the RDT will be maximized. This could be accomplished by relaxing the requirement for cough to be present and by including more general RVI symptoms in the criteria. However, this type of approach would have to be weighed against the potential costs of a greater number of tests needed at a population level and a potentially higher false positive rate if influenza prevalence is low. Conversely, test sensitivity was better for participants reporting a greater number of symptoms or impact on activities, which could potentially be valuable (in addition to, or instead of) the criteria we used to optimize test performance for an at-home influenza RDT. To further understand what participant-reported factors could be related to periods of high viral load, we examined test performance, influenza infection rate, and Ct value for two additional measures: number of symptoms reported and perceived impact of illness on daily activities. Participants who reported the most symptoms and the greatest impact on their daily activities had lower Ct values and higher RDT sensitivity. Furthermore, participants who reported that they felt worse than when they have a typical cold were more likely to have influenza, as were participants who thought they were sick with influenza rather than a cold or another illness. Together, this evidence suggests that objective presentation (such as number of symptoms) and the participant's subjective perception of illness might serve as effective guides for when to test or provide additional credence to test results. Indeed, several clinical prediction rules (CPRs) have been shown to be moderately effective at predicting influenza on their own, with high-risk likelihood ratios (LR) in the range of 4 to 7.8 and a low-risk LR of 0.06 to 0.72 (39,40), comparable with the positive and negative LRs found for the user-interpreted QuickVue test in this study (15.3 and 0.69, respectively). CPRs, however, face the challenge of distinguishing influenza from other ILI such as RSV and COVID-19 (41)(42)(43), preserving the crucial role of diagnostics in confirming the causative pathogen. An additional, promising, source of trigger criteria that is actively being researched is person-generated health data gathered by consumer wearable sensors. The physiological measures that these sensors capture, such as resting heart rate, change in activity level, or peripheral oxygen saturation (SpO 2 ), etc., could provide indicators that a person is unwell before they are aware of any symptoms and may provide a method to differentiate between RVIs with symptom overlap, such as influenza and COVID-19 (44,46,47).
Comparisons to other studies. Previous studies have reported uniform high specificity of the QuickVue A1B test and highly variable sensitivities ranging from 19% to 96% (17)(18)(19)(20)(48)(49)(50). Studies conducted at inpatient or hospital settings tend to find higher sensitivity of the RDT, which may be due to more severe infection and higher viral load (38,51), or the use of pooled nasal and throat swabs or nasopharyngeal aspirate (50) compared with studies in community settings that used nasal swabs (18), as ours did. Other studies have found higher viral load and higher sensitivity of the QuickVue RDT in children (48,50), raising the possibility that the low sensitivity found in our study may be due in part to an overall lower viral load in our adult, less-severely-ill, at-home population.
To our knowledge, only three peer-reviewed studies (52)(53)(54), including a pilot of the flu@home study conducted in 2018, report the performance of an at-home influenza RDT. The pilot flu@home study authors attribute the 14% sensitivity they found for the QuickVue in part due to delays in testing after illness onset-nearly 80% of RDTs were used four or more days after illness onset. The other two studies differ in that they utilized an antigen-based RDT with fluorescent detection that was designed specifically for home-use. These studies report moderate sensitivity (61% and 72.7%) and specificity (95% and 96.2%) of the RDT. While one of them used a regular symptom survey to prompt testing, it was reported weekly, not daily.
Strengths, limitations, and future directions. Our study design had several strengths. Using prospective recruitment, we were able to preposition test kits in a large national sample prior to onset of influenza season and used daily symptom surveys over a 4-month period to trigger use of the at-home RDT and reference swabs. This design facilitated testing within just hours of when trigger symptoms were reported, as well as collection of precise time points throughout the study protocol. While our study was conducted during the outset of the COVID-19 pandemic and mitigation efforts could have impacted the prevalence of influenza, the overall influenza burden for the 2019-2020 season was similar to other years (55) and there was a high proportion of individuals with influenza among the participants we tested.
We are aware of several limitations. First, our study population was younger and less diverse in terms of education, income, and race/ethnicity than the U.S. population overall, likely due to the inclusion criteria that participants must own and use a Fitbit and possibly due to the targeted recruitment of individuals who engage with an online health platform. While this potentially limits the generalizability of the results, the population was composed of active users of digital health tools who are likely representative of early adopters of home-health technology such as an at-home RDT. Future studies should use broader recruitment criteria that do not require use of Fitbits and include more generalizable populations. While we did not conduct a cost-effectiveness analysis, only one fifth of the tests sent to participants were used, which amounts to a non-negligible sunk cost in materials and distribution fees. The approach may still be feasible for test developers seeking to validate an at-home diagnostic, and future researchers may be able to devise methods for participants to return unused tests to recover material costs. Left as-is, the expense of the study may pose a barrier to widespread implementation of this model.
Second, we achieved a high study completion rate but there were a number of protocol deviations that could have been avoided. A small number of samples experienced temperatures in excess of 25°C, outside the recommended range of the UVT used in this study. While samples are likely to tolerate freezing excursions, overheating events may impact sample stability. In our case, all "overheated" samples were negative by PCR and by index test, and the number (12) was small enough that at most one false negative result was likely (given an overall FN rate of 7%); however a, higher proportion of heat excursions could have negatively impacted results. Future studies should take into consideration climate and season during which the samples will be shipped or consider using dry swab collection instead of UVT. Dry swabs have been shown to be stable at high and low temperature excursions for SARS-CoV-2 (56,57) and future studies may show similar stability for influenza. In addition, 51 participants did not return a reference swab, or returned a swab that was unable to be tested. Future studies could address this drop-off by conducting reference sample collection before the index test, and by linking compensation to the confirmed scanning of a shipping label or reception by the post office or courier service.
Third, this study opted to use the QuickVue test, a "subjective read" RDT (srRDT) that requires visual interpretation of the test line. Despite the app-based step-by-step instructions for test interpretation, including photo examples of negative, positive, and invalid test results, participants still made errors when interpreting the RDT at a higher rate than experts, confirming the importance of photos to validate index test results when using a srRDT. Furthermore, some participants failed to upload a picture or uploaded one that was too blurry or dark to be used, resulting in a loss of data. To help mitigate the loss of photos, future iterations of an app could consider automatic flashes, better auto-focusing, local storage of the photo file in the absence of a cellular or Wi-Fi connection, and study kits that include a stand to hold the phone steady and at the appropriate distance. The failure of some participants to correctly interpret this srRDT suggests that other types of digital aids, such as computer vision (CV) assisted interpretation of the RDT, ought to be developed, in addition to the step-by-step instructions for conducting the RDT.
Alternatively, some at-home diagnostics do not require subjective interpretation of the results. Digital RDTs (dRDTs) use electronics in the test cassette or an external instrument to calculate a qualitative positive/negative/invalid result from a chromatographic or fluorescent signal and tend to be more sensitive than srRDTs (58,59). Home-based molecular RDTs, such as those that have been approved by the FDA for detection of SARS-CoV-2 (11,12), appear to be highly sensitive and also do not require subjective interpretation of a test result. Both dRDTs designed for at-home use, such as those used by Geyer et al. and Heimonen et al. (53,54), and molecular RDTs thus present a good option to improve the diagnostic utility of at-home testing. However, subjective read RDTs are considerably less expensive than either of these alternatives and have been widely used during the COVID-19 pandemic for at-home diagnosis. It is crucial to understand the strengths and limitations of this type of assay, particularly when it comes to interpretation of test lines and improving sensitivity. A human-centered assessment of subjective read RDT design could effectively pinpoint weaknesses when the test is operated by a lay user and could inform future design strategies for at-home RDTs (60,61).
Lastly, we selected only one set of trigger criteria that included symptoms widely cited as ILI. Our results suggest that using earlier signals of illness onset may enable testing when viral load is highest, improving test sensitivity. Future studies should therefore employ other individual test triggers or combinations of test triggers to better understand their performance as trigger criteria and their relation to periods of high viral load. Such criteria could include less-specific symptoms, input from wearable devices, and participant-reported illness experience, such as impact on activities and rapid onset of illness.
Implications for patients, policymakers, clinicians, and researchers. Our findings demonstrate that individuals are capable of collecting a nasal sample and performing and interpreting an RDT without clinical oversight when guided by instructions and process controls in a companion app. Enabling easy access to RDTs for RVIs in at-home environments could boost public health efforts by providing individuals with information they can use early in their illness to make the best decisions for their care, including infection control efforts to reduce transmission. Despite the younger and potentially more technologically aware study population and the use of app-guided interpretation of the RDT complete with photo examples of positive, negative, and invalid results, our study did identify a minority of participants (n = 36) who misinterpreted the test result, which could lead to incorrect decision making. The impact of these errors, especially "wrong line" errors, could be compounded by the development of at-home RDTs multiplexed to detect influenza and SARS-CoV-2 (62), which could have more lines and may be more complex to interpret. In these cases, digital aides for test interpretation, such as automated CV and artificial intelligence (AI)-driven interpretation of test strips, could provide important adjuncts to support safe use and appropriate follow-up care.
While our study deployed an RDT not approved or authorized for at-home settings, our results suggest that tests of this level of complexity can be completed successfully by individuals when accompanied by digital guidance. Device regulators and clinicians should determine whether additional criteria should be required to recommend use of the test based on its known sensitivity, and whether to impose more stringent criteria or serial testing for less sensitive tests (63). Furthermore, given the possible variation in symptoms associated with different strains of influenza, other RVI, and presence of comorbidities (64-66), a symptom-based approach to triggering of self-tests may be inherently limited. Researchers may leverage study designs similar to this to generate more evidence for the use of physiological markers from wearable sensors or participant-reported factors in guiding at-home test use (47).
By prepositioning app-supported influenza at-home tests at the start of the study and thus empowering individuals to test shortly after the onset of target symptoms, our study provides support for a form of self-testing that could potentially apply not only to influenza, but to other RVI. While our findings do not support the use of the QuickVue A1B RDT for at-home influenza testing, antigen-based immunochromatographic assays remain the most effective means to get diagnostics into at-home settings and are playing a crucial role in the COVID-19 pandemic (67,68), and thus all potential avenues to improve their performance should continue to be explored. Our results should prompt the exploration of more sophisticated triggers indicating the onset of infection, such as resting heart rate or SpO 2, which the participant may not be aware of, but which can be detected by wearable sensors. Our results should also encourage research and innovation in digital aids that help participants execute testing procedures, and future researchers may leverage our study design and lessons learned to serve as a template for studying the performance of at-home test use, for individual care and public health surveillance related to influenza and other RVI.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. SUPPLEMENTAL FILE 1, PDF file, 1.1 MB.

ACKNOWLEDGMENTS
Audere's participation in this work was supported, in whole or in part, by the Bill & Melinda Gates Foundation (#1204286) (Seattle, WA, USA, http://www.gatesfoundation.org/). The University of Washington received grants from Audere for this study. Evidation's participation was supported, in whole or in part, with Federal funds from the Department