Diagnostic performance of COVID‐19 serological assays during early infection: A systematic review and meta‐analysis of 11 516 samples

Abstract Objective The use of coronavirus disease 2019 (COVID‐19) serological testing to diagnose acute infection or determine population seroprevalence relies on understanding assay accuracy during early infection. We aimed to evaluate the diagnostic performance of serological testing in COVID‐19 by providing summary sensitivity and specificity estimates with time from symptom onset. Methods A systematic search of Ovid MEDLINE, Embase, Cochrane Central Register of Controlled Trials (CENTRAL) and PubMed was performed up to May 13, 2020. All English language, original peer‐reviewed publications reporting the diagnostic performance of serological testing vis‐à‐vis virologically confirmed SARS‐CoV‐2 infection were included. Results Our search yielded 599 unique publications. A total of 39 publications reporting 11 516 samples from 8872 human participants met eligibility criteria for inclusion in our study. Pooled percentages of IgM and IgG seroconversion by Day 7, 14, 21, 28 and after Day 28 were 37.5%, 73.3%, 81.3%, 72.3% and 73.3%, and 35.4%, 80.6%, 93.3%, 84.4% and 98.9%, respectively. By Day 21, summary estimate of IgM sensitivity was 0.872 (95% CI: 0.784‐0.928) and specificity 0.973 (95% CI: 0.938‐0.988), while IgG sensitivity was 0.913 (95% CI: 0.823‐0.959) and specificity 0.960 (95% CI: 0.919‐0.980). On meta‐regression, IgM and IgG test accuracy was significantly higher at Day 14 using enzyme‐linked immunosorbent assay (ELISA) compared to other methods. Conclusions Serological assays offer imperfect sensitivity for the diagnosis of acute SARS‐CoV‐2 infection. Estimates of population seroprevalence during or shortly after an outbreak will need to adjust for the delay between infection, symptom onset and seroconversion.


| INTRODUC TI ON
On March 12, 2020, the World Health Organisation (WHO) declared the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and associated coronavirus disease 2019 (COVID- 19) as a pandemic. Unprecedented control measures have been put in place in an effort to reduce transmission, which at their peak is estimated to have covered a third of the global population. [1][2][3] Despite these measures, by July 15, 2020, 13.5 million confirmed cases had been reported worldwide with 581 221 deaths-a crude case fatality rate of 4.3%. 4 However, the true number of infections (and deaths) is likely to be substantially higher due to the large proportion of infections which are undiagnosed because of atypical, mild or absent symptoms, or unconfirmed because testing was not available. [5][6][7][8] Serological assays have the potential to play an important role in the surveillance of COVID-19. The results of early seroprevalence studies have indicated that during the first wave of the COVID-19 pandemic, up to 10% of the population of Wuhan, China may have been infected, and up to 33% in cities of other countries have experienced large outbreaks. 9,10 Serology can also be used for the diagnosis of acute infection and can form an important tool in containment strategies by identifying and linking clusters of infection. 11 If a serological immune correlate of protection can be found, these assays may also form part of exit strategies from control measures. 12 While serological assays have been reported to have high sensitivity and specificity for SARS-CoV-2 infection, this reflects diagnostic performance during convalescence. 13 Understanding antibody kinetics during early SARS-CoV-2 infection is critical for assessing the accuracy of diagnostic serological results and for interpreting the results of seroprevalence studies. To address this issue, we conducted a systematic review and meta-analysis to evaluate the diagnostic performance of serological assays in early COVID-19, when compared to polymerase chain reaction (PCR) as the gold standard.

| MATERIAL S AND ME THODS
This review was conducted in accordance to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. 14

| Search strategy and study selection
A search string comprising synonyms of "COVID-19" and "serological assays" was applied to the following databases: Ovid MEDLINE, Embase, Cochrane Central Register of Controlled Trials (CENTRAL) and PubMed from November 1, 2019 to May 13, 2020 (Table S1). Studies were screened independently by two reviewers (JJYZ and KSL) with disagreements resolved by consensus or appeal to a third senior reviewer (BEY). Agreement between the reviewers on study inclusion was evaluated using Cohen's κ. 15 All English language, original peer-reviewed publications reporting the diagnostic performance of serological testing in comparison with virologically confirmed SARS-CoV-2 infection were included.
Specific inclusion and exclusion criteria are outlined in Table S2.

| Risk of bias assessment
The quality of included studies was assessed using QUADAS-2 (Table S3 and Figure S1). 16 In summary, the QUADAS-2 tool consists of four key domains that discuss patient selection, index test, reference standard and flow of patients through the study and timing of the index tests and reference standard. Two researchers (KSL and CWO) assessed the quality of all included studies and discussed discrepancies until consensus was reached.

| Data extraction and outcome measures
Data were extracted on the following variables: study, sample and patient details, method of diagnosis, type of blood sample and immunoassay, commercial name of test kit, cut-off values

| Statistical analysis
Random effects models were used for meta-analyses of variables and end points. 17 Pooled proportions were computed with the inverse variance method using the variance-stabilizing Freeman-Tukey double arcsine transformation. 18 Confidence intervals (CI) for individual studies were calculated using the Wilson Score confidence interval method with continuity correction. The I 2 statistic was used to present between-study heterogeneity, where I 2 ≤ 30%, between 30% and 50%, between 50% and 75%, and ≥75% were considered to indicate low, moderate, substantial and considerable heterogeneity, respectively. 19 P values for the I 2 statistic were derived from the chi-square distribution of Cochran Q test. For pooling of means of numerical variables, we computed missing means and standard deviations (SDs) from medians, ranges and interquartile ranges using the methods proposed by Hozo with additional likelihood-ratio tests was performed to evaluate for any significant effects of covariates. In addition, univariate meta-analysis was also done using random effect estimation with the DerSimonian-Laird method to produce pooled diagnostic odds ratios. 24 Publication bias was assessed using funnel plots and Egger's regression test, based on a weighted linear regression of the treatment effect on its standard error. 25,26 All statistical analyses were performed using R software version 3.4.3 , with the packages meta and mada. 27,28 P values less than .05 were considered statistically significant.  Table S4.

| Study characteristics
The type of immunoassay used was specified in 47 studies.

| Quality assessment with QUADAS-2
Among the 39 publications, the proportion of studies with low, high and unclear risk of bias and concerns regarding applicability are summarized in Table S3 and Figure S1. More than half of the studies were found to be at high risk of bias under the domains "Patient Selection and Index Test." The main causes for high risk of bias were due to non-cohort study designs and cut-off values not being reported for the respective serological assays.

| Characteristics of patients and controls
Of the 11 516 samples analysed in total, 5743 were taken from laboratory-diagnosed COVID-19 patients, 5265 were from healthy controls and 508 were from patients infected with human coronaviruses (229E, HKU1, NL63, OC43 and SARS-CoV (using convalescent samples)), influenza A and B, and other respiratory pathogens.   (Table 1 and Figure 1). No evidence of publication bias for IgM seroconversion was identified on Egger's regression test (P = .69; Figure S4).

| IgM Seroconversion
Subgroup meta-analysis identified no significant difference in IgM seroconversion rates between the various types of immunoassay (P = .44; Figure S5).  Figure 1). No evidence of publication bias for IgG seroconversion was identified on Egger's regression test (P = .70; Figure S4). Subgroup meta-analysis identified no significant difference in IgG seroconversion rates between the various types of immunoassay (P = .56; Figure S7).  Table S5. On meta-regression, ELISA was found to have higher specificity values than CLIA (P = .011), with similar sensitivity values (P = .67).

| Diagnostic accuracy of IgM testing
The likelihood-ratio test further suggested a significant difference in test accuracy values with CLIA versus ELISA (χ 2 = 6.62, P = .036).
ELISA also demonstrated higher specificity values than ICA (P = .021), with comparable sensitivity values (P = .51). On further comparison using the likelihood-ratio test, however, the difference in diagnostic accuracy between the two tests was only close to statistical significance (P = .072).

| D ISCUSS I ON
Our meta-analysis indicated that the overall sensitivity and specific-  while cross-reactivity with other human coronaviruses was not evident from the diagnostic performance of assays included in our review, this may become an issue with less specific but more sensitive testing.
An additional question is how these tests will perform in the

| CON CLUS IONS
Our study demonstrated that the utility of serological assays is limited to IgG more than 14-21 days after symptom onset, when most infected individuals have seroconverted and high sensitivity and specificity values are attained. Diagnostic testing is hence likely to continue to require virological confirmation such as PCR, while seroprevalence studies will significantly underestimate the proportion of a population infected if conducted too early in the epidemic curve.
Longer-term studies would be beneficial to improve our understanding of the diagnostic performance of serological testing for COVID-19 in other epidemiological contexts such as endemic infection or future epidemics.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interests.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/irv.12841.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the