On the True Number of COVID-19 Infections: Effect of Sensitivity, Specificity and Number of Tests on Prevalence Ratio Estimation

In this paper, a formula for estimating the prevalence ratio of a disease in a population that is tested with imperfect tests is given. The formula is in terms of the fraction of positive test results and test parameters, i.e., probability of true positives (sensitivity) and the probability of true negatives (specificity). The motivation of this work arises in the context of the COVID-19 pandemic in which estimating the number of infected individuals depends on the sensitivity and specificity of the tests. In this context, it is shown that approximating the prevalence ratio by the ratio between the number of positive tests and the total number of tested individuals leads to dramatically high estimation errors, and thus, unadapted public health policies. The relevance of estimating the prevalence ratio using the formula presented in this work is that precision increases with the number of tests. Two conclusions are drawn from this work. First, in order to ensure that a reliable estimation is achieved with a finite number of tests, testing campaigns must be implemented with tests for which the sum of the sensitivity and the specificity is sufficiently different than one. Second, the key parameter for reducing the estimation error is the number of tests. For a large number of tests, as long as the sum of the sensitivity and specificity is different than one, the exact values of these parameters have very little impact on the estimation error.


Introduction
In the absence of a vaccination or effective medical treatment against the SARS-CoV-2, the global population must cohabitate with the virus. For succeeding in this task, different strategies to slow down the outbreak can be implemented, for example, encouraging social distancing, isolation of infected individuals, mobility restrictions, lockdowns, and contact tracing. The main objective is to guarantee that the number of infected individuals that develop critical forms of symptoms does not exceed the capacity of local health care systems. Nonetheless, most of the strategies to slow down the outbreak induce dramatic economical consequences, and thus, public health policies must be designed based on reliable predictions of the evolution of the pandemic to minimize undesired effects on the global economy. For doing so, estimating the values of variables such as the proportion of susceptible, infected and recovered individuals in the population, among other variables, is of paramount importance. This is due to the fact that such variables are the inputs of mathematical models that help to predict the evolution of the pandemic [1,2], and thus, impact public health policy-making. Reliable estimations of these variables can be achieved in part by testing the population. Nonetheless, diagnosing SARS-CoV-2 is a challenging task given that designing highly reliable tests for massive testing is still an open research problem, c.f., [3][4][5].
In the general realm of epidemiology, the reliability of tests is measured in terms of two parameters: sensitivity and specificity. The former is the probability with which a test is able to correctly identify the presence of a condition, for example, a SARS-Cov-2 infection. Alternatively, the latter is the probability with which a test is able to correctly identify the absence of such condition. Within this context, the main contribution of this work is a mathematical formula for estimating the fraction of individuals that exhibit the condition in a population in which every individual has been tested once with identical unreliable tests. In the following, this fraction is referred to as the prevalence ratio [2]. In these terms, the main result is Theorem 1 in Section 4, which presents an estimator of the prevalence ratio in terms of the sensitivity, specificity and the fraction of positive test results. More importantly, the estimation error induced by such estimator is proved to decrease with the number of tests.
The novelty of this work with respect to existing methods for estimating the prevalence ratio, such as the method of multipliers, capture and recapture methods, among others [2,6], is that it takes into account the effects of both false positive and false negative probabilities. This consideration has already been discussed by several authors, c.f., [7][8][9][10]. Nonetheless, a simple general formula for estimating prevalence ratios in terms of the sensitivity, specificity, and the fraction of positive test results is not available in current literature. This said, the prevalence ratio estimation presented in this work is based exclusively on the results of data obtained through testing campaigns with unreliable binary tests. The main hypotheses adopted in this work are: (a) Individuals are tested once and test results are independent of each other; and (b) the prevalence ratio is assumed constant during the duration of the testing campaign. This breaks away from the studies based on mathematical regressions in which some assumptions on the proabability distribution of the random variables are adopted and whose correctness is often the ground of vivid discussions, c.f., [11][12][13][14].
The main conclusions of this work are: (i) The number of positive tests might be drastically different than the number of infected individuals in a population depending on the sensitivity and specificity of the tests. Hence, the ratio between the number of positive tests and the total number of tested individuals is not a reliable estimation of the prevalence ratio; (ii) Testing campaigns using tests for which the sum of the sensitivity and specificity is different than one, always allow a reliable estimation of the number of infected individuals when a sufficiently large number of individuals is tested in the population (Lemma 1 in Section 4); (iii) Testing campaigns using a test for which the sum of the sensitivity and the specificity is equal to one, lead to data from which it is impossible to estimate the prevalence ratio independently of the number of tested individuals (Lemma 7 in Section 4); and (iv) When the objective is to estimate the prevalence ratio in a population, the key parameter for reducing the estimation error is the number of tests (Lemma 5 in Section 4). That is, as long as the sum of the sensitivity and specificity is different than one, and a large number of test results is available, the exact values of both sensitivity and specificity have very little impact on the estimation error.
The remaining sections of this paper are organized as follows: Section 2 presents a brief overview of the tests for diagnosing SARS-CoV-2 and the reliability of the existing tests; Section 3 formulates the problem of estimating the prevalence ratio taking into account the sensitivity and specificity of the tests; Section 4 presents an estimator of the prevalence ratio using data obtained from unreliable tests, and the proofs of the main results; Section 5 introduces some examples in which the impact of the sensitivity, specificity and number of tests on the estimation error is numerically analyzed; Section 6 concludes this work.

Case Study: SARS-CoV-2
Tests for SARS-CoV-2 can be broadly divided into three groups: virological tests, serological tests, and tests based on medical imaging. Each of these groups provide information about different aspects of the infection and exhibit different reliability parameters.

Virological Tests
Virological tests inform about the presence of the SARS-CoV-2 virus genome in nasopharyngeal (nasal swab) or oropharyngeal swabs (oral swab), blood, anal swab, urine, stool, and sputum samples [15]. Individuals with positive virological tests are declared capable of contaminating others, and thus, virological tests are central in decision-making and policy-making, c.f. [3,5].
The reliability of virological tests in terms of sensitivity and specificity depends on a variety of parameters. These parameters include the type of clinical specimen, the materials and methods used for obtaining the specimens, specimen transportation, viral density of patients, and human errors in data processing in laboratories. In the case of respiratory specimens, viral density appears to play a central role in the sensitivity and specificity of virological tests, c.f., [16,17]. This stems from the fact that during the first week after infection, the virus can be detected by nasopharyngeal or oropharyngeal swabs. During the second week and later, the virus might disappear in the upper parts of the respiratory system and migrate to the bronchial tube and the lungs. From the studies in [16,17], it appears that specimens from the lower respiratory track increase the sensitivity and specificity of virological tests.

Serological Tests
Serological tests determine whether an individual has developed anti-bodies or antigens against the SARS-CoV-2 virus. Nonetheless, an individual produces anti-bodies against SARS-CoV-2 only several days after contracting the infection. Typically, the time between infection and the production of anti-bodies ranges from seven to fourteen days, c.f., [24][25][26]. Serological tests are based on the enzyme linked immunosorbent assay (ELISA) and exhibit high specificity and sensitivity, after fourteen days of infections [24]. This drastically limits the use of serological tests in the early detection of the infection and policy-making, c.f., [3,4]. In a nutshell, on the one hand, a serological test answers the question whether an individual is or has been infected. On the other hand, serological tests do not allow determining whether an individual has immunity to the SARS-CoV-2 virus or whether the individual is currently spreading the virus. Up to the day of publication of this paper, serological tests are not considered for massive testing in France, c.f., [4].

Medical Imaging
Medical Imaging for detection of SARS-CoV-2 includes chest X-Ray and chest computed tomography (CT) scans, which reveal ground-glass opacities and consolidations in the periphery of the lungs of infected individuals [27]. Nonetheless, the sensitivity and specificity of CT depends on the experience of radiologists to distinguish SARS-CoV-2 pneumonia from non-SARS-CoV-2 pneumonia [28]. In [29], it is reported that the sensitivity of CT is better than the one achieved by RT-PCR tests.

Prevalence Ratio and Unreliable Tests
Consider a population subset of n individuals whose state is either susceptible (S) or infected (I) and assume that all individuals of this population subset are tested with the same type of test. Let the actual state of such n individuals be represented by the vector x x 1 , x 2 , . . ., x n . That is, for all t ∈ {1, 2, . . . , n}, it follows that x t ∈ {I, S} is the true state of the individual t. The result of testing individual t is denoted by y t ∈ {I, S}. Hence, the outcome of a testing campaign over such population is a vector y y 1 , y 2 , . . ., y n ∈ {I, S} n . Due to the fact that tests possess strictly positive probabilities of false negatives and false positives, the vectors x and y might be different. That is, some individuals that are infected could have been declared susceptible and vice versa.
A central observation in this analysis is that a test for determining whether an individual is contaminated by SARS-CoV-2 can be modeled by a random transformation P Y|X for which the input and output sets are {I, S}. More specifically, if an individual whose state is x ∈ {I, S} is tested, the result y ∈ Y is observed with probability P Y|X (y|x). Figure 1 shows this binary-input binary-output model. Using this notation, the sensitivity of the test is P Y|X (I|I); and the specificity of the test is P Y|X (S|S). The probability of a false positive is P Y|X (I|S) = 1 − P Y|X (S|S); and the probability of a false negative is P Y|X (S|I) = 1 − P Y|X (I|I). This said, a test is fully described by any of the following pairs of parameters: • The sensitivity and the specificity; • The sensitivity and the probability of a false positive; • The probability of a false negative and the specificity; or • The probability of a false negative and the probability of a false positive.
Let X be random variable taking values in {I, S} and denote by P X : {I, S} → [0, 1] its probability distribution such that P X (I) is the actual fraction of infected individuals among the n individuals. That is, P X (I) is the prevalence ratio of SARS-Cov-2 in this population subset. For this reason, the probability distribution P X is referred to as the ground-truth input probability distribution. Let Y be a second random variable taking values in {I, S} such that its joint probability distribution with X is P XY and for all (x, y) ∈ {I, S} 2 , where the conditional distribution P Y|X is the test. See, for instance, Figure 1. Often, the probability distribution P Y is referred to as the ground-truth output probability distribution and it is obtained as the marginal of P XY . That is, for all y ∈ {I, S}, The problem consists in using the data y obtained through a testing campaign with tests in which parameters are modeled by P Y|X to determine the fraction P X (I) of infected individuals in the population, i.e., the prevalence ratio. More formally, the problem can be stated as follows: Consider two random variables X and Y with the joint probability distribution P XY in (1). The problem consists in estimating the probability distribution P X based only on n realizations y 1 , y 2 , . . ., y n of the random variable Y, with n a finite integer. This problem is reminiscent to the problem of population recovery introduced in [30] and further studied in [31,32].

Estimation of the Prevalence Ratio Using Unreliable Tests
Given the data y ∈ {I, S} n collected during a test campaign, the fraction of the population reporting positive and negative tests form an empirical distribution denoted byP Y is a counting probability measure for which the valuesP In the following, such probability measure is often referred to as the output empirical distribution obtained from the data y.

LetP
(n) X : {I, S} → R be a function representing the estimation of P X based on the data y.
The error induced by estimating P X usingP (n) X can be measured by the total variation, which is denoted by P X −P (n) X TV and satisfies, Note that in the case of binary tests, the total variation is simply the absolute difference between the actual prevalence ratio P X (I) and the estimateP (n) X (I).

Main Result
The following theorem presents the main result of this work.
Theorem 1. Consider a population of n individuals whose true ratio of infected (I) and susceptible (S) individuals is P X (I) and P X (S) = 1 − P X (S), respectively, with P X (I) ∈ [0, 1]. Assume that all individuals of such population are tested with a test P Y|X that satisfies Y be the resulting output empirical probability distribution in (3) and assume thatP (n) Y (I) satisfies the following condition, Then, the estimatorP (n) , and (8a) forms a probability measure that satisfies In a nutshell, Theorem 1 states that approximating the prevalence ratio P X byP (n) X induces an error that vanishes when the number of tests n increases. Nonetheless, despite the fact that P (n) Lemma 4, it is shown that with a large number of test results, the fraction of positive resultsP (n) Y (I) satisfies the inequalities in (7). Note also that the condition in (7) is necessary and sufficient to observe that 0 P (n) X (I) 1 in Theorem 1. This highlights the need for a sufficiently large number of tests in order to obtain a valid estimation of P X (I) using Theorem 1.
Finally, note that the formulas in (8) are given in terms of the sensitivity P Y|X (I|I) and specificity P Y|X (S|S) of the test. Nonetheless, it can be expressed in terms of the probabilities of a false positive and a false negative, or any combination of the parameters describing the test. The following corollary shows the formulas in (8) in terms of the probabilities of a false positive P Y|X (I|S) and a false negative P Y|X (S|I). Corollary 1. Consider a population of n individuals whose true ratio of infected (I) and susceptible (S) individuals is P X (I) and P X (S) = 1 − P X (S), respectively, with P X (I) ∈ [0, 1]. Assume that all individuals of such population are tested with a test P Y|X that satisfies (6). LetP (n) Y be the resulting output empirical probability distribution in (3) and assume thatP (n) Y (I) satisfies condition (7). Then, the estimatorP (n) , and (10a) forms a probability measure that satisfies (9).

Proof of Theorem 1
The proof of Theorem 1 leverages the following intuition: Under the assumption thatP (n) Y , which is obtained from the data y as in (3), is a valid estimation of the ground-truth output probability distribution P Y , i.e., it satisfies (7), then a distributionP (n) X that satisfies is a good estimation of the input probability distribution P X . This intuition builds upon the observation that the output distributionP (n) Y induced by the data, must be the marginal of a joint distribution consisting of the product of the conditional P Y|X and the input distribution. That is, for all y ∈ {I, S}, which is equivalent to the system in (11).
With this intuition in mind, the proof proceeds as follows. First, it is shown that under the condition in (6), there exists a unique pair (P (n) X (I),P (n) X (S)) that satisfies the equality in (11). This is essentially due to the fact that the equality in (11) forms a linear system of two equations with two variables, and thus, if it is consistent, it has either a unique solution or infinitely many solutions.

Lemma 1. Consider the empirical output distributionP
(n) Y in (3) obtained by a test described by the conditional probability distribtuion P Y|X . Then, the following five statements are equivalent: • The system of equations in (11) has a unique solution; • The sensitivity P Y|X (I|I) and specificity P Y|X (S|S) satisfy P Y|X (I|I) + P Y|X (S|S) = 1; (12a) • The sensitivity P Y|X (I|I) and the probability of a false positive P Y|X (I|S) satisfy P Y|X (I|I) = P Y|X (I|S); and (12b) • The probability of a false negative P Y|X (S|I) and the specificity P Y|X (S|S) satisfy P Y|X (S|S) = P Y|X (S|I).
• The probability of a false positive P Y|X (I|S) and the probability of a false negative P Y|X (S|I) satisfy Proof. The proof of Lemma 1 follows from the fact that a unique solution to (11) is observed if and only if the determinant of the matrix is different than zero (Rouché-Fontené theorem [33]). That is, The proof is complete by verifying that the expression in (13) is equivalent to those in (12).
Note that all conditions in (12) are equivalent to each other, and thus, they are equivalent to the condition in (6).
The proof of Theorem 1 continues by showing that when such a unique solution exists, it is identical to the one shown in (8).

Lemma 2.
Consider a test P Y|X that satisfies at least one of the conditions in (12). Then, under the assumption that the empirical output distributionP (7), the unique probability distributionP (n) X that satisfies (11) is:P (n) , and (14a) Proof. The proof of Lemma 2 follows from solving the system of equations in (11) and observing that P (n) X is a probability measure if and only if condition (7) holds.
The rest of the proof of Theorem 1 consists of showing that the error vanishes with the number of test results. This is shown in three steps. The first step consists of showing that the total variation between P X andP (n) X , denoted by P X −P (n) X TV , is equivalent to the total variation between P Y and , up to a scaling factor.

Lemma 3.
Consider a test P Y|X that satisfies at least one of the conditions in (12). Then, under the assumption that the empirical output distributionP where P X and P Y are the input and output probability distributions in (1) and (2), respectively.
Proof. The proof of Lemma 3 follows from the definition of total variation in (4) and from equalities in (14).
Note that Lemma 3 proves the intuition over which the proof of Theorem 1 is based on. That is, Y is sufficiently close to P Y , thenP (n) X must be sufficiently close to P X . The following lemma shows that the more test results are available, the closerP (n) Y and P Y are in total variation. (12). Then, the empirical output

Lemma 4. Consider a test P Y|X that satisfies at least one of the conditions in
where P Y is the ground-truth output probability distribution in (2).
Proof. The proof of Lemma 4 is a consequence of the Theorem of Glivenko and Cantelli [34].
Finally, from Lemma 3 and Lemma 4, it holds that by increasing the number of tests, the error of approximating P X byP (n) X in (14) can be made arbitrarily small. The following lemma leverages this observation. (12). Then, under the assumption that the empirical output distributionP (n) Y in (3) satisfies (7), the input distribution P X and the estimationP This completes the proof of Theorem 1.

Connections to Maximum Likelihood Estimation
In this section, it is shown that the estimator presented in Theorem 1 is also the maximum likelihood estimator. For doing so, note that under the assumption that the prevalence ratio isP (n) X (I) ∈ [0, 1], the probability of observing y ∈ {I, S}, as the result of testing any of the individuals of the population with a test described by the conditional probability distribution P Y|X is: From this perspective, the probability of observing the vector y = (y 1 , y 2 , . . . , y n ), as the result of a testing campaign over a population of n individuals is (3) and (18) where H P (n) Y denotes the entropy of the probability distributionP (n) where the equality holds if and only if D P (n) Y are identical. This observation leads to the conclusion that the log-likelihood function is maximized when the assumed prevalence ratioP (n) (18) are identical, which is induces the system of equations in (11) and in which the unique solution is formed by the equalities in (8). This proves that the estimator in Theorem (1) is the unique maximum likelihood estimator.

Final Remarks
This section highlights some of the conclusions drawn from Lemma 1-5 using a numerical analysis in particular examples. In the following examples, the data is artificially generated. That is, for a given prevalence ratio P X (I), an n-dimensional vector x = x 1 , x 2 , . . ., x n ∈ {I, S} n is generated such that for all t ∈ {1, 2, . . . , n}, x t is a realization of a random variable X ∼ P X and represents the state of individual t. Given a test P Y|X , an n-dimensional vector y = y 1 , y 2 , . . ., y n ∈ {I, S} n is generated such that for all t ∈ {1, 2, . . . , n}, y t is the realization of a random variable Y t ∼ P Y|X=x t and represents the result of the test of individual t. Using the vector y, the fraction of positive testsP  X (I) of the prevalence ratio P X (I) is calculated using (8a). Figure 2 shows this procedure.  From this perspective, the analysis is based on simulated testing campaigns. Note that the use of simulated data allows knowing the actual prevalence ratio, which enables analyzing the estimation error. This is rarely possible with data from actual testing campaigns. Example 1. Consider a population of n = 10, 000 individuals with prevalence P X (I) = 0.4. Assume that all individuals are tested with identical tests P Y|X .

Example 2.
Consider a population of n = 100, 000 individuals with prevalence P X (I) = 0.4. Assume that all individuals are tested with identical tests P Y|X . Example 3. Consider a population of n = 100, 000, 000 individuals with prevalence P X (I) = 0.4. Assume that all individuals are tested with identical tests P Y|X .
In Figures 3-8, the actual prevalence ratio P X (I) is plotted with a straight black line; the estimation P (n) X of P X is plotted with red circles; the fraction of positive testsP   X (I) (red circles) is calculated using a single vector y generated by the same vector x, according to the corresponding values of sensitivity P Y|X (I|I) and specificity P Y|X (S|S), as described above. In the following sections, some remarks based on these examples are presented.

Relevance of the Sensitivity and Specificity
One of the main observations to be highlighted from this numerical analysis is that there exists an important difference between the fraction of positive testsP (n) Y (I) and the actual prevalence ratio P X (I) due to the sensitivity and specificity of the tests. This difference is clearly depicted in Figures 3-8, which together with the mathematical analysis presented before, highlights the conclusion that the fraction of positive tests should not be used as an estimation of the prevalence ratio in public health policy-making.
The following lemma determines the influence of the sensitivity and specificity onP (n) Y (I). For doing so, note that from Lemma (2), it holds that the fraction of individuals reporting positive tests Lemma 6. Consider a test P Y|X that satisfies at least one of the conditions in (12). Then, given the empirical (3) and assuming that it satisfies (7), the following statements hold: • The fractionP

Tests whose Results are Useless
In Figures 3-8, the value of the sensitivity P Y|X (I|I) and specificity P Y|X (S|S) that satisfy P Y|X (I|I) + P Y|X (S|S) = 1 are plotted with a blue dash-dot vertical line. Note that for these specific values of sensitivity and specificity, the estimationP (n) X (I) of P X (I) is not plotted. The following lemmas shed some light into this singularity.

Lemma 7.
Consider the empirical output distributionP (n) Y in (3) obtained by a test described by the conditional probability distribtuion P Y|X . Then, the following five statements are equivalent: • The system of equations in (11) has infinitely many solutions; • The sensitivity P Y|X (I|I) and specificity P Y|X (S|S) satisfy P Y|X (I|I) + P Y|X (S|S) = 1; (27a) • The sensitivity P Y|X (I|I) and the probability of a false positive P Y|X (I|S) satisfy P Y|X (I|I) = P Y|X (I|S); • The probability of a false negative P Y|X (S|I) and the specificity P Y|X (S|S) satisfy P Y|X (S|S) = P Y|X (S|I); and (27c) • The probability of a false positive P Y|X (I|S) and the probability of a false negative P Y|X (S|I) satisfy Proof. The proof of Lemma 7 follows from the theorem of Rouché and Fontené [33] that states that when the system in (11) is consistent, it has infinitely many solutions if the determinant of the matrix is not full rank. When such a matrix is not full rank, its determinant is zero. That is, The proof is completed by verifying that the expression in (28) is equivalent to those in (27).
When at least one of the equalities in (27) is satisfied, nothing meaningful can be said about P X based on the data. This is essentially because any probability distributionP (n) X satisfies the equality in (11). The following lemma reinforces this statement in terms of information measures.

Lemma 8.
Consider a test P Y|X that satisfies at least one of the conditions in (27). Hence, the following statements are equivalent: • Given the output empirical distributionP (n) Y obtained from the data y as in (3), any probability distribution P (n) X on {I, S} satisfies the equality in (11); • Two random variables X and Y, in which the joint probability distribution P XY satisfies (1), have zero mutual information; and • Two random variables X and Y, in which the joint probability distribution P XY satisfies (1), are independent.
Proof. The first statement is a consequence of Lemma 7; the second statement follows from the fact that under any of the assumptions in (27), the mutual information satisfies where P X , P Y|X , and P Y satisfy the equality in (1). The third statement follows from the fact that two random variables are independent if and only if their mutual information is zero.
Lemma 8 shows that when at least one of the conditions in (27) holds, the output probability distribution P Y does not provide any information about the input probability distribution P X . That is, nothing can be said about P X based on the data y.
Despite the singularity, the values of specificity and sensitivity in which the sum is close to one, i.e., around the singularity, are also worthy of discussion. Note that for some 1 > > 0, the absolute difference P X (I) −P (n) X (I) is bigger when the sensibility and specificity satisfy 1 − P Y|X (S|S) − P Y|X (I|I) < than when these parameters satisfy 1 − P Y|X (S|S) − P Y|X (I|I) > . These observations are justified by the fact that the total variation P X −P up to a constant factor, as shown in Lemma 3. Such a factor is indeed 1 |1−P Y|X (I|I)−P Y|X (S|S)| , and thus, larger errors are expected around the singularity for the same finite numbers of tests n. This is evident in the numerical analysis. In Example 1, i.e., Figures 3 and 4, around the singularity, the estimationsP (n) X of P X appear more disperse than the estimations in Example 3, i.e., Figures 7 and 8.   X (I) of the prevalence ratio P X (I). This is independent of the exact values of the specificity and sensitivity as long as (12) holds. More importantly, the reliability of such estimation increases with the number of test results. For instance, compare the estimations in Examples 1 and 3. The implications of this observation are very important in practical terms. This shows that if the objective of a testing campaign against SARS-CoV-2 is to determine the prevalence ratio, the quality of the tests is not important. This is essentially because testing with low quality tests (low sensitivity and low specificity) or high quality tests (high sensitivity and high specificity) leads to identical results in terms of the estimation error, when a large number of tests is performed. Nonetheless, when a low number of tests is available, it is worth noting that when the sensitivity P Y|X (I|I) and specificity P Y|X (S|S) satisfy 1 − P Y|X (S|S) − P Y|X (I|I) > 1 − , for some 0 < < 1, the smaller , the smaller the estimation error of the prevalence ratio, c.f., Lemma 3. This observation is of paramount importance as it implies that smaller estimation errors are observed when the sum of the sensitivity and specificity is bounded away from one. This said, the key parameter for reducing the estimation error is the number of tests.

Conclusions
In this work, it has been shown that estimating the prevalence ratio of a condition, for example, a SARS-Cov-2 infection, by the ratio between the number of positive test results and the total number of tests leads to excessive estimation errors when tests are unreliable. This is simply due to the fact that unreliable tests, i.e., tests in which probabilities of false positives and false negatives are nonzero, lead to some individuals exhibiting the condition to observe negative test results (false negatives), and some individuals who do not exhibit the condition to observe positive results (false positives). From this perspective, an estimation of the prevalence ratio using data obtained from tests must take into account both the sensitivity and the specificity of the tests. Theorem 1 provides an estimation of the prevalence ratio with an estimation error that decreases with the number of tests.
Another important conclusion of this work is that testing campaigns using tests for which the sum of the sensitivity and specificity is different than one, always allow a reliable estimation of the prevalence ratio (Lemma 1 in Section 4) subject to a sufficiently large number of individuals being tested. Alternatively, testing campaigns using tests for which the sum of the sensitivity and the specificity is equal to one, lead to data from which it is impossible to estimate the prevalence ratio even with infinitely many tests (Lemma 7 in Section 4).
A final conclusion is that for estimating the prevalence ratio of a given condition, i.e., a SARS-CoV-2 infection, the key parameter for reducing the estimation error is the number of tests. Surprisingly, as long as the sum of the sensitivity and specificity of the tests is different than one, the exact values of both sensitivity and specificity have very little impact in the estimation when the number of tests is sufficiently large.