• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Med. Author manuscript; available in PMC May 20, 2011.
Published in final edited form as:
PMCID: PMC2879599
NIHMSID: NIHMS202362

On the Estimation of Disease Prevalence by Latent Class Models for Screening Studies Using Two Screening Tests with Categorical Disease Status Verified in Test Positives Only

Summary

To evaluate the probabilities of a disease state, ideally all subjects in a study should be diagnosed by a definitive diagnostic or gold standard test. However, since definitive diagnostic tests are often invasive and expensive, it is generally unethical to apply them to subjects whose screening tests are negative. In this article, we consider latent class models for screening studies with two imperfect binary diagnostic tests and a definitive categorical disease status measured only for those with at least one positive screening test. Specifically, we discuss a conditional independent and three homogeneous conditional dependent latent class models and assess the impact of misspecification of the dependence structure on the estimation of disease category probabilities using frequentist and Bayesian approaches. Interestingly, the three homogeneous dependent models can provide identical goodness-of-fit but substantively different estimates for a given study. However, the parametric form of the assumed dependence structure itself is not “testable” from the data, and thus the dependence structure modeling considered here can only be viewed as a sensitivity analysis concerning a more complicated non-identifiable model potentially involving heterogeneous dependence structure. Furthermore, we discuss Bayesian model averaging together with its limitations as an alternative way to partially address this particularly challenging problem. The methods are applied to two cancer screening studies, and simulations are conducted to evaluate the performance of these methods. In summary, further research is needed to reduce the impact of model misspecification on the estimation of disease prevalence in such settings.

Keywords: maximum likelihood, Bayesian inference, diagnostic test, dependence, screening, latent class models

1. Introduction

Screening for a specific disease or condition is a fundamental component of human disease control and prevention. The objective of screening is to classify asymptomatic people as likely or unlikely to have the disease or condition of interest. People who appear likely to have the disease or condition are examined further for a diagnosis, and those people who are diagnosed with the disease are treated. Therefore, screening can reduce the morbidity and mortality of the disease among people screened and can enable early treatment for diagnosed cases. Screening programs for cancer and heart diseases are well established in many countries. In many screening programs, a population with known size n is screened by two imperfect binary diagnostic tests. If the results of both diagnostic tests are negative, no further screening is undertaken. If either of the two diagnostic tests is positive, then a full evaluation of the disease using a gold standard classification is undertaken [1].

For estimating diagnostic accuracy without a gold standard, it is well known that if the conditional independence assumption is incorrectly assumed, parameter estimates may be biased [24]. When the disease status D is a binary random variable, Albert and Dodd [5] showed that the estimation of diagnostic accuracy and prevalence is sensitive to the choice of dependence structure for studies with multiple diagnostic tests. The dependence structure was specified using a Gaussian random effects model [6,7] and a finite mixture model [8]. They showed that it is difficult to distinguish between different dependence structures in the absence of a gold standard test in most practical situations (i.e., unless there are more than 10 tests). Albert [9] proposed methods for estimating diagnostic accuracy of multiple binary tests with an imperfect reference standard when information about the diagnostic accuracy of the imperfect test is available from external data sources. Furthermore, using the same dependence structure, Albert and Dodd [10] examined the effect of model misspecification on the estimation of test accuracy and prevalence when a binary gold standard is partially verified. They showed that for extreme biased sampling the estimation is sensitive to the choice of dependence structure. Other latent class models with a focus on diagnostic accuracy have also been considered in a single study [11,12] as well as in a meta-analysis [13]. In addition, Black and Craig [14] discussed the estimation of disease prevalence in a scenario involving two imperfect tests in the absence of a gold standard and proposed Bayesian model averaging for inference over the conditional independence and dependence models. However, those dependence models are not directly applicable in the setting that we are considering because there are only two diagnostic tests, and more importantly, if both diagnostic tests are negative, no further gold standard classification will be applied.

Let T1, T2, and D be the random variables denoting the two screening tests and the disease status, respectively. In this article, we consider T1 and T2 to be binary variables with value 1 indicating test positive and 0 indicating test negative, and D to be a categorical variable with value d = 1, 2,, K indicating the classes of disease. Let xijd be the observed frequency with D= d, T1 = i and T2 = j (i = 0, 1 and j = 0, 1), xij=dxijd be the observed frequency with T1 = i and T2 = j, and n=ijxij be the total number of observations. Furthermore, let πij = P(T1 = i, T2 = j), πi+=jπij,π+j=iπij, Pd = P(D = d), πijd=P(D=dT1=i,T2=j),Pijd=P(T1=i,T2=jD=d),Pi+d=jPijd,P+jd=iPijd, and Pijd=P(T1=iT2=j,D=d) denote the corresponding joint, marginal and conditional probabilities. In most studies, we only observe frequencies of x11d,x10d and x01d due to the nature of screening. The frequencies of x00d are usually not observed, although the margin x00=dx00d is observed. The data structure contains 3d+1 observed frequencies, which in general allows for the estimation of a maximum of 3d free parameters. In this paper we will not consider special cases when only less than 3d free parameters are identifiable, e.g., when there are zeros among the observed frequencies.

One way to write the likelihood function (ignoring constant terms) in this setting is in terms of Pd and Pijd (d = 1, 2,, K; i = 0, 1; j = 0, 1) with constraints of dPd=1 and ijPijd=1 as follows,

x00log(dP00dPd)+dx11dlog(P11dPd)+dx10dlog(P10dPd)+dx01dlog(P01dPd).
(1)

This parameterization involves a mixture likelihood in the first term and prevents a closed-form solution for the maximum likelihood estimators (MLE). It contains 4d−1 free parameters. Without further assumptions, the parameters in equation (1) are not identifiable. However, this parameterization allows for direct specification of commonly used assumptions, usually specified through some constrains on Pijd. For example, the frequently used conditional independence assumption [15,16] assumes that the two tests T1 and T2 are independent conditioning on the disease status D, i.e., T1[perpendicular]T2|D, and the number of free parameters in equation (1) is reduced to 3d−1 giving model identification since P11d=P1+dP+1d,P10d=P1+d(1P+1d),P01d=P1+d(1P1+d), and P00d=(1P1+d)(1P+1d). For convenience, we denote the conditional independence model as the [perpendicular] model with P^d as the corresponding MLEs. Under the homogeneous dependence assumptions (i.e., the α, θ, and ρ models that will be discussed in Sections 2 and 3), the number of free parameters in equation (1) is reduced to 3d and the models become saturated and equivalent to the alternative parameterization below [17].

An alternative parameterization of the log-likelihood function can be written in terms of πij (i = 0, 1; j = 0, 1) and πijd(i+j>0and d=1,2,,K) with constraint of ijπij=1 as follows,

ijxijlog(πij)+dx11dlog(π11d)+dx10dlog(π10d)+dx01dlog(π01d).
(2)

This representation relates to previous work in other settings [1820]. This model is a saturated model with 3d parameters. The maximum likelihood equations are tractable and yield MLEs in closed-form. Omitting the algebra, we obtain, [pi]ij = xij/n (i, j =0, 1), and π^ijd=xijd/xij if i + j>0. The existence of closed-form solutions for this alternative parameterization allows for closed- form solutions for the α and θ homogeneous conditional dependent models [17], which will be briefly discussed in detail in Sections 2.1 and 2.2. Furthermore, this saturated alternative parameterization also suggests that the probability of having disease class d for those with both tests negative (i.e., π00d), and further the overall probability of having disease class d (i.e. Pd=ijπijπijd) are not identifiable without some additional “non-testable” assumptions. Thus, the dependence structure modeling considered in this paper itself is not “testable”, and can only be viewed as a sensitivity analysis for the estimation of disease prevalence. Similar to generalized linear models with non-ignorable missing data mechanism [21], the type of sensitivity analyses play an important role in the estimation and inference in this problem.

In similar settings when the gold standard was only measured on those who screened positive, Cheng et al. [22] and Pepe and Alonzo [23] have examined the potential overwhelming impact of the correlation between the two screening tests on the estimation of absolute test accuracy parameters. Both suggest using relative test accuracy for comparing disease screening tests. However, to our knowledge, no one has assessed the impact of the misspecification of conditional dependence structures, which can be specified by a homogeneous dependence parameter for two diagnostic tests, on the estimation of disease class probabilities in such screen-positive ascertained studies.

In this article, we empirically assess the impact of misspecification of the conditional dependence structure on the estimation of disease class probabilities through two case studies and simulations. Specifically, in Sections 2.1 and 2.2, we define the MLEs for the homogeneous dependent α and θ models, and in Section 2.3 we propose the homogeneous correlation coefficient conditional dependent ρ model. Bayesian approaches, which incorporate prior beliefs about dependence, are developed for the three models in Section 3 as an alternative to the maximum likelihood methods. Furthermore, we discuss Bayesian model averaging in Section 3.4 as an alternative way to address the challenging estimation problem since the three homogeneous dependent models can provide the same goodness-of-fit for the data but substantively different estimates [17] and the dependence structure itself is not “testable”. In Section 4, we compare the results for the two case studies using both the maximum likelihood methods and the Bayesian approaches. The two case studies were reanalyzed recently by Böhning and Patilea [1] using a capture-recapture approach under the α and θ model assumptions. Our focus here is to compare the estimates under the α, θ and ρ model assumptions using both the maximum likelihood methods and Bayesian approaches. A simulation study is conducted in Section 5 and a brief discussion is presented in Section 6.

2. Maximum Likelihood Estimators for Models with Homogeneous Dependence

In Section 2.1 and 2.2, we will briefly introduce the homogeneous conditional dependence α and θ models, recently proposed by Böhning and Patilea [1] using a capture-recapture approach. Using the alternative parameterization of the maximum likelihood as presented in equation (2), Chu and Nie [17] presented closed-form maximum likelihood solutions under the α and θ model assumptions.

2.1 Homogeneous conditional dependence: the α model

Under this model, the association of the two tests T1 and T2 conditional on the disease status D as measured by the odds ratio is assumed to be homogeneous over all disease categories, i.e., αd=P11dP00dP01dP10d (d = 1, 2,, K) is assumed to be homogenous. By Bayes’ theorem, we obtain αd=π11π00π01π10×π11dπ01dπ10dπ00d. With dπ00d=1 and simple algebra, we obtain the solution of α=π11π00π01π10×[dπ01dπ10dπ11d]1 under this homogeneity assumption. Thus, by plugging in the closed-form solutions of MLEs of π (i, j =0, 1) and πijd(i+j>0) from equation (2), the closed-form MLEs of α and Pdα are

α^=n×x11+x00+x01+x10+×(dx01dx10dx11d)1,P^dα=1n[x11d+x10d+x01d+x00x01dx10dx11d(dx01dx10dx11d)1],
(3)

where the superscript α indicates the homogeneous odds ratio assumption.

2.2 Homogeneous conditional dependence: the θ model

Under this model, ratio of conditional (conditional on test T2 being positive) and unconditional probabilities of test T1 being positive is assumed to be homogeneous over all disease categories, i.e., θd=P11dP1+d (d = 1, 2,, K) is assumed to be homogenous. By Bayes’ theorem, we obtain θd=π11π11d(π01π01d+π11π11d)(π10π10d+π11π11d)×Pd. With dPd=1 and simple algebra, we obtain the solution of θ=[d1π11π11d(π10π10d+π11π11d)(π01π01d+π11π11d)]1. Thus, the closed- form MLEs of θ and Pdθ are

θ^=n[d1x11d(x10d+x11d)(x01d+x11d)]1,P^dθ=1n[x1+dx+1dx11d(dx1+dx+1dx11d)1],
(4)

where the superscript θ indicates the homogeneous relative risk assumption.

2.3 Homogeneous conditional dependence: the ρ model

In this section, we propose an alternative homogeneous conditional dependence model, the ρ model. Under this model, the correlation of the two tests T1 and T2 is assumed to be homogeneous over all disease categories, i.e., ρd(d = 1, 2,, K) is assumed to be homogenous ρ. Let δd=ρdP1+dP+1d(1P1+d)(1P+1d) be the covariance between two tests in the dth disease group, then we have P11d=P1+dP+1d+δd,P10d=P1+d(1P1+d)δd,P01d=P+1d(1P1+d)δd, and P00d=(1P1+d)(1P+1d)+δd. The bounded range of correlations is determined by the marginal probability of testing positive P1+d and P+1d. Specifically, the correlation coefficients ρ satisfies

maxd{P+1dP1+d(1P+1d)(1P+1d),(1P+1d)(1P+1d)P+1dP1+d}ρmind{(1P+1d)P1+dP+1d(1P1+d),P+1d(1P1+d)(1P+1d)P1+d}.
(5)

Let the MLEs of ρ and Pd be denoted as [rho with circumflex] and P^dρ, where the superscript ρ indicates the homogeneous correlation coefficient assumption. They do not have a closed-form solution under this model.

All three models assume a homogeneous dependence structure. This is a rather strong assumption. However, because all three homogeneous dependent models are already saturated, heterogeneous dependent models are not identifiable without additional constraints on test accuracy parameters (e.g., assuming the test accuracy parameters are the same for the two diagnostic tests, which is a much stronger assumption in general). Furthermore, the three homogeneous dependent models can provide the same goodness-of-fit for the data but substantively different estimates [17], a natural way addressing this problem might be through the frequentist model average estimators [24]. Let wα, wθ and wρ with constraint of wα + wθ + wρ= 1 be the corresponding weights for the α, the θ, and the ρ models, the weighted model average estimator can be defined as Pdw=wαPdα+wθPdθ+wρPdρ. However, the MLEs P^dα,P^dθ and P^dρ are usually correlated since they are based on the same data. Due to the technical difficulty of computing the variance-covariance matrix between ( P^dα,P^dθ) and P^dρ for the computation of the standard error of P^dw, we do not consider the frequentist model average estimator in this article. We will consider the Bayesian model averaging counterpart in Section 3.4.

In practice, it is often of interest to test the difference between the estimated probabilities of disease states using different dependence assumptions (i.e., the α, θ or ρ model). Since the closed-form maximum likelihood solutions for the α and θ models are based on the likelihood function as presented in equation (2), a Wald-type test comparing p^dαp^dθ is directly available with the standard error se(p^dαp^dθ) obtained by the delta method. Due to the technical difficulty of computing the variance-covariance matrix between ( P^dα,P^dθ) and P^dρ, comparing the MLEs of ( P^dα,P^dθ) with P^dρ is not straightforward. In practice, bootstrapping methods can be used as an alternative way to compute the corresponding p-values and 95% confidence intervals [25].

We developed a SAS macro (SAS Institute, Cary, NC) to implement the models discussed above parameterized both in terms of Pd and Pijd as in equation (1) for the homogeneous ρ model, and in terms of πij and πijd as in equation (2) for the homogenous α and θ models. To describe disease class prevalence and to implement the constraints of 0 < Pd < 1 and dPd=1, we used the linear generalized logit model [26] which is widely applied in categorical data analysis. This model has an inverse link function defined as Pd=exp(βd)dexp(βd) (d = 1, 2, …, K) with βK = 0. We used the delta method to compute the standard error of functions of MLEs and their confidence intervals based on normal approximation. The two parameterizations (i.e., in terms of Pd, Pijd and in terms of πij, πijd) provide exactly the same results.

3. Bayesian Estimation for Models with Homogeneous Dependence

In this Section, we discuss the Bayesian approaches [27,28]. Because the Bayesian approach and the frequentist approach use different frameworks, they can be considered complementary. When relatively large studies are combined with weak prior distributions, inferences obtained by Bayesian and frequentist methods generally agree. However, the Bayesian framework is particularly attractive when suitable prior distributions can be constructed to incorporate known constraints and subject-matter knowledge on model parameters [29]. The Bayesian framework allows direct construction of 100(1−α)% equal tail and highest probability density (HPD) credible intervals of general functions of the estimated parameters without having to rely on asymptotic approximations. Furthermore, the Bayesian framework provides direct implementation of model averaging [30], which provides a natural way to address the problem of selecting a model from several competing models that give equal goodness-of-fit but potentially different inferences for a particular study.

3.1 Homogeneous conditional dependence: the α model

To implement the constrains of ijPijd=1 and αd=P11dP00dP01dP10d=α under the α model, we re-parameterize P11d,P01d,P10d and P00d as follows,

P11d=αexp(ad+bd)1+exp(ad)+exp(bd)+αexp(ad+bd),P10d=exp(ad)1+exp(ad)+exp(bd)+αexp(ad+bd)P01d=exp(bd)1+exp(ad)+exp(bd)+αexp(ad+bd),P00d=11+exp(ad)+exp(bd)+αexp(ad+bd).
(6)

Let f(α, ad, bd, Pd) be the prior joint distribution of (α, ad, bd, Pd) (d = 1, 2,, K), the joint posterior distribution given the observed frequencies is proportional to

(Pd)x11d+x10d+x01d(P11d)x11d(P10d)x10d(P01d)x01d(dPdP00d)x00f(α,ad,bd,Pd)=(Pd)x11d+x10d+x01dαx11dexp[ad(x11d+x10d)+bd(x11d+x01d)][1+exp(ad)+exp(bd)+αexp(ad+bd)]x11d+x10d+x01d×(dPd1+exp(ad)+exp(bd)+αexp(ad+bd))x00×f(α,ad,bd,Pd)×I(α>0).
(7)

3.2 Homogeneous conditional dependence: the θ model

To implement the constrains of ijPijd=1 and θd=P11d/P1+d=θ under the θ model, we re-parameterize P11d,P01d,P10d and P00d as follows,

P11d=θP1+dP+1d,P10d=P1+d(1θP+1d),P01d=P+1d(1θP1+d),P00d=1P1+dP+1d+θP1+dP+1d.

Let f(θ,P1+d,P+1d,Pd) be the prior joint distribution of (θ, P1+d,P+1d, Pd) (d = 1, 2,, K), the joint posterior distribution given the observed frequencies is proportional to

(Pd)x11d+x10d+x01d(P11d)x11d(P10d)x10d(P01d)x01d(dPdP00d)x00f(θ,P1+d,P+1d,Pd)=(Pd)x11d+x10d+x01d(P1+d)x11d+x10d(P+1d)x11d+x01dθx11d(1θP+1d)x10d(1θP1+d)x01d×[dPd(1P1+dP+1d+θP1+dP+1d)]x00×f(θ,P1+d,P+1d,Pd)×I(θ>0)×I(1θP+1d>0)×I(1θP1+d>0)×I(1P1+dP+1d+θP1+dP+1d>0).
(8)

The feasible range of θ is determined by the marginal probability of testing positive P1+d and P+1d and is implemented through the addition of the four indicator functions I(·) in equation (8).

3.3 Homogeneous conditional dependence: the ρ model

To implement the constrains of ijPijd=1 and ρd=ρ (d = 1, 2,, K), we re-parameterize P11d,P01d,P10d and P00d as in equation (6). Let f(ρ, P1+d,P+1d,Pd) be the prior joint distribution of (θ, P1+d,P+1d,Pd) (d = 1, 2,, K) and δd=ρdP1+dP+1d(1P1+d)(1P+1d) be the covariance, the joint posterior distribution given the observed frequencies is proportion to

(Pd)x11d+x10d+x01d(P11d)x11d(P10d)x10d(P01d)x01d(dPdP00d)x00+f(ρ,P1+d,P+1d,Pd)=(Pd)x11d+x10d+x01d(P1+dP+1d+δd)x11d(P1+dP1+dP+1dδd)x10d(P+1dP+1dP1+dδd)x01d×{dPd[(1P1+d)(1P+1d)+δd]}x00×f(ρ,P1+d,P+1d,Pd)×I(P1+dP+1d+δd>0)×I(P1+dP1+dP+1dδd>0)×I(P+1dP+1dP1+dδd>0)×I(1P1+dP+1d+P1+dP+1d+δd>0).
(9)

The feasible range of correlation determined by the marginal probability of test positive P1+d and P+1d as in equation (5) is implemented through the addition of the four indicator functions I(·) in equation (9).

3.4 Homogeneous conditional dependence: Bayesian model averaging (BMA)

The homogeneous dependence models are saturated. Therefore, they provide the same goodness-of-fit for the data, but can provide substantively different estimates. Bayesian model averaging (BMA) provides a natural way to address this problem [30]. The posterior distribution of the quantity of interest Pd given data is

pr(PdData)=k=1Kpr(PdMk,Data)pr(MkData),
(10)

where M1 …, MK are the models considered, and the posterior probability for model Mk is given by pr(MkData)=pr(DataMk)pr(Mk)i=1Kpr(DataMi)pr(Mi), where pr(Data|Mk)= ∫ pr(Data|[theta]k, Mk)pr([theta]k|Mk)d[theta]k is the integrated likelihood of model Mk, and [theta]k is the vector of parameters of model Mk, pr([theta]k|Mk) is the prior density of [theta]k under model Mk, pr(Data|[theta]k, Mk) is the likelihood, and pr(Mk) is the prior probability that Mk is the true model (assuming one of the models considered is true). In this paper, we assume equal prior probabilities for the α, θ and ρ models, i.e., pr(Mk)=1/3 for k=1,2,3.

In the Bayesian models discussed above, computation was done using Markov chain Monte Carlo (MCMC) [31] in WinBUGS [32] and BRUGs in R (http://www.r-project.org). Burn-in consisted of 50,000 iterations; 50,000 subsequent iterations were used for posterior summaries. Convergence of Markov chains was assessed using the Gelman and Rubin convergence statistic [33,34]. To describe disease class prevalence and to implement the constrain of 0 < P < 1 and dPd=1, we use the linear generalized logit model which has inverse link function defined as Pd=exp(βd)dexp(βd) (d = 1, 2,, K) with βK = 0 [26]. We selected proper but diffuse prior distributions for the hyperparameters [35]. Specifically, the hyper-priors for the parameters were assumed to be as follows: 1) Vague priors of N(0, 103) were assumed for βd s (d = 1, 2, …, K−1) in the generalized logit transformed probabilities of disease classes Pd s; 2) Uniform prior of [−1.0, 1.0] was assumed for correlation coefficient ρ; 3) Vague priors of N(0, 103) were assumed for α and θ on the log scale to ensure α > 0 and θ > 0; and 4) Vague priors of N(0, 103) were assumed for ad s and bd s in the α model to directly implement the homogeneous odds ratios assumption, and for P1+ds and P+1ds in the logit scale for the θ and ρ models.

4. Two Case Studies

For the purpose of comparing the performance of different models, we reanalyzed the data from two screening studies, in which the disease status has been evaluated only for those who tested positive for at least one of the two tests. The first study consists of data from the Health Insurance Plan Study for screening breast cancer in New York [36]. The study was carried out by the Health Insurance Plan, a prepaid comprehensive medical care plan with 750,000 subscribers enrolled in 31 medical groups. Periodic screening for breast cancer using mammography as well as clinical physical examination was performed for women aged 40 to 64 years who were chosen at random. In this study, 307 out of 20,211 women, who were test positive by either physical examination or mammography, underwent biopsy for the classification of two disease states: no cancer (d = 1) or cancer (d = 2). The second study is the multicenter study comparing cervicography with the standard pap smear cytology test for detecting cervical cancer between November 1991 and December 1992 [37]. In this study, 228 out of 5,192 women, who were test positive by either cervicography or the standard pap smear cytology test, underwent biopsy for the classification of three disease states: not present (d = 1), low grade (condyloma) (d = 2) and high grade (invasive cancer) (d = 3). Table 1 presents the observed frequencies in the two screening studies.

Table 1
Observed frequencies in two screening studies

Table 2 presents the estimates of the conditional dependence parameters (i.e., α, θ and ρ) when using both the maximum likelihood method and the Bayesian method. We use the triple of percentiles, 2.55097.5, to display a parameter estimate (or posterior median) with its 95% confidence (or credible) interval, as suggested by Louis and Zeger [38]. In summary, both approaches suggest statistically significant dependence when using all three models for the two studies. Tables 3 and and44 present the estimated probabilities of the disease classes under the three homogenous dependence models as well as under the independence model, when using the maximum likelihood method and the Bayesian method, respectively. The twice negative likelihood is presented in Table 3 for comparing the goodness-of-fit of the independent [perpendicular] model, and the homogeneous dependent α, θ, and ρ models, which demonstrate that the α, θ, and ρ models give exactly the same goodness-of-fit for both studies. In addition, the BMA estimates across the three conditional dependence models are presented in Table 4. In summary, the estimates were consistent between the maximum likelihood and Bayesian approaches except for the probabilities of not present and low grade cervical cancer in the multicenter study detecting cervical cancer using the ρ model, potentially due to the constrains implemented in the Markov chain Monte Carlo samplings. Specifically, the estimated probability of low grade cervical cancer is estimated to be 54255883 per 1000 women using the Bayesian approach, but only 0115308 per 1000 women using the maximum likelihood method.

Table 2
Summary of parameter estimates for conditional dependence using the maximum likelihood method and Bayesian model. The triple notation of LPU denotes the point estimate P with 95% Wald-type confidence limits (L, U).
Table 3
Summary of parameter estimates using the maximum likelihood method under the assumption of homogenous dependence. The triple notation of LPU denotes the point estimate P with 95% Wald-type confidence limits (L, U). The estimates of the probabilities of ...
Table 4
Summary of posterior estimates using the Bayesian approach under the assumption of homogenous dependence. The triple notation of LPU denotes the posterior median P with 95% equal tailed credible limits (L, U). The posterior estimates of the probabilities ...

As an interesting observation, we found that the difference between the estimated probabilities of disease states using different dependence assumptions (i.e., the α, θ or ρ model) can be statistically significant and practically meaningful. For example, in the Health Insurance Plan Study for breast cancer screening in New York, the estimated probability of having breast cancer using the maximum likelihood method is 34893 per one thousand women assuming the α model, while the estimate is 2875122 per one thousand women assuming the θ model, and 2711 per one thousand women assuming the ρ model. The difference between the estimated probabilities of having breast cancer assuming α and θ models is 142740 per one thousand women with a p-value less than 0.001 by a Wald-type test. The non-overlapping 95% confidence intervals between the estimated probabilities assuming the ρ model and the α (or θ) model suggests a statistically significant difference at least at the 5% significant level. In addition, using the maximum likelihood approach, the estimated probability of having invasive high grade cervical cancer is 286194 per one thousand in the multicenter study for detecting cervical cancer assuming the θ model, which is about eight times higher than the estimate of 5811 per one thousand women assuming the ρ model, and the 95% confidence intervals do not overlap. The Bayesian approaches gave similar inferences to the frequentist approaches. This substantial difference in estimated probability high grade cervical cancer can have an impact on cancer surveillance and prevention. Unfortunately, the data does not contain any information to differentiate those dependent models since they all give the same goodness-of-fit. Thus, without some sensible assumptions, the disease prevalence may not be estimable from the data set, even with Bayesian model averaging, particularly if proposed models in BMA do not contain the correct model (which is arguably true in practice given that an infinite large number of models exist and potentially many can give same goodness of fit).

5. Simulation Studies

To further study how the disease status probability estimates vary with the dependent model assumption and to evaluate the impact of misspecification of different dependent models on the estimation of the probabilities of disease classes, we performed four sets of simulations assuming the independent model, the α, θ, and ρ dependent models, respectively. For ease of presentation and interpretation, we considered two disease strata. The simulation parameters are: the probabilities of disease classes Pd = (0.8,0.2), the marginal conditional probabilities of test T1 being positive P1+d=(0.10,0.25), and the marginal conditional probabilities of test T2 being positive P+1d=(0.05,0.30). In the α and θ models, we used two values of α (or θ) = 1.25 and 3.0. In the ρ model, we used two values of ρ = 0.2 and 0.6. The sample sizes considered were n = 5000 and 25000. For each combination of α (or θ, or ρ) and n values, we generated 2,000 replicates. For each replicate, we computed the estimators under the independent model, the dependent α, θ, and ρ models, using both the maximum likelihood and Bayesian approaches. In addition, the BMA estimators across the three dependent models were computed. We used the true values of Pd, P1+d and P+1d, α = 1, θ =1, and ρ = 0 as the starting values in the maximum likelihood optimization procedures and the Bayesian Markov chain Monte Carlo sampling procedures.

Table 5 presents the means of the estimated disease prevalence across 2,000 replicates, using both the maximum likelihood and Bayesian approaches. For the Bayesian models, posterior medians were used as estimates for disease prevalence for a single replicate. If the true underlying model is the conditional independence model, fitting the α, θ and ρ dependence models will provide unbiased estimates for the disease prevalence. However, if the underlying model is one of the three dependence models, assuming independence provides biased estimates for disease prevalence. In addition, if the underlying model is a dependence model, assuming an incorrect dependence structure leads to biased estimates for disease prevalence. One interesting observation is that Bayesian model averaging (BMA) estimates tend to be less biased than the estimates under a misspecified dependence model. Furthermore when the underlying model is the α dependent model, the BMA estimates lead to nearly unbiased estimates. For all the scenarios, the maximum likelihood and Bayesian approaches provide similar estimates.

Table 5
The means of estimated disease prevalence (true value = 0.2) based on simulation studies with 2000 replicates. The bolded cells represent the correctly chosen model. For the Bayesian models, posterior medians were used as estimates for disease prevalence ...

Table 6 presents the average length of the 95% confidence/credible intervals or the precision of the disease prevalence estimates across 2,000 replicates when using the maximum likelihood and Bayesian approaches. We found that if the true underlying model is the conditional independence model, assuming the α and θ dependence models leads to intervals that are too wide. For example, the 95% confidence/credible interval length using the θ dependence model is about twice than that using the true independence model. This suggests a substantive efficiency loss when conservatively assuming the α and θ dependence models. However, if the ρ dependence model is assumed, the average interval lengths are only slightly inflated. On the other hand, if the underlying model structure is one of the three dependence models, assuming independence leads to intervals that are too narrow (and biased). In addition, if the underlying model is the θ model, incorrectly assuming the α and ρ dependence models also leads to underestimation of the interval length. Furthermore, the results are highly concordant between the maximum likelihood and Bayesian approaches. Note that the average interval lengths of the BMA estimates are generally larger than those under any dependence model alone, regardless of whether the dependence model is correctly or incorrectly specified. This is due to the fact that the BMA estimates incorporate the additional uncertainty from model specification.

Table 6
The 95% confidence/credible interval length of disease prevalence based on simulation studies with 2000 replicates. The bolded cells represent the correctly chosen model.

Table 7 presents the coverage performance of the 95% confidence/credible intervals of the disease prevalence across 2,000 replicates using both the maximum likelihood and Bayesian approaches. The coverage upon misspecification using dependent models is still around 95% if the true underlying model is the conditional independence model, possibly due to the negligible bias and wider confidence/credible intervals upon such misspecification, as suggested in Tables 5 and and6.6. However, if the underlying model structure is one of the three dependent models, the coverage upon misspecification decreases as the degree of dependence increases and as the sample size increases. In addition, the results suggest that if the underlying model is the ρ model, decent coverage tends to be difficult when the model is misspecified. In general, the Bayesian 95% credible intervals show slightly better coverage compared with the maximum likelihood 95% confidence intervals. More importantly, the coverage of the BMA intervals generally exceeds 90%, which has much better performance than the intervals from any single misspecified model. One reason for the better coverage using BMA is that such intervals are generally wider than those under a single model alone, and the true underlying model is included in the model averaging.

Table 7
The 95% confidence/credible interval coverage performance of disease prevalence based on simulation studies with 2000 replicates. The bolded cells represent the correctly chosen model.

6. Discussion

For screening studies where a categorical disease status is verified only if at least one out of the two binary screening tests being positive, we investigated three homogeneous dependence models (i.e., the α, θ, and ρ models) with two case studies and four sets of simulation studies, in which the ρ model is proposed in this paper. If the true underlying model is the conditional independence [perpendicular] model, assuming the α and θ dependence models leads to intervals that are too wide (i.e., the 95% confidence/credible interval length of the θ model can be as twice as that of the [perpendicular] model), while the ρ dependence model only slightly inflated the average interval lengths. By two real data analyses and simulation studies, we demonstrated that the three homogeneous dependence models can provide substantively different estimates for a study although with the same goodness-of-fit. We discussed both the frequentist and Bayesian approaches, and evaluated the impact of model misspecification on the estimation of disease class probabilities. Furthermore, we discussed Bayesian model averaging as an alternative way to partially address this particularly challenging estimation problem. Although we focused on the inference of disease class probabilities in this article, the same conclusion applies to the inference of the cell probabilities for two negative tests, i.e., π00d, and the unknown cell frequency x00d. We did not discuss the impact of misspecification of dependence structures on the estimation of test accuracies because it has been well studied from frequentist perspective [5,10,39]. It might be of interest to compare the performance of frequentist and Bayesian approaches on the estimation of test accuracy parameters under different settings such as low, moderate and high sensitivities and specificities.

The results imply that large differences in the estimated disease class probabilities may occur when assuming different dependence models, which can have a substantial impact on disease surveillance and prevention. Other more robust statistical methods, e.g. the generalized estimation equations [40,41], may be used to reduce the impact from misspecification of the dependence structure in this setting. We do not intend to suggest that these homogeneous dependence models are useless in practice because we cannot statistically differentiate between them based on the data alone. Caution against using these models due to the possible misspecification should be balanced with the need to estimate disease status probabilities. Furthermore, we realize that there are many more potential dependence structures than what we have considered, e.g., one could argue that the tests are dependent only for the cases but are independent for the controls [42]. Depending on the problem in hand, some assumptions may be justifiable and preferable. In addition, note that the indistinguishable characteristic of these models is based on goodness-of-fit statistics alone. We can always use additional information such as expert opinion, historic information on sensitivities and specificities of the two binary diagnostic tests, and/or the range of dependence parameters to assist our choice of selecting a homogeneous dependence model. For the Bayesian approach, the additional information can be formulated as informative priors to improve the posterior inference. However, how to solicit and formulate informative priors in this case deserves thorough investigation and is beyond our current scope.

A potential strategy to justify the homogeneous assumptions of the α, θ and ρ is to incorporate a design element into the screening study that allows the selection of homogeneity models. This strategy could be randomly selecting a subset of both test negatives for ascertainment by a gold standard. However, in cases when a gold standard test is invasive and/or expensive, it is generally considered unethical to apply it to subjects whose screening tests are negative. In this case, if historical data or additional sample from a set of confirmed cases and controls in a similar population is available for determining test accuracy parameters, one can use the data to guide the selection of homogeneity dependence models.

In cases when there is no scientific justification to prefer a particular dependence model over the others, we suggest to treat those dependence models (including the three homogeneous dependence models that we have considered) as sensitivity analyses, and investigate how the dependence structure will impact the estimation of probabilities of disease classes. If there is a clinically significant difference, caution should be taken with any statistical inference. As a last choice, if the dependence structure assumption cannot be reasonably determined, Bayesian model averaging (BMA) may be preferable to any single model, but there is a heavy price to pay for the BMA: 1) the computations become more complex and 2) the credible intervals get much larger in some cases (to reflect the added uncertainty).

Assuming that models used in the Bayesian averaging includes the correctly specified model, the simulation results show that BMA inference generally performs better than any misspecified model alone, especially with respect to the interval coverage performance. In practice, all candidate models can be misspecified and thus one can argue that Bayesian model averaging may not be effective in reducing bias. Intuitively, if some models tend to overestimate and the other models tend to underestimate the parameters of interest, then the Bayesian model averaging will be effective in reducing bias compared to a specific misspecified model. However, we realize that if all models tend to overestimate (or underestimate) the parameters of interest or if the estimates from the incorrect models are far away from the correct model estimates, then the Bayesian model averaging may not be effective in reducing bias. In addition, because the data do not contain information to distinguish between conditional dependence models, one should not expect that the posterior model probabilities to be accurately estimated in practice, casting some doubt on the utility of the BMA estimate in this case.

In this article we consider only homogeneous dependence models which are identifiable from are data setting. Some researchers [43,44] have argued that one can do better in using a non-identifiable model with some informative prior information compared to a less realistic model with strong assumptions that is identifiable. Further research on an expanded model, potentially with heterogeneous dependence structure, may shed more light on the impact of prior misspecification versus model misspecification and the trade-off between an expanded non-identifiable model with less model assumption but more prior assumption and an identifiable model with stronger model assumption but less prior assumption on the estimation of disease prevalence in the case that we discussed.

We assumed that a perfect gold standard (or definitive) test exists, which may limit the usage of the proposed methods, because arguably all diagnostic tests are imperfect and even those with theoretically perfect properties can be rendered imperfect by laboratory or human errors. It may be fruitful for further methodological research to incorporate measurement errors of the third stage gold standard test, e.g., by a sensitivity analysis [45] or multiple imputation [46]. However, this is beyond our present scope.

Another important potential bias in the estimation of disease prevalence is selection bias as to who participates in the screening program. We acknowledge that the estimates from our method can be biased if those who participate in the screening program are not representative of the target population whose prevalence is being estimated. If the information on who tend to participate in the screening program is available, further adjustment for the selection bias can be done by e.g., multiple imputation or inverse probability weighting (i.e., weighting each participant by the inverse of its estimated probability of participating the screening program).

Acknowledgments

Haitao Chu was supported in part by the Lineberger Cancer Center Core Grant CA16086 from the U.S. National Cancer Institute. The authors are grateful to the editor and two anonymous referees for their constructive comments and suggestions which have greatly improved this manuscript.

Reference List

1. Bohning D, Patilea V. A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only. Journal of the American Statistical Association. 2008;103:212–21. [PMC free article] [PubMed]
2. Vacek PM. The Effect of Conditional Dependence on the Evaluation of Diagnostic-Tests. Biometrics. 1985;41(4):959–68. [PubMed]
3. Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Statistics in Medicine. 1997;16(19):2157–75. [PubMed]
4. Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158–67. [PubMed]
5. Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427–35. [PubMed]
6. Qu YS, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996;52(3):797–810. [PubMed]
7. Qu YS, Hadgu A. A model for evaluating sensitivity and specificity for correlated diagnostic tests in efficacy studies with an imperfect reference test. Journal of the American Statistical Association. 1998;93(443):920–8.
8. Albert PS, McShane LM, Shih JH. Latent class modeling approaches for assessing diagnostic error without a gold standard: With applications to p53 immunohistochemical assays in bladder tumors. Biometrics. 2001;57(2):610–9. [PubMed]
9. Albert PS. Estimating diagnostic accuracy of multiple binary tests with an imperfect reference standard. Statistics in Medicine. 2009;28:780–97. [PMC free article] [PubMed]
10. Albert PS, Dodd LE. On Estimating Diagnostic Accuracy From Studies With Multiple Raters and Partial Gold Standard Evaluation. Journal of the American Statistical Association. 2008;103(481):61–73. [PMC free article] [PubMed]
11. Yang I, Becker MP. Latent variable modeling of diagnostic accuracy. Biometrics. 1997;53(3):948–58. [PubMed]
12. Espeland MA, Handelman SL. Using Latent Class Models to Characterize and Assess Relative Error in Discrete Measurements. Biometrics. 1989;45(2):587–99. [PubMed]
13. Chu H, Chen S, Louis TA. Random Effects Models in a Meta-Analysis of the Accuracy of Two Diagnostic Tests without a Gold Standard. Journal of the American Statistical Association. 2009;104:512–23. [PMC free article] [PubMed]
14. Black MA, Craig BA. Estimating disease prevalence in the absence of a gold standard. Statistics in Medicine. 2002;21(18):2653–69. [PubMed]
15. Hui SL, Walter SD. Estimating the Error Rates of Diagnostic-Tests. Biometrics. 1980;36(1):167–71. [PubMed]
16. Walter SD. Estimation of test sensitivity and specificity when disease confirmation is limited to positive results. Epidemiology. 1999;10(1):67–72. [PubMed]
17. Chu H, Nie L. A few remarks on “A capture-recapture approach for screening using two diagnostic tests with availability of disease status for the test positives only” by Böhning and Patilea. Journal of the American Statistical Association. 2008;103:1518–9. [PMC free article] [PubMed]
18. Satten GA, Kupper LL. Inferences About Exposure-Disease Associations Using Probability-Of-Exposure Information. Journal of the American Statistical Association. 1993;88(421):200–8.
19. Lyles RH. A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics. 2002;58(4):1034–6. [PubMed]
20. Pepe MS, Janes H. Insights into latent class analysis of diagnostic test performance. Biostat. 2007;8(2):474–84. [PubMed]
21. Ibrahim JG, Lipsitz SR, Chen MH. Missing Covariates in Generalized Linear Models When the Missing Data Mechanism Is Non-Ignorable. Journal of the Royal Statistical Society Series B (Statistical Methodology) 1999;61(1):173–90.
22. Cheng H, Macaluso M, Waterbor J. Estimation of relative and absolute test accuracy. Epidemiology. 1999;10(5):566–7. [PubMed]
23. Pepe MS, Alonzo TA. Comparing disease screening tests when true disease status is ascertained only for screen positives. Biostat. 2001;2(3):249–60. [PubMed]
24. Hjort NL, Claeskens G. Frequentist model average estimators. Journal of the American Statistical Association. 2003;98(464):879–99.
25. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC; 1993.
26. Agresti A. Categorical data analysis. 2. John Wiley & Sons, Inc; 2002.
27. Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. Chapman & Hall/CRC; 1995.
28. Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. 2. Chapman & Hall/CRC; 2000.
29. Davidian M, Giltinan DM. Nonlinear models for repeated measurement data: An overview and update. Journal of Agricultural Biological and Environmental Statistics. 2003;8(4):387–419.
30. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: A tutorial. Statistical Science. 1999;14(4):382–401.
31. Gelfand AE, Smith AFM. Sampling-Based Approaches to Calculating Marginal Densities. Journal of the American Statistical Association. 1990;85(410):398–409.
32. Spiegelhalter DJ, Best NG, Carlin BP, van der Linde A. Bayesian measures of model complexity and fit. Journal of Royal Statistical Society, Series B. 2002;63(4):583–639.
33. Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Science. 1992;138:182–95.
34. Brooks SP, Gelman A. Alternative methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics. 1998;7:434–55.
35. Natarajan R, McCulloch CE. Gibbs sampling with diffuse proper priors: A valid approach to data-driven inference? Journal of Computational and Graphical Statistics. 1998;7(3):267–77.
36. Strax P, Venet L, Shapiro S, Gross S. Mammography and Clinical Examination in Mass Screening for Cancer of Breast. Cancer. 1967;20(12):2184. [PubMed]
37. De Sutter P, Coibion M, Vosse M, Hertens D, Huet F, Wesling F, et al. A multicentre study comparing cervicography and cytology in the detection of cervical intraepithelial neoplasia. British Journal of Obstetrics and Gynaecology. 1998;105(6):613–20. [PubMed]
38. Louis TA, Zeger S. Effective Communication of Standard Errors and Confidence Intervals. Biostat. 2009 in press. [PMC free article] [PubMed]
39. Pepe MS. The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press; 2003.
40. Zeger SL, Liang KY. Longitudinal Data-Analysis for Discrete and Continuous Outcomes. Biometrics. 1986;42(1):121–30. [PubMed]
41. Liang KY, Zeger SL. Longitudinal Data-Analysis Using Generalized Linear-Models. Biometrika. 1986;73(1):13–22.
42. van der Merwe L, Maritz JS. Estimating the conditional false-positive rate for semi-latent data. Epidemiology. 2002;13(4):424–30. [PubMed]
43. Gustafson P. On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statistical Science. 2005;20(2):111–29.
44. Gustafson P. The utility of prior information and stratification for parameter estimation with two screening tests but no gold standard. Statistics in Medicine. 2005;24(8):1203–17. [PubMed]
45. Chu H, Wang Z, Cole SR, Greenland S. Sensitivity analysis of misclassification: a graphical and a Bayesian approach. Ann Epidemiol. 2006;16(11):834–41. [PubMed]
46. Cole SR, Chu H, Greenland S. Multiple-imputation for measurement-error correction. International Journal of Epidemiology. 2006;35(4):1074–81. [PubMed]

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...