• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Off Stat. Author manuscript; available in PMC Dec 26, 2012.
Published in final edited form as:
PMCID: PMC3530169
NIHMSID: NIHMS381255

Model Averaging Methods for Weight Trimming in Generalized Linear Regression Models

Abstract

In sample surveys where units have unequal probabilities of inclusion, associations between the inclusion probability and the statistic of interest can induce bias in unweighted estimates. This is true even in regression models, where the estimates of the population slope may be biased if the underlying mean model is misspecified or the sampling is nonignorable. Weights equal to the inverse of the probability of inclusion are often used to counteract this bias. Highly disproportional sample designs have highly variable weights; weight trimming reduces large weights to a maximum value, reducing variability but introducing bias. Most standard approaches are ad hoc in that they do not use the data to optimize bias-variance trade-offs. This article uses Bayesian model averaging to create “data driven” weight trimming estimators. We extend previous results for linear regression models (Elliott 2008) to generalized linear regression models, developing robust models that approximate fully-weighted estimators when bias correction is of greatest importance, and approximate unweighted estimators when variance reduction is critical.

Keywords: Sample survey, sampling weights, weight winsorization, Bayesian population inference, weight pooling, variable selection, fractional Bayes Factors

1. Introduction

Population-based samples with differential probabilities of inclusion typically use case weights equal to the inverse of the probability of inclusion to reduce bias in the estimators of population quantities of interest (Horvitz and Thompson 1952). By replacing unweighted sums in statistics with their weighted equivalents, bias can be removed from linear estimators and reduced in nonlinear estimators (Binder 1983).

This bias reduction typically comes at the cost of increased variance. This increase can overwhelm the reduction in bias, so that the mean squared error (MSE) actually increases under a weighted analysis. This is particularly likely if (a) the sample size is small, (b) the difference in the probability of inclusion is large, or (c) the association between the probability of inclusion and the data (which drives the bias) is weak. Consider a population generated from

Yi|Xi~BERNOULLI(pi) for pi=eA+BXi+CXi21+eA+BXi+CXi2

while the superpopulation model of interest is the conditional distribution of Yi given Xi modeled by

Yi|Xi~BERNOULLI(pi) for pi=eα+βXi1+eα+βXi

The superpopulation model is correctly specified when C = 0 and misspecified when C ≠ 0. We consider two sampling schemes: an ignorable sampling scheme that oversamples large values of Xi, and a nonignorable scheme that oversamples large values of Zi that are correlated with Yi. (In the regression setting, an ignorable sample scheme is one in which the inclusion indicator Ii is independent of Yi|Xi). We assume that the goal of the modeler is to describe the association between Y and X using the regression slope β from the superpopulation model. If the superpopulation model is correctly specified, the target quantity of interest could be either the superpopulation slope or the population slope defined by A, B such that

U(AB)=(00)

for

U(αβ)=(i=1N/αlog f(yi|α,β)=i=1N/βlog f(yi|α,β))=(i=1N(yieα+βXi1+eα+βXi)i=1NXi(yieα+βXi1+eα+βXi))

(the “corresponding descriptive population quantity” in Pfeffermann (1993)). If the superpopulation model is misspecified, then only the population slope makes sense as a target quantity. The unweighted OLS estimator and case-weighted WLS estimator of (A, B) are given by solving

U(α̂β̂)=(i=1N/αSilogf(yi|α,β)=i=1N/βSilogf(yi|α,β))=(00)

U(α̂wβ̂w)=(i=1N/αSiwilogf(yi|α,β)=i=1N/βSiwilogf(yi|α,β))=(00)

respectively, where Si is an indicator for inclusion in the sample, and wi = 1/πi.

Table 1 shows the results of an evaluation from 500 simulations for equivalent populations of N = 10, 000, under correctly specified and misspecified models and ignorable and nonignorable sample designs, for sample sizes of both n = 50 and n = 500. The target quantity of interest is a logistic regression slope linearly relating a covariate X to the log-odds that Y = 1 for a dichotomous outcome Y in a population of N = 10, 000; bias and mean squared error (MSE) are computed relative to this target quantity of interest. The correctly specified model has a linear association between log P(Y = 1)/(1 − P(Y = 1)) and X, while the misspecified model has a quadratic association. The probability of selection is a function of X only for the ignorable sampling design, and a function of both X and Z, where Z ~ N(Y, 1), for the nonignorable sample design.

(Corr(Y,Z)=Var(Y)Var(Y)+1

which equals .41 for the correctly specified population model and .43 for the misspecified population model.) We assume that both X and Z are available for the entire population, but Y is only observable for the sampled elements. When the sample design is nonignorable for the population slope, the weighted population slope estimator βw accounts for the underrepresentation of smaller values of Y when X is small, reducing the negative bias in the slope (though the bias remains in small sample settings, as Table 1 shows). Nonetheless, when the sample is small, the mean squared error of the weighted estimator is larger than that of the unweighted estimator for both the correctly specified and misspecified superpopulation models. When the sample design is ignorable for the population slope (probability of selection depends only on X), use of the weighted estimator provides protection against model misspecification, but can introduce so much variability into the estimator that the MSE is larger than for the unweighted estimator under all of the conditions considered.

Table 1
% Bias (MSE in parentheses) for population slope for population generated under Yi|Xi ~ BERNOULLI(pi) for pi=eA+BXi+CXi2/(1+eA+BXi+CXi2), i = 1, …, 10, 000, and superpopulation model is given by Yi|Xi ~ BERNOULLI(pi) for pi = eα+βX ...

This article develops an alternative approach to weight trimming that considers the case weights as stratifying variables within strata defined by the probability of inclusion. These “inclusion strata” may correspond to formal strata from a disproportional stratified sample design, or may be “pseudo-strata” based on collapsed or pooled weights derived from selection, poststratification, and/or nonresponse adjustments. Ordering these weight strata by the inverse of the probability of selection and collapsing together the largest valued strata mimics weight trimming by assuming that the underlying data from these combined strata are exchangeable (conditional on any covariates of interest). In a regression setting, this model can be posed as a variable selection problem, where dummy variables for the inclusion strata interact with the regression parameters; substracting from or adding to the inclusion strata design matrix allows for a greater or lesser degree of weight trimming. By averaging over all possible of these “weight pooling” models, we can compute an estimator of the population parameter of interest whose bias-variance trade-off is data-driven. By allowing for all contiguous inclusion strata to be considered for pooling, we induce a high degree of robustness into our model, protecting against the “overpooling” from which models that crudely mimicked weight trimming suffered (Elliott and Little 2000).

We embed this model in a Bayesian framework, as we believe it provides a natural setting for model averaging, as well as a proper framework for population inference. In particular, we consider an alternative Bayesian modeling approach that treats Y as a random variable and focuses on population quantities of interest Q(Y), such as population means Q(Y) = Y or population least-squares regression slopes Q(Y1,Y2)=minB0,B1i=1N(Yi1B0B1Yi2)2. Inference is made about Q(Y) by considering the marginal posterior predictive distribution (Ericson 1969; Holt and Smith 1979; Skinner et al. 1989; Little 1993):

p(Q(Y)|y)=f(Q(Y)|θ)p(θ|y)dθ=f(Q(Y)|θ)f(y|θ)p(θ)dθf(y|θ)p(θ)dθ
(1.1)

If the sampling indicator I is independent of Y, as is the case in the probability sampling design, then the sampling mechanism is said to be unconfounded or noninformative (Little 2004), and inference about Q(Y) can be made using p(Q(Y)|y) alone. However, in order to make the assumption of unconfoundedness in (1.1) reasonable, the sample design needs to be accounted for in both the likelihood and prior model structures. For more detail about Bayesian survey inference in the context of regression models, see Elliott (2008).

Section 2 briefly reviews standard weight trimming methods. Section 3 develops our weight pooling models for generalized linear regression models. Section 4 provides simulation results to consider the repeated sampling properties of the weight pooling estimators of logistic regression parameters in a disproportional stratified sample design and compares them with standard design-based estimators. Section 5 illustrates the use of the weight pooling estimators in an analysis of risk of injury to children in passenger truck crashes. Section 6 summarizes the results of the simulations and considers extensions to more complex sample designs.

2. Standard Weight Trimming Procedures

Standard weight trimming approaches pick a single cutpoint w0 at which all weights larger than this value are to be fixed, with the remaining weights usually adjusted upward by a constant so that the trimmed and untrimmed weighted sample sizes are equal. Typically w0 is chosen in an ad hoc manner – say 3 times or 6 times the mean weight – without regard to whether the chosen cutpoint is optimal with respect to mean squared error. Other design-based methods have been considered in the literature. Potter (1990) discusses systematic methods for choosing w0, including the weight distribution and MSE trimming procedures. The weight distribution technique assumes that the weights follow an inverted and scaled beta distribution; the parameters of the inverse-beta distribution are estimated by method-of-moment estimators, and weights from the upper tail of the distribution, say where 1 − F(wi) < .01, are trimmed to w0 such that 1 − F(w0) = .01. The MSE trimming procedure (Cox and McGrath 1981) determines the empirical MSE at a variety of trimming levels t = 1, …, T under the assumption that the true population mean is given by the fully weighted estimate: MŜEt = ([theta w/ hat]t[theta w/ hat]T)2 + VT), where t = 1 corresponds to the unweighted data and t = T to the fully-weighted data, and [theta w/ hat]t is the value of the statistic using the trimmed weights at level t. The trimming level is then given by the level l minimized MŜEt over t. More recently, the calibration literature has developed methods for adjusting design weights so that the adjusted weights equal known population totals under a variety of minimizing distance constraints between the unadjusted and adjusted weights, thus generalizing poststratification and raking procedures (Deville and Särndal 1992). Techniques have been developed that allow these adjustments to be bounded to prevent the construction of extreme weights (Deville and Särndal 1992; Folsom and Singh 2000), but these bounds involve the winsorizing of extreme weights to a fixed cutpoint value, with the choice of this cutpoint remaining arbitrary.

3. Weight Pooling Models

Weight trimming effectively pools units with large weights by assigning them a common, trimmed weight. Suppose the population can be divided into H weight strata by the set of ordered distinct values of the weights wh. Let nh be the number of included units and Nh the population size in weight stratum h, so that wh = Nh/nh for h = 1, …, H. We assume here that Nh is known, as when the weight strata come from a stratified or post-stratified random sample. The untrimmed (design-based) weighted mean estimator is then

w=hiwhyhihiwh=hNhNh

Weight trimming typically proceeds by establishing an a priori cutpoint, say 3 for the normalized weights, and multiplying the remaining weights by a normalizing constant

γ=(Nκiwo)(1κi)wi

where κi is an indicator variable for whether or not wiw0. The trimmed mean estimator is thus given by

wt=h=1l1γNhNh+h=1Hw0nhNh=γh=1l1NhNh+w0h=1HnhN(l)

where

γ=Nw0h=lHnhh=1l1Nh

and

(l)=(1h=lHnh)h=lHnhh

Choosing

w0=h=1HNhh=1Hnh

yields γ = 1 and

wt=h=1l1NhNh+h=lHNhN(l)

which corresponds to the estimate for a model that assumes distinct stratum means for the smaller weight strata and a common mean for the larger weight strata, that is:

yhi|μh~N(μh,σ2)h<l

yhi|μl~N(μl,σ2)hl

μh,μl,logσconst

Elliott and Little (2000) considered an extension of this model where we no longer assume the cutpoint l is known:

yhi|μh~N(μh,σ2)h<l

yhi|μl~N(μl,σ2)hl

p(L=l)=1/H

p(σ2|L=l)=σ(l+1/2)

p(β|σ2,L=l)=(2π)l

where μ1 = β0 + β1, …, μl = β0 + βl−1. This “weight pooling” model averages the estimators obtained from all possible weight trimming cutpoints, where each estimator contributes to the final average based on the probability that the cutpoint is “correct.” This posterior probability is determined via Bayesian variable selection models that determine the posterior probability of each cutpoint model conditional on the observed data. Elliott (2008) extended this model to consider the conditional distribution of y given covariates x (linear regression), and to allow for the pooling of all conterminous inclusion strata. The latter extension greatly increased the robustness of the model, preventing “overpooling” and increased MSE relative to the fully-weighted estimator due to bias. Elliott (2008) also found that use of fractional Bayes factor priors (O’Hagan 1995) could substantially increase the efficiency of the weight pooling models with little effect on robustness.

3.1. Weight Pooling Models for Generalized Linear Regression

Generalized linear regression models postulate a likelihood for yi of the form

f(yi;θi,ϕ)=exp [yiθib(θi)ai(ϕ)+c(yi,ϕ)]

where ai(ϕ) involves a known constant and a (nuisance) scale parameter ϕ, and the mean of yi is related to a linear combination of fixed covariates xi through a link function g(·): E(yii) = μi, where g(μi)=g(b(θi))=ηi=xiTβ (McCullagh and Nelder 1989, p. 30). We also have Var(yii) = ai(ϕ)Vi), where Vi) = b″ (θi). The link is canonical if θi = ηi, in which case g′(μi) = V−1i).

Indexing the inclusion stratum by h and allowing for the pooling of all conterminous inclusion strata, we have

g(E[yhi|βl,σ2,L=l])=ZliTβl

where Zli = Dhl[multiply sign in circle]xhi and where Dhl is a vector of dummy variables that pool the appropriate conterminous inclusion strata based on the lth pooling pattern.

Figure 1 shows the set of pooling patterns when H = 4. (Note that patterns 2 and 5 correspond to a crude weight trimming procedure that simply cuts the maximum weight to

h=lHNhh=lHnh.)
Fig. 1
The set of {Dhl} when four weight strata are present: all patterns of pooling coterminous strata

We assume priors of the form

βl|L=l~N(β,Σ0)

p(L=l)=2(H1)

Our population quantity of interest B is the slope that solves the population score equation UN(B) = 0 where

UN(β)=i=1Nβlogf(yi;β)=h=1Hi=1N(yig1(μi(β)))xiV(μi(β))g(μi(β))

The posterior predictive distribution of B is given by

p(B|y,X)=lp(B|y,X,θl)p(θl|y,X)dθl

for θl = (βl, ϕ, L = l). Simulations from p(B|y, X) can be obtained by first obtaining a draw from p(θl|y, X) and then computing

h=1HWhi=1nh(ŷhig1(μi()))xhiV(μhi())g(μhi())=0

where Wh = Nh/nh and ŷhi=g1(ZliTβl). Thus, in the example of logistic regression, where Vi) = μi(1 − μi) and g(μi)=μi1(1μi)1, a posterior draw of B can be computed by solving for Bj, j = 1, …, p

h=1HWhi=1nhxhijexp(xhijBj)1+exp(xhijBj)=h=1HWhi=1nhxhijexp(xhijβhj)1+exp(xhijβhj)

where βhj corresponds to the jth value of the βl parameter for the hth inclusion stratum as a function of the lth pooling pattern. Thus βhj = βj for all h when l = 1 (i.e., βj = Bj for the unweighted estimator); βhj = β1j for h = 1, βhj = β2j for h > 1 when l = 2; and so forth. This can be accomplished via simple root-finding numerical methods such as Newton’s Method.

A direct draw of p(θl|y, X) is not generally possible outside of the Gaussian setting. We approximate a direct draw by using a Laplace approximation to obtain a draw from p(L = l|y, X) and a Metropolis step to obtain a draw from p(βl|L = l, y, X); alternatively a Metropolis step (Gelman et al. 2004, pp. 289–290) may be used to obtain draws from p(L = l|βy, X) and a Markov Chain Monte Carlo algorithm implemented instead. See the Appendix for details.

3.2. Fractional Bayes Factors

In the absence of strong prior information to define P(θl), the Bayes Factors comparing weight pooling model l with weight pooling model l

BF(y,X)=p(L=l|y,X)p(L=l|y,X)=p(y|L=l,X)p(L=l)p(y|L=l,X)p(L=l)=p(y|βl,σ2L=l,X)dβldσ2p(L=l)p(y|βl,σ2L=l,X)dβldσ2p(L=l)

can be quite sensitive to the choice of p(θl) (Kass and Raftery 1995). We have a similar issue in our weight pooling model, since our marginal pooling probabilities are simply Bayes Factors converted from the odds to the probability scale. To counter this, we consider the “fractional Bayes factor” approach proposed in O’Hagan (1995). A fraction b of the sample is set aside so as to provide a data-based proper prior for θl. O’Hagan (1995) shows that the resulting Bayes factor for comparing model l with model l′ using the data-based prior, which he terms a fractional Bayes factor (FBF), is of the form BFb(y, X) = ql(f, y, X)/ql′ (f, y, X), where

ql(f,y,X)=p(θl)f(y|θl)dθlp(θl)f(y|θl)bdθl

Small values of b should be most efficient when choosing correct models, while larger values of b are protective against outliers (data generated under a model not in the classes considered). O’Hagan proposed n−1logn and n−1/2 as increasingly “robust” choices of b. O’Hagan assumes a noninformative prior h(θl) in contrast to our proper prior, but very weakly informative priors, such as we use in simulations and examples below, can be used as well. The Appendix provides details describing the use of FBF in the weight pooling application.

4. Simulation Results

Because we desire models that are simultaneously more efficient than design-based estimators yet reasonably robust to model misspecification – and in general we feel that even Bayesian models should have good frequentist properties – we evaluate our proposed models in a repeated sampling context. We consider two settings where weights can be utilized to reduce bias in generalized linear regression models: model misspecification and informative sampling.

4.1. Model Misspecification

We consider logistic regression under a correctly specified and then under an increasingly misspecified model. We generate population data as follows:

P(Yi=1|Xi)~BERNOULLI(expit(2.4*Xi+C*Xi2))

Xi~UNIFORM(0,10),i=1,,N=20,000

where expit(·) = exp(·)/(1 + exp(·)). The object of the analysis is to obtain an estimate of the logistic population regression slope, defined as the value B1 in the equation

iN(yiexpit(B0+B1xi))(1xi)=0

A disproportional sampling scheme is implemented as described in the linear regression simulations. We consider values of C = 0, .0158, .0273, .0368, .0454, corresponding to curvature measures of K = 0, .02, .04, .06 at the midpoint 5 of the support for X, where

K(X;C)=|2C[1+(2CX.75)2]3/2|

200 simulations are generated for each value of C. A noninformative, disproportionally stratified sampling scheme sampled elements as a function of Xi (Ii equals 1 if sampled and 0 otherwise):

Hi=Xi

P(Ii=1|Hi)=πh(1+Hi/2.5)Hi

A total of n = 1, 000 elements were sampled for each simulation (maximum normalized weight ≈ 7.5).

For priors, we consider a nearly noninformative prior of the form βl|L = l ~ N2(0, 225I), which assumes that the logistic regression parameters lie between −30 and + 30 with approximately 95% probability. We term the estimator of B1 obtained under this model PWT. We also consider the Fractional Bayes Factor data-based prior as well; PWTF1, which uses a training fraction of n−1/2, and PWTF2, which uses a larger training fraction of 0.1. O’Hagan suggests that PWTF1 will be more efficient when choosing the correct model when the true model is among the models considered, whereas PWTF2 will be more robust (have better repeated sampling properties when the true model is not among the models considered).

In addition to these two weight pooling models, we consider the standard designed-based (fully weighted) estimator (FWT), as well as trimmed weight (TWT) and unweighted (UNWT) estimators. The TWT estimator is obtained by replacing the weights whi with trimmed values whit that set the maximum normalized value to 3

whit=Nhith=1Hnhht

where w˜hit=min(whi,3N/n), and the UNWT estimator is obtained by fixing whi = N/n for all h,i. We estimate their variance using the Taylor Series (linearization) approximation (Binder 1983) that accounts for weighting and stratification.

Table 2 shows the relative bias, RMSE relative to the RMSE of the fully-weighted estimator, and true coverage of the nominal 95% CIs or PPIs associated with each of the six estimators of the population slope (B) for different values of curvature K, corresponding to increased degrees of misspecification.

Table 2
Relative bias (%), squared root of mean squared error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of the population logistic regression slope estimator under model misspecification

The undersampling of small values of X means that the maximum likelihood estimator of B in the model misspecification setting will be unbiased for K = 0 and biased downward for K = .02, .04, .06 unless the sample design is accounted for. The trimmed estimator’s bias is intermediate between the unweighted and fully weighted estimator. The weight pooling estimator with a noninformative prior, similar to the fully weighted estimator, showed little bias. The weight pooling FBF estimator with the smaller training fraction (PWTF1) had bias similar to the trimmed weight estimator, while the weight pooling FBF estimator with the larger training fraction (PWTF2) had bias similar to the fully weighted estimator.

The unweighted estimator had substantially improved MSE (40% reduction) when the linear slope model was approximately correctly specified, but was highly biased with a moderate to large degree of misspecification. The weight pooling estimator with a noninformative prior had MSE very similar to the fully weighted estimator. The trimmed weight estimator dominated the standard fully-weighted estimator over the range of simulations considered, with MSE savings of 10–35%. The weight pooling estimators with the fractional Bayes factors had MSE reductions of more than 40% when the linear slope model was approximately correctly specified, and both were robust against model misspecification.

The unweighted estimator had poor coverage except when the linear slope model was correctly specified, or nearly so. Because of the modest sample size, the fully weighted estimator had somewhat below nominal coverage both when the model was correctly specified and when the model was badly misspecified; the trimmed weight estimator had below nominal coverage only when the linear model was misspecified. The weight pooling estimator with noninformative prior had below nominal coverage regardless of model misspecification. The fractional Bayes factor estimators generally had correct to somewhat conservative coverage for all ranges of model specification.

4.2. Informative Sampling

For the informative sampling setting, we consider a correctly specified mean model, but allow the probability of selection to be related to a known covariate that is correlated with the outcome Y. The stronger the association, the more informative the sample design will be. We generated population data as follows:

P(Yi=1|Xi)~BERNOULLI(expit(2.4*Xi))

Xi~UNIFORM(0,10),i=1,,N=20,000

As in the previous simulations, the object of the analysis is to obtain the logistic population regression slope, defined as the value B1 in the equation

iN(yiexpit(B0+B1xi))(1xi)=0

An unequal probability of selection scheme is utilized:

(P(Ii|Yi,Xi)exp(.5Zi/(Xi/10+.25)1)),  where  Zi~N(Yi,10l)

where l = −0.5, −0.25, 0, 0.25, 0.5. The resulting asymptotic correlations between Y and Z are .664, .555, .447, .351, and .271. The maximum normalized weight ranges from approximately 7 for l = −.5 to 50 for l = 0.5. A total of n = 1, 000 elements were sampled for each simulation. The priors, including the fractional Bayes factors, are identical to those used in the model misspecification simulations. The unweighted, fully weighted, and trimmed weighted estimators are also considered along with the weight pooling estimators. The FBF estimators again used training fractions of n−1/2 for PWTF1 and .1 for PWTF2.

Table 3 shows the relative bias, RMSE relative to the RMSE of the fully-weighted estimator, and true coverage of the nominal 95% CIs or PPIs associated with each of the six estimators of the population slope. The results were overall similar to what was observed in the model misspecification setting. The unweighted estimator had substantial bias that was only relieved when sampling was more nearly ignorable (modest correlation between Y and Z). The trimmed weight and weight pooling estimators both had very modest degrees of bias, and had equal or improved MSEs as compared with the fully weighted estimator with little reduction in coverage. Both the trimmed weighted and the fraction Bayes factor weight pooling estimators had substantial reductions in MSE as compared with the fully-weighted estimator when the sampling was more nearly ignorable.

Table 3
Relative bias (%), squared root of mean squared error (RMSE) relative to RMSE of fully-weighted estimator, and true coverage of the 95% CI or PPI of population logistic regression slope estimator under informative sampling. Correlation between dichotomous ...

5. Application: Estimation of Injuries to Children in Compact Extended-Cab Pickup Trucks

The Partners for Child Passenger Safety dataset consists of the disproportionate, known-probability sample from all State Farm Insurance claims since December 1998 involving at least one child occupant ≤15 years of age riding in a model year 1990 or newer State Farm-insured vehicle (Durbin et al. 2001). Because injuries, and especially “consequential” injuries defined as facial lacerations or other injuries rated 2 or more on the Abbreviated Injury Scale (AIS) (Association for the Advancement of Automotive Medicine 1990), are relatively rare even among children in the population of crash-related vehicle damage claims, a disproportional probability-of-inclusion sample design is utilized. The weights for this dataset are quite variable: 1 ≤ wi ≤ 50, where 9% of the weights have normalized values larger than 3. Because the treatment stratification is imperfectly associated with risk of injury (more than 15% of the population with consequential injuries are estimated to be in the lowest probability-of-selection category and nearly 20% of those without consequential injuries are in the highest probability-of-selection category), the sampling design is informative, with unweighted odds ratios biased toward the null (Korn and Graubard 1995).

Winston et al. (2002) determined that children rear-seated in compacted extended-cab pickups are at greater risk of consequential injuries than children rear-seated in other vehicles. However, quantifying degree of excess risk, and thus the size of the public health problem, was somewhat problematic. The unweighted odds ratio (OR) of consequential injury for children riding in compacted extended-cab pickups versus other vehicles was 3.54 (95% CI 2.01,6.23), versus the fully weighted estimator of 11.32 (95% CI 2.67,48.02). Because both injury risk and compacted extended-cab pickup use were associated with child age, crash severity (passenger compartment intrusion and drivability), direction of impact, and vehicle weight, a multivariate logistic regression model that adjusted for these factors was also considered. The unweighted and fully weighted adjusted ORs for injury risk in rear seated children in compacted extended-cab pickups versus other vehicles are 3.50 (95% CI 1.88,6.53) and 14.56 (95% CI 3.45,61.40), respectively. Utilizing the unweighted estimator was problematic because of bias toward the null induced by the informative sample design; however, the fully weighted estimator appeared to be highly unstable. In Winston et al. (2002), injured children with high weights were deleted, yielding an estimated OR for injury risk of 3.87 (95% CI 1.42,10.57), close to the unweighted estimator.

Table 4 shows the unadjusted and the adjusted odds ratios of consequential injury risk using the unweighted, fully weighted and pooled weighted estimators of OR of injury for children in compact pickup trucks versus other vehicles. Pooled weighted estimators are obtained using both the noninformative and fractional Bayes factor priors. The noninformative pooled estimator generally falls between the unweighted and the fully weighted estimator, and reduces the unrealistically larger 95% upper limit of the OR from 50–60 down to 19–20. The FBF pooled estimator with fraction n−1/2 reduced unadjusted OR to 6.75 (95% CI 2.51,17.99), intermediate between the unweighted and fully weighted estimator, but provided an adjusted OR estimator similar to that of the unweighted estimator. The FBF pooled estimator with fraction 0.1 reduced unadjusted OR to 7.6 (95% CI 2.6,21.8) and the adjusted OR to 7.0 (95% CI 2.8,18.2).

Table 4
Estimated odds ratio of injury for children rear-seated in compacted extended-cab pickups (n = 60) versus rear-seated in other vehicles (n = 8,060), using unweighted (UNWT), fully weighted (FWT), and pooled weight (PWFT2) estimators; unadjusted and adjusted ...

An additional two years of data, which included an additional 4,091 rear-seated children in passenger vehicles (44 in compact extended-cab pickup trucks), provided a fully weighted unadjusted odds ratio for injury for children in compact extended-cab pickups of 6.3, and an adjusted OR of 7.0 – very close to the FBF pooled estimator results with fraction 0.1.

6. Discussion

The model discussed in this article generalizes the work of Elliott and Little (2000) and Elliott (2008), where under the former population inference was restricted to population means using a weight pooling model that mimicked weight trimming, and under the latter where population inference was restricted to the estimation of population slopes under the Gaussian error model. As in Elliott (2008), we consider a model that permits the pooling of all conterminous inclusion strata, as well as utilizing the data-based “fractional Bayes Factors” of O’Hagan (1995). Here we extend the weight pooling method to consider population regression slopes under the generalized linear model, allowing for regression models for binary, count, or other outcomes with nonnormal error terms. We obtained robust estimators that can still gain considerable efficiencies as compared with standard fully weighted estimators. The crude trimming estimators also performed well among the simulations considered, although previous work (Elliott and Little 2000; Elliott 2007) shows that these estimators are generally not robust.

We also applied the methods developed in this article to the Partners for Child Passenger Safety data to determine the excess risk of injury in a crash to rear-seated children in compacted extended-cab pickups relative to rear-seated children in other passenger vehicles. It appears that the decision in Winston et al. (2002) to eliminate a low probability-of-selection child from the analysis to stabilize the estimates led to an underestimation of the effect of exposure. The pooled estimates suggested adjusted risks of 7.2, versus the 14.6 obtained from the fully weighted sample – very close to the estimate of 7.0 that was obtained using an additional two years of data.

More generally, the methods discussed in this article show the promise of adapting model-based methods to attack problems in survey data analysis. However, because these models rely on stratifying the data by probability of selection as a prelude to using pooling or shrinkage techniques to induce data-driven weight trimming, there is a natural correspondence between this methodology and (post)stratified sample designs in which strata correspond to disproportional probabilities of inclusion. Developing methods that accommodate a more general class of complex sample designs that include single- or multi-stage cluster samples and/or strata that “cross” the weight strata remains an area for future work.

Acknowledgments

This research was supported by National Institute of Heart, Lung, and Blood grant R01-HL-068987-01. The author acknowledges Jack Chen for his assistance with programming. The author also thanks Drs. Dennis Durbin and Flaura Winston of the Partners for Child Passenger Safety project for their assistance, as well as State Farm Insurance Companies for their support of the Partners for Child Passenger Safety project. The author thanks the Associate Editor and three anonymous reviewers for their assistance in improving the quality of the article.

7. Appendix: Obtaining the Posterior Distribution of Parameters from Weight Pooling Models

To simplify notation, we assume that the generalized linear model of interest does not contain an unknown scaling parameter ϕ (i.e., logistic or Poisson regression models where ϕ = 1 or is fixed at an overdispersion value treated as known). Straightforward extensions of all algorithmic steps can accommodate ϕ.

7.1. Simulations from the Generalized Weight Pooling Model Using Direct Draws

Draws from p(βl, L = l|y, X) = p(βl|L = l, y, X)p(L = l|y, X) can be made by drawing first from p(L = l|y, X) using a Laplace approximation (Tierney and Kadane 1986) to obtain f (y|L = l, X) and then a Metropolis step for p(βl|L = l, y, X).

Note that

p(L=l|y,X)=f(y|L=l,X)lf(y|L=l,X)
(7.1)

where

f(y|X,L=l)=f(y|X,βl,L=l)p(βl|L=l)dβlf(y|X,βl,L=l)dβl(2π)(pH*)/2|Σ̂β̂l|1/2f(y|X,β̂l,L=l)

where [beta]l is the MLE of a GLM regressing y on Zl, where Zl consists of the stacked row vectors of ZliT, and [Sigma][beta]l is the associated covariance matrix estimate for [beta]l given by the inverse of the expected information matrix. The first approximation follows from assuming a noninformative or nearly noninformative prior on βl|L = l, and the second from the Laplace approximation to the true marginal distribution of y.

Draws from p(βl|L = l, y, X) are made by running a Metropolis algorithm (Metropolis et al. 1953; Gelman et al. 2004, pp. 289–290). Briefly, a Metropolis step is a single draw from the Metropolis algorithm, which generates a draw from a general distribution p(θ|y) by sampling a proposed θ* from a “jumping” distribution Jt*(t−1)), where Jt is symmetric (Jtba) = Jtab)). The proposal draw is then accepted as a draw of θt with probability min(1, p*|y)/p(t−1)|y)); otherwise θ (t) = θ(t−1). In this setting N(β(t−1), k [Sigma][beta]l) jumping distribution, where k is a tuning factor designed to obtain an acceptance rate of 20–30%. The algorithm starts at βl(0)=β^l, and a proposal draw βlprop=β^l+e,e~N(0,kΣ^β^l) is made; βl(1)=(1u)βl(0)+uβlprop, where u is a Bernoulli random variable with probability

min (1,f(y|X,{βlprop,L=l)}f(y|X,{βlprop,L=l)})

The algorithm proceeds until a sufficient number of draws T have been made to approximate the posterior distribution. In general k = .1 and T = 200 provided reasonable acceptance rates and sufficient coverage of the posterior interval.

7.1.1. Fractional Bayes Factors

When using the FBF prior, we replace f (y|L = l, X) in (7.1) with

f*(y|L=l,X)=f(y|X,βl,L=l)p(βl|L=l)dβlf(y|X,βl,L=l)bp(βl|L=l)dβl

for 0 < b < 1. Under a nearly noninformative prior, we have, using the Laplace approximation,

f(y|X,βl,L=l)bdβl(2π)(pH*)/2|b1Σ̂β̂l|1/2f(y|X,β̂l,L=l)b

so that

f*(y|L=l,X)b(pH*)/2f(y|X,β̂l,L=l)(1b)

7.2. Simulations from the Generalized Linear Weight Pooling Model Using an MCMC Algorithm

Draws from the posterior distribution of (βl, L = l) are obtained via the product space series method of Carlin and Chib (1995). This approach assumes that y is independent of {βkl} given that L = l. Assuming also that {βl} are independent for l = 1, …, L, we have that

p(y|X,L=l)=f(y|X,β,L=l)p(β|L=l)dβ=f(y|X,βl,L=l)p(βl|L=l)dβl

The form given to the “pseudoprior” p(βkl|L = l) is irrelevant, as it is chosen only to completely define the joint model specification:

p(y,β,L=l|X)=f(y|X,βl,L=l)j=12H1{p(βj|L=j)}P(L=l)

We can then develop a Gibbs sampler that draws from p(βl|L = l, βkl y, X) and then from p(L = l|β, y, X).

With the model fixed at L = l, we obtain a draw of

p(βl|L=l,βkl,y,X)=p(βl|L=l,y,X)

using the Metropolis step described in 7.1.

The full conditional p(L|β, y, X) is given by

p(L=l|β,y,X)=f(y|X,βl,L=l)j=12H1{p(βj|L=j)}P(L=l)j=12H1f(y|X,βj,L=j)i=12H1{p(βi|L=j)}P(L=j)

Because computing j=12H1{p(βj|L=j)} is prohibitive except when H is small, we instead used a Metropolis step suggested by Dellaportas, Forster, and Ntzoufras (2002) to obtain a draw from L|β, y, X.

  1. Propose new model l′ with probability h(l,l′).
  2. Generate βl from the pseudoprior pl |Ll′).
  3. Accept the new model l′ with probability
    min{1,f(y|X,βl,L=l)p(βl|L=l)p(βl|L=l)P(L=l)h(l,l)f(y|X,βl,L=l)p(βl|L=l)p(βl|L=l)P(L=l)h(l,l)}

Carlin and Chib (1995) note that poor choices for pseudo priors p(βkl|L = l) can yield slow convergence, and suggest matching them as closely as possible to the true model-specific posteriors. Because of the large number of models to be considered, we simply set the pseudo prior to be multivariate normal with mean [beta]k given by the MLE of a GLM regressing y on Zl, and covariance Σ[beta]l given by the inverse of the expected information matrix. Jumping probabilities to the l′ models that exclude L were always given by the uniform dicrete distribution with probability (2H−1 − 1)−1.

References

  • Association for the Advancement of Automotive Medicine. Association for the Advancement of Automotive Medicine. Des Plaines, IL, U.S.A.: 1990. The Abbreviated Injury Scale, 1990 Revision.
  • Binder DA. On the Variances of Asymptotically Normal Estimators from Complex Surveys. International Statistical Review. 1983;51:279–292.
  • Carlin BP, Chib S. Bayesian Model Choice via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society, Series B. 1995;57:473–484.
  • Cox BG, McGrath DS. An Examination of the Effect of Sample Weight Truncation on the Mean Square Error of Survey Estimates. Paper Presented at the 1981 Biometric Society ENAR Meeting; Richmond, VA, U.S.A.. 1981.
  • Dellaportas P, Forster J, Ntzoufras I. On Bayesian Model and Variable Selection using MCMC. Statistics and Computing. 2002;12:27–36.
  • Deville JC, Särndal C-E. Calibration Estimators in Survey Sampling. Journal of the American Statistical Association. 1992;87:376–382.
  • Durbin DR, Bhatia E, Holmes JH, Shaw KN, Werner JV, Sorenson W, Winston FK. Partners for Child Passenger Safety: A Unique Child-Specific Crash Surveillance System. Accident Analysis and Prevention. 2001;33:407–412. [PubMed]
  • Elliott MR. Bayesian Weight Trimming for Generalized Linear Regression Models. Survey Methodology. 2007;33:23–34.
  • Elliott MR. Model Averaging Methods for Weight Trimming. Journal of Official Statistics. 2008;24:517–540. [PMC free article] [PubMed]
  • Elliott MR, Little RJA. Model-based Approaches to Weight Trimming. Journal of Official Statistics. 2000;16:191–210.
  • Ericson WA. Subjective Bayesian Modeling in Sampling Finite Populations. Journal of the Royal Statistical Society, Series B. 1969;31:195–234.
  • Folsom RE, Singh AC. The Generalized Exponential Model for Sampling Weight Calibration for Extreme Values, Nonresponse, and Poststratification. Proceedings of the American Statistical Association, Survey Research Methods Section. 2000:598–603.
  • Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. 2nd Edition. Boca Raton, FL: Chapman and Hall/CRC; 2004.
  • Holt D, Smith TMF. Poststratification. Journal of the Royal Statistical Society, Series A. 1979;142:33–46.
  • Horvitz DG, Thompson DJ. A Generalization of Sampling Without Replacement from a Finite Universe. Journal of the American Statistical Association. 1952;47:663–685.
  • Kass RE, Raftery AE. Bayes Factors. Journal of the American Statistical Association. 1995;90:773–795.
  • Korn EL, Graubard BI. Analysis of Large Health Surveys: Accounting for the Sampling Design. Journal of the Royal Statistical Society, Series A. 1995;158:263–295.
  • Little RJA. Poststratification: A Modeler’s Perspective. Journal of the American Statistical Association. 1993;88:1001–1012.
  • Little RJA. To Model or Not Model? Competing Modes of Inference for Finite Population Sampling. Journal of the American Statistical Association. 2004;99:546–556.
  • McCullagh P, Nelder JA. Generalized Linear Models. 2nd Edition. London: Chapman and Hall; 1989.
  • Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. Equations of State Calculations by Fast Computing Machines. Journal of Chemical Physics. 1953;21:1087–1092.
  • O’Hagan A. Fraction Bayes Factors for Model Comparison. Journal of the Royal Statistical Society, Series B. 1995;57:99–138.
  • Potter F. A Study of Procedures to Identify and Trim Extreme Sample Weights. Proceedings of the American Statistical Association, Survey Research Methods Section. 1990:225–230.
  • Pfeffermann D. The Role of Sampling Weights When Modeling Survey Data. International Statistical Review. 1993;61:317–337.
  • Skinner CJ, Holt D, Smith TMF. Analysis of Complex Surveys. New York: Wiley; 1989.
  • Tierney L, Kadane J. Accurate Approximations for Posterior Moments and Marginal Densities. Journal of the American Statistical Association. 1986;81:82–86.
  • Winston FK, Kallan MK, Elliott MR, Menon RA, Durbin DR. Risk of Injury to Child Passengers in Compact Extended Pick-up Trucks. Journal of the American Medical Association. 2002;287:1147–1152. [PubMed]
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...