Bidirectional case-crossover studies of air pollution: bias from skewed and incomplete waves.

The case-crossover design compares exposures during the period of time of failure with one or more periods when failure did not occur and evaluates the potential excess risk using conditional logistic regression. In this simulation study, we applied several control sampling approaches to control for confounding by various temporal patterns of an exposure variable and evaluated the usefulness of symmetric bidirectional control strategies. We simulated true relative risks (RRs; true ss = 0.001) of deaths of 1.051 per 50-ppb increase of sulfur dioxide and included confounding by right- or left-skewed seasonal waves, linear long-term time trends, or a combination of both. The range of the estimated RRs from symmetric bidirectional control sampling approaches was 1.044 approximately 1. 056 at either a long-term trend or any skewed seasonal wave of SO(2) levels, which indicated the bidirectional control sampling methods would successfully control confounding by design. The simulations with bidirectional sampling, however, show that biases may occur if waves are incomplete (20-43% underestimated RRs). In conclusion, our simulations show that the symmetric bidirectional case-crossover design can substantially control for confounding by linear long-term trends and/or seasonality of an exposure variable by design as well. However, unidirectional control sampling would fail to control confounding by those variations of air pollution. Simulation results also show that even the bidirectional case-crossover design can be biased in a situation where the exposure variable shows incomplete cyclic waves, and therefore it cannot completely control for temporal confounding.

Two recently published case-crossover studies on air pollution epidemiology by Neas et al. (1) and Lee and Schwartz (2) present an attractive alternative approach to analyzing the mortality effects of short-term exposure to ambient air pollution. Instead of using timeseries analysis, the case-crossover study design developed by Maclure (3) was applied to evaluate the association between daily death counts and air pollution. This approach compares exposures during the period of time of death (case period) with one or more periods when death did not occur (control periods) and evaluates the potential excess risk using conditional logistic regression.
The study design requires no additional control subjects to be sampled and can control individual susceptibility by making comparisons within a subject. Therefore, this design can control for all measured and unmeasured time-invariant potential confounders. As Pope (4) described, this approach controls for potential confounding variables such as day of the week, seasonality, long-term time trends, and changes in population size and composition by design rather than by statistical modeling.
Various case-crossover designs have been proposed, involving different strategies for selecting control information. In a casecrossover study, there are two main reasons for applying different control sampling schemes. One is to improve the relative efficiency described by Mittleman et al. (5), who showed that the empirical relative efficiency increased as the number of control periods sampled increased. The other reason is to control for exposures with temporal patterns such as long-term time trends and seasonal waves in exposure, outcome, or both (1,2,6,7). The latter problem is related to the validity of the case-crossover estimator. If the estimator were not valid, then the estimation of efficiency would not be meaningful. Navidi (6) demonstrated that the existence of systematic seasonal patterns in exposure could introduce bias into the estimates if air pollution exposure is the only source of the seasonal pattern in the outcome. He proposed a bidirectional control sampling approach, with controls randomly selected from all days before or after the event day, which eliminated the bias. In the presence of seasonal patterns in outcome, Lee and Schwartz (2) and Bateson and Schwartz (7) showed unidirectional control approaches may yield confounded risk estimates depending on the seasonal pattern or long-term time trends of air pollution levels. Bateson and Schwartz (7) also showed that control periods needed to be close to event periods to avoid this confounding.
We begin by clearly defining "long-term time trends" or "seasonal waves." "Long-term time trends" of exposure indicate whether the levels of exposure are tending to decrease (or increase) with calendar time. In most developed countries, the annual average levels of air pollution have decreased since air quality standards have become stricter.
The results from case-crossover studies can be quite sensitive to the selection of control periods, especially when clear time trends or cyclic waves such as seasonal waves exist. Navidi (6) illustrated how time trends in exposure can cause a bias in case-crossover studies with hypothetical examples. Lee and Schwartz (2) also showed empirically that the validity of estimates in case-crossover studies varied greatly depending on the strategy used in control sampling. Several studies suggested choosing symmetric control periods both before and after the date of death (1,2,6,7). This symmetric control sampling approach can be used under circumstances where subsequent levels of exposure are not influenced by the failure/event of interest, which is quite true in studies of air pollution and daily mortality. This new design approach was also examined thoroughly to determine whether the case-crossover design was able to control for temporal confounding patterns by design rather than through modeling (7). Symmetric bidirectional control sampling approaches with short intervals between event and control could control for temporal confounding by design (7).
The case-crossover design compares exposures during the period of time of failure with one or more periods when failure did not occur and evaluates the potential excess risk using conditional logistic regression. In this simulation study, we applied several control sampling approaches to control for confounding by various temporal patterns of an exposure variable and evaluated the usefulness of symmetric bidirectional control strategies. We simulated true relative risks (RRs; true β = 0.001) of deaths of 1.051 per 50-ppb increase of sulfur dioxide and included confounding by right-or left-skewed seasonal waves, linear long-term time trends, or a combination of both. The range of the estimated RRs from symmetric bidirectional control sampling approaches was 1.044~1.056 at either a long-term trend or any skewed seasonal wave of SO 2 levels, which indicated the bidirectional control sampling methods would successfully control confounding by design. The simulations with bidirectional sampling, however, show that biases may occur if waves are incomplete (20-43% underestimated RRs). In conclusion, our simulations show that the symmetric bidirectional case-crossover design can substantially control for confounding by linear long-term trends and/or seasonality of an exposure variable by design as well. However, unidirectional control sampling would fail to control confounding by those variations of air pollution. Simulation results also show that even the bidirectional case-crossover design can be biased in a situation where the exposure variable shows incomplete cyclic waves, and therefore it cannot completely control for temporal confounding.

Articles
In the present study, we applied several control-sampling schemes to control for time trends and seasonal waves in exposure of interest (ambient air levels of sulfur dioxide). As shown in Figure 1, air pollution levels show strong seasonal waves as well as long-term time trends. To compare the methods for different control sampling approaches, we performed a simulation study. We examined two scenarios not examined by Bateson and Schwartz (7). In one scenario the seasonal pattern in SO 2 exposure was asymmetric; in the other scenario, data were accrued over an interval not equal to an integer number of cycles.

Materials and Methods
Our simulation study used death counts collected in Seoul, Korea. The National Statistics Office of Korea supplied the number of deaths occurring between 1 October 1991 and 30 September 1993, according to the day on which death occurred. In our previous study, SO 2 levels were significantly associated with increased all-cause mortality and showed a clear fluctuation by calendar time (2,8). The annual levels of SO 2 had decreased (long-term time trends). The seasonal pattern was also shown for SO 2 (whereas the total suspended particulates and O 3 seasonal waves were more scattered). It seemed clear that the periods of increasing SO 2 were short (steep increase in the wave), whereas the subsequent declines were gradual, over longer time periods.
Based on this pattern, we created several scenarios depending on the pattern of SO 2 levels over time. The fluctuation in levels could show long-term time trends, seasonal waves, or a combination of the two patterns. If there were cyclic waves (seasonal waves), the periods of increasing SO 2 were short (long), whereas the downward wave pattern held over longer (shorter) time periods. In some cases, it may have been true that those periods of increasing exposure levels were equal to decreasing levels. We called the shorter periods of increasing exposure levels "left heavier" or "right skewed" (Figure 1). In contrast, "right heavier" or "left skewed" was used to refer to shorter periods of decreasing exposure levels ( Figure 1). Levels of exposure, on average, may have also been continuously decreasing/increasing, and we referred to this situation as long-term time trends. Of course, there could also be no long-term time trends in exposure levels. All these situations included nine scenarios, depending on the pattern of exposure levels. They can be seen in Figure  1, except for the case of no trends at all.
To compare the methods described earlier in this paper, we performed a simulation study. First, we generated sets of SO 2 data from 1 October 1991 to 30 September 1993. Data were artificially created so that they had the nine patterns described earlier. The details of a simulation procedure are given in Appendix 1. The error terms with variance proportional to the SO 2 values were added to the SO 2 values so that the data looked naturally distributed. Figure 1 shows the generated daily SO 2 values for nine patterns. After the SO 2 value was simulated, the death count was simulated from a Poisson distribution using SAS RANPOI function (SAS Institute, Cary, NC) with a mean parameter λ = 82 × exp(0.001 × SO 2 ), where 82 was the approximate mean death count in Seoul. If we assume there were 10,000,000 people at risk in Seoul, the baseline hazard was 82/10,000,000 = 8.2 × 10 -6 for all people in Seoul, and the true value of β was assumed to be 0.001. This corresponds to the true relative risk of approximately 1.051 per 50 ppb, which is in close agreement with the relative risk of mortality due to SO 2 as estimated by Lee and Schwartz (2). For each data set of death counts, the estimated value, β i , was calculated by the conditional logistic regression method using SAS PROC PHREG (9), and this process was repeated 1,000 times. Then the following were calculated: a) the mean of estimated values (β i , i = 1, 2, …, 1,000), b) the sample standard deviation calculated from the estimated values (β i, i = 1, 2, …, 1,000), and c) the root mean-squared error (RMSE) of β, which is the square root of the sample mean of squared errors [(β i -β) 2 , i = 1,2,…,1000].    The mean-squared error (MSE) of β is defined by Equation 1.
where Bias = E[(β) -β] 2 and As we can see in Equation 1, the MSE is the sum of the squared bias and the squared standard error (variance) of β. For illustrational purpose, we use RMSE rather than MSE because RMSE has the same dimension of bias and standard error. In Tables 1-3, each row represents the sampling design results retrospective with one control, retrospective with two controls, bidirectional with two controls, bidirectional with four controls, prospective with one control, and prospective with two controls, respectively. In our study, controls were selected 1 week and 2 weeks before and/or after the death date for the corresponding sampling designs in order to be limited to a similar part of the same season with the case.

Results
We can gain some insight into the nature of the bias for unidirectional sampling approaches by examining the time trends in simulated data. Under one scenario, air pollution levels were created to have time trends only, without any cyclic wave such as seasonal pattern. The results can be seen in Table 1. The first three columns of Table 1 present, for each estimator, summaries from the 1,000 replications across the simulated data sets in which the exposure levels decrease with calendar time. The next three columns present the same information in which the exposure levels increase with calendar time. By the nature of simulated data sets, on average, the days when death occurred had lower air pollution levels than their corresponding unidirectional retrospective control days in the first situation and vice versa in the second situation. The results of unidirectional control are biased in the presence of increasing or decreasing time trends in exposure, similar to the results of Navidi (6) and Bateson and Schwartz (7). The bias is greater with 2 control days. In this example, we can conclude that bidirectional sampling approaches are nearly unbiased and their RMSEs, combined measure of bias and efficiency, are smaller, while others are considerably biased and RMSEs are larger if time trends exist.
We also simulated the data sets to have seasonal waves. Interpretation of this analysis is more complicated than the analysis of data sets for time trends only. However, it should be mentioned that we obtained the least biased estimates with smallest RMSE from bidirectional sampling approaches. There were two scenarios where data sets included seasonal waves. One included both seasonal waves and long-term time trends in exposure.

Articles • Bidirectional case-crossover studies of air pollution
Environmental Health Perspectives • VOLUME 108 | NUMBER 12 | December 2000   The other included only seasonal waves in exposure, without linear time trends. The former is more realistic in air pollution studies. Air pollution levels in most developed or developing countries have exhibited longterm time trends, as well as seasonal patterns. In most countries, the ambient levels of air pollutants have been steadily decreasing since ambient air quality standards were strengthened. It is also a well-known phenomenon that certain kinds of air pollution show cyclic fluctuation by seasons. Tables 2 and 3 show that, for the case of symmetric seasonal patterns in exposure, the bidirectional method gives an essentially unbiased estimate with smallest RMSE. This replicates the results of Bateson and Schwartz (7). However, for the case of left-or right-heavy patterns in exposure, the bidirectional approach shows more bias and more RMSE, although in other settings it gives less biased estimates than unidirectional sampling. In comparing RMSE, it is interesting to note that Table 1 shows similar values of RMSEs for both bidirectional methods. However, Tables 2 and 3 show that the bidirectional method with four controls gives smaller RMSEs than the bidirectional method with two controls. This suggests that in a real situation, the bidirectional method with many controls will give the less biased and more efficient estimator if there is seasonal variation or long-term trend. Our simulation analysis also showed that symmetric control periods may not work well if the exposure has cyclic waves. We observed that the estimates with symmetric control periods could be biased if the data sets with a cyclic wave were incomplete (Table 4). Here, a complete cyclic wave means that a data set whose number of concave vertex points (peaks) and convex vertex points (valleys) are the same. A complete wave does not necessarily mean the multiples of a complete year's data. As shown in Figure 2, data from summer to summer or from winter to winter have different numbers of concave vertex points and convex vertex points, so they are incomplete waves. For the incomplete cyclic wave, the symmetric control sampling approaches could also be biased. For entire exposure data sets, the estimator could be biased if the number of convex vertex points was greater (or less) than the number of concave vertex points ( Table 4). The RRs estimated with bidirectional case-crossover design were underestimated by 20-43% compared with the true RR (Table 4). Figure  2 depicts the situation described above. This analysis suggests that we need to look at the entire exposure pattern thoroughly if the exposure has a cyclic wave such as seasonal patterns. If the wave is not complete, then the estimates with symmetric bidirectional sampling approaches could be biased as well.

Discussion
In environmental epidemiologic studies, the interpretation of the results has been criticized because the data may be subject to an unmeasured confounding bias. The size of association between air pollution and daily mortality is also small enough that the difference may be considered due to bias. To overcome all those difficulties in conducting environmental research, researchers need to focus on the use of improved environmental monitoring techniques, the amount and precision of health and exposure data, the application of advanced statistical techniques, and the study design.
Formal time-series analysis using Poisson regression has been the primary statistical approach to such studies. Recently, advanced time-series modeling techniques have been developed to provide better control for potential confounders. For example, generalized additive models (GAM) that use nonparametric smoothing have allowed for highly flexible fitting of seasonal wave and long-term time trends, as well as nonlinear associations with weather variables such as air temperature and humidity. Most previous mortality studies have dealt with death counts rather than individual deaths.
As Pope (4) pointed out, case-crossover studies have some drawbacks. Due largely to the loss of information from control periods that cannot be included in the analysis, the case-crossover approach has been believed to have lower statistical efficiency. If a bidirectional sampling approach with a few control days (which are selected within the limited periods, for example within 2 weeks) had been applied for 1 year of data, then there would not be more than 7.7% (4 weeks divided by 1 year) information loss. This indicated that deaths could not be analyzed for a 4-week period (2 weeks from the beginning and 2 weeks from the end) each year. The relative efficiency will not be substantially lowered if the study periods are extended with a few control days. However, Bateson and Schwartz (7) found 66% efficiency in their simulations when controls were randomly selected. The major drawback of a case-crossover design was that the results could be sensitive to the selection of control periods, especially when clear time trends exist. Previous studies have shown that bidirectional approaches are needed when there are seasonal patterns in exposure (6) or in both exposure and outcome (7).

Appendix
A sine curve should be used to simulate air pollution data to capture the cyclic seasonal pattern. Because the real shapes of SO 2 plots are sharper at the edges, we decided to use several lines to simulate SO 2 values rather than one sine curve. Another reason for using combinations of lines is that it is easier to create left-or right-skewed data. We controlled the ranges of the date, slopes, and intercepts of the lines so that we had the shapes we wanted. For long-term trend only (no seasonal waves) simulation, only one mean line was used to generate SO 2 values. The formula for the simulation was where day is the number of days after 1 October 1991 (1 ≤ day ≤ 731) and N(0,1) are the random numbers generated from standard normal distribution using NORMAL function of SAS version 6.12 (9). Values of a, b, c, d are given in Table A-1. For the simulations for seasonal waves, five mean lines were used to generate SO 2 values. The whole range of the day (1 ≤ day ≤ 731) was divided into five pieces, depending on the patterns of the skewness, and a mean line was used for each piece of the range. The cut points for five pieces of day ranges are given in Table A To give the linear trend, 0.7 was multiplied at the second cycle (see Table A-1). For all simulations, if the simulated value is negative, then the value was deleted and a new random number was generated again until a positive value was generated.