AUsing Disease-Cluster and Small-Area Analyses to Study Environmental Justice

Wartenberg D.

Publication Details

This appendix explains, from a methodological point of view, why there are a paucity of epidemiologic or health-effects studies that assess environmental equity on a local, residential scale. It examines why the data requirements for traditional epidemiologic studies are rarely met and explains the strengths and limitations of screening for communities at greatest risk by using preepidemiologic studies (that is, studies with more limited data), such as disease-cluster and small-area analyses. Finally, it concludes with a set of research needs and a strategy for implementing health-effects studies to assess issues of environmental justice.

Health-Effects Studies

Barriers to Epidemiologic Studies

To understand why few studies have examined the health status of minority and economically disadvantaged populations living in contaminated environments, it is helpful to define what types of health-effects studies are possible and how scientists undertake such studies. There are two principal barriers to applying traditional epidemiologic methods to issues of environmental justice: data availability and sample size.

Data Availability

To conduct an epidemiologic study of the effects of a local exposure source on a residential population, one needs to know the demographics of the population at risk, the extent of the exposure of concern, and information about other risk factors for disease, such as occupation, diet, and socioeconomic status. If the population is large enough and if a sufficient number of people are exposed to the disease agent, as defined by statistical criteria, then these data can be used in a traditional epidemiologic design (e.g., a cohort, case-control, or cross-sectional study). These epidemiologic studies help determine whether those people with higher exposures are or were more likely to develop disease when the effects of other risk factors for disease are removed or adjusted for. That is, one can compare residents with disease to those without disease in terms of exposure while making adjustments (either statistically or in the design) for other risk or lifestyle factors.

However, in most minority and low-income communities, such data are not readily available (Environmental Protection Agency, 1992; Sexton et al., 1993). As noted in an Environmental Protection Agency report on this issue, “Environmental and health data are not routinely collected and analyzed by income and race” (Environmental Protection Agency, 1992, p. 1). When such data are collected, they are not always available to researchers at the community or neighborhood level. For example, for some cancer incidence studies, states have been willing to provide municipal-level data and, in special circumstances, individual-level data. For studies in other states, data have been limited to the county level. Furthermore, data on lifestyle and behavior are not generally available except for regional data based on statistical sample surveys (e.g., the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System). Therefore, to undertake epidemiologically reliable studies of these communities, substantial data collection efforts would be required, but data collection is an extremely costly and resource-intensive enterprise.

Sample Size

In addition, many of the minority and low-income communities with environmental justice concerns are extremely small, in epidemiologic terms. Studies of small populations may not have adequate statistical power to detect a significant effect even if one exists. For example, Zimmerman reports that communities with at least 2,500 residents and at least one inactive hazardous-waste site on the National Priorities List (NPL) have a 1990 median population of less than 18,000 people (Zimmerman, 1993). To appreciate the limitations of studying a population this small or smaller, consider the following hypothetical calculations. For a cancer with an incidence rate of 1 in 30,000 people per year, the typical rate of childhood leukemia, one would expect to see fewer than one case per year in over half of the communities with NPL sites. Even if 5 years of health outcome data were available, one would expect to see fewer than three cases in each of these communities and about six cases in 10 years, a scant number of cases for reliable epidemiologic analysis.

However, it is not enough to know only about the people who developed adverse health effects. For most epidemiologic analyses, one also must have information about the people who did not develop disease so that explanations for the disease other than the one of interest, the environmental contamination, can be ruled out. To appreciate the concerns about statistical power (i.e., the ability to detect an effect statistically when one exists), consider the following hypothetical study. Assume that one can identify all childhood leukemia cases in one of these communities over a 5-year observation period and obtain data on other risk factors for the entire population, a rather big assumption. Then, assume (on the basis of typical U.S. data) that approximately 40 percent of a typical community is under 20 years of age. Then, assume that those exposed to the environmental contaminants were five times more likely to develop leukemia than those who were not exposed, a risk larger than that typically seen in environmental epidemiologic studies. With these data, an epidemiologic study would have a less than 50 percent chance of detecting a statistically significant effect. Studies with such a low likelihood of success are not attractive to researchers, are expensive, and are extremely difficult to fund. Small sample size is a major impediment to conducting meaningful studies.


As an alternative, before embarking on large, costly data collection efforts, it is sometimes possible to gain insight into local health risk by conducting preliminary studies with existing, albeit limited, data. This is called preepidemiology (Wartenberg and Greenberg, 1993). The methods used in these preepidemiologic assessments are typically called disease-cluster and small-area analyses. Some methods primarily use information about cases, not taking account of the characteristics of the population from which they were drawn. For example, if the locations or dates of incidence, or both, of a series of cases are known, it is possible to investigate whether these cases are closer together (or closer to a known exposure source) than would be expected. On the other hand, it might be possible to investigate whether the populations of three towns with borders within a mile of the local incinerator have higher rates of lung cancer than the lung cancer rate in the entire county in which they reside. Such studies can be used for the screening of populations in regions with a high incidence of a particular disease for further study and as an aid in the design of more rigorous studies. Because the data requirements for preepidemiologic studies are far more limited than those for traditional epidemiologic methods and because the data may be more subject to underreporting or inaccurate reporting, depending on the source, the results of the analyses are less reliable. Nonetheless, these data are still useful for screening and for designing additional studies.

Types of Preepidemiologic Methods

Cluster Data In some preepidemiologic studies, data are analyzed without knowledge of the size of the population from which they came or whether there is a local source of risk. Such analyses simply summarize the characteristics of the patterns of cases. For other analyses one must determine the size of the population from which the cases have been identified. Sometimes, when a resident notices one or two cases of a rare disease in an area, he or she asks all neighbors and friends if they know of any other cases. When the observation is reported to health officials, the context (e.g., two cases on the resident's block) may overestimate the rate (i.e., these two cases may be the only ones in the entire town), making the disease rate look unnecessarily high. There also are issues of statistical stability of rate estimates if the population at risk is very small. Some more sophisticated disease-cluster methods use data about some of the people who did not develop disease, to adjust for other differences between the cases and those who did not develop disease (e.g., behavior and lifestyle). Some analyses compare average distances among those with disease to average distances among those without disease, whereas others use data on the proximity of people to a known exposure source. These are so-called focused studies (see below), which determine whether those with disease are closer to the source than those without disease.

Area Data Another type of data used in preepidemiology is a set of disease rates reported for a set of geographic units, such as counties. These are called area data. Occasionally, such data are available at the geographic scale of the minor civil division (MCD), but most often the locational information is not sufficiently reliable at that level. This type of data can be analyzed in a variety of ways, such as to see if geographic units with higher rates of disease are nearer one another than expected or whether those closer to facilities emitting pollution have higher rates of disease than those farther away.

Limitations of Preepidemiology

One limitation of these summary data is that for exposures of very limited geographic or temporal extent, the majority of the MCD or county may be unexposed, diluting any possible excess cases of disease when data are reported for the unit as a whole. In addition, an exposure source is sometimes near a boundary of MCDs, so that several MCDs must be combined to capture the entire exposed population, even though only those living closest to the source are exposed, compounding the dilution problem even further. Another limitation of these data is that data on other risk factors most often are not available for individuals. Therefore, group summaries must be used, precluding adjustment for possible covariance of risk factors within a geographic unit, which leads to possible confounding (i.e., misattribution of cause [Greenland, 1989, 1992; Richardson et al., 1987; Rothman, 1990; Susser, 1994a,b]).

Although many of the preepidemiologic approaches make clever use of scant data, some scientists question the reliability and utility of such analyses (Neutra, 1990; Rothman, 1990). Because the limitations of each specific data set define its interpretableness and generalizability, these limitations must be made explicit both to researchers and to the community under study. This includes discussions with the community about the limitations of the methods before any data collection or data analyses are undertaken, lest false expectations be raised.

Appropriate Uses of Preepidemiology

On the other hand, preepidemiologic studies of case reports can be extremely useful as screening tools and for guidance in epidemiologic design. For example, in the 1850s, John Snow, a British physician, noticed higher death rates from cholera in one area of London than another. He hypothesized that this might be due to the source of the drinking water and its proximity to sewage disposal areas. By preventing access to the suspected contaminated water supply by removing the water pump handle, Snow was able to confirm his hypothesis (Snow, 1965).

Similarly, in Woburn, Massachusetts, in the early 1980s, residents reported excess cases of childhood leukemia that were confirmed by preepidemiologic analyses. A rigorous epidemiologic study further validated the residents' concerns and implicated chemically contaminated drinking water as the cause (Lagakos et al., 1986). The results of the latter study remain controversial, although a recent report supports the initial finding of cases of excess disease and reports a subsequent decline after appropriate allowance for latency.

In 1971, Herbst and coworkers reported on eight cases of adenocarcinoma of the vagina in women aged 15 to 22 in Massachusetts in which diethylstilbestrol was implicated as the disease-causing agent (Herbst et al., 1971). A year later, the observation of three cases of angiosarcoma of the liver among polyvinyl chloride production workers was used to implicate the vinyl chloride monomer as a cause of disease in a worker population (Creech and Johnson, 1974). Another workplace cluster of disease, male infertility in the pesticide industry, was used to identify the manufacture of dibromochloropropane as dangerous (Whorton et al., 1977). Several cases of phocomelia alerted experts to the problems of thalidomide. The identification of risk factors for human immunodeficiency virus transmission and AIDS also arose out of case reports from preepidemiologic studies (Centers for Disease Control, 1981). Finally, studies of soybeans and asthma attacks have identified soybeans as a new etiologic factor for the disease, and studies of Hodgkin's disease in young adults and mesothelioma in the small village of Karain, Turkey, have helped focus further epidemiologic studies that eventually led to a reduction in the number of cases of disease (Alexander, 1992). In short, although many question the utility of these preepidemiology studies, there are numerous examples of successes, that is, preliminary results confirmed by further rigorous epidemiologic studies.

Statistical Approaches to Preepidemiology

Over 70 methods are used in disease-cluster and small-area analysis studies, and it is important to understand which type of method should be used in which situation, as well as the expected results and limitations of each method. I begin with a summary of the current practice of disease-cluster analysis, provide a set of questions to help investigators determine which analytic method is most appropriate for their study, briefly summarize the characteristics of the available methods, and conclude with a discussion of the statistical power of the methods.

Disease-Cluster Investigation Practices

To characterize the state of disease-cluster analyses, my colleagues and I have undertaken a review of published studies from 1960 to 1990. The goal was to characterize the problems studied, the approaches used, and the results obtained. The work is still under way, and the results presented here are preliminary.

A fairly wide variety of diseases were studied, although a few diseases were studied far more often than all the others. Of the 352 reported diseases, 207 (59 percent) were some type of cancer, 30 (8 percent) were some type of birth defect, and 14 (4 percent) were multiple sclerosis. (The area of infectious diseases is one that has largely been excluded in the preliminary review.) Of the studies of cancers, over one third were studies of leukemia and over one fifth were studies of Hodgkin's disease.

Approximately 74 different statistical analysis methods were used in the studies described in the set of 287 papers reporting statistical results. The two most commonly used methods, each used in over 20 percent of the reported studies, were the chi-square test (which compares the observed number of cases to that expected under an assumed Poison distribution) and the Knox test for time-space interaction (Knox, 1963, 1964a,b). Both of these methods were used more than twice as often as the next most frequently used method, the Ederer-Myers-Mantel (EMM) test for space-time clustering (Ederer et al., 1964). Only 12 of the 74 methods were each used in at least five separate studies. Few of the methods allow adjustment of confounding variables such as population density, ethnicity, age of residents, and so forth.

Of the categories of study types investigated, space-time clusters were assessed most often, followed by spatial clusters, seasonal periodicities, temporal clusters, and occupational clusters. A number of the reports were case series in which no statistical analyses were performed.

Overall, 71 percent of the reported results were statistically significant. This preponderance of positive findings has a few possible explanations. First, it may be that the reporting process overreports studies with positive findings. That is, although many neighborhoods evaluate their own disease frequencies (either explicitly or implicitly), those with excess disease frequencies are predominantly reported on and receive more rigorous consideration. Few communities, if any, report a deficit of cancers. So, even if the variation in cancer rates is random, with some neighbors showing higher than expected rates (on the basis of state or national averages) and some showing lower than expected rates, reporting of only the higher rates would produce a bias in the reports.

Second, it may be that studies with positive findings are more likely to be published. Many preliminary cluster investigations are conducted by health department officials and are never published. On the other hand, it may be that it is harder to have published a study with negative results than one with positive results. This, too, could create an overrepresentation of positive reports. However a substantial number (nearly one third) of studies with negative findings were published, suggesting that this is unlikely to fully explain the excess of studies with positive findings.

Choosing the Right Statistical Analysis Method

The statistical analysis methods most often used in preepidemiologic studies are the most simple to apply: the chi-square test and the Knox test. Neither method allows adjustment for confounding factors. Given the limited data available for most cluster studies in terms of both sample size and number of variables assessed, it seems reasonable that most investigators use methods that are not designed to emphasize subtle aspects of the data. Nonetheless, it is important that investigators choose the method that will be most sensitive for the detection of the pattern or process that they think underlies the concern, while appreciating the characteristics and limitations of the specific method chosen. Unfortunately, choosing among the more than 70 different methods that have been used in the published literature (Jacquez et al., 1996; Marshall, 1991) poses a challenge for researchers seeking to conduct an investigation. Below are six questions that may help researchers choose the most appropriate method (Waller and Jacquez, 1995; Wartenberg and Greenberg, 1993).

What Type of Clustering is Being Investigated?

It is important that investigators choose the method that will be most sensitive for detection of the pattern or process that they think underlies the concern, while appreciating the characteristics and limitations of the specific method chosen.

Typical methods address time-, space-, space-time-, and exposure-based hypotheses. Assessments of patterns over time is the simplest, because data are distributed along one axis: time (Bailar et al., 1970; Ederer et al., 1964; Larsen et al., 1973; Naus, 1965; 1970; Tango, 1984). Patterns over time can be thought of as clusters (many events bunched together), trends (a gradual increase or decrease in the rate of events over time), or cycles (repeating patterns, such as high rates in certain seasons). For example, if one is concerned about a sudden increase in the incidence of asthma following the opening of a new industrial facility, one might want to look for a trend in asthma incidence over time, both before and after the opening. If one is concerned that events such as copycat suicides come in bunches, one might look for clusters of reported suicides using death certificates. If one believes that events come in cycles, such as asthma attacks during periods with high ozone levels in the summer, one might use a method that detects annual cycles.

Assessment over space is two—dimensional and has increased complexity compared with the assessment of patterns over time (Cuzick and Edwards, 1990; Diggle, 1991; Geary, 1954; Grimson et al., 1981; Moran, 1948, 1950; Ohno et al., 1979; Openshaw et al., 1988; Schulman et al., 1988; Tango, 1984; Whittemore et al., 1987). Spatial patterns may be clusters (regions with more events than other regions) or trends (gradual increases in the number of events across the study area). For example, one may suspect that there was a large spill of toxic material somewhere nearby but may not be sure where. Then, one might want to look for clusters of adverse health events that might signal the location (and effect) of the spill. On the other hand, one might be concerned about the confluence of emissions for a variety of pollution emitters, although one might not know the specific dispersion patterns of the mixed releases. For that situation, one might postulate both clusters of adverse health outcomes near the facilities, signifying hot spots, and a decreasing trend of adverse health effects as one moves away from the general area of the facilities, indicating the atmospheric processes of dilution, dispersion, and transport. In such cases, one would use spatial methods.

A specialized group of spatial methods assesses clustering in proximity to a particular source of hazard. They are called focused methods (Besag and Newell, 1991; Bithell and Stone, 1989; Lawson and Williams, 1993; Stone, 1988; Waller et al., 1994). Most often, these methods use distance as a surrogate for exposure and assess whether cases are closer to the source than expected. The advantage of these methods over other spatial methods is that they address a specific hypothesis of concern and, because of their specificity, have increased sensitivity. The disadvantage of these methods is that their specificity limits their ability to detect more general patterns of clustering.

Space-time assessments are three—dimensional (Abe, 1973; Barton et al., 1965; David and Barton, 1966; Klauber, 1975; Knox, 1964a,b; Mantel, 1967; Pike and Smith, 1968, 1974; Pinkel and Nefzger, 1959; Pinkel et al., 1963; Symons et al., 1983; Williams, 1984). However, distances in space are not commensurate with distances in time, which further increases the complexity of the assessment problem. In general, space-time methods look for corresponding patterns of events that occur in both space and time. For example, finding that asthma cases occur when there are releases from an industrial facility and that those asthma cases tend to occur in specific geographic regions (say, mainly downwind of the facility) would be an example of space-time clustering. Space-time cluster methods are particularly popular because space-time clusters are thought to be more unusual than time-only or space-only clusters because they require simultaneous, correlated effects in two independent domains: space and time. Furthermore, the appearance of space-time clusters is unlikely to be attributable to other factors such as age, ethnicity, socioeconomic status, or even random fluctuations because they require effects in both space and time simultaneously.

What is the Null Hypothesis?

The null hypothesis defines the pattern that one would expect to observe in a data set if there were no clustering. One might want to assume that cases are equally likely to occur at all locations. This is termed a random, uniform risk distribution. However, this assumption of uniform risk is overly simplistic. The total number of expected cases for any small spatial or temporal region is the sum of each individual's chance of being a case, or the individual's risk.

Since population density varies by block, by town, by county, and so forth, risk also varies. To the degree that one can model this variation in risk, the accuracy of the assessment can be improved. Beyond population density, it is known that demographic characteristics such as ethnicity, gender, and age also vary. Beyond demographics, it is known that behaviors and lifestyles vary. These factors—population density, demographics, behaviors, and lifestyles—are called confounding variables because they might be alternative explanations for the observed cases of disease. To the degree that one can capture these sources of variation, one can further improve the accuracy.

For temporal cluster methods, many methods assume the random, uniform risk distribution as the null hypothesis (Grimson and Rose, 1991; Grimson et al., 1992; Larsen et al., 1973; Naus, 1965; Tango, 1984; Wallenstein, 1980). That is, in each equal interval of time one expects to observe the same number of cases, with that number being the total number of observed cases divided by the number of equal time intervals. A modification of this method that allows investigators to accommodate confounding variables uses unequal time intervals in which the size of each time interval is set according to the population size (or some other function of confounding variables), still keeping the number of expected cases equal for each (unequal) time interval (Weinstock, 1981).

Most spatial methods similarly assume the random, uniform risk distribution as the null hypothesis (Cliff and Ord, 1981; Geary, 1954; Grimson and Rose, 1991; Moran, 1948, 1950). Some methods enable researchers to modify the size of the spatial interval to accommodate confounding variables such as population size (Hjalmars et al., 1996; Kulldorff, 1997; Openshaw et al., 1988; Turnbull et al., 1990). Some methods modify the distance between geographic units to reflect the underlying population density by a statistical (Whittemore et al., 1987) or graphical (Schulman et al., 1988) means. One method weights comparisons between nearby units by their relative population densities (Oden, 1995). Others enable investigators to choose a set of subjects without disease (i.e., controls) and then compare the distribution of the cases to that of the controls (Cuzick and Edwards, 1990; Diggle, 1991). One method characterizes the demographics of an entire neighborhood population, which was then used to define the risk of disease for each individual (Day et al., 1988).

Some space-time clustering methods determine whether cases are close in space and time simultaneously, close only in space, close only in time, or close in neither, assuming as the null hypothesis that closeness in space and closeness in time are unrelated (Barton et al., 1965; Klauber, 1975; Knox, 1963, 1964; Mantel, 1967; Marshall, 1991; Pinkel and Nefzger, 1959; Symons et al., 1983; Williams, 1984). Pinkel and coworkers (1963) refined this approach by defining the null distribution of time differences between cases as that observed in cases that were spatially far apart and comparing the distribution of time differences among cases spatially close to it. Mantel (1967) extended this approach by using linear regression to compare the time differences between all pairs of cases to their spatial distances, and Jacquez (1995) modified it to compare nearest neighbors in space to those in time. Finally, some methods sample a set of individuals from the same study area but without disease (i.e., controls) and use their distribution as the null hypothesis against which the distribution of cases is compared (Lyon et al., 1981; Pike and Smith, 1974). The latter approach is most able to capture unknown confounding variables.

What is the Alternative Hypothesis?

The alternative hypothesis defines the pattern that one wishes to infer if one rejects the null hypothesis. Statistical methods are not equally powerful against all alternative hypotheses. Typically, certain methods are more sensitive at detecting a particular type of pattern, such as a single, large cluster of cases, whereas others are more sensitive at detecting another type of pattern, such as many small clusters of cases distributed throughout the study area. Other possible alternative hypotheses are that the cases follow the shape of an exposure source, such as high-voltage electric power transmission lines, polluted rivers, or the air dispersion pattern downwind of a single stack. One also could imagine a trend in which the risk of disease falls off gradually with increasing distance from a certain location or the source of exposure. In choosing a method, it is useful to review the results of tests of statistical power (see below) to determine which method is most sensitive for the particular type of pattern being investigated.

What Type of Summary Statistic is Used?

Two main statistical approaches are used by cluster methods. In one, distances between all pairs of events (case-case, case-control, case-exposure, etc.) are calculated and summarized. These may be summarized as the number of case pairs closer than a certain separation distance, such as the mean distance between cases or such as the mean distance of each case to the nearest other case. On the other hand, the whole distribution of distances may be used. If controls are used, then a summary of the case distances may be compared to a summary of the control distances or even to case-control distances.

In the other approach, called the cell occupancy approach, the study area (i.e., time, space, or space-time) is divided into a set of cells, and each is assigned an expected number of cases on the basis of the overall disease rate, possibly adjusting for population density or other risk factors. Then, the observed number of cases in each cell is compared to the expected number in each cell.

Which Method is Most Appropriate for the Cluster Under Investigation?

Two main characteristics are used to evaluate preepidemiologic methods: bias and sensitivity. When one is planning to undertake a preepidemiologic investigation, it is important to evaluate these characteristics so that one appreciates the strengths and limitations of the method. When working with communities, it is at least as important to discuss these characteristics with the community representatives, so that they have an understanding of what the method can and cannot do. If the residents do not appreciate the limitations at the outset and the study gives a negative result, the community representatives may believe that the scientists are hiding something and that they purposefully used a method that could not find a cluster. However, if they understand the limitations at the outset, they can either accept or reject the methodological approach independent of the results, a far more objective and satisfying evaluation. On the other hand, researchers could work with residents to select a method that both is appropriate scientifically and accommodates residents' preferences. These characteristics are now considered.

Bias, the finding of a false effect or the obscuring of a real effect, is a critical factor of concern. For example, if investigators conducted a study of lung cancer incidence around an industrial facility, they might find a cluster of disease near the facility. However, it might be that data on smoking were not available, and it might be that a greater proportion of the population that lived close to the facility than those that live far away were smokers (due to clustering of people of similar ethnicity or socioeconomic status). If the investigators had been able to adjust for smoking, the cluster may have been fully explained, exonerating the industrial facility. Therefore, by not adjusting the data for smoking patterns, the investigators drew the wrong conclusion. Unfortunately, there is no easy way to test for this type of bias unless data on all possible factors that could cause the disease are available. However, when reliable data on some possible risk factors are available, it is better to use methods that adjust for these data than methods that do not.

Sensitivity (i.e., statistical power) is the ability of a method to detect an effect when it really exists. Sometimes, a study with very few cases is conducted. The method may not show statistically significant clustering even though a real cluster exists because random variation could produce a similar effect. Ideally, methods should be extremely sensitive. To evaluate the sensitivity of a method, investigators conduct computer simulations. In a set of simulations, investigators construct a large number of hypothetical data sets that have a number of cases that cluster. Then, the investigators add a number of cases that do not cluster but that are distributed randomly. Typically, one constructs several thousand of these hypothetical data sets. Next, the method being evaluated is applied to all of these hypothetical data sets, and the percentage of times that the method detects the cluster is reported. The percentages are then subdivided so that one can determine how the sensitivity of the method decreases as the proportion of random cases increases. Studies that have compared the sensitivities of different methods are described below.

In conducting studies to evaluate the sensitivity or statistical power of a method, the investigator sets three important parameters: the number of events, the relative risk, and the alternative hypothesis. Most often, in preepidemiologic investigations, there are a few to a few dozen cases. These are small numbers for any epidemiologic study. Most simulation studies evaluate 25 to 250 cases.

The relative risk is the typical measure of the severity of the hazard. It tells how much more likely a person is to get the disease if the person is exposed to the hazard. Risks of known environmental hazards usually range from about 1 to 5, with unusually strong hazards being as large as 10. For preepidemiologic methods, in which exposure most often is not well characterized, most investigators would like to have sensitivity to detect risks of at least 3 or higher. Some investigators postulate more complex models of epidemicity, particularly for infectious diseases, which preclude translation into simple relative risks. In short, the relative risks used in simulation studies vary widely.

The alternative hypothesis, as noted above, is the specific pattern that the investigator creates in the hypothetical data sets for the simulations (e.g., a single cluster, multiple clusters, trend, results of models of pollutant dispersion, and results of models of specific disease spread). Some investigators use simplistic models (e.g., single cluster hot spots or trends). These are easy to explain but are unrealistic. Other investigators use complicated models of contagions or the disease process. Still others use data sets that are modified from actual studies. These may capture subtle characteristics of data that may be hard to identify and that are difficult to explain or describe.

The results of statistical power studies provide two types of information. First, for a specific data pattern (e.g., a single cluster or a trend), they allow investigators to compare the sensitivities of methods. Typically, one method is more sensitive for one type of pattern and another method is more sensitive for another type of pattern. Second, they allow investigators to estimate the minimum number of cases needed to detect an effect, given a rather strict set of assumptions. Unfortunately, different investigators often use different sets of assumptions, which result in the need for different minimum numbers of cases.

For example, in studies of time patterns, Naus (1966) found that a method that assesses the interval with the greatest number of events (the scan method [Naus, 1965]) was more sensitive than a method that compares the number of observed cases to that expected for each time interval in the study EMM [Ederer et al., 1964]) for data sets with very few cases, but the reverse was true for data sets with many cases. Sometimes, results of these studies are highly dependent on a set of assumptions. Wallenstein and colleagues (1993), in a complementary investigation of the scan method (Naus, 1965), reported that for a hypothetical data set with a relative risk of 4, one needed only 10 cases for a sensitivity of 80 percent, whereas Sahu and colleagues (1993), using the same method with slightly different assumptions of what the pattern looked like, reported that with a relative risk of 4 one needed 50 cases for a sensitivity of 80 percent. Apparently, the scan method is quite sensitive to the definition of the cluster.

For space-time methods, several investigators have compared two or three methods using specific models of epidemicity (typically infectious diseases) or specific data sets (typically chronic diseases) (Alexander, 1991; Bithell, 1992; Cartwright et al., 1990; Chen et al., 1984; Cuzick and Edwards, 1990; McAuliffe and Afifi, 1984; Raubertas, 1988; Shaw et al., 1988). Although these results highlight certain strengths of the most powerful methods, it is difficult to describe the results so that they are understandable or apply them to situations involving noncontagious diseases.

To investigate both spatial and space-time clustering, Wartenberg and Greenberg (1990a,b, 1992) compared a method using the distance between all cases (Mantel's space-time regression method [Mantel, 1967]), a method comparing the number of observed and expected cases in each space-time interval (the EMM method [Ederer et al., 1964, and two indices of spatial autocorrelation (Moran's I [Moran, 1948, 1950] and Geary's c [Geary, 1954]). They found that the method comparing observed and expected cases was more sensitive to trends in data than the other methods and that the distance-based method was more sensitive to single clusters. However, even with a relative risk of 5, to achieve adequate power (>80 percent) one needs sample sizes of at least 25 to 40 cases. In short, all methods were relatively insensitive.

Waller and Lawson (1995) studied the statistical power of focused cluster tests using hypothetical data sets with 51, 150, and 300 people with disease. A type of trend test (the local score tests [Lawson and Williams, 1993; Waller et al., 1994]) did best, achieving 80 percent power with risks of over 2 for the 300-case scenario for more concentrated clusters. However, the methods did not do as well with a risk of 4 for the 51-case situation. For a more diffuse cluster with the same number of total cases, the sensitivity is much greater.

Walter (1992a,b) investigated the possible bias arising from differences in population density (and, hence, the statistical stability of rate estimates) in area data and found a substantial effect. Oden and colleagues (1996), investigating Oden's population adjustment method for spatial autocorrelation (Ipop [Oden, 1995]), found the method to be more powerful than both unadjusted methods (Moran, 1948, 1950) and methods that use individual case data rather than area summaries (Cuzick and Edwards, 1990; Grimson and Rose, 1991; Grimson et al., 1981). The latter result is somewhat surprising and warrants further investigation. Even so, for simulations by this method with 30 cases, the power was moderate at best, generally under 50 percent.

In summary, cluster detection methods work best if there are at least 30 cases, and preferably at least 50 cases. Even so, the ability to detect an excess incidence of disease is dependent both on the choice of method and on the specific nature of the excess disease incidence. Some methods are powerful for one type of pattern (e.g., a single local excess disease incidence or cluster) and not powerful for another type of pattern (e.g., a more general increasing trend in risk or disease incidence with proximity to the source), and vice versa. As with most statistical tests, as one increases the specificity of the alternative hypothesis under investigation, such as with the focused tests, one increases the power for the detection of patterns consistent with that hypothesis while decreasing the power for the detection of patterns that are not consistent with that alternative hypothesis but that are still aberrant.

Two limitations of this whole body of literature are the lack of consistency and comparability of the simulation methods and the narrow focus of most simulation studies. Investigators have used simulated disease patterns, models of disease spread, and examples based on real data sets as alternative hypotheses for testing. Within each of these approaches, small differences in methods make comparisons difficult, and comparisons between these approaches are not direct. Furthermore, investigators usually evaluate only a few methods, and when they do so, most often they use methods designed to be sensitive to the same aspect of the pattern. Typically, the studies do not compare spatial methods to space-time methods to focused methods and do not use a wide array of data that simulate patterns for which each is likely to be the most powerful. By so doing, one could begin to evaluate the limitations of applying the wrong method to a given situation (i.e., compare the costs of specificity across methods). Although such an exercise may not be that interesting theoretically, it would be of substantial value to the practitioner.

How Should the Statistical Significance of an Analysis be Interpreted?

Statistical interpretations of disease-cluster and small-area analyses are extremely controversial. As demonstrated by participants at the 1989 conference on disease clusters sponsored by the Centers for Disease Control, many investigators are skeptical of the utility and statistical reliability of cluster studies (Neutra, 1990; Rothman, 1990). Some investigators are concerned that cluster studies most often result from community awareness of high disease rates that occur as a result of temporary, random fluctuations. These observations are based on informal evaluations conducted in each community as neighbors talk to one another about their lives. Therefore, those that garner scientists' attention have been screened out of thousands and thousands of potential study areas. From a statistical inference point of view, one must make an adjustment in the calculation of the p value to adjust for the screening of thousands of communities to limit the multiple-comparisons problem and the likely false-positive results (Armon et al., 1991; Neutra, 1990). Others point out that the specificities of the methods may impede detection of clusters not exactly fitting the assumptions of the specific method being used. They recommend against ruling out the existence of a cluster even if the p value exceeds the nominal level (Grimson et al., 1992). In short, the problems of false-positive and false-negative results are far from resolved or even well understood in the community setting (Wartenberg, 1994).

In developing an interpretation of the statistical importance of a disease-cluster or small-area study, one must consider the goal of the evaluation and the use to which the result will be put. If used for screening (Wartenberg and Greenberg, 1993) or even for the identification of new etiologies (Rothman, 1990), the use of p values can be thought of as a way of ranking the severity of the clustering rather than traditional use for statistical interpretation. Then, by using this ranking in conjunction with other information about the observation (e.g., sample size and plausibility in terms of either known exposures or biologic models), situations can be targeted for more rigorous investigation. It is hoped that this further investigation, although expensive, would resolve the concern.

Another instance that raises the issue of p values is the follow-up of an assessment of the presence of an excess incidence of disease. Often, once an excess incidence is detected, prospective observation is instituted, and after several years of additional data collection, the new data are assessed. This is a response to community concerns that provides useful data for assessment. In fact, this is what happened in a study of childhood leukemia in Woburn, Massachusetts (Lagakos et al., 1986), in which an excess incidence of childhood leukemia appears to have persisted, even though the putative exposure, a contaminated drinking water well, was closed. (More epidemiologic research to understand this situation is under way.) In such situations, should one use a more liberal statistical criterion to evaluate the newly collected, prospective data since this community had already shown an excess incidence of childhood leukemia? Certainly, it should not be held to the same statistical standard as a new investigation in a situation with no history of an excess incidence. Should the tests be one tailed, since the a priori hypothesis is that there is an excess? Should the statistical criteria for assessments of disease causation hypotheses differ from those designed to guide public health policy? Should the confirmation of occurrence of a statistically significant excess incidence of disease be required in situations of known exposures before preventive action is undertaken? These questions pose difficult challenges.

In the context of environmental justice issues, the interpretations are also complex. These are situations in which a priori concerns based on known exposures may have resulted in a high incidence of disease. However, there are thousands of such situations throughout the United States. It is not clear how one should address these same issues of statistical significance testing. On the one hand, since one has an a priori concern, one might argue for doing one-tailed significance testing. If one is considering public health policy and disease prevention rather than etiologic research, one also might consider increasing the acceptable p value to diminish false-negative results. On the other hand, one might be concerned about false-positive results because of the large number of communities from which data for the study have been drawn. Then one would lower the p value to adjust for the multiple tests that have been carried out, albeit informally.

In short, one should not preclude the use of preepidemiologic studies of communities with environmental justice concerns on the basis of the statistical sensitivity of the methods, as described above. The issues are too complex and too poorly understood to dismiss this potentially useful methodology. The only criterion for a successful investigation should not be the identification of a new etiology (Neutra, 1990) or even the identification of an excess incidence of disease. Rather, helping to target more in-depth investigations and even just to put community concerns in focus can be an extremely useful process. Such exercises sometimes identify particular exposure pathways that can be addressed whether or not an excess incidence of disease is confirmed. Despite their limitations, preepidemiologic methods are useful for screening and for focusing hypotheses. The examples of preepidemiologic studies described above provide further justification for their successful use with more traditional standards, although many have argued that, in epidemiology, one should not be overly concerned with p values in the context of traditional inference (Rothman, 1990b; Savitz and Olshan, 1995; Thomas et al., 1985).

Research Needs

This summary of methods for the investigation of environmental justice issues has highlighted a number of limitations of these methods and needs for improvement. First and foremost is the need for better access to existing data. As noted above, states have different rules regarding access to individual data records with personal identifiers. Such data, when available, provide the best resource for researchers investigating local problems. These data need to be made available regardless of the researchers' professional affiliation or other political and bureaucratic issues, provided that the request is filed by a bona fide researcher and that adequate steps will be taken by the researcher to protect confidentiality.

Second, there is a need to develop better and more comprehensive data resources. Typically, small communities have data only on births and deaths, and often, these data are available only at the scale of the municipality or the county. Researchers need data for each individual with residential addresses or specific residence locations for analysis of disease incidence patterns and assessment of proximity to possible sources of exposure. With the advent of geographic information systems, such analyses are becoming increasingly easy to conduct (Guthe et al., 1992; Rushton et al., 1995, 1996; Wartenberg, D. 1992; Wartenberg et al., 1993; Wartenberg, D. 1994), although in most places such data are not available or of insufficient accuracy for meaningful analysis. In addition, it would be useful to have other data, such as cancer incidence and birth defects data, and data on less severe outcomes that might be affected by air pollution, such as asthma incidence and the results of pulmonary function tests. Finally, data on confounding variables, such as behavioral and lifestyle characteristics, would facilitate more rigorous evaluations. There exist several preepidemiologic methods that allow for adjustment by confounders, if such data are available. These adjustments may reveal hidden associations or explain associations that had erroneously been attributed to exposure to environmental contaminants. If such data are not made available, substantial resources will be needed to undertake investigations.

Third, there is a need for a more systematic evaluation of the statistical properties of the preepidemiologic methods. Investigators need to know what method to use when and what they can and cannot detect. The statistical power studies conducted to date do not provide that type of systematic evaluation. This evaluation could be facilitated by developing a protocol for such comparisons, including the specific data sets, hypothetical and real, that should be used and the test results that should be reported. Then, data or computer programs should be made available to the scientific community for more comprehensive testing of existing methods and for development and testing of new methods. Finally, by compiling the results, one may begin to understand how to use these tools and what interpretations to draw from the results.

Fourth, one must consider the trade-offs involved in the interpretation of statistical significance in preepidemiologic studies. Researchers have argued in different contexts that traditional significance testing is both too liberal and too conservative. There is a need to look at the type of problems for which these methods are used (e.g., identification of new etiologies, identification of specific exposures or excess incidence of disease, replication of observed historical excess incidences of disease, and development of public policy) and to develop guidelines for interpretation for each use. Simply applying the rule that statistical significance is a p value of <0.05, as is currently practiced, does not adequately address the disparity of needs and the variation in the severity of the consequences of false-positives and false-negative results. In epidemiology, results identifying a new etiology become credible only after substantial replication, regardless of the p value.

Failure to identify an exposure that is causing an excess incidence of disease (a false-negative result) will likely lead to additional cases of disease, possibly a high cost to society, while false confirmation of a hazard (a false-positive result) will lead to overprotection, possibly a relatively low cost. Public health policy typically dictates a greater willingness to accept false-positive alarms than false-negative missed diagnoses. However, cluster investigators do not consider these trade-offs.

Fifth, there is a need for the development and testing of methods for combining the results from studies performed at disparate locations (Neutra, 1990). In the past two decades, there has been an explosion in the use of meta-analysis, a method for combining the results of published studies. Although many criticize this methodology by pointing out its limitations, in appropriate contexts the approach has been very powerful. For environmental justice issues, at the current time, meta-analysis is not the appropriate tool. Too few studies have been undertaken, let alone published. However, if one could identify exposures with similar characteristics in several communities, each of which is too small individually to have adequate statistical power, the joint analyses of these communities' data might prove insightful. Such approaches have been tried in other situations in which the exposures are similar (Cardis et al., 1995; Geschwind et al., 1992; Marshall et al., 1995), but the methods and their limitations need to be examined in the context of using preepidemiologic methods and studying issues related to environmental justice. These include small sample sizes, comparability of communities, possible confounding, similarity of exposures, and so forth.

In conclusion, when conducting studies of adverse health effects in minority and economically disadvantaged communities, the following considerations apply:

  • When concerns are raised, involve the community in all discussions and explain the limitations of the available methods and the possible outcomes of study.
  • Conduct a preepidemiologic assessment.
  • Use the right method for the job.
  • Be careful in evaluating the results.
  • Conduct an epidemiologic study.

The results of preepidemiologic studies should be evaluated in the context of the answers to the set of questions listed earlier. For example, if a method used is sensitive to trends but not clusters and negative results are obtained, one should not rule out the existence of clusters.

Similarly, results should be evaluated in the context of additional information. For example, if the observed health effect has not previously been reported for the hypothesized exposure, one should be far less confident of an association. If an excess incidence of disease is not observed where and when it was expected, one must question both the hypothesized association between this exposure and the excess incidence of disease and the adequacy of the exposure information. The association may be real, but it may be that researchers misunderstood how people were exposed.

The results also should be evaluated in the context of future risk. For example, if it is likely that people in the community are still being exposed to the suspected agent, one should consider taking remedial action more aggressively than one would if exposure were no longer occurring.

On the basis of these and related issues, the community, scientists, and policymakers must decide whether sufficient information has been gathered, whether exposure can and should be reduced, and whether further study is needed. Further study can mean working with officials to get access to existing databases that were not made available earlier or collecting new data and performing a new analysis.

If further study is needed, a traditional epidemiologic study should be conducted. The design of that study should be determined by the specifics of the situation. For example, if only data at the county level were available previously, the most useful study might try to get data on a more local level. This may require researchers to interview all current and former residents to get disease incidence and risk factor information. On the other hand, it may require working with local officials to break down regional data to the individual level. Alternatively, if data had previously been made available on a local level and there were enough cases for sufficient statistical power, effort might best be spent measuring or quantifying exposures.

An additional option is to implement a surveillance or sentinel investigation program. A surveillance program could be used routinely to develop the data necessary for investigation of health concerns, including ones not currently reported, and then to perform preepidemiologic assessments on a regular basis. This would provide a baseline from which one or several communities could monitor their health status. Appropriate caveats would have to be provided, so that yearly or local fluctuations were not interpreted as meaningful unless they were based on a sufficient number of cases.

In addition, one could develop a sentinel reporting system. This system would record occurrences of easily identifiable health events that may be indicators of environmental exposures but not contained in current reporting systems (Rothwell et al., 1991). However, to be useful, one would have to do more research to determine whether appropriate and useful sentinels can be identified.


  • Abe O. A note on the methodology of Knox's test of “time and space interaction” Biometrics. 1973;29:68–77. [PubMed: 4691051]

  • Alexander FE. Investigations of localised spatial clustering and extra-poison variation. In: Draper G, editor. The Geographical Epidemiology of Childhood Leukemia and Non-Hodgkin's Lymphomas in Great Britain 1966–1983. London: HerMajesty's Stationery Office; 1991.

  • Alexander FE. Space-time clustering of childhood acute lymphoblastic leukaemia: Indirect evidence for a transmissible agent. British Journal of Cancer. 1992;65:589–592. [PMC free article: PMC1977570] [PubMed: 1562468]

  • Armon C, Daube J, O'Brien P, Kurland L, Mulder D. When is an apparent excess of neurologic cases epidemiologically significant? Neurology. 1991;41:1713–1718. [PubMed: 1944897]

  • Bailar JC III, Eisenberg H, Mantel N. Time between pairs of leukemia cases. Cancer. 1970;25:1301–1303. [PubMed: 5422907]

  • Barton DE, David FN, Herrington M. A criterion for testing contagion in time and space. Annals of Human Genetics. 1965;29:97–103.

  • Besag J, Newell J. The detection of clusters in rare diseases. (Series A).Journal of the Royal Statistical Society. 1991;154(part 1):143–155.

  • Bithell JF. Statistical methods for analysing point-source exposures. In: Elliott P, Cuzick J, English D, Stern R, editors. Geographical and Environmental Epidemiology: Methods for Small-Area Studies. New York: Oxford University Press; 1992.

  • Bithell JF, Stone RA. On statistical methods for analysing the geographical distribution of cancer cases near nuclear installations. Journal of Epidemiology and Community Health. 1989;43:79–85. [PMC free article: PMC1052795] [PubMed: 2592896]

  • Cardis E, Gilbert ES, Carpenter L, Howe G, Kato I, Armstrong BK, et al. Effects of low doses and low dose rates of external ionizing radiation: Cancer mortality among nuclear industry workers in three countries. Radiation Research. 1995;142:117–132. [PubMed: 7724726]

  • Cartwright RA, Alexander FE, McKinney PA, Ricketts TJ. Leukemia and Lymphoma: An Atlas of Distribution Within Areas of England and Wales 1984–1988. London: Leukemia Research Fund; 1990.

  • Centers for Disease Control. Pneumocystis Pneumonia—Los Angeles. Morbidity and Mortality Weekly Report. 1981;30:250–252. [PubMed: 6265753]

  • Chen R, Mantel N, Klingberg MA. A study of three techniques for time-space clustering in Hodgkin's disease. Statistics in Medicine. 1984;3:173–184. [PubMed: 6463454]

  • Cliff AD, Ord JK. Spatial Processes: Models and Applications. London: Pion; 1981.

  • Creech JL Jr., Johnson MN. Angiosarcoma of the liver in the manufacture of polyvinyl chloride. Journal of Occupational Medicine. 1974;16:150–151. [PubMed: 4856325]

  • Cuzick J, Edwards R. Spatial clustering for inhomogeneous populations. (Series B).Journal of the Royal Statistical Society. 1990;52:73–104.

  • David FN, Barton DE. Two space-time interaction tests for epidemicity. British Journal of Social Medicine. 1966;20:44–48.

  • Day R, Ware JH, Wartenberg D, Zelen M. An investigation of a reported cancer cluster in Randolph, Massachusetts. Journal of Clinical Epidemiology. 1988;42:137–150. [PubMed: 2918323]

  • Diggle PJ. A point process modeling approach to raised incidence of a rare phenomenon in the vicinity of a pre-specified point. (Series A).Journal of the Royal Statistical Society. 1991;153:349–362.

  • Ederer F, Myers MH, Mantel N. A statistical problem in space and time: Do leukemia cases come in clusters? Biometrics. 1964;20:626–638.

  • Environmental Protection Agency. Environmental equity: Reducing risk for all communities. In: Wolcott RM, Banks WA, editors. Workgroup Report to the Administrator. Vol. 1. Washington, DC: Environmental Protection Agency; 1992. Report EPA230-R-92-008.

  • Geary RC. The contiguity ratio and statistical mapping. The Incorporated Statistician. 1954;5:115–145.

  • Geschwind SA, Stolwijk JAJ, Bracken M, Fitzgerald E, Stark A, Olsen C, et al. Risk of congenital malformations associated with proximity to hazardous waste sites. American Journal of Epidemiology. 1992;135:1197–1207. [PubMed: 1626538]

  • Greenland S. Modeling and variable selection in epidemiologic analysis. American Journal of Public Health. 1989;79:340–349. [PMC free article: PMC1349563] [PubMed: 2916724]

  • Greenland S. Divergent biases in ecologic and individual-level studies. Statistics in Medicine. 1992;11:1209–1223. [PubMed: 1509221]

  • Grimson RC, Rose RD. A versatile test for clustering and a proximity analysis of neurons. Methods of Information in Medicine. 1991;30:299–303. [PubMed: 1762584]

  • Grimson RC, Wang KC, Johnson PWC. Searching for hierarchical clusters of disease: Spatial patterns of sudden infant death syndrome. Social Science and Medicine. 1981;15D:287–293.

  • Grimson RC, Aldrich TE, Drane JW. Clustering in sparse data and an analysis of rhabdomyosarcoma incidence. Statistics in Medicine. 1992;11:761–768. [PubMed: 1594815]

  • Guthe WG, Tucker RK, Murphy EA, England R, Stevenson E, Luckhardt JC. Reassessment of lead exposure in New Jersey using GIS technology. Environmental Research. 1992;59:318–325. [PubMed: 1464285]

  • Herbst A, Ulfelder H, Poskanzer D. Adenocarcinoma of the vagina. Association with maternal stilbestrol therapy with tumor appearance in young women. New England Journal of Medicine. 1971;284:878–881. [PubMed: 5549830]

  • Hjalmars U, Kulldorff M, Gustafsson G, Nagarwalla N. Childhood leukemia in Sweden: Using GIS and a spatial scan statistic for cluster detection. Statistics in Medicine. 1996;15:707–716. [PubMed: 9132898]

  • Jacquez GM. The map comparison problem: Tests for the overlap of geographic boundaries. Statistics in Medicine. 1995;14:2343. [PubMed: 8711274]

  • Jacquez GM, Waller LA, Grimson R, Wartenberg D. The analysis of disease clusters. Part I. State of the art. Infection Control and Hospital Epidemiology. 1996;17:317–327. [PubMed: 8727621]

  • Klauber MR. Space-time clustering analysis: A prospectus. Philadelphia: Society of Industrial and Applied Mathematics; 1975.

  • Knox G. Detection of low intensity epidemicity: Application to cleft lip and palate. British Journal of Preventive and Social Medicine. 1963;17:121–127. [PMC free article: PMC1058905] [PubMed: 14044846]

  • Knox G. The detection of space-time interaction. Applied Statistics. 1964;13:25–29.

  • Knox G. Epidemiology of childhood leukemia in Northumberland and Durham. British Journal of Preventive and Social Medicine. 1964;18:17–24. [PMC free article: PMC1058931] [PubMed: 14117764]

  • Kulldorff M. A spatial scan statistic. Communications in Statistics: Theory and Methods. 1997;26:1481–1496.

  • Lagakos S, Wessen B, Zelen M. An analysis of contaminated well water and health effects in Woburn, Massachusetts. Journal of the American Statistical Society. 1986;81:583–596.

  • Larsen RJ, Holmes CL, Heath CW. A statistical test for measuring unimodal clustering: A description of the test and of its application to cases of acute leukemia in metropolitan Atlanta, Georgia. Biometrics. 1973;29:301–309. [PubMed: 4513628]

  • Lawson AB, Williams F. Applications of extraction mapping in environmental epidemiology. Statistics in Medicine. 1993;12:1249–1258. [PubMed: 8210824]

  • Lyon JL, Klauber MR, Graff W, Chiu G. Cancer clustering around point sources of pollution: Assessment by case-control methodology. Environmental Research. 1981;25:29–34. [PubMed: 7238466]

  • Mantel N. The detection of disease clustering and a generalized regression approach. Cancer Research. 1967;27:209–220. [PubMed: 6018555]

  • Marshall EG, Gensburg LJ, Geary NS, Deres DA, Cayo MR. Analytic Study to Evaluate Associations Between Hazardous Waste Sites and Birth Defects. Atlanta: Agency for Toxic Substances and Disease Registry; 1995.

  • Marshall RJ. A review of methods for the statistical analysis of spatial patterns of disease. (Series A).Journal of the Royal Statistical Society. 1991;154:421–441.

  • McAuliffe TL, Afifi AA. Comparison of nearest neighbor and other approaches to the detection of space-time clustering. Computational Statistics and Data Analysis. 1984;2:125–142.

  • Moran PAP. The interpretation of statistical maps. (Series B).Journal of the Royal Statistical Society. 1948;10:243–251.

  • Moran PAP. Notes on continuous stochastic phenomena. Biometrika. 1950;37:17–23. [PubMed: 15420245]

  • Naus JI. The distribution of the size of the maximum cluster of points on a line. Journal of the American Statistical Association. 1965;60:532–538.

  • Naus JI. A power comparison of two tests of non-random clustering. Technometrics. 1966;8:493–517.

  • Neutra R. Counterpoint from a cluster buster. American Journal of Epidemiology. 1990;132:1–8. [PubMed: 2356803]

  • Oden NL. Adjusting Moran's I for population density. Statistics in Medicine. 1995;14:17–26. [PubMed: 7701154]

  • Oden N, Jacquez GM, Grimson R. Realistic power simulations compare point- and area-based disease cluster tests. Statistics in Medicine. 1996;15:783–806. [PubMed: 9132905]

  • Ohno Y, Aoki K, Aoki N. A test of significance for geographic clusters of disease. International Journal of Epidemiology. 1979;8:273–281. [PubMed: 536098]

  • Openshaw S, Craft AW, Charlton M, Birch JM. Investigation of leukaemia clusters by use of a geographical analysis machine. Lancet. 1988;1:272–273. [PubMed: 2893085]

  • Pike MC, Smith PG. Disease clustering: A generalization of Knox's approach to the detection of space-time interactions. Biometrics. 1968;24:541–556. [PubMed: 5686305]

  • Pike MC, Smith PG. A case-control approach to examine diseases for evidence of contagion, including diseases with long latent periods. Biometrics. 1974;30:263–279. [PubMed: 4833139]

  • Pinkel D, Nefzger D. Some epidemiologic features of childhood leukemia in the Buffalo, New York area. Cancer. 1959;12:351–357. [PubMed: 13638954]

  • Pinkel D, Dowd JE, Bross IDJ. Some epidemiological features of malignant solid tumors of children in Buffalo, N. Y. area. Cancer. 1963;16:28–33.

  • Raubertas RF. Spatial and temporal analysis of disease occurrence for detection of clustering. Biometrics. 1988;44:1121–1129. [PubMed: 3069139]

  • Richardson S, Stucker I, Hemon D. Comparison of relative risks obtained in ecological and individual studies: Some methodological considerations. International Journal of Epidemiology. 1987;16:111–120. [PubMed: 3570609]

  • Rothman KJ. A sobering start for the Cluster Buster's Conference. American Journal of Epidemiology. 1990;132(Suppl.):6–13. [PubMed: 2356837]

  • Rothman KJ. No adjustments are needed for multiple comparisons. Epidemiology. 1990;1:43–46. [PubMed: 2081237]

  • Rothwell CJ, Hamilton CB, Leaverton PE. Identification of sentinel health events as indicators of environmental contamination. Environmental Health Perspectives. 1991;94:261–263. [PMC free article: PMC1567956] [PubMed: 1683284]

  • Rushton G, Krishnamurti D, Krishnamurty R, Song H. A geographic information system analysis of urban infant mortality rates. Geographical Information Systems. 1995;5:52–56.

  • Rushton G, Krishnamurty R, Krishnamurti D, Lolonis P, Song H. The spatial relationship between infant mortality and birth defect rates in a U.S. city. Statistics in Medicine. 1996;15:1907–1919. [PubMed: 8888483]

  • Sahu SK, Bendel RB, Sison CP. Effect of relative risk and cluster configuration on the power of the one-dimensional scan statistic. Statistics in Medicine. 1993;12:1853–1865. [PubMed: 8272666]

  • Savitz DA, Olshan AF. Multiple comparisons and related issues in the interpretation of epidemiologic data. American Journal of Epidemiology. 1995;142:904–908. [PubMed: 7572970]

  • Schulman J, Selvin S, Merrill DW. Density equalized map projections: A method for analysing clustering around a fixed point. Statistics in Medicine. 1988;7:491–505. [PubMed: 3368676]

  • Sexton K, Olden K, Johnson BL. Environmental justice: The central role of research in establishing a credible scientific foundation for informed decisionmaking. Toxicology and Industrial Health. 1993;9:685–727. [PubMed: 8184441]

  • Shaw GM, Selvin S, Swan SH, Merrill DW, Schulman J. An examination of three spatial disease clustering methodologies. International Journal of Epidemiology. 1988;17:913–919. [PubMed: 3225103]

  • Snow J. Snow on Cholera. New York: Hafner; 1965.

  • Stone RA. Investigations of excess environmental risks around putative sources: Statistical problems and a proposed test. Statistics in Medicine. 1988;7:649–660. [PubMed: 3406597]

  • Susser M. The logic in ecological studies: I. The logic of analysis. American Journal of Public Health. 1994;84:825–829. [PMC free article: PMC1615050] [PubMed: 8179056]

  • Susser M. The logic in ecological studies: II. The logic of design. American Journal of Public Health. 1994;84:830–835. [PMC free article: PMC1615022] [PubMed: 8179057]

  • Symons MJ, Grimson RC, Yuan YC. Clustering of rare events. Biometrics. 1983;39:193–205. [PubMed: 6871348]

  • Tango T. The detection of disease clusters in time. Biometrics. 1984;40:15–26. [PubMed: 6733224]

  • Thomas D, Siemiatycki J, Dewar R, Robins J, Goldberg M, Armstrong B. The problem of multiple inference in studies designed to generate hypotheses. American Journal of Epidemiology. 1985;122:1080–1095. [PubMed: 4061442]

  • Turnbull BW, Iwano EJ, Burnett WJ, Howe HL, Clark LC. Monitoring for clusters of disease: Application to leukemia incidence in upstate New York. American Journal of Epidemiology. 1990;132:S14–S22. [PubMed: 2356825]

  • Wallenstein S. A test for detection of clustering over time. American Journal of Epidemiology. 1980;104:576–584. [PubMed: 7361759]

  • Wallenstein S, Naus J, Glaz J. Power of the scan statistic for detection of clustering. Statistics in Medicine. 1993;12:1829–1943. [PubMed: 8272664]

  • Waller LA, Jacquez GM. Disease models implicit in statistical tests of disease clustering. Epidemiology. 1995;6:584–590. [PubMed: 8589088]

  • Waller LA, Lawson AB. The power of focused tests to detect disease clustering. Statistics in Medicine. 1995;14:2291–2308. [PubMed: 8711270]

  • Waller LA, Turnbull BW, Clark LC, Nasca P. Spatial pattern analyses to detect rare disease clusters. In: Lange N, Ryan L, editors. Case Studies in Biometry. New York: Wiley; 1994.

  • Walter SD. The analysis of regional patterns in health data. I. Distributional considerations. American Journal of Epidemiology. 1992;136:730–741. [PubMed: 1442739]

  • Walter SD. The analysis of regional patterns in health data. II. The power to detect environmental effects. American Journal of Epidemiology. 1992;136:742–759. [PubMed: 1442740]

  • Wartenberg D. Screening for lead exposure using a geographic information system. Environmental Research. 1992;59:310–317. [PubMed: 1464284]

  • Wartenberg D. When Is a Cluster Really a Cluster? The Competing Agendas of Science, Society and Social Programs. Environmental Epidemiology: Science for Society or Science in Society? Hamilton, Ontario, Canada: McMaster University; 1994.

  • Wartenberg D. Use of geographic information systems for risk screening and epidemiology. Andrews JS, Frumkin H Jr., Johnson BL, Mehlman MA, Xintaras C, Bucsela JA, editors. Princeton, NJ: Princeton Scientific Publishing Co.; 1994.

  • Wartenberg D, Greenberg M. Methodological problems in investigating disease clusters. The Science of the Total Environment. 1992;127:173–185. [PubMed: 1480954]

  • Wartenberg D, Greenberg M. Spatial models for detecting clusters of disease. In: Thomas R, editor. Spatial Epidemiology. London: Pion; 1990.

  • Wartenberg D, Greenberg M. Solving the cluster puzzle: Clues to follow and pitfalls to avoid. Statistics in Medicine. 1993;12:1763–1770. [PubMed: 8272659]

  • Wartenberg D, Greenberg M, Lathrop R. Identification and characterization of populations living near high voltage transmission lines: A pilot study. Environmental Health Perspectives. 1993;101:626–631. [PMC free article: PMC1519884] [PubMed: 8143596]

  • Wartenberg D, Kipen HM, Scully PF, Greenberg M. Racial oversight in occupational cancer epidemiology: A review of published studies. In: Johnson BL, Williams RC, Harris CM, editors. National Minority Health Conference: Focus on Environmental Contamination. Princeton, NJ: Princeton Scientific Publishing Co., Inc.; 1992. pp. 137–147.

  • Weinstock MA. A generalized scan statistic test for the detection of clusters. International Journal of Epidemiology. 1981;10:289–293. [PubMed: 7287289]

  • Whittemore AS, Friend N, Brown BW, Holly EA. A test to detect clusters of disease. Biometrika. 1987;74:631–635.

  • Whorton D, Krauss R, Marshal S, Milby T. Infertility in male pesticide workers. Lancet. 1977;2:1259–1261. [PubMed: 73955]

  • Williams GW. Time-space clustering of disease. In: Cornell RG, editor. Statistical Methods for Cancer Studies. New York: Dekker; 1984.

  • Zimmerman R. Social equity and environmental risk. Risk Analysis. 1993;13:649–666.