![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright © 2007 Wheeler; licensee BioMed Central Ltd. A comparison of spatial clustering and cluster detection techniques for childhood leukemia incidence in Ohio, 1996 – 2003 1Department of Biostatistics, Emory University, Atlanta, GA, USA Corresponding author.David C Wheeler: dcwheel/at/sph.emory.edu Received January 16, 2007; Accepted March 27, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate hypotheses about cancer etiology. There are many cluster detection methods used in spatial epidemiology to investigate suspicious groupings of cancer occurrences in regional count data and case-control data, where controls are sampled from the at-risk population. Numerous studies in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases and substantial public concern over elevated leukemia incidence. The main focus of this paper is an analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996–2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS). Specifically, we explore whether there is statistically significant global clustering and if there are statistically significant local clusters of individual leukemia cases in Ohio using numerous published methods of spatial cluster detection, including spatial point process summary methods, a nearest neighbor method, and a local rate scanning method. We use the K function, Cuzick and Edward's method, and the kernel intensity function to test for significant global clustering and the kernel intensity function and Kulldorff's spatial scan statistic in SaTScan to test for significant local clusters. Results We found some evidence, although inconclusive, of significant local clusters in childhood leukemia in Ohio, but no significant overall clustering. The findings from the local cluster detection analyses are not consistent for the different cluster detection techniques, where the spatial scan method in SaTScan does not find statistically significant local clusters, while the kernel intensity function method suggests statistically significant clusters in areas of central, southern, and eastern Ohio. The findings are consistent for the different tests of global clustering, where no significant clustering is demonstrated with any of the techniques when all age cases are considered together. Conclusion This comparative study for childhood leukemia clustering and clusters in Ohio revealed several research issues in practical spatial cluster detection. Among them, flexibility in cluster shape detection should be an issue for consideration. Background Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate subsequent hypotheses about cancer etiology. A spatial disease cluster may be defined as an area with an unusually elevated disease incidence rate [1,2]. There are several cluster detection methods used in spatial epidemiology to investigate apparently suspicious groupings of cancer occurrences in both regional count data and case-control data, where the controls are often sampled from the at-risk population and are used to estimate local relative risk or local rates, depending on the method utilized. Numerous studies [3,4] in the literature have focused on childhood leukemia because of its relatively large incidence among children compared with other malignant diseases, its apparent tendency to cluster, and the substantial public concern over locally elevated leukemia incidence. Many cluster-inducing factors have been considered in the literature on leukemia, including infectious agents [5] and population mixing[6,7], environmental pollution [8], such as benzene [9], pesticides [10], and radiation [11], and geographic variation in other risk factors, such as inherited genetic risk [12], maternal alcohol consumption and cigarette smoking [13], and socioeconomic status [14]. There are many studies of potential cancer clusters in the literature, and the reader is referred to two useful reviews [15,16]. In this paper, we present an empirical analysis of the spatial distribution of leukemia incidence among children from 0 to 14 years of age in Ohio from 1996–2003 using individual case data from the Ohio Cancer Incidence Surveillance System (OCISS) in response to public concern of potentially elevated cancer risk among children in areas of Ohio. There has been no previous comprehensive and systematic spatial analysis of potential clustering of childhood leukemia in Ohio. Other studies [7,17] of potential clusters of childhood leukemia in Ohio do not include spatial analysis methods or individual case data, and instead typically use chi-square tests of differences in expected and observed case counts in census or political units. This approach is not expressly a test for clustering or clusters, but a test of elevated counts inside an often heterogeneously populated area, for example, a county, and the test for one area is considered independently of other areas. This approach does not consider if areas with significantly more cases than expected are spatially juxtaposed [18,19]. We choose not to use aggregated case data at the census level because we have access to individual case and control data, want to avoid unstable regional rates caused by small observed case counts and small population counts [20,21], and want to avoid the modifiable areal unit problem (MAUP) [19] arising from using political boundaries that are arbitrarily related to public health. More specifically, we explore whether there is or is not statistically significant global clustering and local clusters of individual leukemia cases using numerous published methods of spatial cluster detection. We, therefore, address the questions of whether childhood leukemia cases have a significant tendency to cluster in Ohio and where the most unusual groupings of cases, if any, are located. The evaluation of the null hypothesis of no significant global spatial clustering of childhood leukemia uses three different methods: the K function, the kernel intensity function, and Cuzick and Edwards' method. See Waller and Jacquez [22] for a discussion of hypotheses in tests for disease clustering. We evaluate the null hypothesis of no local areas of elevated childhood leukemia risk using the kernel intensity function and Kulldorff's scan statistic. The distinction between clustering and cluster detection tests has been made in the literature [1,19,23-25], and we follow that distinction in this paper. Clustering and cluster detection tests are viewed as complimentary, as they test different hypotheses. A simulation study by Waller et al. [1] indicated that it is possible to have a significant cluster, but no overall significant clustering. In spatial point processes, the first-order property (intensity function) of the process is used for a test of clusters and the second-order property (K function) is used as a test for global clustering [19]. Our comparison of cluster detection methods is similar in spirit to Griffith's comparison of disease mapping techniques for West Nile Virus [26], and is motivated by the numerous and diverse analytical options currently available to cancer prevention researchers investigating potential clusters with case-control data. There have been methodological comparison papers in the literature for spatial cluster detection [27-31], but none exclusively for individual level data. Our selection set of methods to compare in this paper includes the leading published methods designed for individual level case data that are currently implemented in publicly available software. We use R software [32] to implement the K function and kernel intensity function, ClusterSeer software [33] for Cuzick and Edwards' method, and SaTScan [34] for Kulldorff's scan statistic. The reader interested in a comparison of general functionality of free software that may be used for cluster analysis is referred to a review by Anselin [35], although not all features compared in the review are expressly for individual case data. We next briefly review each of the clustering and cluster detection techniques and then present and compare the findings from them. Methods Data In the subsequent analysis, we use 738 individual OCISS cases diagnosed between 1996–2003, geocoded to the street level using geographic information system (GIS) software from ESRI [36]. The use of the cancer data in this study was approved by the Ohio Department of Health Institutional Review Board. The childhood (0–14) leukemia rate for Ohio between years 1996–2003 was 4.2 per 100,000 persons, compared to the SEER rate of 4.8 per 100,000 persons [37]. The completeness of incidence data in OCISS varies by year, for example, the percent of completeness was 85% in 1996, 92% in 1998, and 95% in 1999 [38]. We excluded cases from the analysis that were not address matched to the street level and were matched only to the ZIP Code centroid level. There were 86 cases that were matched to the centroid level and omitted to avoid inducing spurious clustering. A map of these cases showed an essentially random pattern across Ohio, neither occurring in exclusively urban or rural areas, and the lack of pattern or concentration in the cases helped to justify removing them from the study. As stated earlier, this paper focuses on a spatial case-control study, which requires controls sampled from the at-risk population for leukemia that did not develop leukemia during the same time period of births as the reported cases. We used as controls births sampled from the Ohio Vital Statistics (OVS) records where there were digital files available, from 1989–2003, which contains most of the possible birth years of cases (1982–2003). More specifically, we began with 21,906 randomly sampled birth records from OVS that were geocoded to the street level and then systematically sampled 7,302 records as controls, selecting every third record where the birth records were ordered by longitude and latitude. Presumably, any rural bias in the failure to locate addresses in the geocoding process would affect both cases and controls, so any impact in the analysis presented here is likely slight. The systematic sampling scheme was employed to provide a geographically representative sample of the at-risk population and resulted in a control-case ratio of approximately 10 to 1. Visual comparison of the controls and the larger set of birth records suggested the controls were a spatially representative sample. The control-case ratio used was a compromise between using as many controls as possible and computation considerations for certain methods. The idea of using as many controls as possible draws from Peter Diggle's comments in his written discussion of Cuzick and Edwards' paper [39] introducing their nearest neighbor test for clustering. In fact, in a preliminary analysis with the Cuzick and Edwards method we used a control-case ratio of 3 to 1 to align with traditional case-control studies in epidemiology, but found significant clustering at small distances that appeared to be due to a lack of an adequate number of controls in some rural areas. A visual display of the controls using this ratio suggested that controls underrepresented the at-risk population in some rural areas. The ideal number of controls to use relative to the number of observed cases and the underlying population structure is an important issue left for future research. A map of the sampled controls from a 10 to 1 ratio of controls to cases shows a pattern that appears to better approximate the general distribution of population in Ohio. Figure Figure11
Results K function The K function is a method introduced by Ripley [41] for testing for general clustering in a point pattern. It measures how many events occur within a certain distance of other events. A simple formula for the K function is K(h) = (average number of events within distance h of a randomly chosen event)/(average number of events per unit area). Also see Diggle [42] and Waller and Gotway [19] for a detailed discussion of the K function. The K function uses a vector of distances h to calculate the function many times at a range of distances in the study area. One can calculate a transformation,
Fortunately, when using the K function, one can calculate a difference of K functions for cases and controls to detect differences in patterns in the two point processes. The simple formula for doing so is KD(h) = Kcases(h) - Kcontrols(h). For this difference in cases and controls, one can calculate confidence bands using Monte Carlo randomization to evaluate significance of any differences in patterning. To do so, one first conditions on the locations of cases and controls, randomizes the case labels among the locations, and then calculates the test statistic KD(h) at a range of distances. This procedure is performed a set number of times and the test statistic from the original data is compared to the upper 97.5% limit of the test statistic values from the Monte Carlo randomizations to assess significance. Figure Figure33
Kernel intensity function While the K function is designed to test for clustering, the kernel intensity function introduced by Kelsall and Diggle [45] can be used to test for clustering and the presence and location of local clusters. In fact, it is the only test in this comparison that can explicitly evaluate both conditions. The kernel intensity function calculates the number of events expected in an area at location s (intensity) or the probability of an event occurring at location s (density) using a kernel function. The intensity and density functions are proportional and are often used interchangeably in practice [19]. The kernel function requires a bandwidth that determines the size of the kernel and the overall smoothness of the resulting estimate. In a Gaussian kernel, which we make use of in this study, the bandwidth corresponds to the standard deviation and larger bandwidths result in smoother kernel intensity functions. We use Scott's [46] rule for optimal bandwidth selection in a Gaussian kernel, where Scott's rule considers the number of events and spatial variance of events in a point pattern when calculating the bandwidth. The two-dimensional Gaussian kernel we use has a bandwidth in both the u and v directions, where the map coordinates are in the form of (u, v). Applying Scott's rule to the Ohio data results in bandwidths of 34,627 meters in the u direction and 30,882 meters in the v direction for cases and bandwidths of 23,753 meters in the u direction and meters units in the v direction for controls. The kernel function uses distance between a location s and all other points as input to calculate an intensity function at s. We evaluate the kernel function at each point on a 40 × 40 grid that completely contains the study area, where the distance between adjacent grid points is approximately 11,619 meters. Figure Figure44
Conveniently, one can calculate a log ratio of kernel intensity functions for cases and controls to get a log relative risk at a location on the grid. When considering all grid points that cover the study area, this yields a log relative risk surface. To calculate this log relative risk surface, we first redefine the kernel bandwidth with the kernel intensity function ratio because it is beneficial to have the same kernel bandwidth in both cases and controls in order to have an equal spatial extent covered in the numerator and denominator of the ratio. We initially choose for a kernel bandwidth in both dimensions the mean of the control optimal bandwidths calculated previously, which is 22,647 distance units. We favor the controls in this bandwidth selection because there are many more of them than cases and they should in theory reflect the underlying population distribution. This bandwidth yields a smaller kernel than with the cases, and will reveal more detail in the estimated kernel intensity function but will also be more variable. With the kernel intensity function ratio, one can again use Monte Carlo randomization of the case labels to detect significant local differences in case and control intensities. Figure Figure55
Cuzick and Edwards' method Similar to the K function, Cuzick and Edwards' method [39] tests for clustering in a point pattern. It measures the tendency of a point process to cluster at certain specified numbers of nearest neighbors and asks if there are more cases then expected under random labeling in the k locations nearest each case. Cuzick and Edwards' method counts the number of cases within k nearest case and control neighbors of each case and sums these counts to make one test statistic T(k) for each k. In practice, this method requires specification of the k nearest neighbors in advance, and, typically, one would specify a range of k nearest neighbors to use. In this case, there is an adjustment of the overall p-value, using both the Bonferroni and Simes adjustments, to reflect the multiple nearest neighbor tests. The Bonferroni adjustment is pB = n·min[pi] and the Simes adjustment is pS = min[(n-i+1)·pi], where n is the number of tests, pi is the p-value for the ith test, and i is the test index, which is sequential from lowest to highest p-value for the Simes adjustment [33]. ClusterSeer software uses Monte Carlo randomization of the case labels among the given locations and also a normal approximation to evaluate significance of each nearest neighbor test statistic [33]. Some of the results of applying Cuzick and Edwards' method to all leukemia cases in the dataset are listed in Table 1. We specified 10 nearest neighbor tests, using k from 1 to 10 and used 4999 Monte Carlo randomizations to evaluate the overall p-value. As listed in the table, neither the normal approximation nor the Monte Carlo p-values indicate significant tests for the ten levels of nearest neighbors. The tests for k = 6 and k = 7 are somewhat close to significant (at the 0.05 level) with the Monte Carlo randomization assessment and k = 7 and k = 8 are also close to significant with the normal approximation. The overall Bonferroni and Simes p-values for the normal approximation are 0.73 and 0.22, respectively, where the Bonferroni is overly conservative and Simes is less conservative. The overall Bonferroni and Simes p-values for the Monte Carlo randomizations are 0.70 and 0.17, respectively. These values indicate that there is no clustering of cases among nearest neighbors in all of the leukemia cases. Figure Figure88
We next applied the Cuzick and Edwards method to subsets of the case data, using three sets for ages 0–4, 5–9, and 10–14 and one for ALL type cases. In the interest of space, we report only the summary of each subset analysis. There was no overall significant clustering or significant clustering at any level of k for cases age 0–4. There was significant clustering for cases age 5–9 with k = 7 (p-value = 0.04), but no overall significant clustering. There was no overall significant clustering or significant clustering for cases age 10–14. There was significant clustering for cases of type ALL with all ages with k = 6 (p-value = 0.048), but no overall significant clustering. The results suggest some clustering at six or seven nearest neighbors, depending on the subset of cases, but no overall clustering, regardless of the set of cases. The relevance of nearest neighborhood structures of size six or seven for some leukemia cases is unknown at this point in time, but could be a subject of future inquiry with a credible hypothesis. However, there may not be a factor that can be quantified to explain the significance of this apparent structure. SaTScan Kulldorff's scan statistic [47] as implemented in SaTScan software is explicitly a test for clusters, as noted in [1,33,34,48]. For case-control data, it calculates local rates inside scanning circles of various sizes using the Bernoulli model, where cases are designated as ones and controls are designated by zeros. SaTScan places circles at each case and control, ranging in radius from the smallest inter-event distance to typically the distance that contains half the population in the study area, and calculates a likelihood ratio test of each potential cluster, where the likelihood ratio test compares the alternative hypothesis that there is an increased risk of disease inside the circle with the null hypothesis that the disease risk is the same inside and outside the circle. The circle with the maximum likelihood is the most likely cluster. SaTScan calculates the p-value of the most likely cluster using the likelihood ratio test and repeated Monte Carlo randomizations of the case labels. The rank of the most likely likelihood ratio test among all randomization tests determines the p-value. As output, SaTScan reports the most likely cluster and secondary clusters, along with the corresponding significance values. The scan statistic in SaTScan has been applied to Poisson distributed count data [1,49], in addition to Bernoulli case-control data [19]. We applied Kulldorff's scan statistic in SaTScan to all of the cases and then the same four case subsets described in the Cuzick and Edwards' method section. The most likely cluster found by SaTScan using all of the cases is displayed in Figure Figure9.9
Typically, when public health professionals investigate a potential cluster, they use a much smaller study area than a state, perhaps using the spatial extent of a county or area surrounding a town. To better mimic this type of investigation, and to evaluate the sensitivity of the spatial scan statistic's test for significance to the size of the study area, we next report results from a cluster detection analysis in a spatial subset of the study area. We selected a contiguous set of five counties, Union, Franklin, Delaware, Madison, Champaign, which contained the most likely SaTScan cluster for cases age 0–14. In practice, a public health analyst would not refine the study area around a previously detected cluster. The most likely cluster found by SaTScan with this subset of data is the same 43 cases in the most likely cluster with all of the Ohio data, but the p-value is now 0.71, instead of the value of 0.81 found with the complete dataset. The highlighted subset of counties and most likely cluster are visualized in Figure Figure9.9 Discussion The three methods used to detect global clustering, the K function, the kernel intensity function ratio summary, and Cuzick and Edwards' method, all found no statistically significant clustering of childhood (age 0–14) leukemia in Ohio from 1996–2003. Cuzick and Edwards' method also found no significant clustering in three separate age groups of cases and ALL type cases. These findings are not entirely surprising given the large and diverse study area of Ohio, in which it is doubtful that one particular risk factor would have a consistent or sustained effect across space that would result in clustering demonstrated at the state scale. It is more likely that factors which could explain clustering of cases would have local or regional influence, and one factor could be associated with clustering in one area while another factor could be related to clustering in a different area. Given the scale of the study area in this analysis, the search for local cancer clusters is the more useful investigation, and also the one with more public interest. In investigation of potential clusters, there were inconsistent findings from the two methods used to detect clusters. The kernel intensity function ratio suggested some significant local clusters in cases age 0–14 in portions of central and eastern Ohio, while the spatial scan statistic in SaTScan found no significant clusters. SaTScan also found no significant clusters for three different age groups and ALL type cases. Some reassurance comes from the fact that some of the most likely SaTScan clusters are in the same areas as the significant elevated log relative risk areas from the kernel intensity function ratios. Still, the cancer cluster investigator is left to wonder which results are more trustworthy in this circumstance. Unfortunately, without a well-designed simulation study that reflects the current study situation and where the true clusters are known, one cannot definitively reach a conclusion on this matter. A simulation study that tests for different types of clusters is left for future research. One practical reason to favor the kernel intensity function method is that it tests for local clusters and explicitly uses a summary measure of the local results to test for global clustering; it is unique in this regard. Another advantage of the kernel intensity function method is that it provides the log relative risk surface over the entire study area, so one can visualize the local peaks and valleys in the risk of disease. In addition, the kernel is more flexible in its shape than SaTScan's circular scanning window. There have been advancements in the literature, however, with scan statistics designed to detect elliptical clusters [51] as well as more flexibly shaped clusters [52]. An arbitrary shaped non-scanning method based on minimum spanning trees has also been recently introduced [53]. A disadvantage with the kernel intensity ratio is that one must select the bandwidth in advance of calculating the log relative risk, and results can certainly vary depending on the selected bandwidth. One possibility to overcome this is may be to use a Bayesian framework for kernel intensity estimation [54], where the kernel bandwidth would be estimated from the data while simultaneously calculating the log relative risks. Numerous practical issues with spatial case-control cluster detection were encountered in this study. First, the selection of controls is crucial in these case-control spatial clustering studies. We found a traditional epidemiology ratio of 3 to 1 to be inadequate with our systematic sampling scheme, and believe that would be true with a purely random sampling scheme as well. We tentatively recommend using as many controls as possible taking into consideration the cost in acquiring them and in computing, as some methods such as the K function and SaTScan can take substantial run time with a large number of points in the study. More research is needed to determine, if possible, an optimal number of controls and sampling scheme. In this study, we also realized the importance of avoiding unnecessary spatial error when possible, in terms of geocoding and map units. Of course, there is inherent locational uncertainty in these data [55]. Invariably, in the address matching process of individual records there will be observations for which an exact address match is not possible. These records can be geocoded to census boundary or ZIP Code centroids or omitted from the study, where the decision on the handling of these records could depend on the study area scale. For a large study area, using census tract or ZIP Code centroids matches may be deemed acceptable in searching for an approximate cluster location, where county centroids may be viewed as providing spatial locations that are too inaccurate. We omitted centroid-matched points after checking visually that they were not spatially influential, i.e. occurring in one area only or exclusively in rural areas, to avoid inducing artificial clustering in cases or controls. We also used UTM map coordinates to prevent adding spatial error to our Euclidean distance calculations. An alternative would be to use great circle distance calculation for records in latitude and longitude coordinates. Conclusion This comparative study for childhood leukemia clustering and clusters in Ohio is the first one with individual level case and control data. The study produced results that lead to different conclusions based on the method utilized regarding the significance of clusters and also revealed several open research issues in practical spatial cluster detection. In summary, we found some evidence, although inconclusive, of significant local clusters in childhood (age 0–14) leukemia in Ohio during years 1996–2003, but no significant overall clustering when considering all case ages simultaneously. The spatial scan statistic in SaTScan found no significant clusters, while the kernel intensity function ratio found clusters, some of irregular shape, in areas of central, southern, and eastern Ohio. It should be pointed out that different methods used to test for clustering look for different types of clusters, and one method may not find a cluster while another method does, and both may be correct depending on the underlying true cluster. Consideration of the potential shape of clusters in the study area appears to be an important issue. In considering future work with these data, a subsequent study should test for spatial clusters in ALL type cases by age groups based on the finding of Dockerty and his coauthors [3] of significant clustering using Cuzick and Edwards' method in age subgroups of ALL cases, but not in ALL cases age 0–14. Additional future work could systematically investigate the sensitivity of the results from the methods selected to the ratio of controls to cases, to different sizes of the study area, and to different control sampling schemes, such as simple random, stratified, or probability proportional to size cluster sampling. A potentially interesting and relevant future comparison would be between the results presented here to those from methods for regional count data at the county level. There is additional effort involved in spatial case-control cluster studies compared to regional count cluster studies, and it would be worthwhile to see if the additional data needs and computational cost result in substantially increased power to detect clusters. Competing interests The author(s) declare that they have no competing interests. Authors' contributions DW designed the study, performed the analysis, and drafted the manuscript. Acknowledgements Cancer incidence data used in this study were obtained from the Ohio Cancer Incidence Surveillance System, Ohio Department of Health (ODH), a registry participating in the National Program of Cancer Registries of the Centers for Disease Control and Prevention (CD). Use of these data does not imply ODH or CDC either agrees or disagrees with any presentations, analyses, interpretations or conclusions. Information about the OCISS can be obtained at [56]. The author thanks Holly Engelhardt and Robert Indian of the Ohio Department of Health for providing case data and John Paulson from the Ohio Vital Statistics Department for providing control data. The author acknowledges assistance from James Fisher and Mario Davidson of the Arthur G. James Cancer Hospital at The Ohio State University with data processing of the cases. The author also thanks Lance Waller for sharing R code for the K function and kernel intensity estimation and for helpful comments on an earlier draft that lead to improvement of this paper. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Stat Med. 2006 Mar 15; 25(5):853-65.
[Stat Med. 2006]Am J Epidemiol. 1990 Jul; 132(1 Suppl):S6-13.
[Am J Epidemiol. 1990]J Epidemiol Community Health. 1999 Mar; 53(3):154-8.
[J Epidemiol Community Health. 1999]Eur J Cancer. 1993; 29A(10):1424-43.
[Eur J Cancer. 1993]Am J Epidemiol. 2004 Apr 1; 159(7):716; author reply 717.
[Am J Epidemiol. 2004]Epidemiology. 1995 Nov; 6(6):584-90.
[Epidemiology. 1995]Stat Med. 2006 Mar 15; 25(5):853-65.
[Stat Med. 2006]Stat Med. 2006 Mar 15; 25(5):897-916.
[Stat Med. 2006]Int J Health Geogr. 2005 Aug 2; 4():18.
[Int J Health Geogr. 2005]Am J Prev Med. 2006 Feb; 30(2 Suppl):S37-49.
[Am J Prev Med. 2006]Stat Med. 1999 Mar 15; 18(5):497-525.
[Stat Med. 1999]Stat Med. 1995 Nov 15-30; 14(21-22):2335-42.
[Stat Med. 1995]Stat Med. 2006 Mar 15; 25(5):853-65.
[Stat Med. 2006]Int J Epidemiol. 2002 Apr; 31(2):495-6.
[Int J Epidemiol. 2002]Int J Epidemiol. 2002 Apr; 31(2):490-5.
[Int J Epidemiol. 2002]Int J Health Geogr. 2003 Feb 17; 2(1):3.
[Int J Health Geogr. 2003]Stat Med. 2006 Nov 30; 25(22):3929-43.
[Stat Med. 2006]Int J Health Geogr. 2005 May 18; 4():11.
[Int J Health Geogr. 2005]Stat Med. 2006 Mar 15; 25(5):723-42.
[Stat Med. 2006]Int J Health Geogr. 2004 Oct 12; 3(1):22.
[Int J Health Geogr. 2004]J Epidemiol Community Health. 1999 Mar; 53(3):154-8.
[J Epidemiol Community Health. 1999]