![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Space–time visualization and analysis in the Cancer Atlas Viewer BioMedware, Inc. Ann Arbor, MI 48104, USA 1Author for correspondence: Email: Dunrie/at/biomedware.com, voice 734.913.1098, fax 734.913.2201 See other articles in PMC that cite the published article.Abstract This article describes the Cancer Atlas Viewer: free, downloadable software for the exploration of United States cancer mortality data. We demonstrate the software by exploring spatio-temporal patterns in colon cancer mortality rates for African-American and white females and males in the southeastern United States over the period 1970-1995. We compare the results of two cluster statistics: the local Moran and the local G*, through time.. Overall, the two statistics reach similar conclusions for most locations, although where they disagree reveals some interesting patterns in the data. There are only two persistent clusters of colon cancer mortality, and these are clusters of low values. Keywords: spatio-temporal, temporal GIS, clustering, animation, cancer 1 Introduction This article describes the Cancer Atlas Viewer: free, downloadable software for the exploration of data in the National Cancer Institute’s Atlas of Cancer Mortality in the United States 1950-1994 (Devesa et al. 1999). This software helps users avoid the bandwidth constraints and delays inherent in web-based GIS, goes beyond current GIS technology to enable true spatio-temporal visualization, and provides cluster analysis statistics. For chronic diseases such as cancer, which have long latency and can display significant spatial pattern, atlases of health data are an important resource. Atlases allow researchers and the public alike to evaluate hypotheses about geographic variation, such as clustering, and to formulate new hypotheses (Jacquez 1998, Moore and Carpenter 1999, Rushton et al. 2000, Jacquez and Greiling 2003). The identification of spatial pattern in mortality has stimulated research to elucidate causative relationships such as the association between snuff dipping and oral cancer (Winn, Blot et al. 1981); the association between shipyard asbestos exposure and lung cancer (Blot, Morris et al. 1980) and others. Mortality atlases are available in print form, such as the Atlas of United States Mortality (Pickle et al. 1996) and the Atlas of Cancer Mortality of the United States 1950-94 (Devesa et al. 1999) and in web format (see Table 1). Web atlases are increasingly available at the state and national levels as the technology for online mapping has matured. Both print and web atlases provide a considerable amount of data and statistics in an easy to understand visual format, but online atlases offer a level of interactivity not available in printed books, as the user can change the colors, zoom and pan, click through to data tables, and customize the maps to address a question or purpose not envisioned by the print map’s creators.
While online atlases provide greater flexibility and customization than print atlases, their use is hindered by performance limitations that result from Internet communication between the user’s computer and the mapping engine. For example, if someone is using the Cancer Mortality Maps and Graphs website and wants to change the cancer site displayed in the map, the time period, or the color scheme, the user must send a request for the change to the map server, which then sends a new image of the map to the user’s computer (Figure 1
An alternative to web mapping is to download a local version of the data for mapping and interaction on a desktop computer (similar to Figure 2
In this article, we demonstrate the STIS approach by exploring colon cancer patterns for African-American and white females and males using NCI data. Among cancers, the highest mortality for men is from lung, prostate, and colon cancers respectively; for women it is lung, breast, and colon cancers, all of which demonstrate spatial pattern (Devesa et al. 1999). One challenge in exploring patterns for multiple groups is that there are low populations of African-Americans in rural areas of the midwest and western states. Because of low population numbers, the counts used to calculate the mortality rates are based on small samples and are therefore unstable—subject to fluctuations that may be due to chance. The NCI print and online atlas masks data based on few counts (< 6 deaths in the 5 year time period). We focus on the southeastern United States and Gulf Coast, including part of eastern Texas, Mississippi, Louisiana, Alabama, Georgia, Florida, South Carolina, and North Carolina. This region has high enough populations of African-Americans to avoid most rural areas becoming masked out, as geographies with a lot of missing (masked) data are unsuitable for spatial analysis. The southeastern US has been identified as a region of persistently high mortality (Cossman et al. 2003), though it is not the highest mortality region for colon cancer in the U.S. For colon cancer mortality rates, the southeast is exceeded by the northeastern states (Devesa et al. 1999). Specifically, we assess the spatial pattern of mortality from colon cancer in the Southeast, using descriptive data visualization of the cancer data and cluster analysis, Moran’s I, local Moran (Ii), and local G* analyses, for state economic areas (SEAs). We assess the changes in spatial pattern by examining trends in Ii statistics and the persistence of clusters over time as well as the concordance of the two cluster detection methods (Ii and G*). While the Ii and G* statistics are related, we expected that their differences in form could lead to different findings, specifically that the G* statistic would be less sensitive to the value at the cluster center (“ego”) than the Ii and more susceptible to a few strong neighbor values. 2 Methods 2.1 Data description The National Cancer Institute has released age-adjusted cancer mortality rates for U.S. counties, state economic areas (SEAs), and states, for 40 site-specific cancers, 4 groups (African-American females, African-American males, white females, white males), and for several time periods from 1950-1994. The rates are the number of cancers per 100,000 person-years, age-adjusted to the 1970 U.S. population standard age distribution. We focus here on the SEA datasets. SEAs are aggregations of counties within state boundaries that were similar according to 1960 socioeconomic data (U.S. Bureau of the Census 1966). The SEA data has better temporal resolution than the counties (counties have 20 or 25 year times only) and finer spatial resolution than the state datasets. Data for African-American males and females starts in 1970, while data for white males and females begins in 1950. More information on this data is available from the National Cancer Institute cancer mortality maps and graphs website (http://www3.cancer.gov/atlasplus/) and in the printed atlas (Devesa et al. 1999). The National Atlas has compiled metadata for this dataset, available at http://www.nationalatlas.gov/cancerm.html. We focus on colorectal cancer rates for African-American and white females and males for SEAs in 5-year time intervals from 1970 through 1994. We use the age-adjusted rates produced by the NCI: these rates are for 100,000 person-years and are adjusted to the 1970 age-classes (Devesa et al. 1999). We repeated this analysis for the county-level rates data, but did not include it in the write-up because of space constraints. The comparison between Ii and G* conclusions for the counties geography was similar to the SEA results we present. 2.2 Software Description The Cancer Atlas Viewer is the first implementation of the more general Space-Time Information System described in Jacquez et al. (this issue). Although it has a unified graphical user interface, the software is built of several subcomponents, as shown in Figure 3
The Cancer Atlas Viewer and STIS software contain several statistical methods, from data transformations such as the Z-score standardization, to the creation of difference datasets, to the calculation of Moran’s I, local Moran, and local G* clustering statistics. The cluster statistics are evaluated with Monte Carlo randomization-based hypothesis testing. The Cancer Atlas Viewer (and its STIS counterpart) has time as a dimension of the data. The spatial relationships among the observed objects (whether point objects or polygons) and the attribute data can be brought in as separate pieces. For instance, in the case of the NCI Atlas data, there is only one geography of the polygons (at the county, SEA, or state level) for the entire analysis. Although the outline of some US counties has changed over time, the NCI standardized it as one static geography for representation in GIS. Morphing polygons are easily represented in the STIS (Meliker et al. this issue), with counties appearing, disappearing, splitting and merging. Once the geography (e.g. county shapefile) is imported, the user then can import attribute data that is joined into a complete space-time dataset. The attribute data can be imported as a time series (where the variables are valid for times specified as fields in the database file) or as a time slice (where the data is stored in a series of files or database records, each valid for a time interval specified on import). The latter is especially useful for importing static layers from a conventional GIS. 2.3 Z-score The Cancer Atlas Viewer uses Z-score standardization to prepare the data for the Moran analysis. The Z-score standardizes the mortality rates by taking the observed rate, subtracting the mean rate for the entire region, and then dividing by the standard deviation. The Z-score is only one of several possible epidemiologically relevant standardizations of mortality data, including the standardized mortality rate or ratio (observed cases/expected). It is a required step for Moran’s I and Ii analyses. A Z-score standardizes the mortality rate for area i, mi,t, by its mean and standard deviation, creating a new variable m^i,t
where t is the mean mortality at time t, and sm,t is the standard deviation of at time t. After Z-score transformation, all variables in a larger dataset have equal means (transformed mean = 0) and standard deviations (transformed s = 1), but different ranges. Negative Z-scores indicate the location is below the mean of the data, positive that it is above the mean. The magnitude of the Z-score is the distance in standard deviation units away from the mean.2.4 Difference datasets Cancer Atlas also calculates difference datasets, to allow the user to view change maps. Absolute change in cancer mortality, mi for area i between times t and t+1 is calculated as:
2.5 Moran’s I Moran’s I (Moran 1950) is a spatially weighted correlation coefficient used to detect spatial pattern such as clustering (Equation 4)
Here N is the number of regions, wij is a weight denoting the strength of the connection between areas i and j, drawing on equation 1 to calculate Z-scores. W is the sum of the weights (equation 5). We used first-order queen neighbors with row-standardization. Weights are wij = 1/(# neighbors) for first order queen neighbors, 0 for all other locations. Hence the weights for each location sum to 1 and the sum of the weights, W, is equal to the number of locations, N. It is a global statistic, in that there is one value for an entire geography. The range of the statistic is usually between (-1, 1), but its range depends on the characteristics of the spatial weights set used (Cliff and Ord 1981). Positive spatial autocorrelation means that surrounding areas have similar mortality rates, negative values indicate surrounding areas have different rates. Because we repeated the Moran’s I calculation for 5 SEA time intervals for each group, we used a Bonferroni adjustment for multiple comparisons, lowering the alpha level for significance to 0.01 (α = 0.05/5). 2.6 Local Moran The local Moran test (Anselin 1995) detects local spatial autocorrelation in group-level data. The local Moran decomposes Moran’s I into contributions for each location (each i), termed Local Indicators of Spatial Association (LISA). These indicators detect clusters of high and low values as well as anomalies, also called spatial outliers. The sum of Iis for all observations is proportional to Moran’s I. Anselin (1995) defined a local Moran statistic for an observation i:
In equation 6, m^i,t is the Z-score for the cancer mortality rate at location i at time, j,t at location j at the same time. The Z-scores are calculated as in Equation 1. wij is a weight denoting the strength of connection between areas i and j, defined as for Moran’s I. .2.7 Local G* The local G* test (Getis and Ord 1992; Ord and Getis 1995) is also a LISA statistic. Like the local Moran, it detects clusters of extreme values (high and low). Unlike the local Moran, it is not designed to detect anomalies or outliers. Ord and Getis (1995) defined a local G* statistic for an observation i:
One difference between the Ii and the G* is that the centering location is treated the same as the neighbors, its value enters the calculation when j = i. So, the weights sets for the two statistics are different:
and the squared sum of the weights
In the context of interpreting the G* by evaluating its Monte Carlo randomization p-value, the denominator of the G* is unimportant, as it is constant for a location and all of its conditional randomizations. 2.8 Significance and Multiple Testing We calculate p-values for Ii and G* using 999 conditional Monte Carlo permutations of the data values. Ii and G* statistics calculated for a given study area are not independent of one another, and hence their p-values (depending on one’s philosophical inclinations) should be corrected for multiple testing. We now consider sources of this lack of independence, two approaches for accomplishing the adjustment, and a proposed solution. Lack of independence arises in two places: Monte Carlo distributions under the null hypothesis, and in the test statistics. A conditional randomization approach is used when calculating the reference distribution of the test statistic within a Monte Carlo framework. Assume an area i for which we wish to evaluate the significance of the LISA statistic Ii, and that area i has neighbors j and k. Further assume N is the number of areas on the map. Conditional randomization repeatedly assigns new values to these 2 neighbors by drawing 2 new values at random from the N-1 areas surrounding location i, and calculates and records a new value of Ii to construct the reference distribution. The reference distributions for two different areas i and j thus are not independent since they will be constructed from repeated drawings from the same population. Lack of independence also arises in the test statistics for two areas that are neighbors of one another. Because they are neighbors, the Ii statistics for i, j and k will each use the values associated with one another when calculating Ii, Ij, and Ik. The test statistics therefore are correlated, and their p-values should be adjusted accordingly. Two approaches may be used to accomplish this adjustment. We can correct the significance or alpha level of the test, or we can adjust the p-values themselves. If we correct the alpha level we then compare the unadjusted p-values to the corrected alpha level. If we adjust the p-value we compare the adjusted p-value to the unadjusted alpha level (α = 0.05 in this case). We propose to use the second approach – adjusting the p-value and using the same alpha level for evaluating the N Ii statistics on the map. We use the Simes (1986) adjustment, which is not as conservative as the Bonferroni correction. The Simes adjustment is calculated as in Equation 10. Assume 3 p-values pi, pj, and pk – suppose they are (0.002, 0.001, 0.036). Rank the p-values from lowest to highest, obtaining the vector (0.001, 0.002, 0.036). We wish to calculate the “Simed” p-value for pi = 0.002, the second element in this vector. This is
Here n is the number of p-values being considered (3), and a is the index (starting at 1) indicating the location in the sorted vector of pi (2). The Simed p-value is then (2) 0.002 = 0.004. 2.9 Classification After the software calculates the G* and Ii statistics for each location, it classifies all of the SEAs in the geography. For the Ii analysis, it classifies all of the SEAs in the geography as being the center of low-low clusters, high-high clusters, a significant high outlier (high-low), a significant low outlier (low-high), or nonsignificant. It compares the Simed p-values to a prespecified alpha level, in this case α = 0.05, and then assigns the classes based on the sign of the Ii (positive indicates cluster, negative indicates outlier) and its Z-score (high or low). This treatment is parallel to the treatment of the local Moran in other software products, such as ClusterSeer, the SpaceStat extension for ArcView, and GeoDa. For the G*, the software compares Simed p-values to a pre-specified alpha level, in this case α = 0.05, and then assigns the classes based on the sign of the G* (positive indicates high cluster, negative indicates low cluster). We then scored each set of maps for similarity of classifications and for cluster persistence. For each race-gender subgroup, we examined cluster classifications resulting from the Ii and the G* analyses at a particular time interval (for example, African-American female mortality rate 1990-1994). We considered an Ii high-high cluster equivalent to a high G* cluster, similarly matching an Ii low-low cluster with a low G* cluster, and a finding of nonsignificant in both analyses matched. All matching classifications were considered concordant. Non-concordant situations occurred when the outcomes differed from this matched pairing. For cluster persistence, we compared sets of maps over time within one classification, for example white male G* mortality rate classes in 1985-1989 and 1990-1994. Clusters identified in the last time period could not be scored for persistence as we had no information about clustering after 1994. As most of the locations were classed as non-significant and stayed that way, we did not count transitions from the non-significant class, just transitions from a cluster class to another class (such as outlier or non-significant). 3 Results In this section, we describe the Atlas software and a comparison of the two clustering statistics’ conclusions about the patterns in colon cancer mortality from 1970 through 1994 rates in 5-year time intervals for SEAs. The data for SEAs are efficiently represented as time slices in 5-year time intervals (1970-4, 1975-79, etc) with a static geography. Thus, what changes when data are animated in a map, graph, or table are only the attributes, rather than the shapes and positions of the geographic units. Figure 4
From early 2004 to July 2004, over 200 people downloaded the Cancer Atlas Viewer. BioMedware staff have also watched individuals use the software at our project planning meetings (described http://www.biomedware.com/innovations/atlas.html) and in exploratory spatial data analysis courses. The response from users has been positive: they appreciate the ease of visualizing the time series data as animated maps and the interactivity that brushing linked windows provides. Our user observation sessions fed back into software design, helping us to improve parts of the interface that were difficult to use and to uncover new, required features. For all but a few SEAs, the count of deaths from colon cancer has increased since 1970 for all gender-race combinations. Similarly, the mortality rates from colon cancer increased for African-American females and males and white males. The differences in rates are largest for African-American males, who experienced the greatest increase in mortality rates from 1970 to 1990. White females, however, experienced decreasing rates in most of the study area. The differences are illustrated for all four groups in Figure 5
The global spatial pattern in these rates has been somewhat variable over time, but typically increasing. Figure 7
Moran’s I is a global test; it does not detect localized pattern. Yet, because there are 4 race-gender combinations and five time intervals by two cluster tests (that is forty cluster maps to compare), we will not detail the results of any single cluster test at any particular time. Instead, we present the pattern of results across all race-gender subgroups over all time intervals. Table 3 compares the results from both tests. Overall the two local statistics were in concordance. Over 97% of the time, the two statistics agreed on the status of a location. Both agreed that there were twenty significant clusters of high values, sixteen significant clusters of low values, and 1,901 non-significant areas. There is no case where the Local Moran finds a cluster of high values and the G* finds a cluster of low values, or the reverse. They appear to be drawing similar conclusions about these data.
Yet, there are some differences between the results. These differences could be caused by differences in the search pattern of each statistic (the geographic alternative hypothesis to which the statistic is sensitive) or because of the random nature of Monte Carlo probability assessment. We classed the forty-three nonconcordant results into four categories for convenience: no comment on outlier, outlier disagreement, significance disagreement, and marginal significance disagreement. These categories are summarized in Table 4. No comment on outlier was a category we expected in the beginning—that the G* may have “no comment” on locations identified as significant spatial outliers by the Ii. Since the G* is not designed to detect outliers, and the Local Moran is, we expected outliers by the Moran analysis not to show up as clusters in the G* analysis. This occurred thirteen times (where the Local Moran was Low-High or High-Low and the G* was not significant). What was unexpected, however, was the five times that the local G* called something a cluster and the local Moran called it an outlier. We will discuss two examples, Columbus, GA and Greenville, SC. The Columbus, GA SEA is considered the center of a cluster of low mortality rates for white males in 1970-1974 by the G* but a high outlier among low values by Ii. As shown in the left side of Figure 8
The Greenville, SC SEA is considered the center of a cluster of high mortality rates for African-American females in 1980-1984 by the G* but a low outlier among high neighbors by the Ii. Yet, as shown in Figure 9
The twenty-five other cases of difference between the G* and local Moran results occurred when only one of the two tests called a location the center of a significant cluster. In all cases, the statistics agreed about the pattern, both the Ii and the G* showed clustering of high or low values for each location, but their results disagreed about the significance of the pattern. The G* called 4 locations clusters that the local Moran called not significant, while the local Moran called 21 locations clusters that the G* called not significant. Overall, the local Moran finds clusters more often than the G* does. Whether either is more accurately reporting the “true” number of clusters in the region cannot be determined with this dataset, but we can examine those cases where the two tests differ to see what triggers each statistic. Of the twenty-five disagreements about the significance of the clustering, twelve occur when there is a marginal difference in the p-values of the two statistics. For all items in this category, the p-value for both the Ii and the G* were < 0.10. For example, for white males in 1970, Auburn, AL was the center of a cluster of low values according to the local Moran (Ii = 0.67, p = 0.049); its G* was marginally significant (G* = -2.72, p = 0.056). Both statistics are in agreement about the pattern and its strength, but the Moran statistic happens to be just below the decision criterion (alpha = 0.05) and the G* above, so they provide different answers. The mean p-value for each statistic in this class was low (mean Ii p value = 0.042, mean G* p value = 0.059, Table 4). Hence this lack of concordance reflects the arbitrariness of the alpha = 0.05 decision threshold. Because of the random nature of the conditional Monte Carlo randomization used to assess significance for both statistics, it is entirely possible that the significance for Auburn, AL would be the same for both statistics (either a significant low cluster or nonsignificant) or the pattern of significance reversed (with the G* being below the threshold and the Ii above) if the analysis was re-run. Also, we could have chosen a higher number of Monte Carlo randomizations (such as 9,999) to get a more precise p-value from the software. More p-value precision could alleviate these minor p-value disagreements. The other thirteen cases of disagreement about the significance of a cluster are more interesting. In these cases, the difference in the p-value is large. For example for African-American females in 1970, Sumter, SC was the center of a significant low cluster according to Ii (Ii = 0.291, p = 0.040) but not close to significant by the G* (G* = -1.68, p = 0.500). Similarly, Prattville, AL was the center of a local Moran cluster of low mortality for African-American females in 1970 (Ii = 0.05361, p = 0.024) but not for the G* (G* = -1.83, p = 0.500). In these and other cases where the Moran p-value is much lower than the G* p-value it is often the case that the Ii statistic is low (mean of this class is 0.364, Table 4) though significant. There is a significant but weak correlation between the values in the local neighborhood. Overall, the range of Ii values was from about -2 to 7. Positive Ii values less than 0.5, while significant, do not indicate strong clusters of extreme values, and correspond instead to clusters of values slightly lower or higher than the mean. The findings for G* and Ii differ because G* is considering divergence from the mean rather than correlation between the values. In many of these cases, G* provides a more reasonable interpretation of the pattern in terms of what we are looking for in the study of cancer mortality rates—researchers are understandably more concerned about clusters of extremely high or low mortality rates than clusters of rates within 1 standard deviation of the mean. The differences in the results of the two cluster detection statistics stem from two factors. The marginal significance disagreement arises from the use of the 0.05 decision criterion. The outlier disagreement, no comment on outlier, and significance disagreement arise from differences in the search pattern of each statistic. In some cases, such as Columbus, GA, G* provides a better description of the local pattern but not the ego location, while in others, such as Greenville, AL, neither explanation fits. For cluster persistence, the results were clear: most of the significant clusters do not persist to the next time period, as detailed in Table 5. For sixty-two clusters or outliers identified by the Ii from 1970 through 1989, sixty were no longer significant in the next time period. For thirty-six clusters identified by the G*, thirty-five were no longer significant in the next time period. Only one Ii cluster persisted into the next time period, a cluster of low mortality around Auburn, AL in 1970-74 and 1975-79. The area around Auburn in both time intervals is illustrated in Figure 10
4 Conclusion The lack of persistent clustering in these data suggests that the clusters detected may be ephemeral. There is not persistent clustering of high values indicating a stable environmental exposure or a stable social or genetic contributing factor to colon cancer. Ephemeral clusters can be explained in several ways: they might result from unstable mortality rates generated by small populations-at-risk, population migration, or short term factors such as geographic differences in treatment or screening that do not persist. This seems to be a positive conclusion—there is no area at high for risk colon cancer that persists through time. There are some disparities in colon cancer mortality, as shown in Figure 6 This comparison finds that the results of the two statistics are similar, with agreement on their classifications over 97% of the time. Because of the large amount of concordance between the two cluster statistics, there seems to be little additional value gained from applying both cluster statistics to a dataset. Because of the differences in reporting of significance, with the Ii reporting some clusters quite near the mean, the G* may be more sensitive to clusters of extreme values. The cluster classifications produced by the statistics needed to be further examined to be interpreted well. Although several software products, including the Cancer Atlas Viewer, produce crisp maps classifying locations into clusters, outliers, and non-significant areas, the simplicity of these maps can obscure the complexity of the observed data. The differences we found in the classifications of the G* and the Ii pointed out a few locations where the interpretation for either classification required careful examination of the mapped data. These were instances in which the local map pattern did not correspond to the search image of either cluster statistic. The few times they disagree on the status of an outlier seem to come from limitations on the shape of the cluster to be detected imposed by the first-order neighbor relationships considered. Disagreement emerged from a situation where there was variability among the neighbor set, so the “true cluster”, if it existed at all, was likely a subset of the neighbor set, rather than the whole group of first-order neighbors. This is shown in Figures 8 This analysis was performed using the free Cancer Atlas Viewer software, providing researchers with sophisticated visualization and statistics for the exploration of patterns in mortality from 40 site-specific cancers. It can act as a quicker and more interactive way to explore the data made available by the National Cancer Institute, cutting out the delays inherent in web mapping that may hinder exploration. The statistical analysis presented here may be beyond the interest and commitment of a casual user, but the software provides a means for researchers to assess cancer mortality patterns, examine these patterns over time in animated maps and graphics, and to assess the persistence of clustering. Acknowledgments This project was funded by grant CA92669 from the National Cancer Institute to BioMedware, Inc. The positions espoused in this article are those of the authors and do not necessarily represent the official views of the National Cancer Institute. Constructive criticism from Heidi Durbeck of BioMedware, Peter Rogerson of SUNY-Buffalo, and three anonymous reviewers helped us improve the interpretation and presentation of these results. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||
Epidemiol Rev. 1999; 21(2):143-61.
[Epidemiol Rev. 1999]Int J Health Geogr. 2003 Feb 17; 2(1):3.
[Int J Health Geogr. 2003]N Engl J Med. 1981 Mar 26; 304(13):745-9.
[N Engl J Med. 1981]J Natl Cancer Inst. 1980 Sep; 65(3):571-5.
[J Natl Cancer Inst. 1980]Health Place. 2003 Dec; 9(4):361-9.
[Health Place. 2003]Biometrika. 1950 Jun; 37(1-2):17-23.
[Biometrika. 1950]Am J Epidemiol. 2000 Nov 1; 152(9):847-54.
[Am J Epidemiol. 2000]Cancer. 2002 May 1; 94(9):2327-32.
[Cancer. 2002]Int J Health Geogr. 2003 Feb 17; 2(1):3.
[Int J Health Geogr. 2003]