Estimation of the time component in the movement of chemicals in contaminated groundwater.

For a proper analysis of the potentially causal relationship between exposure to volatile organic chemicals (VOCs) in drinking water and health events, it is essential to know T1, the time when exposure started, and C = f(T), which is the change of the VOC concentration C as a function of time T and the total accumulated exposure (TAE) to VOCs to which an individual was exposed. In the typical situation of incidentally detected pollution of groundwater, no such information is available. This paper describes the development of a method for estimating T1, C = f(T), and TAE as part of an epidemiologic study of the health effects of VOC contamination of an aquifer serving public and private wells. Pooled test results of city wells, tested periodically since 1981, provided the data base for developing a statistical model for estimating C = f(T). This model was then applied to private wells, for which the data of only one water sample were available, to retrospectively estimate their T1. The best-fitting model was a multiple linear regression equation consisting of the natural logarithm of the VOC concentration as the response variable, with the time of sampling, the distance of the wells from the source (expressed as coordinates), the well depth, and the well capacity as determinants. The TAE was calculated by integrating the area under the time-concentration curve.


Introduction
Except in prospective cohort studies, epidemiologists usually have the problem of individuals failing to recall exposure data or the problem of poor quality of existing exposure records. This explains, in part, why most studies suffice with dichotomous or ordinal exposure data, or with assuming that exposure was constant over time during the period of residence or employment in an area with contaminated air, water, or soil. In the search for better exposure data, little attention has been given to the time factor in exposure assessment. In risk assessment and epidemiologic studies, the time factor can be important in several ways: * The starting date of exposure determines whether a disease antedates exposure and should therefore be excluded from analysis. * The starting date of exposure determines whether the observation time was long enough for a disease to be attributable to the exposure of interest. * The end date of exposure marks the beginning of the exposure-free observation period, in which the doseresponse curve is expected to be different from that when exposure was ongoing (1).
* The dates when exposure started and ended and the changes in exposure level between these two points in time (exposure as a function of time) allow estimation of the duration and total amount of exposure. Both are important for a proper dose-response analysis and for judging the biological plausibility of an observed statistical relationship between exposure and a health event. * The age at which exposure started and ceased is necessary information in estimating the risk of cancer through mathematical models for a less-than-lifetime exposure (2).
Clearly, in epidemiology, neglecting the time factor is likely to result in misclassification with regard to exposure and eligibility of an observed disease and can lead to decreased statistical power and to erroneous inferences from study results. In the case of contaminated groundwater, once a chemical spill has occurred, a certain period of time (dTl) is needed for a chemical to reach a distant well. The magnitude of dTl is determined by many factors, such as depth of the water table, solid structure, direction and velocity of the groundwater flow, the distance of the well from the axis of the groundwater flow and from the spill site, withdrawal rate and amount, soil affinity and other properties of the chemicals, etc. Once the compound reaches a well at a time T1 (the time that a compound reaches the detection limit), a period of time dT2 is needed to reach a sample concentration C8 at sampling time Ts (Fig. 1). Contrary to a predicted dTl or dT2 as a future event, estimation of T1, dTl, and dT2 is retrospective when one is confronted with a situation of polluted groundwater. Estimating these values is important in tracing the generator of the pollution to seek recovery for cleanup costs or material damage, and to determine possible adverse health effects. Retrospective estimation of T1 and C = f(P, which is a mathematical expression of the changes in concentration C as a function of time T, is necessary when the potential health effect of chemicals in water is being studied. In the typical scenario of an incidentally detected pollution of residential wells, however, existing information is limited to the results of testing a single water sample; routine monitoring of public water supplies for a wide variety of toxic chemicals was begun only a few years ago. Naturally, prospective (predictive) models might be used retrospectively. Several models have been advanced for mathematically describing the transport of chemicals in groundwater, but all of these require extensive data on hydrogeologic parameters, which were not available for the scenario described in this paper, as they do not exist in virtually all sites with incidentally detected contamination of groundwater. Moreover, as described in "Discussion," such models are inherently incapable in discriminating between individual wells in a crowded well field. This paper describes the development and use of a method for retrospectively estimating T1 and C = f(T) as part of an ongoing epidemiologic study of the potential health effects of human exposure to volatile organic chemicals (VOCs) in drinking water. The method is based on the hypothesis that dTo -Imonitoring data for city wells, if available, can be used for estimating C = f(T) for city wells, which in turn can then be applied to neighboring residential wells for the retrospective estimation of T1, as depicted in Figure 1.

Materials and Methods
In September 1981, an unscheduled survey led to the discovery of VOCs in the municipal drinking water of Battle Creek, MI. A routine monitoring program for city wells was set up, and neighboring private wells were also tested, but not more than once in the period 1981 to 1983. Contamination of a number of wells was confirmed. There were no historic data on the presence of VOCs in city or private wells. The source was found to be a distributor of industrial solvents located upgradient of the affected wells (3). The major cause of the spill appeared to be leaking underground storage tanks. Soil samples taken at the site showed extremely high concentrations, and VOCs were found freely floating on top of the water table. Thus, the contamination of the aquifer was not from a one-time spill but from continuous spilling. However, this may not necessarily be true for all VOCs, as not all VOCs were stored at the same time. Figure 2 depicts a map view, constructed from an aerial photograph and from maps supplied by the city engineer.
Groundwater flow in the Verona area is in three aquifers. From land surface down the aquifers are sand and gravel units within unconsolidated glacial deposits and upper and lower sandstones of the Marshall Formation. Horizontal hydraulic conductivities of the aquifers are about 100, 150, and 550 feet/day for the sand and gravel,  upper sandstone, and lower sandstone, respectively. Secondary permeability of the sandstones, due to fracturing, is the cause of the relatively high hydraulic conductivities. Most private well water is pumped from the sand and gravel or the upper sandstone aquifers. Municipal water is drawn from the upper and lower sandstones. Although the vicinity of the river west of the wells would lead one to expect groundwater to flow from the point source in a westerly direction, the actual flow has been found to be in a north to northwesterly direction. This can be explained by the heavy water withdrawal in the city well field. The velocity of the flow has been estimated at 1 to 4 feet a day. This hydrogeologic information is abstracted from a study by Grannemann and Twenter, conducted when the contamination of the city well field was detected (4). Estimation of T1 requires C = f(T), and thus, repeated testing of water from the same well over time.
Such time series, covering a period from September 1981 to the present, were available for 30 city wells, but not for the 116 private wells tested. Data on well descriptors and water quality were provided by the Michigan Department of Public Health and the Department of Natural Resources. The VOCs found most frequently, their abbreviations as used in the text, and the ranges of concentrations are listed in Table 1. The following sections describe the steps in determining C -fiT) for city wells to be applied to individual private wells for retrospectively estimating T1.

Selection of Wells
Minor remedial changes in the management of the city well field started in 1982, but reached major proportions in 1984 when newly constructed city wells north of the well field came into operation. The new well field caused, as intended, a sharp drop in VOC concentrations in the pre-existing wells, and the flow pattern in the aquifer was drastically altered (5). As declines and leveling of concentrations were the results of the change in the management of the city well field, it was presumed that residential wells continued to be affected by increasing contamination levels at least until the end of 1983. Accordingly, the selection of city wells for developing VOC-specific C = fiT) was limited to wells showing increasing VOC concentrations over one year or more during the interval 1981 to 1983. Not all wells were polluted, and if they were, contamination did not necessarily involve all VOCs. In particular, ethylene dichloride (12-DCA) and 1,1,2-trichloroethylene (TCE) occurred rarely in city wells, although these compounds were common in private wells (Table 1).
Of the 30 city wells, 11 were either not polluted, tested positive a few times Qnly, or were contaminated as late as 1984. VOC levels in these 11 wells ranged from 1 to 11 ppb, and in nonie of these wells was more than one or two VOCs present. Of the remaining 19 wells, 8 showed low VOC levels with single sample peaks, or the concentrati9n versus time curve (CT curve) did not show an increasing trend. Eventually, 11 city wells provided a total of 27 time series suitable for modeling. These wells are depicted in Figure 2 (numbered wells). In total, these 11 wells provided 27 chemical-and well-specific concentration-time series (Table  2). If increasing trends were followed by a decline in 1984 (attributable to remedial actions), the descending part of the time series was deleted prior to statistical processing. Thus, no time series extended beyond April 1984.
Variables for Developing a Time-Concentration Model A grid pattern with a central X-Y cross-axes system was laid over the map of the study area with a northsouth Y-axis and the intersection point of the X and Y axes located just northwest of the spill site (Fig. 2). The X and Y coordinates for each well were expressed as the number of grid intervals from the X and Y intersection point. The rationale underlying the X-Y coordinates was the hypothesis that, although the movement of the plume would follow the main direction of the groundwater flow (Y-axis), microscale deviations were likely to occur as a result of diffusion or local differences in the soil structure or flow characteristics. Such deviations are projected as components of X and Y. In addition, the depth of the well was considered important because a downward component of the groundwater flow and a vertical gradient in VOC levels were found in hydrologic studies (5).
The data for developing C = f(T) were arranged as a line listing in which each single water sample represented one observation with the following information: an identification number unique for a single water sample; the well number (for city wells) or street address (for private wells); the X and Y coordinates; a number indicating the quadrant of the X-Y grid system; month and year of sampling, from which the variable T8 was calculated as the number of months since January 1, 1970 (an arbitrarily chosen date); the sample concentration C, of each chemical in ppb (parts per billion); the well depth D in feet (midpoint of the screened or uncased portion); and the well pump capacity G in gallons per minute. a In addition to the wells cited, 3 city wells remained free of VOCs during the study period, and 48 private wells were free of VOCs in the one sample taken in the same period. bDCA = 1,1-dichloroethane; 12-DCA = ethylene dichloride or 1,2-dichloroethane; 11-DCE = 1,1-dichloroethylene or vinylidene chloride; CIS = 1,2-cis-dichloroethylene; PCE = perchloroethylene or tetradtloroethylene; TCA = 1,1,1-trichloroethane; TCE = 1,1,2-trichloroethylene.  Figure   2.

Development of the Time-Concentration Model
The statistical models tested are based on linear regression of C on T as of a common starting date, arbitrarily set at January 1, 1970. This date is 1.5 years after the solvent company started its operations. This time lapse, composed of dTo (from when the company started its operations to the date of spilling) plus dTl ( Fig. 1), was considered a minimum length of time for spilling to occur and for the spilled VOCs to reach the wells at a level above the detection limit. Most city wells demonstrated VOC-positive water samples later than September 1981, implying that in the respective time series, one or more leading samples were free of VOCs (CS = 0). In such cases, all but the last leading zeros were censored. The last leading zero, and zeros occurring after at least one positive sample, were converted to an arbitrary Cs = 0.5 ppb for two reasons: it allows the transformation of C into logC (the natural logarithm of C); and 0.5 is closer to the detection limit (between 0.2 and 1 ppb). Univariable regression (the only regressor variable being 7) was applied to determine the model that best fits the individual time series. Multivariable models were employed for data pooled from all selected wells, using stepwise backward elimination procedures with a significance threshold of p = 0.1 (6).
The following basic models were tested: C = a + bT log C = a + bT Multivariable Models C = a + bjT + b2X + b3Y + b4D + b5G log C = a + bjT + b2X + b3Y + b4D + b5G In the above models, "a" is the intercept, and "b" is the regression coefficient. In this paper,' b1 (the coefficient of 7) will also be referred to as the slope of the regression. Model 1 will be referred to as a C = T model, and Model 2 as a logC = T model. Application of Models for Estimating T1 (Starting Date of Contamination) Let C8 and T7 denote the concentration and time of a water sample from a well, and let Tl. denote the time in the past when a chemical reached the lower concentration Clow of interest. Tl1 can then be estimated as follows. Select the best final model from the stepwise regression for the chemical of interest, say, a C = T model comprising T, X, and Y for: perchloroethylene (PCE) and apply it to the target private well for both C8 and C1. subtract, and solve for T1ow The same derivation can be used for a logC = T model to show that: Equations 1 and 2 show that all variables but T cancel out because of their constancy over time. It is therefore not necessary to know a, b2, and b3 for the well and compound of interest. Besides, if these values were known for individual residential wells, there would have been no need for developing a model from city wells.
As Cl,w represents any concentration for which the associated Tlo,, has to be estimated, and thus also C1W = 1 ppb, T, can be estimated by replacing Clo with 1, and deleting logC10,,, as log (1) = 0. Thus, for estimating Tl, Equation (2) can be condensed into the following simpler equations: Evaluation of the Validity of the Models The following procedures were used to select the best model for each VOC: (a) Judge for each single city well and VOC how well univariable C = T and log C = T models fit the data from which they were derived, using the r2 value and the distribution of residuals. (b) Pool the well data and judge how multiple regression models fit the pooled city well data from which they were derived, using the model r2 and the distribution of residuals. Let an observed Tl1w and C1lw denote the time and concentration of the first positive sample in a time series. Using Equations (1) and (2) (Table 3). (d) Compute T1 for the other private wells if C. is more than 1 ppb. As these wells lack the reference value of a second sample, a relatively narrow (a few years) range of T1 estimates for a group of neighboring wells should serve as an alternative evaluation criterion for the validity of the model. (e) Repeat steps 2 through 4, now using weighted regression models, in which the data of an individual well are weighted by 1/SE2 of that well (SE is the standard error of the slope of the univariable regression).

Results
The following paragraphs refer to the results of applying the five evaluation procedures for model fitting, described above.
Of the 27 VOC-and well-specific time series, most showed curved scatter plots of C versus T, resulting in 17 series with higher values for logC = T models, 2 series with equal r2 for C = T and logC = T models, and 8 series with higher r2 for the C = T model ( Table  2). Figure 1 depicts a typical logC = T fitting scatter plot, representing the time series of city well no. 28 for 1,1,1 trichloroethane. C = T models showed a tendency toward underestimating C at both ends of the curve, especially at the upper end, and toward overestimating in the middle part of the curve. LogC = T models tended to slightly overestimate C at the lower end of the curve. Except for the model r2, the results of unweighted multivariable regression models were again in favor of logC = T models. LogC = T models with a higher r2 were found for TCA, PCE, and 11-DCA (1,1-dichloroethane), whereas DCE (1,1-dichloroethylene), CIS (1,2-cis-dichloroethylene), and TCE had a higher r2 associated with C = T models. No multivariable regression model could be developed for 12-DCA, a chemical found only occasionally and at very low levels in city wells (maximum concentration of 10 ppb). In the case of C = T models, the difference between observed and estimated Tl0w increased with inclining C8, resulting in a mean difference of + 2.6 months with large variations (SD 24.5). In contrast, logC = T models showed differences scattered within a narrow range around a mean of zero months (SD 6.5). In 56% of the cases, the estimated Tlo, was closer to the observed Tl,,,, if a logC = T model was used, compared with 32% if the basis was a C = T model. No difference between the two models was seen in 12% of the estimates.
As another test of accuracy, the models were applied to the two private wells that, coincidentally, were sampled twice before 1984. Table 3 shows that in 10 of the 14 pairs of observed and estimated Tl,,, estimates derived from the logC = T model were the closest to the observed value. The range of differences between estimated and observed Tl0w is by far the widest for C = T model estimates. In one case the C = T model yielded a Tl0,, estimate, pointing to a time still to come (1997). Table 4 depicts T1 estimates for private wells derived from unweighted models. C = T models yielded the widest range of estimates (134 years) and many large negative values, that is, estimates far earlier than 1970. In extreme cases, T1 went back in the past many decades to a century, before the VOCs came in commercial production. T1 estimates for neighboring wells differed by many years to decades with no consistent pattern. T1 estimates from logC = T models ranged from 1977 to 1983. Figure 3 is a map view of the location of wells with T1, in time lapses of 1 year, estimated (or observed if the sampling result was 1 ppb) for CIS and 11-DCA using logC = T models. The close correlation of each well's T1 with its topographical location is reflected in the clumping of the earliest T1 estimates in an area closest to the spill site. From here, protrusions with later T1 estimates spread out, pointing mostly in a westerly to northwesterly direction, in agreement with the main direction of the groundwater flow.
When analyzing pooled data, weighted regression models are conceptually preferred over unweighted models. Indeed, r2 values and the accuracy in estimating Tl0, improved for logC = T models, although not for a Because the first sample of well 260 has a Clow of 0 ppb TCA, a T, was estimated rather than T1ow. The observed reference for this estimate is between' tLow and Thigh-  C = T models. However, weighted models resulted in larger differences between observed and predicted Tl1u for both city wells and the two private wells listed in Table 3. Apparently, giving more weight in the pooled data to wells with smaller standard errors of the slope does not guarantee better accuracy.
We concluded that unweighted logC = T models yielded the best results when retrospectively estimat-ing T1 and C = f(T). The widest confidence range for T1 estimates was for chemicals with the smallest number of time series suitable for modeling, which is a logical consequence of using well constants such as distance and depth. Confidence limits around a regression estimate would set limits around Cs and not T1, rendering conventional methods for estimating these limits not applicable. Since calculation of both Cs and T1 are based on b1, the estimated regression coefficient for T, a sort of 95% confidence range was established by using b1 plus or minus 1.96 times its standard error. Ignoring 12-DCA, the standard error (SE) of b1 for logC = T models was not more than 6 to 15% of b1, resulting in a confidence range of a few months to a year, with an occasional outlier of 2 years. This is quite satisfactory, in view of the period covered (1970)(1971)(1972)(1973)(1974)(1975)(1976)(1977)(1978)(1979)(1980)(1981)(1982)(1983)(1984) and the limited data available. The models for 12-DCA were derived from only one well, with a very low level of contamination and a monitoring period of only one year (well no. 35), which resulted in an SE of b1 for 12-DCA that is 50% of b1. This does not invalidate the T1 estimates for 12-DCA. The large SE means that the T1 estimate has a wide confidence range, but it is still the best estimate. C = f(T) can be used for estimating the total accumulated exposure (TAE) in the period T1 to T7, indicated by the area under the CT curve (Fig. 1). For Clow Using the confidence range of b, for logC = T models described above, the lower limit of TAE was 11 to 27% lower than the point estimate, with an upper limit (for VOCs other than 12-DCA) of 14 to 61% higher than the point estimate. The widest confidence range was observed for DCE and TCE, of which the models were derived from 3 and 2 city wells only. This limited the number of explanatory variables (other than T) allowed in the multivariable regression models to 2 and 1, respectively. The upper limit for 12-DCA was high, consistent with the large standard error of b1 for this chemical.

Discussion
A great many factors govern groundwater movement. Most of these factors cannot be measured directly, but some can readily be inferred from associated measurable features such as porosity, hydraulic conductivity and gradient, and the amount of recharge and discharge from the aquifer. These features are used to describe the flow in aquifers and to understand more basic characteristics of the soil or rock matrix and the water that flows through the aquifer. For example, hydraulic conductivity reflects the friction and surface ten-sion between soil or rock particles and water molecules, which in turn depends on the size, shape, and distribution of particles, as well as properties of the water (7). An accurate prediction of groundwater movement requires a large number of measurements in the study area. Flow models are often applied to manipulate the large number of variables. Flow models predict the movement of water, not of chemicals, but nevertheless have been used to predict movement of chemicals (8).
Other, more complex models are available to simulate the transport of chemicals in a flow system. Some of these complex models have been used successfully (9) or are still in an experimental stage (10,11), and might theoretically be applied to the retrospective estimation of C = fl(T) and T1, but require extensive data on hydrogeologic parameters. In a typical setting of groundwater contamination found incidentally, such as in the Battle Creek situation, these data simply do not exist. Assuming that current parameters also prevailed in the past decade would certainly lead to erroneous results, as some parameters did change over time, e.g., the chemical gradient and the flow direction.
Because of their general concept, these prospective models are applicable to a wide variety of scenarios. They cannot discriminate, however, between individual wells as close to each other as in a well field or in a residential area. Accordingly, estimates of T1 or C = f(T) inevitably refer to relatively large areas or to groups of wells rather than to individual wells. This is not because of a deficiency in the concept of the models, but because today's technology for estimating geohydrologic parameters is essentially a macroscale technology. Dealing with an individual well amidst a whole field of wells in the same aquifer would require microscale parameters. Estimating C = f(T) and T1 for individual wells would require not only hydrogeologic tests for each single well, but also the assessment of chemical-specific aquifer and soil properties still unknown. For instance, Gold and Roberts (10) suggested the presence of immobile water compartments in an ;quifer. If immobile water compartments exist, there is no method to assess their location or size. Neither are the causes of the immobility known, nor whether such compartments may become mobile again.
Questions may arise about how aquifer properties relate to T1 and C = f(T) as expressed in Equations 1 through 4, but our method does not require that these questions be answered. The retrospective C = fiT) method described in this paper is not based on hydrogeologic principles, neither is it to be interpreted as a mathematical hydrological flow model. It is a pragmatic and empirical approach, using statistical tools to evaluate observed chemicaland well-specific changes in concentration over time. The estimates are based ont and can often be validated by, observed data. Analyzing a concentration-time curve is merely accepting that whatever factors affect the movement of a chemical will eventually be reflected in T1 and in C = fiT) for that compound. The regression coefficient of T reflects the speed with which chemicals move through the aquifer after being slowed down, accelerated, or detoured by the various characteristics of the aquifer, wells, and chemical. The time T in C = f(T) ultimately represents all geohydrologic, chemical, and physical factors that influence the movement of chemicals.
In the presence of a concentration gradient caused by a chemical spill, the chemical inevitably travels to places lower on the gradient, which yields increasing C8 in wells. It is equally inevitable that transportation is governed by time and characteristics of the wells and the aquifer, some of which may be known. A combination of flow, diffusion, and sorption factors, as described by Cameron and Klute (11), may explain the monotonic nature of C = f(T), but does not explain why neighboring wells have different slopes for the same chemical (see Table 2). Acceptance of the existence of immobile compartments and sorption factors in the aquifer, as proposed by Goltz and Roberts (10), would provide a reasonable explanation for differences in the slope of C = f(T). Immobile compartments would force the water flow to circumvent the area, resulting in increased values for dTl and dT2 (see Fig. 1), and thus result in a later T1 and lower slope. Immobile compartments also explain why some wells were hardly contaminated and others not at all, although they were in the main pathway of the plume of contamination. The adsorption/desorption ratio is likely to influence not only dTl, but also the shape of the concentration versus time curve. VOCs were stored and distributed at different times and volumes, depending on supply and demand, and not all tanks were corroded and leaked at the same time and at the same rate. This explains why different chemicals started to show up in the samples of the same well at different times and reached different peak concentrations.
Our approach of developing C = f(T) from city wells selected for showing increasing C on T, however, circumvents these issues, but some essential assumptions are made: the private wells experienced increasing C over T at the time of sampling; city wells selected for an increasing C over T better represent contaminated residential wells than do a hypothetical average of clean and polluted city wells; and private wells, once contaminated, will behave the same as city wells, that is, show the same dependency of C over T. There are some arguments in support of these assumptions.
First, Table 3 shows that the two private wells tested more than once did show increased levels in the second (later) sample, with CIS as the only exception. However, the decline of CIS most likely reflects a fluctuation rather than a true decline, as large fluctuations in the CIS levels of city wells were commonly observed. Second, the assumption that private wells, once contaminated, behave as city wells is supported by the close agreement between estimated (modeled) and observed concentrations in these private wells. Third, since the withdrawal by private wells is minimal relative to that by city wells, the groundwater flow in the residential area is determined by the activities in the city well field (4,5). Hence, if the city wells showed increasing concentrations under prevailing flow conditions, it is reasonable to assume that private wells in the same period with the same flow conditions were also affected by increasing VOC levels. Fourth, an increasing trend in VOC concentration was also apparent in most of the other contaminated city wells not selected for modeling. Finally, nearly all private wells were tested in 1981 and 1982 (three wells were tested mid-1983), while none of the city wells showed a definitive decline in VOC levels prior to 1984.
That not all wells displayed the expected increase in C over time may also be explained by other factors than distance, immobile aquifer compartments, and differences in sorption properties. Pumping a well causes a cone of depression in the aquifer through a pressure gradient, resulting in water from all sides being drawn to the pump. At the side of the chemical spill, the VOC concentration will b-l igher than on the other sides of the cone. The 11 city wells not polluted, or barely so, were either in the vicinity of the river, or in the northern rim of the well field (Fig. 2). In other words, large supplies of untainted water dominated in the cones of depression of these wells. Wells may also be affected by pumping of neighboring wells if they are within the cone of depression of those wells.
Still other assumptions are common to all regression techniques. Since our database consisted of time series, autocorrelation may pose a source of errors (12). However, plots ofresiduals versus time showed random scatter around zero over the time gradient, evidence that our models were not complicated by autocorrelation. Homogeneity of slopes of individual wells was assumed when the time series were pooled to derive a multivariable model. Testing for heterogeneity of slopes (13) showed that only one well (no. 38, Table 2) appeared to have a significantly different (steeper) slope, and only for PCE and 11-DCA. However, deleting that well from the data set would have resulted in T1 estimates earlier by not more than 0 to 2 months, too small an improvement to justify exclusion of the well from the pooled data.
We found that logC = T models had a better overall fit to the observed data than do C = T models. LogC = T models proved better in another aspect as well: errors in estimating Tlou} were hardly dependent on the sample concentration C8, an important feature in view of the much wider range of Cs in private wells than in city wells (Table 1). Further, using logC = T models, the range of observed deviations of estimated from observed Tl0,,, in city wells of 1 month to 2 years is considerably narrower than the range of errors in assessing the date disease was diagnosed. Our study showed differences in the reported date of disease onset of up to one decade, even for well-known diseases such as diabetes (14).
Evidence of the reasonable degree of accuracy in estimating T1 may also be construed from the velocity of the groundwater. A velocity of 1 to 4 feet/day or about 1000 feet/year was measured (4). Well no. 38 in the northern rim of the well field (Fig. 2) had an observed T1 varying from June 1982 to May 1983 for TCA, 11-DCA, and DCE. Given a straight distance of 1 mile from the VOC source, it can be concluded that water from this source would require 5 years to reach the well, and thus, that VOCs, if traveling at the same speed as groundwater, could have entered the aquifer at the source approximately mid-1977 to mid-1978 to reach well no. 38 by the above dates. Based on similar data on observed T1 for various VOCs in well nos. 32 and 33 in the southern rim of the field, VOCs could have entered the aquifer between mid-1977 and the end of 1979. Thus, if VOCs started their journey through the aquifer between mid-1977 and end of 1979, they would have reached the closest private wells about 1200 feet northwest of the source sometime in 1978. This closely agrees with the T1 estimates from the logC = T models for the private wells of 1977 to 1978 (see Fig. 3). Admittedly, this way of estimating when wells became contaminated is crude and ignores sorption and other issues causing VOCs to travel slower than the watery medium. Yet, since this way of reasoning is independent of the C = f(T) approach, the results add to the credibility of the outcomes of the logC = T models. In contrast, C = T models resulted in T1 estimates many decades earlier than reasonably possible, especially if the sample concentration was over 50 ppb.
We have investigated the significance of a great number of modifications of our method for retrospectively estimating C = f(T), many of which are of a statistical nature. Examples of such variations include expanding the number of wells (or even pooling all wells without selection) to provide the data for slope estimation, regressing T on C rather than C on T, replacing the X-Y coordinates with the direct distance and the angle of the well in relation to the point source, rotating the X and Y axes 300 counterclockwise, deleting influential outliers in the time series, pooling sampling results weighted by the inverse of the standard error of the slope of the wells, and using regression models with interaction between T and the coordinates (to account for possible secular trends). None of these variations provided better results with regard to the magnitude of the difference between estimated and observed Tl0,, the ultimate parameter of accuracy, although in some instances a significant increase in the model r2 (the square of the multiple correlation coefficient) was obtained.
The conclusions that can be drawn from these failures are that pooling wells without applying proper selection criteria is doomed to result in a heterogeneous population of wells rendering any modeling meaningless, and that well characteristics alone are incomplete predictors of chemical movement in an aquifer. One modification, however, did yield improved results, that is, the addition of water withdrawal parameters to the regressor variables in the statistical model. The number of pumping hours (H per month and per well, the pump capacity G of a well, and the product of these two variables (HG), which is the monthly amount of water withdrawn from the aquifer, were used as additional regressors. In addition to improved r2 values, the regression slopes increased slightly, resulting in earlier T1 estimates by 2 months or less. Since no information on H was available for private wells, this expanded model could not be applied to residential wells.
One possible explanation for the only limited success of this modification is that the compound effect of the interaction of cones of depression of neighboring wells, and the relation of these cones to the proximity of the river and the main axis of the groundwater flow, are complex matters requiring a procedure more sophisticated than simply adding the parameters to the model. We intend to further investigate the possible use of withdrawal parameters interacting with data on neighboring wells and on the proximity to the river.
Clearly, our method cannot, and should not, be used to predict whether, when, and to what degree water from a given well may become contaminated with a chemical of interest. On the other hand, our results do show that when no significant alterations in the local management of an aquifer have been made in the past, the results of a current simple monitoring program for comparable wells can be applied to unmonitored wells to yield meaningful information on the estimated time that the plume of contamination reached those unmonitored wells and on how the chemical concentrations changed over time. Evidence has been provided of a reasonable accuracy on the results of our approach, results that, given the lack of required data, could not have been obtained by any of the existing hydrologic flow models. Our approach to estimating C = f(T) and T1 is not presented as the ultimate solution for retrospectively estimating the movement of chemicals in groundwater. It is a pragmatic tool for quantifying exposure for epidemiologic purposes superior to the conventional, but erroneous, approach of assuming a constant exposure level throughout the period of residence in the area. We also feel that there is a good possibility that our approach is applicable to other sites with a similar scenario.
Finally, we deliberately refrained from reporting actual estimates of the slopes obtained from the pooled well data in Table 2 to avoid their use for other sites. These values, ranging from 0.08 to 0.17 for the various VOCs, are strictly specific to the Battle Creek site. Obviously, slope values should be estimated for each site separately, as they are determined by site-specific well properties, amount of spilled chemicals, the length of the period during which spilling occurred, and concentration-time series.

Conclusions
Under the conditions of the study area, well-monitoring data can be used to estimate the changes in the concentration of a chemical in well water as a function of time, that is, C = f(T).
A chemical-specific C = f(T) developed from monitoring data can be applied to nearby contaminated wells lacking such data to retrospectively estimate when the contamination started ( T1).
Under the conditions of the study site, the best of the models studied is logC = a + b1T1 + b2X + b3Y+ b4D + b5G. Specificity to the chemical of interest can be achieved through a backward version of stepwise multivariable regression.
Inclusion of variables that reflect some characteristic of the cones of depression of individual wells and of the interaction of the cones of neighboring wells is expected to further improve the accuracy.
More data and more studies under comparable conditions are needed to determine whether a logC = T model is universal for all wells that are not subject to changes in pumping regime and to determine the dependency of C = f(T) of a well on the operational status of neighboring wells.