Quantitative modeling of soil sorption for xenobiotic chemicals.

Experimentally determining soil sorption behavior of xenobiotic chemicals during the last 10 years has been costly, time-consuming, and very tedious. Since an estimated 100,000 chemicals are currently in common use and new chemicals are registered at a rate of 1000 per year, it is obvious that our human and material resources are insufficient to experimentally obtain their soil sorption data. Much work is being done to find alternative methods that will enable us to accurately and rapidly estimate the soil sorption coefficients of pesticides and other classes of organic pollutants. Empirical models, based on water solubility and n-octanol/water partition coefficients, have been proposed as alternative, accurate methods to estimate soil sorption coefficients. An analysis of the models has shown (a) low precision of water solubility and n-octanol/water partition data, (b) varieties of quantitative models describing the relationship between the soil sorption and above-mentioned properties, and (c) violations of some basic statistical laws when these quantitative models were developed. During the last 5 years considerable efforts were made to develop nonempirical models that are free of errors imminent to all models based on empirical variables. Thus far molecular topology has been shown to be the most successful structural property for describing and predicting soil sorption coefficients. The first-order molecular connectivity index was demonstrated to correlate extremely well with the soil sorption coefficients of polycyclic aromatic hydrocarbons (PAHs), alkylbenzenes, chlorobenzenes, chlorinated alkanes and alkenes, heterocyclic and heterosubstituted PAHs, and halogenated phenols. The average difference between predicted and observed soil sorption coefficients is only 0.2 on the logarithmic scale (corresponding to a factor of 1.5). A comparison of the molecular connectivity model with the empirical models described earlier shows that the former is superior in accuracy, performance, and range of applicability. It is possible to extend this model, with the addition of a single, semiempirical variable, to take care of polar and ionic compounds and to accurately predict the soil sorption coefficients for almost 95% of all organic chemicals whose coefficients have been reported. No empirical or nonempirical models have ever predicted the soil sorption coefficients to such a high degree of accuracy on such a broad selection of structurally diverse compounds. An additional advantage of the molecular connectivity model is that it is sufficient to know the structural formulas to make predictions about soil sorption coefficients.(ABSTRACT TRUNCATED AT 400 WORDS)

Quantitative Modeling of Soil Sorption for Xenobiotic Chemicals by Aleksandar Sabljic* Experimentally determining soil sorption behavior of xenobiotic chemicals during the last 10 years has been costly, time-consuming, and very tedious. Since an estimated 100,000 chemicals are currently in common use and new chemicals are registered at a rate of 1000 per year, it is obvious that our human and material resources are insufficient to experimentally obtain their soil sorption data. Much work is being done to find alternative methods that will enable us to accurately and rapidly estimate the soil sorption coefficients of pesticides and other classes of organic pollutants.
Empirical models, based on water solubility and n-octanol/water partition coefficients, have been proposed as alternative, accurate methods to estimate soil sorption coefficients. An analysis of the models has shown (a) low precision of water solubility and n-octanol/water partition data, (b) varieties of quantitative models describing the relationship between the soil sorption and above-mentioned properties, and (c) violations of some basic statistical laws when these quantitative models were developed.
During the last 5 years considerable efforts were made to develop nonempirical models that are free of errors imminent to all models based on empirical variables. Thus far molecular topology has been shown to be the most successful structural property for describing and predicting soil sorption coefficients. The first-order molecular connectivity index was demonstrated to correlate extremely well with the soil sorption coefficients of polycyclic aromatic hydrocarbons (PAi[s), alkylbenzenes, chlorobenzenes, chlorinated alkanes and alkenes, heterocyclic and heterosubstituted PAHs, and halogenated phenols. The average difference between predicted and observed soil sorption coefficients is only 0.2 on the logarithmic scale (corresponding to a factor of 1.5). A comparison of the molecular connectivity model with the empirical models described earlier shows that the former is superior in accuracy, performance, and range of applicability.
It is possible to extend this model, with the addition of a single, semiempirical variable, to take care of polar and ionic compounds and to accurately predict the soil sorption coefficients for almost 95% of all organic chemicals whose coefficients have been reported. No empirical or nonempirical models have ever predicted the soil sorption coefficients to such a high degree of accuracy on such a broad selection of structurally diverse compounds. An additional advantage of the molecular connectivity model is that it is sufficient to know the structural formulas to make predictions about soil sorption coefficients. The structural analysis of our quantitative model has shown that two factors are responsible for the majority of the variations in the soil sorption data: the molecular surface areas and the polarity of compounds.

Distribution of Xenobiotic Chemicals in the Environment
The widespread use of organic pesticides has given rise to extensive interest (1)(2)(3) in the adsorption of such solutes by soils from aqueous solutions because of the influence of this process on pesticide performance, mobility in the soil, and residue problems. Contamination of groundwater by pesticides and other agricultural chemicals by hazardous chemicals from waste disposal sites, and by gasoline and chemicals from underground storage tanks is becoming a major environmental problem. Although universal, this problem is particularly emphasized in the United States and other industrialized countries as well. Reports prepared by the U.S. *Theoretical Chemistry Group, Department of Physical Chemistry, Institute Rudjer Boskovic, HPOB 1016, 41001 Zagreb, Croatia, Yugoslavia.
Environmental Protection Agency (4) and the Council on Environmental Quality (5) indicate that up to 50,000 waste disposal sites in the United States may contain hazardous chemicals. Groundwater systems close to many of these sites are being slowly degraded, and the contamination often involves the presence of synthetic organic materials (6).
To minimize the impact of man's activity on groundwater quality, mechanisms by which pollutants enter groundwater need to be better understood, and reliable techniques either to measure or predict the transport of contaminants within aquifers need to be developed. Because the large majority of synthetic organic chemicals are hydrophobic, their adsorption on soil, sediments, or other materials plays a very important role in their transport in surface or subsurface systems.
Present laws and regulations about control of toxic substances (e.g., Toxic Substances Control Act, U.S. Federal Pesticides Law, OECD-Guideline for Testing Chemicals) require that all compounds that are prepared commercially or identified as potentially useful chemicals must be assessed for their environmental behavior and hazards. Many compounds (-100,000) (7) are currently in common use, and new chemicals are registered at a rate of 1000 per year. Our human and material resources are not sufficient to obtain experimentally even basic information about the environmental fate for all these compounds. Thus, it is necessary to develop quantitative models that will reliably and rapidly predict the environmental behavior for such large sets of compounds.
The environmental fate of organic pollutants depends strongly on their distribution between different environmental compartments. The soil sorption coefficients are currently used as a quantitative measure of the adsorption of xenobiotic chemicals by soil from aqueous solutions (2). They are defined as the ratios between the concentrations of a given chemical sorbed by the soil and dissolved in soil water. In order to compare the soil sorption coefficients measured for different soils, they have to be normalized either to the total organic carbon content of the soil (Ko0) or the organic matter content of the soil (Kom). These two normalizing schemes are simply related by the factor 1.724; thus, it is easy to convert coefficients reported on any basis.
In this paper, quantitative models for predicting soil sorption coefficients of xenobiotic chemicals will be described and evaluated. A small section will be devoted to general principles of statistical modeling and to QSAR models in particular. Another purpose of this paper is to shed more light on the mechanism(s) of the soil sorption process since it is currently described either as an adsorption or a partitioning process. The details about this controversy will be discussed in the section discussing the mechanism of the soil sorption process. In the closing section we shall describe the applications for quantitative models of soil sorption and give our views about needs and future developments in this environmental science field.

Principles and Rules of Statistical Modeling General Considerations on Statistical Modeling
Before proceeding any further, we shall discuss some elementary principles and pitfalls of modeling processes in general and statistical modeling in particular. While the following points should be self-evident, they have generally been ignored and/or overlooked. On the whole, models are used in the sciences to represent, explain, predict, or estimate phenomena of interest. They are simply approximations of real systems primarily developed for the purpose of prediction, and we cannot expect them to be the true descriptions of reality. Thus, the real value of a model is related neither to the type and size of the model nor to its results for the training data set, but rather to its ability to handle new situations correctly. Yet, as Ptolemy demonstrated, bad models are as readily available as the good ones. Moreover, microcomputers have made this era the golden age of modeling, and easy access to desktop computers and modern software allows people with either modest or no statistical background to construct and test elaborate formal models. In addition, microcomputers enable us to abuse models at superhuman speed and to produce enormous volumes of questionable numerical results. Thus, the principles of logic, statistics, and measurement must be satisfied in order to develop a meaningful model and to successfully predict phenomena of interest. The data sets must conform with the basic statistical requirements underlying the statistical procedure used to develop a particular model. Also, the model must make logical, mechanistic, and statistical sense. Particular caution should be exercised not to overfit the data. As demonstrated later, many quantitative environmental models are published that do not comply with the above basic requirements.

Statistical Modeling Process
There are five major stages in the process of statistical modeling: identification, fitting and estimation, validation, application, and iteration. Figure 1 gives the schematic description of this process.
Identification is the process of finding or choosing an appropriate model for a particular situation. There are no rigorous procedures that guarantee success. Systematically, there are two extreme approaches in the identification process: one which seeks a model on the basis

180
L on of a rational argument from some knowledge of the real life situation without reference to any actual data (conceptual identification), and the other that considers only the data and their properties (empirical identification). In practice, both approaches should be used and combined to create the best models possible.
Model fitting is the stage when we move from the general form to the specific numerical form, whereas the estimation stage represents the process of assigning numerical values to parameters in that model. The most frequently used and the best known method for the purpose of statistical modeling is least squares fit.
The process of comparing the model with the observed world is called "validation." What is valid at one stage of a study need not be valid at a more developed stage. A model can be valid for one purpose, but not for another. The object of validation is to examine whether the model is a good (not true) description of reality in terms of its behavior and of its intended application(s).
For logical reasons and convenience of presentation, the application is described as the last stage of the modeling process. However, in practice, the application must be a part of the very first consideration and taken into account when carrying out the processes of identification, estimation, and validation, since models are developed to help solve problems. If we ignore application during the modeling process, we might end up with an excellent model that will not solve our problem. Figure 1 gives a realistic view of statistical modeling being an iterative process. It is a process of continuous development, going back a stage or two to use additional information. The model is never "the model," final and unaltered. It is always a tentative model, which we shall use until we can improve it.
We should always remember that in statistical modeling we are dealing with probabilities, distributions, populations, and uncertainties. One ofthe major sources of uncertainty is our data. When seeking to fit models to our data, we often find that an accurate fit requires more data than we can either practically or financially obtain. Even when we have enough data, we find that its quality is often far from perfect. We may have accurate data on some variables, whereas those for other variables may be inadequate. Very often, in order to have enough data, we are forced to collect data from various sources. The reliability and quality of such data will vary greatly. The limitations on the amount and quality of the data available reduce the precision with which we can fit and use the models.
Quantitative Structure-Activity Relationships (QSAR) Models Mathematical models that relate some chemical, biological, or environmental activity of interest to some quantitative structural descriptor or physico-chemical property are collectively known as quantitative structure-activity relationships (QSAR) models. QSAR models are usually developed for a group(s) of structural congeners. The primary objective in creating them is to predict the activities of untested congeners. The investigators also hope to understand better the mechanisms of action of structures under study.
The statistical procedure used to derive QSAR models is the linear regression analysis, and it can be either single or multivariable, depending on the number of structural descriptors used in the particular analysis. The usual procedure in deriving QSAR models is stepwise and begins with the single variable regressions going from the simplest to the more complex structural descriptors. The next step is to screen multivariable models of increased complexity until the simplest model predicting activity of interest within the experimental error is found. Naturally, this stepwise procedure can be discontinued with the single variable models. To test the quality and accuracy of derived models, the following statistical parameters should be used: the single (r) and multiple (R) correlation coefficients, the standard error of the estimate (s), a test of null-hypothesis (Ftest), and the amount of explained variance (EV).
Several major mistakes are commonly made in today's environmental QSAR modeling: a) the principle of simplicity is ignored; b) experimental error in the dependent variable is neglected and models are overfitted, c) too many independent variables are used with small sets of compounds and with a high probability of chance correlation; d) models are often made for a small congeneric series, checked and validated only with basic statistical tests (correlation coefficient and standard deviation), and residuals are rarely analyzed for possible patterns, and e) models are rarely validated and never against the entire knowledge base and data base.

Modeling the Soil Sorption Coefficients: Past Results with Empirical Variables, Their Limitations and Present Needs
Considerable work has already been done to find alternative methods that will enable us to accurately and rapidly estimate and/or predict soil sorption coefficients of xenobiotic chemicals (2,(8)(9)(10)(11)(12)(13). Water solubility (WS) and n-octanol/water partition coefficient (K.O) are generally accepted as accurate estimates of the soil sorption coefficients. The quantitative models describing relationships between the soil sorption coefficients and WS or K,w are shown in Tables 1 and 2, respectively.
Our detailed analysis of these models (14) showed surprisingly high variability in reported WS and I(w data and quantitative linear models describing their relationship to the soil sorption coefficients. For the sake of clarity, the results and models with each of the above empirical variables will be discussed separately.
The low precision of the Kow data is illustrated in Table 3. A brief examination of Table 3 clearly shows a surprisingly high variability in experimental Kow data, Table 1. Published quantitative models describing correlations between observed soil sorption coefficients and n-octanol/water partition coefficients. of reported experimental K., data vary between 0.5 and 3.3 on the logarithmic scale. The accuracy of predicted soil sorption coefficients cannot be better than the accuracy of the experimental K,, data. Table 3 contains only experimental K0w data, and all reports cited have been published within the last several years.

Equation
The second difficulty in using the n-octanol/water partition coefficient to predict the soil sorption coefficients is the wide variety of reported quantitative linear models describing their relationship (2,3,(8)(9)(10)(11)(12)(13). This large diversity in published models is illustrated in Table 1. The slopes of the models range from 0.52 to 1.00 and their intercepts, from -0.78 to 1.14. The large diversity in log Kom versus log K,, models has been recognized by other investigators in the field (15)(16)(17), but only one study (17) gave this problem serious and detailed consideration.
Notwithstanding these problems, the soil sorption coefficients were calculated for a test sample of 31 compounds (alkyl-and chloro-benzenes, heterocyclic and substituted polycyclic aromatic hydrocarbons, chlorinated alkanes and alkenes, and chlorinated phenols) with the quantitative models a, b, e, f, and g shown in Table 1 (14). (The identical test sample will be used to test all models discussed in this review.) These five quantitative models were selected because they cover at least three orders of magnitude in the soil sorption data. When several 1(, coefficients were reported for the same compound, they were all used in the calculation. The predicted value for the soil sorption coefficient of the particular compound was expressed as the range of calculated values. The average range of predicted soil sorption coefficients from their (,, coefficients was over 1.5 on the logarithmic scale (corresponding to a factor of 35). This result is consistent with the low accuracy of the experimental K,, data.
Such variability in the predicted soil sorption coefficients shows that quantitative empirical models are highly dependent on the empirical data used to create them. Consequently, the soil sorption coefficients calculated by such models will depend on the empirical input data and the particular quantitative model used to calculate them. Currently, the empirical models based on n-octanol/water partition coefficients appear insufficiently reliable in predicting the environmental distribution of chemicals.
In addition to the coefficients' low accuracy, some basic statistical requirements were overlooked when quantitative models based on K&w coefficients were developed. All quantitative models discussed above were derived through the statistical procedure known as the linear regression model or the method of least squares. The most important assumption underlying the linear regression model is that the dependent variable (commonly denoted as Y) conti as all the errors in each data pair (X, Y). In other words, the independent variable (commonly denoted as X) is free of error and has zero variance (18,19). It is obvious from an examination of Table 3 that the data for n-octanol/water partition coefficients do not conform with this basic assumption. It was demonstrated some time ago (20) that if this basic assumption is violated, the fitted slopes can deviate by as much as 40% from the correct value. Thus, the validity and applicability of published quantitative models describing the relationship between the K0w coefficients and soil sorption coefficients are highly questionable. The n-octanol/water partition coefficient data calculated by the Hansch (21) or Rekker (22) methods will have the same level of accuracy as the experimental K0w coefficients. They will contain almost all their experimental errors and uncertainties because both these methods are based on measured Kow data. For example, the use of these methods for polycyclic aromatic hydrocarbons with four or more rings results in their calculated K0w's (16,23) being up to one order of magnitude higher than those measured by Means et al. (24).
All that has been said for empirical models based on the n-octanol/water partition coefficients is also true for models based on water solubilities: a) the low precision of the measured water solubility data (Table 4), b) the wide variety of quantitative linear models correlating soil sorption coefficients and water solubility (Table 2) (3,8,9,12,13,25), and c) the violation of statistical requirements for linear regression models by large errors in the experimental water solubility data. Because some quantitative models (13,25,26) use nonmolar units (mg/ mL) for water solubility data, their use is further impeded.
From the discussion in this section, it is obvious that an accurate, simple, and fast model for predicting soil sorption coefficients of xenobiotic chemicals is still badly needed. During the last several years, considerable work has been done to find alternative, structurally based methods that will enable us to accurately and rapidly estimate and/or predict the soil sorption coefficients of xenobiotic chemicals (14,(27)(28)(29). Detailed discussion about these methods and created quantitative models is given in the following sections.

Molecular Connectivity Models for Predicting Soil Sorption Coefficients
Our long-term interest (14,(27)(28)(29) in environmental QSAR modeling is in developing general nonempirical model(s), based on information from structural formulas only, for predicting soil sorption coefficients of xenobiotic chemicals. The ultimate goal is to develop the above quantitative models with the following desired characteristics: a) be highly accurate for reliable use; b) be sufficiently simple, thus easily applicable by various scientists and other people (even laymen) involved in environmental problems as daily routine, either in the laboratory or in the field; and c) be short enough to be performed on large samples in a reasonable amount of time.
The nonempirical approach was selected because it has two major advantages over empirical approaches. First, the determination of nonempirical variables is faster and less expensive than the measurement of empirical variables and can be performed almost anywhere (e.g., office, laboratory, home, field, etc.), whereas measurements commonly require laboratory facilities with specialized equipment and qualified, experienced personnel. The second advantage is the higher accuracy of nonempirical variables which, by definition, have zero intrinsic error. This makes them ideally suitable within the rules of the standard linear regression model.
On the other hand, empirical variables (i.e., water solubility and the n-octanol/water partition coefficient) contain large experimental errors and uncertainties. Thus, it is necessary to use complex statistical procedures and experienced people to create a valid and meaningful quantitative relationship between experimental variables and environmental characteristics of hazardous chemicals.
The simplest way to represent a molecular structure is to assign it a number or a set of numbers, termed "indices." Indices generated by applying the chemical graph theory (30) are called "topological indices" (30,31). The topological indices numerically express the topology of a chemical species and usually reflect, in varying degrees, their shape and size (30,31). The particular advantage of topological indices is that they are direct and simple numerical descriptors of molecular structures and are used in the quantitative correlations with physical, chemical, biological, or environmental properties of molecules. The majority of topological indices are related either to adjacency relationship (atom-toatom connectivity) or to topological distance (the number of bonds between the pair of atoms) in the molecular structure (chemical graph). Thus, they can be calculated either from the adjacency or from the distance matrix of a chemical graph; the means by which this is accomplished varies from index to index.
The concept of molecular connectivity indices was introduced by Randic (32) and further developed and extensively exploited by Kier and Hall (33) and many others (14,(27)(28)(29)(30)(31)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50)(51). Several extensive reviews of the theory and method of calculation of molecular connectivity indices have been published recently (30,31,33,52,53). Thus, only a brief description of the calculation of the first-order molecular connectivity index used in the nonempirical models in this study is given here. (1) levels of the food chain, and d) ability to cross the blood/ brain barrier. Moreover, the soil sorption data of PAHs are from a single source (12), measured under highly controlled and uniform conditions, and therefore they have the highest degree of internal consistency and comparability. Thus, the single-source data was an excellent starting point for our project, since it gave us optimal control over the modeling process, good feeling for modeling environmental properties, and an excellent chance to create a sound model for predicting soil sorption coefficients.
The first-order molecular connectivity index was found to correlate extremely well with the soil sorption coefficients of eight PAHs as shown by Eq. (2) and its statistics. where i and j correspond to the pairs of adjacent nonhydrogen atoms and the summation runs over all bonds between nonhydrogen atoms. Molecular connectivity indices for models discussed in this section were calculated by the GRAPH III computer program on an IBM PC/XT personal computer (31). Minimum hardware and software requirements for this program are an IBM PC or compatible computer, 256 KB of memory, 1 double-sided/double-density disk drive, and a PC-DOS or MS-DOS operating system, version 2.1 or higher. The use of a mathematical coprocessor is highly recommended. Its present version, GRAPH III, can calculate the molecular connectivity indices up to the tenth order for molecules with 35 nonhydrogen atoms or less. The program can be extended to handle larger molecules if sufficient memory is available.
During the last 5 years, much work has been done to develop nonempirical models that will be free from errors imminent to all models based on empirical variables. Thus far, molecular topology has been shown (14,(27)(28)(29)40,41) to be the most successful structural variable for describing and predicting the soil sorption coefficients.
Several years ago we pioneered the idea that molecular topology could be successfully applied in correlations between molecular structure and the environmental distribution of xenobiotic chemicals (27,34). Subsequently, molecular connectivity indices were found to be very important structural variables for describing and predicting the soil sorption coefficients (27). In this preliminary study our attention was focused on polycyclic aromatic hydrocarbons (PAHs). They constitute the major group of environmental hazards to all living species because of their a) large production and widespread use, b) resistance to chemical and biological degradation, c) ability to accumulate in food chains and affect the growth and reproduction of organisms at all This result was confirmed by two other laboratories (40,41). Such gratifying results were a driving force to our continuing investigation in the same direction.
The soil sorption coefficients were collected for 29 additional compounds, mainly halogenated hydrocarbons: chlorobenzenes, polychlorinated biphenyls, and chlorinated or brominated alkanes and alkenes. Adding halogen atoms to the hydrocarbon skeleton seemed to be a very small perturbation, and it is reasonable to expect that resulting compounds will have similar environmental distribution patterns as parent compounds. We were fortunate to learn that our working hypothesis is correct. A quantitative model describing the soil sorption of hydrocarbons and their halogenated derivatives is given by Eq. Statistically, Eq. (3) results account for 95% of the variation in the log Kom data. This variation is as good as can be expected, taking into account the accuracy of measured data. The alternative models log Kom versus log WS and log Kom versus log K&, were also examined, and both were found to be inferior to Eq. (3). The fit for these empirical variables was 85 and 77% of the variation in the log Kom data, respectively. Our next task was to expand our molecular connectivity model [Eq. (3)], to define the whole range of its applicability, and test the level of accuracy for predicting soil sorption coefficients. For this test we used the identical set of 31 compounds (alkyland chloro-benzenes, heterocyclic and substituted polycyclic aromatic hydrocarbons, chlorinated alkanes and alkenes, and chlorinated phenols) as previously used for testing models based on empirical variables (the previous section). Their first-order molecular connectivity indices were calculated, and their soil sorption predicted from the molecular connecti (3).
A comparison of the observed and pr coefficients clearly demonstrates that ti nectivity model is very accurate in pr sorption coefficients. The average difi predicted and observed soil sorption cc 0.22 on the logarithmic scale (correspo: 1.66), and more than 90% of the coef dicted within the two standard devia 1,2,3,4-and 1,2,4,5-tetrachlorobenzene trachlorophenol soil sorption coefficien outside the two standard deviation rar is far superior to the empirical modeli erage difference between predicted ax sorption coefficients is over 1.5 on the] (corresponding to a factor of 35).
The highly satisfactory performance connectivity model Eq. The statistical parameters show that are statistically significant above the 99 have similar levels of accuracy. Thus, plicability of the molecular connectivil extended to alkylbenzenes, heterocycli tuted PAHs, and chlorophenols withoul accuracy. In addition to extending the cability of the molecular connectivity m results show that the initial set of ex (28) (3,(8)(9)(10)(11)(12)(13)24) where onl were considered of the published expe on those compounds. In summary, the cl of the molecular connectivity model v commonly used empirical models shows ular connectivity model is clearly supei performance, and range of applicabili models based on n-octanol/water parti or water solubility. coefficients were Our molecular connectivity model for predicting soil ivity model, Eq. sorption coefficients was based on nonpolar and nonionic compounds. Thus, it was reasonable to assume that it redicted sorption may not be valid for the highly polar and ionic comie molecular conpounds. To check this assumption our model [Eq. (4)] 'edicting the soil was tested on the following classes of compounds: aniference between lines, acetanilides, nitrobenzenes, carbamates, substiiefficients is only tuted benzenes and pyridines, phenylureas, 3-phenyl-1nding to a factor methylureas, 3-phenyl-1,1-dimethylureas, 3-phenyl-1ficients are pre-cycloalkylureas, alkyl-N-phenylcarbamates, triazines, ,tions. (Only the uracils, organic acids, and organic phosphates. The list s and 2,3,4,5-teof compounds sorted by functional groups and their soil its are predicted sorption coefficients are given by Sabljic (14). (In adnge.) This result dition to the 143 compounds listed by Sabljic, 16 coms, where the avpounds were used in the initial analysis that could not nd observed soil be simply classified by their functional groups.) The logarithmic scale first-order molecular connectivity indices were calculated for all those compounds and their soil sorption of the molecular coefficients predicted from the molecular connectivity the soil sorption model [Eq. (4)]. We expected the predicted soil sorption compounds, ex-coefficients of such a large number of compounds to be ve, into a single distributed randomly around the regression line defined alar connectivity by Eq. (4).
of soil sorption To our surprise, the predicted soil sorption coeffiunds is given in cients of all compounds used in this analysis fall below the regression line. Such an unusual result reveals valuable information about the relation between molecular structure and soil sorption properties of organic coms = 0.282 (4) pounds. First, hydrocarbons must have optimal geo-S = 0.282 metric and electronic features for strong sorption on l.4% soils. Second, the introduction of any polar atom and/ Eqs. (3) and (4) or substituent will always decrease the soil sorption % lee and both capability of the resulting compound. Although the sectlevl range bofh a ond relation was previously known to exist, this is the the range of apfirst systematic and general proof for it and, as shown ty mAodel ss ntw below, it can be described quantitatively.
ic Psarfcnsubtigh The detailed analysis of the calculated soil sorption L sacrificing high coefficients of polar compounds shows that the absolute e range of applidifference between the calculated and observed coeffiethod, the above cients depends strongly on the polar functional group perimental data and that variations within groups are small. This result i4) s is shown in Table 5, where the 143 polar compounds are of the empirical arranged into 17 classes of compounds in descending )n the extended order of their soil sorption coefficients. The average resented in this error within each functional group shown in Table 5 is rbiased No hyvery small and is always less than the standard deviaor phenol, and tion of the molecular connectivity model. Such systemose soil sorption atic behavior of the polar organic compounds can be used ported has been to indicate the presence or absence of a polar functional ntrasts with the group. In addition, the numerical difference between ly a small subset calculated and observed soil sorption coefficients (Table  ,rimental results 5, column 3) were used to determine the magnitude of omplete analysis a set of semiempirical variables (the polarity correction rersus two most factors-Pf) that can be employed to predict accurately s that the molec-the soil sorption coefficients of polar organic compounds. rior in accuracy, (The only exception to this rule are organic phosphates, ity to empirical which fall into two distinctly different groups. Curition coefficients rently, we are unable to find a structural property that can differentiate between these two groups.) The po- ;ivity indices for 143 Adsorption can be viewed as a two-dimensional process used as the polarity in which the adsorbed molecules are assumed to be in the plane of the solid's surface. A second model for of Average adsorption is that of a three-dimensional interfacial reids Pf deviation gion bordering on the solid surface where the solute has 1.00 0.16 different thermodynamic properties from those in the 1.00 0.20 bulk phase because of the effect of the solid surface. is interpretative, speculative, and may be disputed and/ or reinterpreted. This is exactly what happened during the past several years (17,57). We will discuss the vafor each class of lidity of the above evidence and the feasibility of soil sorption being a partition process.
The first argument put forward in favor of a partition obs (5) is the linearity of soil-water equilibrium isotherms for K0 m] nonionic chemicals, even at high, relative concentrations. However, Karickhoff's group (12) measured the i particular class. soil sorption of 10 small polycyclic aromatic hydrocarcompounds were bons (including benzene) and chlorinated hydrocarbons on factors by Eq. and found that "as the sorbate water concentration approaches 60 to 70% of the sorbate aqueous solubility, the isotherms typically bend upward, indicative of in-(6) creased sorption." Similar nonlinear behavior was found (f6) for dibromoethylene, parathion, hexachlorocyclohexanes, napropamide, and terbufos. These results are colrecalculated and lected in the study by Mingelgrin and Gerstl (17). In total of 215 com-addition, linear adsorption isotherms of parathion, monn Eq. (7) and its uron, methyl-parathion, oxydipropionitril, trietazine, hexanol, hexachlorocyclohexanes, and glycine and its peptides on clays was reported (17,58). Such adsorbents 8 are very different from the soil organic matter; they do not possess the presumed solvent action of the organic (7) matter fraction that is invoked as the phase where pars = 0.279 tition from the aqueous solution occurs in soil. Thus, it is fair to conclude that this argument is not generally valid for soil sorption of nonionic chemicals, and it supicients for almost ports both the adsorption and partition processes.
.l sorption coeffi-The second argument favoring partition is the very il sorpton coeigood correlation between the soil sorption coefficients il sorpnonoempirand the water solubility of nonionic organic compounds. sorption coefi-We demonstrated in the third section that a very good on such a broad correlation between the soil sorption coefficients and ounds.
water solubility of nonionic organic compounds looks more like wishful thinking than reality itself. The truth Sorption is that the correlation holds only within a congeneric series of compounds. Thus, serious caution must be exercised not to assign general applicability to them. For rption of nonionic example, correlation based on polycyclic aromatic comd as the physical pounds gives reasonable estimates for soil sorption coef-ficients of small chlorinated hydrocarbons, but tends to overestimate the sorption of highly chlorinated hydrocarbons by as much as an order of magnitude (16).
Finally, a weak correlation (r2 = 0.635) is obtained (17) between Kom and WS for a large number of nonionic, structurally diverse pesticides. In any case, the lack of good correlation between these two quantities does not prove or disprove the adsorption or partition models. Note here that better correlations are usually found between the soil sorption coefficients and water solubility than between the soil sorption coefficients and n-octanol/water partition coefficients. If the soil sorption process is mimicking a partition process, we would expect the opposite to be true.
The third argument put forward favoring partition being the process of soil sorption is the thermodynamic effects. Chiou, Peters, and Freed argued that the observed enthalpies support the partition hypothesis (8). This argument is based on an assumption that the adsorption process implies a decrease in entropy and requires a high enthalpy to make adsorption feasible. By the same reasoning, partition may not be as exothermic as adsorption.
In reality, these assumptions are not valid. Adsorption from solutions is usually a competitive process. The entropy can therefore be either positive or negative, depending on the balance between the entropy of the solvent and that of the solute. Consequently, the change in enthalpy of adsorption may have any magnitude and sign. Since the changes in enthalpy and entropy for both partition and adsorption may vary significantly in magnitude and sign, the value of these parameters does not prove or disprove the adsorption or partition models.
Below we examine our results on modeling soil sorption of nonionic organic chemicals with regard to the nature of this process, namely, surface adsorption or solute partition. We proposed (28) that the first-order molecular connectivity index be viewed as a quantitative measure of the area occupied by the optimal projection of the molecular skeleton. Since the large majority of the 72 studied nonpolar compounds are "quasi" two dimensional, their index should correlate very well with their surface areas. This prediction was found (14,28) to be true for all nonpolar compounds discussed in this study. The molecular surface areas (SM) of those compounds were calculated by the method recently suggested by Gavezzotti (59). The following regression equation is obtained for all 72 compounds: SM = 24.6 lX + 57.7 (8) with the correlation coefficient equal to 0.956. This result supports the proposed physical meaning of the firstorder molecular connectivity index. Since the two-dimensional representation of the molecular structure is sufficient to explain all quantitative variations in the soil sorption data of nonpolar compounds, the process of soil sorption may be viewed as an attractive interaction between two planes, with the magnitude directly proportional to the main plane area of the solute molecule. Thus, it can be concluded that our results on modeling the soil sorption process favor an adsorption mechanism.
The available results and the previous discussion clearly show that the arguments presented (8) in favor of the partition process being the physical basis of soil sorption of nonionic organic molecules are either unjustified or insufficient. It is unfortunate that such a partial study (8) on the mechanism of soil sorption was published in the highly regarded scientific journal Science, giving it, in our opinion, undeserved respectability. Only a few (17,57,60) studies have made a critical assessment of their arguments, while the majority of published articles uncritically accept the partition as a proven, universal physical basis for the soil sorption of nonionic organic molecules (2,3,(9)(10)(11)(12)(13)16,(24)(25)(26)47,61). At any rate, the available results cannot be used as an unambiquous support of an adsorption mechanism. We speculate that it is most likely that the complexity of the soil and the process of soil sorption are so great that in reality many interactions may control the process, and it is impossible to assign a single mechanism to it. Consequently, the mechanism of soil sorption will be the balance of several possible interactions between the soil and solute molecule, the magnitude of which will be decided by the structural characteristics of the solute molecule and the composition of the soil.
Obviously, much work is needed before clearly identifying specific mechanisms of soil sorption is possible. One possible direction, increasing the fundamental understanding of the mechanisms involved in soil sorption of nonionic chemicals, is to study molecular modeling and computer simulations of this process. The main obstacle to this approach is the lack of an accurate molecular model of soil organic matter. The significant fraction of soil organic matter is humic, thus a solid starting point will be creating an accurate molecular picture of humic acids. The methodology of such an approach should include highly sophisticated computer graphics routines, coupled with molecular mechanics or/and quantum mechanical methods. Of course, the experience on modeling the interactions with or within macromolecules (polypeptides, DNA, RNA, etc.) will be of enormous help.

Areas of Application for Molecular Connectivity Model
In this final section we briefly review the areas of application and planned improvements for the molecular connectivity model. This topological model outperforms the models based on the n-octanollwater partition coefficients or water solubility in the accuracy, speed, and range of applicability. In addition, direct correspondence between molecular structure and this particular topological index makes it possible to locate structural features respon-sible for environmental behavior of organic pollutants in the soil and learn more about the underlying mechanism(s) of soil sorption processes on a molecular level.
With the addition of a single semiempirical variable, the presented topological model is also very useful for predicting soil sorption coefficients of polar and even ionic compounds. In fact, we were able to predict accurately the soil sorption coefficients for nearly 95% of all organic chemicals whose soil sorption coefficients have been measured. This result gives us confidence that this topological model will also be accurate in predictions for the compounds whose soil sorption coefficients have not yet been measured. Such an accurate, simple, and fast nonempirical model is almost the ideal predictive tool for ranking potentially hazardous chemicals and for creating priority lists for testing them. The model will enable government agencies, industry, and public groups to make fast and accurate assessments on the environmental fate of proven or potentially hazardous chemicals. The practical need for such a reliable, fast nonempirical model is particularly emphasized by the new U. S. Federal Pesticides Law which will require the review of nearly 600 active pesticide ingredients for their environmental safety within 2 years after the law is passed by Congress. Finally, the nonempirical nature of this model will enable the manufacturers of pesticides and other classes of organic pollutants to predict accurately the soil sorption capacity and, consequently, the potential environmental hazards of their future products, even before such compounds are synthesized.
The second area of application for molecular connectivity model are exposure assessment methods for groundwater quality (62,63). It should be possible to couple our model for estimating sorption coefficients with the models for calculating concentration profiles and the transport of chemicals in groundwater aquifers.
This will result in better assessment methods for predicting exposure concentrations of xenobiotic chemicals in groundwater resources. Predictions of chemical concentration with a high degree of accuracy are essential for retrospective and prospective epidemiological studies, for modification of agricultural practices to reduce the quantity of these chemicals entering groundwater resources, and for the reclamation of contaminated aquifers.
In the future, the molecular topology approach may also be helpful in developing new global ecological models that will give more insight into the expected distribution patterns and the behavior of chemicals in the environment. The molecular topology approach relative simplicity and the possibility to reduce substantially the empirical component in ecological modeling studies make it far more atractive than the correlations between empirical variables, which are of limited usefulness because they lack theoretical basis. In ecological models, different distribution coefficients are combined to describe a type of ecosystem. It would be extremely convenient to have quantitative models describing various environmental distribution coefficients of commer-cial chemicals based on the same or similar concepts. This will facilitate the development of global, complex environmental models since they can be easily built up as combinations of modules, with each module representing an individual environmental property.

Future Developments for Molecular Connectivity Model
It is unfortunate that we had to introduce a semiempirical variable into our model and lose the beauty of a purely nonempirical model, trying to extend its use for polar and ionic chemicals. Thus, our future efforts will be focused on finding the structural variable(s) which can explain and quantify the soil sorption behavior of polar and ionic chemicals. Structural analysis of our results on polar and ionic compounds shows that their soil sorption capacity depends strongly on the presence or absence of particular polar functional group(s), and other factors have only minor influence on resulting soil sorption coefficient.
The next developmental stage for the molecular connectivity model is to test its ability to predict the soil sorption coefficients of new, commercial chemicals that are registered at a very high rate. Their main structural characteristics are unusually large size and simultaneous presence of a large number and variety of functional groups. Such a trend will be emphasized even more in the future. Thus, it is important to learn whether or not our model will work correctly for these chemicals.
Our model, as well as all other published models, are developed for relatively small and functionally simple chemicals and may be invalid for large and highly complex molecular structures. Such situations will require either introducing new structural descriptors to account for the unique behavior of the new commercial chemicals, or developing a completely new way of reasoning about these chemicals. The latter solution, if needed, will be an exceptionally difficult task with respect to efforts and time.
Another area of interest in the field of environmental modeling is to develop molecular connectivity models that will predict sorption properties of commercial chemicals on other soil or sediment components (clay, sand, silt, swelling clay, etc.) and the wide variety of surfaces frequently encountered in the subsurface. When developed, these models will be extremely valuable, since the experimental data for such materials are exceptionally scarce. The calculated sorption coef-ficients will be incorporated into the transport models (62,63) to calculate concentration profiles for a wide variety of field conditions. These examples will be also used to identify sources of error that are introduced by the numerical methods used in transport models and the limitations of models used to calculate sorption coefficients from structural characteristics. Consequently, steps will be taken to eliminate or minimize their impact and develop better models.