Pattern recognition analysis of a set of mutagenic aliphatic N-nitrosamines.

A set of 21 mutagenic aliphatic N-nitrosamines were subjected to a pattern recognition analysis using ADAPT software. Four descriptors based on molecular connectivity, geometry and sigma charge on nitrogen were capable of achieving a 100% classification using the linear learning machine or iterative least squares algorithms. Three descriptors were capable of a 90.5% and two descriptors of a 85.7% overall correct classification. Three of the four descriptors were each capable of classifying 15 of the 16 active chemicals while it required three of the four descriptors to classify correctly two of the five inactive chemicals. These results are in concert with previous observations that molecular connectivity, geometry, and sigma charge on nitrogen are powerful descriptors for separating active from inactive mutagenic and carcinogenic N-nitrosamines.


Introduction
Structure-activity analysis is a powerful tool to study the relationship between chemicals and their biological effects. The elucidation of specific chemical features that directly or indirectly result in the expression of a biochemical or biological change can provide new insights into the mechanisms of action of the chemicals as well as the structure and function of the biological system.
We have previously reported on the development of an innovative genetic toxicology system that determines the mutagenic activity of chemicals in mammalian cells using a specific gene mutation at the sodium/potassium-adenosine triphosphatase (ATPase) gene (1). The system utilizes mutable Chinese hamster lung cells (V79) that are cocultivated with primary hamster hepatocytes serving as the metabolic activation component. The use of intact liver cells rather than subcellular liver enzyme fractions allows the simulation of the metabolic competancy of liver in vivo and as a result quantitative relationships observed in vivo can be reproduced in vitro (2,3). Intact liver cells also provide a means to study liver specific chemical toxicants (4,5). In this re- gard, aliphatic N-nitrosamines have been recognized as hepatotoxic and hepatocarcinogenic in a variety of rodent species including mouse, rat, hamster, and guinea pig (6). The mechanisms by which some of these Nnitrosamines (e.g., dimethylnitrosamine, diethylnitrosamine) are metabolically activated to genotoxic and carcinogenic forms are well known and involve a.-carbon oxidation and the formation of a methyl or ethyl carbonium ion which alkylates DNA. However, the mechanism by which longer chain aliphatic N-nitrosamines are metabolically activated and the species that interact with DNA are still a matter of speculation. It has been postulated that propyl N-nitrosamines are metabolized to both a methylating and propylating species as methyl residues in DNA (7) and propyl residues in RNA have been isolated (8). Dipropylnitrosamine is also metabolized to bis-2-oxypropylnitrosamine (7-9), which can methylate DNA in vrivo after further metabolism (10,11). Previous structure-activity studies on groups of carcinogenic N-nitrosamines by Rose and Jurs (12) using pattern recognition and by Dunn and Wold (13,14) using SIMCA pattern recognition have indicated a variety of functions which participate in the separation of positive and negative chemicals. In this study a set of related propyl and butyl N-nitrosamines previously examined for their ability to induce specific gene mutation in the V79 cell-hamster hepatocyte cocultivation system (11) were incorporated into a structure activity analysis using pattern recognition techniques. It was of interest to relate specific physiochemical, electronic, and topo-graphic parameters to mutagenic activity in order to study the mechanisms of action ofthese N-nitrosamines.

Computational Methods
The methods used were implemented through an interactive computer software system known as ADAPT (Automated Data Analysis by Pattern-recognition Techniques), which has been reported elsewhere (15). ADAPT is maintained on a UNIVAC 1100 mainframe and interactively accessed via a Tektronix 4054 terminal using PLOT 10 graphics.
The chemical structures of the compounds comprising the data set were entered into computer disc files by sketching the structures on a graphics display terminal using a joystick. Each structure was modeled using a force-field molecular mechanics model builder.
Descriptors were calculated using various algorithms incorporated into the ADAPT system. Molecular connectivity was calculated according to Kier and Hall (16), a sigma charge according to del-Re (17), and geometric descriptors from atomic coordinates.

Data Set
The 21 N-nitrosamines ( Fig. 1) were evaluated for their ability to mutate Chinese hamster lung cells (V79) (17) in the presence of primary hamster hepatocytes from Syrian golden hamsters according to previously published procedures (1,2,11). The mutation locus selected was ouabain resistance. The data for 14 of the 21 Nnitrosamines has already been reported (11). MTBN, 2BOB, 2BHB, DEN, 3POP, 3MHP and 3MOP were evaluated under the same conditions reported by Langenbach et al. (1), and their activity (active or inactive) ascribed from the mutation frequency [No. of ouabain resistant mutants/106 survivors] at 0.7 mM (< 10, inactive; > 10 active).

Results and Discussion
A set of 21 N-nitrosamines consisting of 14 propylnitrosamines, 5 butylnitrosamines, dimethylnitrosamine, and diethylnitrosamine evaluated for mutagenic activity in a hepatocyte-V79 cell mutagenesis assay were used ( Table 1, Fig. 1).
Of the 21 N-nitrosamines, 16 had mutagenic activity when tested at a concentration of 0.7 mM. Mutagenic activity was defined as the induction of more than 10 ouabain-resistant mutants per 106 surviving V79 cells.
Five N-nitrosamines (3HPP, BAP, BHP, MTBN, and 2BHB) produced less than the above number of mutants and were considered inactive.
The molecular structures of the N-nitrosamine dataset were entered into the ADAPT pattern recognition system and descriptors were generated. Initially a large   Table 1. aMutagenicity determined in the hamster hepatocyte-V79 cell cocultivation system using ouabain resistance as the genetic marker. aValues are mean ± SD. bMOLCl is path 1 molecular connectivity for all bonds. CGEOM3 is the smallest principal moment (Z). dGEOM6 is the ratio of the intermediate principal moment (Y) and the smallest principal moment (Z).
SCSAl is the sigma charge on the terminal nitrogen to which the nitroso moiety is attached (*N-N=O).
number of descriptors were calculated, including molecular fragment descriptors (number and kinds of atoms and bond types), branching index descriptors, molecular size and volumes descriptors, and charge distribution descriptors. In order to insure that each descriptor contributed unique information about the molecules, multiple correlation analyses of the descriptors were performed. Those descriptors whose values were highly correlated with other descriptors were discarded.
Four descriptors (Table 2) were found to be free of colinearity with the most correlated descriptors GEOM3 and GEOM6 exhibiting a correlation coefficient of 0.576 (Table 3). No multicolinearity was detected. MOLCl is the path 1 molecular connectivity which is an index of branching and a measure of the complexity of the molecule; the geometric descriptor GEOM3 describes the smallest of the three principal moments of a molecule; GEOM6 is the ratio between the intermediate and smallest principal moments; SCSA1 was calculated by the del-Re method and is the sigma charge on the terminal nitrogen to which the nitroso moiety is attached   cWeighted average of percent correctly identified active and inactive chemicals.
(*N-N = 0). The simple statistics of the distributions of the four descriptors partitioned into the active and inactive classes are given in Table 2.
Using the linear learning machine algorithm, the entire dataset and four descriptors were first used to train weight vectors for construction of a separable pattern space. Complete convergence was attained rapidly, usually with less than 50 feedbacks using a deadzone value of 100. Using 100 training sets of randomly generated datasets where three of the active class and one of the inactive class were removed, the linear learning machine algorithm achieved 97% overall correct prediction of 17 member prediction datasets. When the prediction datasets consisted of the four members left out of the training dataset, the overall prediction was 80%.
The database of 21 N-nitrosamines did not contain an equal number of active and inactive chemicals, and the number of descriptors used to achieve separation was only one less than the number of inactive chemicals. It was necessary to determine if the four descriptors chosen specifically classified only the inactive compounds or if they recognized both the active and inactive compounds. This possibility was assessed by excluding each of the descriptors separately to determine the effectiveness of the linear learning machine. Exclusion of any of the four descriptors resulted in nonconvergence of the training routine of the linear learning machine; the weight vectors generated could not separate the active and inactive compounds during training without misclassification. However, the exclusion of no one descriptor specifically resulted in misclassification of only an inactive compound ( Table 4). Omission of a descriptor always resulted in an error in classifying at least one active and one inactive N-nitrosamine. This would suggest that although the number of descriptors used was nearly equal to the number of inactive N-nitrosamines, each descriptor did not select for a particular member of the inactive class. Indeed, it was found instead that in order to correctly classify the inactive N-nitrosamines MTBN and BHP, a minimum of three and two descriptors were necessary, respectively. The molecular connectivity and geometric descriptors were capable of correctly classifying 15 of 16 active chemicals, indicating each to strongly contribute to the separation. HPOP and BOP were the most difficult to identify as it required three descriptors to correctly identify them (Table 4).
Utilizing four descriptors and the complete 21 chemical dataset the linear learning machine achieved an overall classification of 100% ( Table 5). The iterative least-squares algorithm was also able to attain 100% correct overall classification in 71 iterations. A linear discriminant function analysis program was able to generate a discriminant function that could separate the active class correctly but misclassified one chemical in the inactive class. Bayes' quadratic classifier achieved a 93.8% correct classification of the active class and incorrectly classified an inactive class member. Bayes' linear classifier was least effective at separating either the inactive or active class and classified 13/16 (81.2%) of the active class correctly but missed 1/5 of the inactive class.
The power of the descriptors chosen is evident when only two are used in the linear learning machine classification of the 21 N-nitrosamine dataset. The most accurate two descriptor set was MOLC1/GEOM3 which gave an overall classification of 85.7%. MOLC1/GEOM6 or GEOM3/GEOM6 gave identical classifications, 80.9%. All other combinations produced classifications of < 71%.
Earlier structure-activity analyses using 14 of these 21 N-nitrosamines indicated that molecular connectivity path 1 calculations correlated well with the absolute mutation frequencies of the 14 N-nitrosamines (11). Rose and Jurs (12) also found molecular connectivity to play an important role in the classification of a 150 N-nitrosamine carcinogen dataset as its deletion from the descriptor list reduced the overall classification from 97.33% to 74.67%. Rose and Jurs (12) also found GEOM3 (smallest principal moment) and sigma charge on the N-nitrosamine moiety to be important in their pattern recognition analyses. Therefore, the molecular connectivity, sigma charge, and geometric descriptors seem to be a powerful combination for separating active from inactive mutagenic and carcinogenic N-nitrosamines.