The Genetic Activity Profile database.

A graphic approach termed a Genetic Activity Profile (GAP) has been developed to display a matrix of data on the genetic and related effects of selected chemical agents. The profiles provide a visual overview of the quantitative (doses) and qualitative (test results) data for each chemical. Either the lowest effective dose (LED) or highest ineffective dose (HID) is recorded for each agent and bioassay. Up to 200 different test systems are represented across the GAP. Bioassay systems are organized according to the phylogeny of the test organisms and the end points of genetic activity. The methodology for the production and evaluation of GAPs has been developed in collaboration with the International Agency for Research on Cancer. Data on individual chemicals have been compiled by IARC and by the U.S. Environmental Protection Agency. Data are available on 299 compounds selected from volumes 1-50 of the IARC Monographs and on 115 compounds identified as Superfund Priority Substances. Software to display the GAPs on an IBM-compatible personal computer is available from the authors. Structurally similar compounds frequently display qualitatively and quantitatively similar GAPs. By examining the patterns of GAPs of pairs and groups of chemicals, it is possible to make more informed decisions regarding the selection of test batteries to be used in evaluating chemical analogs. GAPs have provided useful data for the development of weight-of-evidence hazard ranking schemes. Also, some knowledge of the potential genetic activity of complex environmental mixtures may be gained from assessing the GAPs of component chemicals. The fundamental techniques and computer programs devised for the GAP database may be used to develop similar databases in other disciplines.


Introduction
Data derived from short-term tests are usually interpreted according to the phylogenetic category of the test and the end point detected. Commonly studied end points include DNA damage, gene mutation, sister chromatid exchange, micronuclei, chromosomal aberrations, aneuploidy, and cell transformation. Few short-term bioassays monitor more than one or two of these end points. Therefore, data from a variety of short-term tests are required to properly define the response profile of a given chemical agent.
Garrett et al. (1) developed a technique for presenting the quantitative genetic toxicology data for a chemical compound as a bar graph (genetic activity profile) in which test systems (identified by three-letter code words) are displayed along the X-axis, and values corresponding to the doses employed in the tests are shown on the Y-axis. The total data available from up to 200 different short-term bioassays for a compound are thus presented in a standardized format that allows rapid visualization of the genetic (or related) effects induced. The technique facilitates qualitative as well as quantitative assessments of genetic toxicity. Current procedures for preparing and evaluating Genetic Activity Profiles (GAPs) are described by Waters et al. (2) in the context of their use by the International Agency for Research on Cancer.

Methodology
The data set for a given chemical, consisting of a discrete set oftests and the doses required to induce responses in those tests, are presented in a bar graph illustrated in Figure 1. The bars (profile lines) originating on the X-axis represent the tests plotted in either a phylogenetic or end point sequence. A three-letter code is used to identify the test system represented by each bar. Values on the Y-axis are the logarithmically transformed lowest effective doses (LED) and highest ineffective doses (HID) tested. The term "dose," as used in this report, does not take into consideration length of treatment or exposure and may therefore be considered synonymous with concentration. The doses or concentrations used for all in vitro tests were converted to micrograms per milliliter and those for in vivo tests to milligrams per kilogram body weight per day. Because dose units are plotted on a log scale, differences in molecular weights of compounds do not greatly influence comparisons of their GAPs.
Profile-line height (the magnitude ofeach bar) is a function of the LED or HID, which is associated with the characteristics of each individual test system, such as population size, cell-cycle kinetics, and metabolic competence. Thus, the detection limit of each test system is different, and across a given GAP, responses will vary substantially. No attempt is made to adjust or relate responses in one test system to those of another.  Line heights are derived as follows: For negative test results, the highest dose tested without excessive toxicity is defined as the HID. If there is evidence of extreme toxicity, the next lower dose is used. A single dose tested yielding a negative result is considered equivalent to the HID. For positive results, the LED is recorded. Ifthe original data have been analyzed statistically by the author, the dose recorded is that at which the response was significant (p < 0.05). Ifthe data were not analyzed statistically, the dose required to produce an effect is estimated as follows: When a dose-related positive response is observed with two or more doses, the lower of the doses is taken as the LED; a single dose resulting in a positive response is considered equivalent to the LED.
To accommodate both positive and negative responses on a continuous scale, doses are transformed logarithmically so that effective (LED) and ineffective (HID) doses are represented by positive and negative numbers, respectively. The logarithmic dose unit (LDUij) for a given test system i and chemical j is represented by the expressions: LDUiJ =-logl0(dose), for HID values; LDU < 0 LDUiJ = 5-logI0(dose), for LED values; LDU > 0 These simple relationships define a dose range of 0 to -5 logarithmic units for ineffective doses (1-100,000 ,ug/mL or mg/kg body weight) and 0 to +9 logarithmic units for effective doses (100,000-0.0001 pglmL or mg/kg body weight). A scale illustrating the LDU values is shown in Figure 1. Negative responses at doses less than 1 pg/mL (mg/kg body weight) are set equal to 1. Effectively, an LED value > 100,000 or an HID value < 1 produces an LDU = 0; no quantitative information is gained from such extreme values. Levels of log dose units between 1 and -1 define a "zone of uncertainty" in which positive results are reported at very high doses (10,000-100,000 1g/mL or mg/kg body weight), and negative results are reported at relatively low dose levels (1-10 lAg/mL or mg/kg body weight).
All dose values are plotted for each assay using either a bar (-) for results obtained in the absence of an exogenous metabolic system or a caret (A) for those obtained in the presence of an ex-ogenous metabolic system. When all results for a given assay are either positive or negative, the geometric mean ofthe responses is plotted as a solid line; when conflicting data are reported for the same assay (i.e., both positive and negative results), the majority data are shown with a solid line and the minority data with a dashed line (drawn to the extreme response). In the few cases where the numbers ofpositive and negative results are equal, the solid line is drawn in the positive direction, and the negative response is indicated with a dashed line, drawn from the origin to the extreme negative LDU.
The three-letter code words representing the commonly used tests were originally defined by the Gene-Tox Program ofthe U.S. Environmental Protection Agency (EPA) (3,4). These codes have been systematically redefined and expanded in a manner that should facilitate inclusion of additional tests in the future (2).

Evaluation of Genetic and Related Effects
The International Agency for Research on Cancer (IARC) has employed the GAP methodology (2) in evaluating genetic and related effects of suspected human carcinogens in IARCMonographs, supplement 6 (5) and in volumes 36, 39, 41,44 (13)(14)(15)(16) and 46-50 (6-10). Table 1 illustrates the procedure currently employed by the IARC as it relates to the GAP database.
Nesnow (personal communication) has recently completed a PC D-Base version of the data on carcinogenicity contained in IARC Monographs, supplement 7 (17) and IARC Monographs, volumes 43 (18), 44 (16), 45 (19), and 46-49 (6-9). Most of these agents are included in the GAP database [derived from IARC Monographs, supplement 6 (5) and volumes 46-50 (6-10)]. Therefore, these two databases can be used to examine retrospectively the usefulness of short-term tests for the prediction of carcinogenicity and the relationship between specific genetic end points or assays and carcinogenicity.
Personal Computer Version of the GAP Database, Version 3.0 Copies of software for IBM-compatible personal computers to display and search GAPs are available from the authors. Computer programs require the following minimum configuration: PC using Intel 8086 chip (PC XT) with 640 kb memory, a hard disk drive, an enhanced graphics card (EGA or VGA), a highresolution color monitor, and DOS version 3.2 or higher. Generally, an Intel 80286 (PC AT) computer is preferred because data processing and graphics display are faster than with the 8086 computer. Optional devices used for data and graphic output include a line printer and plotter. Alternatively, a laser printer or equivalent can be used to print the HPGL plotter files using additional software.
The GAP software is distributed on three double-sided, double-density, 5.25-inch floppy disks; the program disk, the data disk, and the GAP bibliography. Executable programs are archived on the program disk and are compiled from programs written in Turbo Pascal. During installation ofthe programs, the necessary directory and subdirectories are created. The installed programs and data use approximately 1.2 Mb of disk space.
The bibliography of the GAP data is also in an archived file. The file requires 0.7 Mb ofdisk space; however, the bibliography is not necessary to operate the GAP programs, and it may be deleted and reinstalled as needed.
The data disk consists of two data sets, [ARC and EPA. The IARC data set contains data on 299 agents published in supplement 6 (5) and in volumes 46-50 of the IARC Monographs (6-10). The EPA data set contains data on 115 agents assembled for the Genetic Toxicology Division ofthe U.S. Environmental Protection Agency (11). A list ofthe individual projects included in each data set may be viewed using the GAP computer program. A data subdirectory is provided for users to enter their own data.

Main Program Menu
The main program menu ofGAP version 3.0 offers the following selections: agents, profiles, data listings, modify data, short citations, and additional information. "Agents" provides options to list the available projects, CAS numbers, and agent names. Another menu allows ordering ofthe list by any ofthe three options. "Profiles" provides graphic display ofthe short-term test data on selected agents. A menu is used to select the sequence of test codes, either in phylogenetic order oforganisms (i.e., prokaryotes, lower eukaryotes, etc.) or in test end point order (i.e., DNA damage, gene mutation, etc.). Individual test codes may be examined to determine the source citations by using the GAP program zoom-in features.
"Listings" produces a listing ofthe data in either phylogenetic or end point order and may be directed to the PC screen, to a printer, or to a data file. "Modify data" is used to add, change, or delete agents or test results (test codes, results, doses, and reference numbers).
"Short Citations" permits searching the literature citation information for approximately 6000 short citations contained in the GAP database. The citation information includes the citation number (LITNR), the Environmental Mutagen Information Center (EMIC) accession number, and a short citation (consisting of the last names of up to three authors, the first page number and the year of the publication). The citation information may be searched by author or by EMIC number to determine if a citation is present. Short citations also may be added to the file and are automatically assigned citation numbers.
"Additional information" includes three-letter test code definitions, the scale oflog-dose units used in the profiles, information on the dose conversions, and tables listing projects for both the EPA and IARC data sets.

Some Applications of the GAP Database Comparative Evaluation of Genetic Activity Profles Using Computer-Based Proflle-Matching Techniques
Where an adequate number of the same tests have been used to evaluate two or more chemicals, it is possible to use the mainframe computer to select matching pairs of GAPs. This computer-based pairwise matching process may be extended to all chemicals in the database. The pilot applications of this procedure to EPA databases on known or suspected human carcinogens (I) and on pesticide chemicals (12) have demonstrated that structurally similar compounds frequently display qualitatively and quantitatively similar profiles ofgenetic activity. This implies that the GAP database should be ofconsiderable utility in structure-activity relationship investigations and in test battery selection (20).
By examining the patterns of GAPs of pairs and groups of chemicals, it is possible to make more informed decisions regarding the selection of test batteries to be used in the subsequent evaluation of structurally similar chemicals. The approach draws on all information within the database and may be linked to computer systems that model the molecular properties of the chemicals under evaluation (21). This comparative information can enhance our understanding of the relationships between genetic and related activity in short-term tests and molecular properties of structurally related chemicals and thus contribute to our knowledge ofthe mechanisms ofcomplex processes such as carcinogenesis.

Testing and Evaluating Complex Mixtures
A recent application ofGAPs is in testing and evaluating complex mixtures (22). Some knowledge ofthe potential genetic activity of a complex environmental mixture may be gained from assessing the genetic activity of its component chemicals. This requires information on the chemical components and composition of the mixture. For example, the Atmospheric Chemical Compound database developed by Graedel et al. (23) contains information on chemical structures, properties, detection methods, and sources of chemicals found in ambient air. The GAP database provides a computer-generated graphic representation of genetic bioassay data as a function of dose. Using the two databases, information on the quantity of an individual chemical present within a mixture may be related to the quantity (LED) of the chemical required to demonstrate a positive response in one or more genetic bioassays. Quantitative information on the carcinogenic potency of each individual compound (TD5o value) may also be related to the quantity present in the mixture or mixture fraction. In turn, the quantity ofthe chemical in the complex mixture to which humans are exposed may be estimated and used to calculate the percent human exposure dose/rodent potency dose (HERP) for the chemical (24)(25)(26). Using an additivity assumption, for example, an estimate of potential carcinogenic hazard for the mixture may be calculated based on the HERP indices for the known chemical components. This conceptual approach is limited by the relatively small number ofchemicals identified in complex mixtures for which genetic toxicology and animal cancer data exist.
Weight-of-Evidence Ranking Schemes Committee 1 ofthe International Commission for Protection Against Environmental Mutagens and Carcinogens (ICPEMC) has for several years been involved in the development of a computer-based methodology to assess the evidence from shortterm genetic tests that a chemical is a mutagen (27). The evaluative approach selected by ICPEMC Committee 1 is based on a '"weighted test" scoring system that provides a relative ranking of genotoxic potential. Input data for this ranking methodology have been obtained from the GAP database described above (28). The results ofthe application ofthe Committee 1 ranking scheme are to be compared by ICPEMC to results obtained by applying the carcinogenicity ranking scheme of Nesnow (29,30).

Development of Other Databases
The fundamental techniques and computer programs devised for the GAP database may be used to develop similar databases in genetic toxicology and in other disciplines. Dearfield et al. (31) have described the application ofthe GAP methodology to the database being constructed by the EPA Office of Pesticide Programs. Kavlock et al. (32) have successfully used the approach and modified computer programs to assemble graphic activity profiles and corresponding data listings for several developmental toxicants.

Future Directions
A useful application ofthe GAP database in the future will involve computer-based profile-matching techniques with weightof-evidence ranking schemes to create a subset ofchemicals that act similarly, i.e., have similar GAPs. Correlative structureactivity approaches can then be used more effectively to identify the substructural elements ofchemicals that are responsible for particular biological responses so as to suggest biologically plausible mechanisms of action (33).