Use of selected toxicology information resources in assessing relationships between chemical structure and biological activity.

This paper addresses the subject of the use of selected toxicology information resources in assessing relationships between chemical structure and specific biological end points. To assist the researcher in how to access the primary literature of genetic toxicology, teratogenesis, and carcinogenesis, three specific specialized information centers are discussed--Environmental Mutagen Information Center, Environmental Teratology Information Center, and Environmental Carcinogenesis Information Center. Also included are descriptions of information resources that contain evaluated (peer-reviewed) biological research results. The U.S. Environmental Protection Agency Genetic Toxicology Program, the International Agency for Research on Cancer Monographs, and the Toxicology Data Bank are the best sources currently available to obtain peer-reviewed results for compounds tested for genotoxicity, carcinogenicity, and other toxicological end points. The value of published information lies in its use. It has become evident that most information cannot be accepted at face value for interpretation and analysis when subjected to stringent quality evaluation criteria. This deficit can be corrected by rigid editorship and the cognizance of authors. Increased interest in alternative methods to in vivo animal testing will be exemplified by use of short-term bioassays and in structure-activity relationship studies. With respect to this latter area, it must be remembered that mechanically (computer generated) derived data cannot substitute, at least at this stage, for data obtained from actual animal testing. The future of structure-activity relationship studies will rest only in their use as a predictive tool.

Attempts to correlate chemical structure with the induction of specific toxicological end points must include a strong alliance with specialized information centers that collect, store, and distribute information relevant to the various disciplines of toxicology. This alliance is necessary if researchers interested in exploring structure-activity relationships (SAR) are to keep current with the steadily expanding volume of literature being published. The success of an investigator's endeavors will be proportional to his or her ability to access and to use the literature pertinent to his or her scientific area of interest.
An array of toxicology and chemical information resources is available to SAR practitioners (1,2). Akland and Waters (2) have recently catalogued the major chemical and toxicological data bases and information files that can provide information for assessing SAR. The volume of information represented in the resources described in this publication has increased significantly during the last 16 years. No doubt, if a researcher within any area of toxicology succumbed to the Rip van Winkle syndrome, nodded off to sleep in 1968, and awakened in 1984, he or she would be astounded at the rate at which papers were being published. One would also be overwhelmed at the amount and complexity of the information accumulated. Two areas of toxicology (genetic toxicology and teratology) have been selected to demonstrate this rapid literature growth (Table 1). These research areas have come into prominence within the last several years as efforts have increased to determine the effects of environmental agents, especially chemicals, on man's health and well-being.
The intent of this paper is to focus on those information resources and/or data bases that persons interested in conducting SAR studies can use from the areas of general toxicology, genetic toxicology, teratology, and carcinogenicity.
cNumbers for these years are incomplete.
difficult the scientist's task of acquiring, using, and communicating information. Before 1960, access to information in a particular discipline could be obtained through communication with colleagues, attendance at select scientific meetings, and regular reading of a few key journals. Scientific advancement, however, has made inadequate these methods of acquiring information. Today, scientists in all disciplines of toxicology are plagued with the intrinsic problem of having to deal with accessing and using a rapidly growing volume of literature. Researchers attempting to assess the relationships between chemical structure and toxicology end points are not exempt from this problem. As a matter of fact, they are faced with an additional handicap because of the lack of data bases with evaluated (peerreviewed) biological research results. In order to overcome this deficiency, SAR enthusiasts will have to initiate a painstaking effort to fulfill this need for biological research results by collecting, reviewing, and evaluating papers from the primary literature (2). However, this painstaking task can be made much easier through the use of the selected specialized information centers described in this paper. Furthermore, by taking advantage ofthe available evaluated biological effects data bases, researchers can begin applications of SAR in some areas.

Specialized Toxicology Information
Centers'Provide Access to the Primary Literature There are three innovative specialized information activity centers available at the Oak Ridge National Laboratory (ORNL) for use in accessing the primary literature relevant to genetic toxicology, teratogenicity, in vivo carcinogenicity, and in vitro cell transformation. Detailed descriptions of these centers (Environmental Mutagen Information Center, Environmental Teratology Information Center, and Environmental Carcinogenesis Information Center) have been previously published (3,4); therefore, only brief summaries will be presented.

Environmental Mutagen Information Center
In an effort to catalog systematically and to make available all published data in the area of genetic toxicology, the Environmental Mutagen Information Center (EMIC) was organized in 1969 at ORNL.
The major emphasis of EMIC's work focuses on the collecting, organizing, and indexing of papers relevant to the testing and/or evaluation of chemical, biological, and physical agents for one or more of the following biological end points: cytological effects, effects on chromosomes, effects on nucleic acids, effects on fertility and/or sterility (work coordinated with the Environmental Teratology Information Center), gene mutation induction, mitotic or meiotic effects, ancillary effects (e.g., sperm-head abnormalities, comutagenesis, multigenerational studies, activation studies, etc.), and plant pigment mutation induction.
The EMIC data base consists of representative coverage of the literature prior to 1969 and comprehensive coverage of the literature for the period 1969 to date.
Each record in the EMIC data base contains the usual bibliographic data common to all information services. In addition, EMIC indexes a wide range of technical data not available in other sources that allows investigators to search according to specific interests. For example, each citation is indexed by the following points: 1. As the use of the EMIC data base has grown and the EMIC staff has had the opportunity to interact with its users through personal communication, search requests, special projects, and other media, new needs and uses of EMIC services have become apparent. EMIC has responded by adding new indexing fields, such as explicit descriptions of the assay method and/or genetic end point measured, making the EMIC data base more versatile in its ability to respond to questions from its users.
The indexing scheme used by EMIC provides more complete and more readily accessible information than is possible with conventional text abstracts used by most secondary information resources. The EMIC file now contains over 50,000 citations which have been selected from over 3200 sources, with information on more than 15,000 unique chemicals. This file is accessible online through the National Library of Medicine's TOXLINE system and the U.S. Department of Energy's RECON system. Investigators not having online access to the EMIC file may contact the center directly with their search requests by writing to the following address: Environmental Mutagen Information Center, Oak Ridge National Laboratory, P. 0. Box Y, Bldg. 9224, Oak Ridge, TN (U.S.A.) 37831.

Environmental Teratology Information Center
The Environmental Teratology Information Center (ETIC) was organized in 1975 at ORNL by the National Institute of Environmental Health Sciences. The information on file in this center is available to individual researchers and physicians, as well as to institutions and government research and regulatory agencies. Easy access to this literature facilitates health assessment, research planning, and prevention of the duplication of effort in the field of environmental teratology.
The major emphasis of ETIC's work focuses on the collecting, organizing, and indexing of papers that contain information relevant to the testing and/or evaluation of chemical, biological, and physical agents for teratogenic activity and reproductive effects in warmand cold-blooded animals. Publications dealing with other factors, such as dietary deficiencies and maternal stress, are also included. In addition to this primary emphasis on information collection, ETIC selects and processes papers that deal with in vitro short-term teratology testing and with the evaluation of reproductive and/or fertility effects of agents. The programs and information processing techniques used by ETIC to control and to manipulate collected information have been adapted from techniques developed and successfully used by EMIC. These programs and information processing techniques have met the needs of the center's user community and serve as the source for the dissemination of information relevant to the fields of teratology and reproductive toxicology.
There are currently 33,000 citations in the ETIC file covering the literature published from 1950 to the present. These citations have been selected from over 3100 sources, with information on over 6000 unique chemicals. Each citation is basically indexed in the same manner used by EMIC. For example, the information elements shown in the previous section for EMIC are also used by ETIC. In addition to these, other specific subject categories indexed are: sex of treated animal; experimental conditions; developmental stage oftreated animal (amphibians, fish, reptiles, and sea urchins only); author abstract (selected papers only); biological effect keywords; cell type(s), tissue(s), organ(s), or whole embryo(s) analyzed (in vitro studies only); inducer(s) [agent(s)

Environmental Carcinogenesis Information Center
Because of the limitations of accessing the literature of carcinogenesis, the Environmental Carcinogenesis Information Center (ECIC) was organized in 1980 to provide specialized information services to investigators and other interested individuals in the areas of in vitro cell transformation in mammalian cells and in vivo animal carcinogenicity. The need for a subject-oriented carcinogenicity information resource was confirmed from surveys of researchers and information specialists. In particular, this need was clearly illustrated by attempts to gather and to access information from this area as part of the U.S. Environmental Protection Agency (EPA) Genetic Toxicology (Gene-Tox) Program (5).
ECIC is a computerized information facility structured within the same organization as EMIC and ETIC. The mission of ECIC is to collect, organize, and disseminate information relevant to studies of in vivo animal carcinogenicity and in vitro oncogenic cell transformation in mammalian cells.
Papers selected for the ECIC file are primarily concerned with the testing of chemicals or other agents for in vivo animal carcinogenicity or in vitro cell transformation. Particular emphasis is given to those papers discussing agents for which short-term genetic toxicology test data are available in EMIC files. Peripheral subjects that may be useful in understanding the known or suspected carcinogenic activity of environmental agents are tagged in ECIC's companion centers, EMIC and ETIC. As is the practice in EMIC and ETIC, copies of all ECIC cited papers are available on file at the center.
Various methods are used to locate relevant publications of interest. The most productive is the manual searching of key journals which regularly publish data on carcinogenicity.* These journals are scanned as soon as they are available and yield 40 to 50% of the papers selected to become a part of the ECIC information file. The other 50 to 60% is obtained by searching the large secondary abstracting services such as Chemical Abstracts, Biological Abstracts, and BioResearch Index. Other secondary resources, such as CANCERLIT, are also utilized. Descriptions of these secondary sources are given by Akland and Waters (2). Researchers assist ECIC by sending reprints of their work and copies of material from journals and books published in their respective countries. Such cooperation is frequently the only means of obtaining information from non-English language sources.
It is only after a copy of a publication is obtained that information is prepared for two-phase computer input. Initial preparation of input consists of recording bibliographic details and other similar information indexed by EMIC (see EMIC indexing element numbers 1-11).
Following initial processing, papers are then assigned for technical indexing. Criteria used in processing articles for technical indexing were established for ECIC by the Carcinogenesis Panel of the Gene-Tox Program. The subject categories for in vitro cell transformation include: experimental description, abstract, activation system used, chemical inducer of the activation system, assay (colony morphology, growth in soft agar, viral enhancement, etc.), animal source of the target cell, target cell used in the experiment, virus name (viral enhancement studies only), agent(s) tested, and Chemical Abstracts Service Registry Number(s) [chemical agent(s) only]. The topics for in vivo animal carcinogenicity include: sex of treated animal, agent(s) tested, Chemical Abstracts Service Registry Number(s) [chemical agent(s) only], solvent and/or vehicle used for *Readers who would like copies of the ECIC key journal list may obtain it by writing to the center. administration of agent, route of administration, experimental design (serial biopsy, interval and terminal autopsy, etc.), type of analysis (gross pathology, histopathology, etc.), organ(s) examined, assay, promoter, and control(s) (concurrent, historical, etc.).
The primary focus of ECIC's current work is to collect information on compounds for which EMIC has shortterm genetic toxicology test data. This information will be used in the continuing update of the EPA Gene-Tox Program. At present, ECIC has over 3000 papers in its file, with information on over 800 compounds.
Readers may obtain additional information on ECIC by writing to the Environmental Carcinogenesis Information Center at the address given in the previous section.
Considering all that has been discussed thus far, one can see that the function of information collection and processing as performed at EMIC, ETIC, and ECIC can serve SAR researchers as a valuable resource for accessing the literature in the areas of genetic toxicology, teratology, and carcinogenicity.

Quality vs. Quantity in the Review and Analysis of Information from Primary Literature
Once the problem (quantity) of access has been resolved, the next question (quality) is "How good are the reported data?" Here, the users of the information must rely on their own personal knowledge and on the standard editorial policy of the journal or publication source.
During the Gene-Tox review and evaluation of the literature on selected short-term genetic toxicology bioassays, in vitro cell transformation bioassays, and in vivo carcinogenicity studies through mid-1979, some provocative insights into the quality of the literature were revealed. It was shown that most journals failed to maintain a strict editorial policy with respect to format, data presentation, and inclusion or referencing of key or essential information elements regarding such obviously vital items as specific details of agent(s) tested, control data, experimental design, and/or protocol used. Because of these deficiencies, only 52% of the papers reviewed were used; it is indeed interesting that almost half (48%) of the literature was, for one reason or another, not used. Some of the papers in this latter category were not used either because they were written in a foreign language, not published in a refereed source, or did not contain original data; the majority, however, did not meet the rigid criteria established by the various Gene-Tox review panels. The criteria used for each specific bioassay reviewed by Gene-Tox may be found in the various published panel reports for that bioassay. The number of papers used varied with each panel and bioassay. The panel having the lowest use percentage (8%) was the Chinese hamster ovary (CHO) cell gene mutation panel. Although perhaps not as well documented as the Gene-Tox observations, the work groups that are convened by the International Agency for Research on Cancer (IARC) for the purpose of reviewing the literature and producing monographs on the carcinogenicity of selected chemical substances can corroborate the need for more strict editorial policies. DeMarini and Shelby (6) have recently commented on the stateof-the-scientific literature and the lack of published data to properly evaluate test results. Even with these deficiencies, the indexing methods that are used by EMIC, ECIC, and ETIC facilitate the answering of specific queries and help users differentiate which papers meet their personal criteria or standards.

Peer-Reviewed Biological Research Results Data Bases
At present there are only a small number of evaluated data bases which can be put to use by SAR practitioners interested in the areas of general toxicology, genetic toxicology, teratogenicity, or carcinogenicity. However, these sources that are available are shown in Table  2, and are described in the following section.

U.S. EPA Gene-Tox Program Agent Registry File
The Gene-Tox Program (5) is a two-phase evaluation of selected short-term bioassays for detecting mutagenicity and presumptive carcinogenicity. Sponsored and directed by the Office of Testing and Evaluation within EPA's Office of Pesticides and Toxic Substances, Gene-Tox is used by EPA as a resource in establishing standard genetic testing and evaluation procedures for the regulation of toxic substances. Gene-Tox also helps to determine the direction of research and development in the field of genetic toxicology.
In the first phase of the Gene-Tox Program, 23 panels (each consisting of 5-10 scientists) reviewed the existing literature from the EMIC information file through mid-1979 and prepared reports on the applicability and performance of each selected bioassay (Appendix). The data were edited and placed in a computer file. To date, information on over 2600 different chemicals has been entered in this file. This distribution of these compounds among the 73 short-term bioassays (64 genetic toxicology and 9 cell transformation) reviewed by Gene-Tox is sporadic. Some bioassays have results on less than 10 compounds (e.g., the in vivo sister chromatid exchange test with human lymphocytes has only one), while other bioassays, such as the Ames/Salmonella test, have as many as 1079. Evaluated in vivo carcinogenicity results are also available in the Gene-Tox file on 392 compounds. The structural classification of all Gene-Tox evaluated compounds, as they occur within a given bioassay, show an erratic clustering pattern with respect to their common structural characteristics. This clustering or distribution, of course, varies with respect to the number of compounds tested in each bioassay. The Gene-Tox data base will be available online through the NIH/EPA Chemical Information System (7), Toxicology Data Bank (8,9), and Registry of Toxic Effects of Chemical Substances (10).
During this current second phase of the Gene-Tox Program, panels of scientists are critically evaluating the reports for each bioassay and comparing them on a chemical-by-chemical and class-by-class basis (11). From this evaluation, attempts will be made to determine the sensitivity of each bioassay to respond to specific classes of chemicals and to identify major strengths and weaknesses of each bioassay. The data base resulting from the Gene-Tox Program is the most comprehensive collection of evaluated genetic toxicology data available. Work is under way to update this data base with material screened from papers in the EMIC file published from mid-1979 through March 1984.

International Agency for Research on Cancer (IARC) Monographs
In 1971, IARC initiated a program to evaluate the carcinogenic risk of chemicals to humans (12). The object of the program was to provide government authorities with expert, independent scientific opinions regarding environmental carcinogenesis through the publication of critical reviews of carcinogenicity and related data. The aims of IARC are to evaluate possible human carcinogenic risk from the detailed reviews and analyses of the pertinent literature. IARC's work is partially funded by the National Cancer Institute.
The IARC monographs summarize the evidence for the carcinogenicity of individual chemicals and other relevant information on the basis of data compiled, reviewed, and evaluated by a working panel of experts. Priority is given to chemicals, groups of chemicals, or industrial processes for which there is at least some suggestion of carcinogenicity either from evidence of human exposure and/or observations in animals. It should be emphasized that the inclusion of a particular compound in an IARC volume does not necessarily mean that it should be considered to be carcinogenic nor does the fact that a chemical is absent from an IARC review imply that it is not a carcinogenic hazard. As new data become available on chemicals for which monographs have already been prepared and/or new principles for evaluation become available, reevaluations may be made at subsequent IARC meetings, and if the new evidence warrants, revised monographs will be published.
The IARC monographs are distributed internationally to governmental agencies, industries, and scientists. They are also offered to any interested person through the World Health Organization publication outlets.
Through December 1983, 32 volumes of the monographs and four supplements have been published. These volumes contain indexes both for chemical name and molecular formula, as well as Chemical Abstracts Service Registry Number(s). A total of 800 chemicals have been reviewed by IARC through Monograph Volume 32. Carcinogenicity evaluations have not been made on all the chemicals reviewed; either no data were available or the data available to IARC were judged inadequate for evaluation.

Toxicology Data Bank
The Toxicology Data Bank (TDB) (8,9,13,14) is a factual data base composed of over 4150 comprehensive chemical records. These records contain up to 60 different data elements which are grouped into eight categories. These categories include pharmacological and toxicological data (e.g., LD,0 values), environmental and occupational information, manufacturing and use data, as well as information on the chemical and physical properties of each chemical record. Substances selected for TDB include high volume production or exposure chemicals, drugs and pesticides exhibiting potential toxicity or adverse effects, and other substances of interest.
The information used in a TDB record is selected from sources such as TOXLINE and secondary sources such as standard reference books, handbooks, criteria documents, and monographs. The data extracted from secondary sources are reviewed on a quarterly basis by a panel of experts convened by the Toxicology Study Section, National Institutes of Health. Additional information from other sources, such as TOXLINE, may be incorporated by the panel or Peer-Review Committee members to assure that each TDB record contains the most relevant theories and accurate information.
Readers may obtain further information on TDB by contacting either Toxicology Data Bank, Oak Ridge National Laboratory, P.O. Box  Chemical Structure as a Tool for Predicting Biological Activity A considerable amount of work is under way regarding the use of chemical structure to predict the carcinogenicity, mutagenicity, or teratogenicity of chemical compounds. Several promising models for use in pre-dicting specific biological responses have been reviewed in a recent book edited by Golberg (15).
The correlation of biological properties with chemical structure is rooted in the early history of pharmacology when compounds with similar structural characteristics to those of known pharmaceuticals were selected to test their efficacy to combat human disease. The principles of drug action based on chemical structure have been carried over to other research areas where structural attributes or characteristics of known biological response invite comparison with compounds whose activities are unknown. The subject of SAR has intrigued, to some degree, investigators in the field of carcinogenesis, mutagenesis, and teratogenesis as they have made selections of compounds to undergo testing.
Because the volume of literature for carcinogenesis, mutagenesis, and teratogenesis has grown significantly, especially in the last decade, a larger information base, composed of the primary literature, is now available to SAR practitioners than every before. The three specialized information centers described in this paper can serve as a valuable source of primary information for SAR studies. With this much needed information base, the level of effort now placed in this area, coupled with the development of peer-reviewed data bases and the crafting by high technology groups of new computer hardware and software for more efficient SAR studies, will make possible the occurrence of significant accomplishments within the next five years. Interest in SAR must be tempered with the recognition of the weakness inherent in most existing information collections used for drawing correlations between chemical structure and biological activity (15). This weakness is the lack of evaluated or peer-reviewed quantitative data. The information activities that are now available and contain peerreviewed evaluation data were discussed in the previous section. Many more programs and/or projects, such as Gene-Tox, IARC monographs, and TDB that generate the needed evaluated data bases from either primary or secondary literature, will be required before SAR technology can be applied effectively and used. An extensive review of the published literature, similar to the Gene-Tox Program, is needed to fill the void that now exists in the teratology research area with respect to the availability of a comprehensive and peer-reviewed data base.
The reservoir of peer-reviewed toxicological data will be greatly enhanced as a result of the research efforts of the National Toxicology Program (NTP) (16,17). Compendiums, such as the one recently published containing NTP's Salmonella test results on 250 chemicals (18), will provide a substantial amount of new and immediately usable data. Access to these and other NTP toxicity testing data will be crucial to the success of all future SAR studies. Augmenting the NTP testing data with that already assembled in the peer-reviewed biological effects data bases discussed in this paper will make it possible to initiate effective SAR studies on several specific toxicological end points (e.g., mutation induction in Salmonella). However, even with the marriage of the NTP data and those gleaned from the primary or secondary literature, it will still take at least three to five more years of continuing effort before enough peer-reviewed data are available for most toxicological end points and effective SAR study.
The enthusiasm that will follow accomplishments made in this area within the next five years, through use of the increased accumulation of evaluated data bases and more sophisticated computer technology, must be mitigated by the fact that mechanically derived data (computer generated results from SAR studies) can be used only as a "predictive tool" and not as a substitute for or alternative to actual biological testing. The greatest future value of predictive SAR studies will be their use to select chemicals for testing to make the most expedient use of resources such as personnel, equipment, money, and time.
Golberg, in the introduction to his book (15), draws a very poignant analogy between our current pursuits of SAR and the Shakespearean character Macbeth: " . . . we should learn from Macbeth's experience and not take predictions too literally, even when they come from a computer rather than a witch." Appendix: U.S. Environmental Protection Agency Genetic Toxicology Program-Publication List Manuscripts Published