NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Applications of Toxicogenomic Technologies to Predictive Toxicology. Applications of Toxicogenomic Technologies to Predictive Toxicology and Risk Assessment. Washington (DC): National Academies Press (US); 2007.

Cover of Applications of Toxicogenomic Technologies to Predictive Toxicology and Risk Assessment

Applications of Toxicogenomic Technologies to Predictive Toxicology and Risk Assessment.

Show details

3Experimental Design and Data Analysis

The greatest challenge of toxicogenomics is no longer data generation but effective collection, management, analysis, and interpretation of data. Although genome sequencing projects have managed large quantities of data, genome sequencing deals with producing a reference sequence that is relatively static in the sense that it is largely independent of the tissue type analyzed or a particular stimulation. In contrast, transcriptomes, proteomes, and metabolomes are dynamic and their analysis must be linked to the state of the biologic samples under analysis. Further, genetic variation influences the response of an organism to a stimulus. Although the various toxicogenomic technologies (genomics, transcriptomics, proteomics, and metabolomics) survey different aspects of cellular responses, the approaches to experimental design and high-level data analysis are universal.

This chapter describes the essential elements of experimental design and data analysis for toxicogenomic experiments (see Figure 3-1) and reviews issues associated with experimental design and data analysis. The discussion focuses on transcriptome profiling using DNA microarrays. However, the approaches and issues discussed here apply to various toxicogenomic technologies and their applications. This chapter also describes the term biomarker.

FIGURE 3-1. Overview of the workflow in a toxicogenomic experiment.


Overview of the workflow in a toxicogenomic experiment. Regardless of the goal of the analysis, all share some common elements. However, the underlying experimental hypothesis, reflected in the ultimate goal of the analysis, should dictate the details (more...)


The types of biologic inferences that can be drawn from toxicogenomic experiments are fundamentally dependent on experimental design. The design must reflect the question that is being asked, the limitations of the experimental system, and the methods that will be used to analyze the data. Many experiments using global profiling approaches have been compromised by inadequate consideration of experimental design issues. Although experimental design for toxi cogenomics remains an area of active research, a number of universal principles have emerged. First and foremost is the value of broad sampling of biologic variation (Churchill 2002; Simon et al. 2002; Dobbin and Simon 2005). Many early experiments used far too few samples to draw firm conclusions, possibly because of the cost of individual microarrays. As the cost of using microarrays and other toxicogenomic technologies has declined, experiments have begun to include sampling protocols that provide better estimates of biologic and systematic variation within the data. Still, high costs remain an obstacle to large, population-based studies. It would be desirable to introduce power calculations into the design of toxicogenomic experiments (Simon et al. 2002). However, uncertainties about the variability inherent in the assays and in the study populations, as well as interdependencies among the genes and their levels of expression, limit the utility of power calculations.

A second lesson that has emerged is the need for carefully matched controls and randomization in any experiment. Because microarrays and other toxicogenomic technologies are extremely sensitive, they can pick up subtle variations in gene, protein, or metabolite expression that are induced by differences in how samples are collected and handled. The use of matched controls and randomization can minimize potential sources of systematic bias and improve the quality of inferences drawn from toxicogenomic datasets.

A related question in designing toxicogenomic experiments is whether samples should be pooled to improve population sampling without increasing the number of assays (Dobbin and Simon 2005; Jolly et al. 2005; Kendziorski et al. 2005). Pooling averages variations but may also disguise biologically relevant outliers—for example, individuals sensitive to a particular toxicant. Although individual assays are valuable for gaining a more robust estimate of gene expression in the population under study, pooling can be helpful if experimental conditions limit the number of assays that can be performed. However, the relative costs and benefits of pooling should be analyzed carefully, particularly with respect to the goals of the experiment and plans for follow-up validation of results. Generally, the greatest power in any experiment is gained when as many biologically independent samples are analyzed as is feasible.

Universal guidelines cannot be specified for all toxicogenomic experiments, but careful design focused on the goals of the experiment and adequate sampling are needed to assess both the effect and the biologic variation in a system. These lessons are not unique to toxicogenomics. Inadequate experimental designs driven by cost cutting have forced many studies to sample small populations, which ultimately compromises the quality of inferences that can be drawn.


DNA microarray experiments can be categorized into four types: class discovery, class comparison, class prediction, and mechanistic studies. Each type addresses a different goal and uses a different experimental design and analysis. Table 3-1 summarizes the broad classes of experiments and representative examples of the data analysis tools that are useful for such analyses. These data analysis approaches are discussed in more detail below.

TABLE 3-1. Data Analysis Approaches .


Data Analysis Approaches .

Class Discovery

Class discovery analysis is generally the first step in any toxicogenomic experiment because it takes an unbiased approach to looking for new group classes in the data. A class discovery experiment asks “Are there unexpected, but biologically interesting, patterns that exist in the data?” For example, one might consider an experiment in which all nephrotoxic compounds are used individually to treat rats, and gene expression data are collected from the kidneys of these rats after they begin to experience renal failure (Amin et al. 2004). Evaluation of the gene expression data may indicate the nephrotoxic compounds can be grouped based on the cell type affected, the mechanism responsible for renal failure, or other common factors. This analysis may also suggest a new subgroup of nephrotoxic compounds that either affects a different tissue type or represents a new toxicity mechanism.

Class discovery analyses rely on unsupervised data analysis methods (algorithms) to explore expression patterns in the data (see Box 3-1). Unsupervised data analysis methods are often among the first techniques used to analyze a microarray dataset. Unsupervised methods do not use the sample classification as input; for example, they do not consider the treatment groups to which the samples belong. They simply group samples together based on some measure of similarity. Two of the most widely used unsupervised approaches are hierarchical clustering (Weinstein et al. 1997; Eisen et al. 1998; Wen et al. 1998) and k-means clustering (Soukas et al. 2000). Other approaches have been applied to unsupervised analysis, including self-organizing maps (Tamayo et al. 1999; Toronen et al. 1999; Wang et al. 2002), self-organizing trees (Herrero et al. 2001), relevance networks (Butte and Kohane 1999), force-directed layouts (Kim et al. 2001), and principal component analysis (Raychaudhuri et al. 2000). Fundamentally, each of these methods uses some feature of the data and a rule for determining relationships to group genes (or samples) that share similar patterns of expression. In the context of disease analysis, all the methods can be extremely useful for identifying new subclasses of disease, provided that the subclasses are reproducible and can be related to other clinical data. All these methods will divide data into “clusters,” but determining whether the clusters are meaningful requires expert input and analysis. Critical assessment of the results is essential. There are anecdotal reports of clusters being found that separate data based on the hospital where the sample was collected, the technician who ran the microarray assay, or the day of the week the array was run. Clearly, microarrays can be very sensitive. However, recent reports suggest that adhering to standard laboratory practices and carefully analyzing data can lead to high-quality, reproducible results that reflect the biology of the system (Bammler et al. 2005; Dobbin et al. 2005; Irizarry et al. 2005; Larkin et al. 2005).

Box Icon

BOX 3-1

Supervised and Unsupervised Analysis. Analysis methods for toxicogenomic data can be divided into two broad classes depending on how much prior knowledge, such as information about the samples, is used. Unsupervised methods examine (more...)

In the context of toxicogenomics, class discovery methods can be applied to understand the cellular processes involved in responding to specific agents. For example, in animals exposed to a range of compounds, analysis of gene expression profiles with unsupervised “clustering” methods can be used to discover groups of genes that may be involved in cellular responses and suggest hypotheses about the modes of action of the compounds. Subsequent experiments can confirm the gene expression effects, confirm or refute the hypotheses, and identify the cell types and mode of action associated with the response. A goal of this type of research would be to build a database of gene expression profiles of sufficiently high quality to enable a gene expression profile to be used to classify compounds based on their mode of action.

Class Comparison

Class comparison experiments compare gene expression profiles of different phenotypic groups (such as treated and control groups) to discover genes and gene expression patterns that best distinguish the groups. The starting point in such an experiment is the assumption that one knows the classes represented in the data. A logical approach to data analysis is to use information about the various classes in a supervised fashion to identify those genes that distinguish the groups. One starts by assigning samples to particular biological classes based on objective criteria. For example, the data may represent samples from treatment with neurotoxic and hepatotoxic compounds. The first question would be, “Which genes best distinguish the two classes in the data?” At this stage, the goal is to find the genes that are most informative for distinguishing the samples based on class.

A wide variety of statistical tools can be brought to bear on this question, including t-tests (for two classes) and analysis of variance (for three or more classes) that assign p values to genes based on their ability to distinguish among groups. One concern with these statistical approaches is the problem of multiple testing. Simply put, in a microarray with 10,000 genes, applying a 95% confidence limit on gene selection (p ≤0.05) means that, by chance, one would expect to find 500 genes as significant. Stringent gene selection can minimize but not eliminate this problem; consequently, one must keep in mind that the greatest value of statistical methods is that they provide a way to prioritize genes for further analysis. Other approaches are widely used, such as significance analysis of microarrays (Tusher et al. 2001), which uses an adjusted t statistic (or F statistic) modified to correct for overestimates arising from small values in the denominator, along with permutation testing to estimate the false discovery rate in any selected significant gene set. Other methods attempt to correct for multiple testing, such as the well-known Bonferroni correction, but these methods assume independence between the measurements, a constraint that is violated in gene analysis as many genes and gene products operate together in pathways and networks and so are co-regulated. Further confounding robust statistical analysis of toxicogenomic studies is the “n < p problem,” which means the number of samples analyzed is typically much smaller than the number of genes, proteins, or metabolites assayed. For these reasons, statistical analysis of higher-dimensional datasets produced by toxicogenomic technologies remains an area of active research.

As described above, class comparison analyses provide collections of genes that the data indicate are useful in distinguishing among the various experimental groups being studied. These collections of genes can be used either as a starting point for mechanistic studies or in an attempt to classify new compounds as to their mode of action.

Class Prediction

Class prediction experiments attempt to predict biologic effects based on the gene expression profile associated with exposure to a compound. Such an experiment asks “Can a particular pattern of gene expression be combined with a mathematical rule to predict the effects of a new compound?” The underlying assumption is that compounds eliciting similar effects will elicit similar effects on gene expression. Typically, one starts with a well-characterized set of compounds and associated phenotypes (a database of phenotype and gene expression data) and through a careful comparison of the expression profiles finds genes whose patterns of expression can be used to distinguish the various phenotypic groups under analysis. Class prediction approaches then attempt to use sets of informative genes (generally selected using statistical approaches in class comparison) to develop mathematical rules (or computational algorithms) that use gene expression profiling data to assign a compound to its phenotype group (class). The goal is not merely to separate the samples but to create a rule (or algorithm) that can predict phenotypic effects for new compounds based solely on gene expression profiling data.

For example, to test a new compound for possible neurotoxicity, gene expression data for that compound would be compared with gene expression data for other neurotoxic compounds in a database and a prediction would be made about the new compound’s toxicity. (The accuracy of the prediction depends on the quality of the databases and datasets.)

When developing a classification approach, the mathematical rules for analyzing new samples are encoded in a classification algorithm. A wide range of algorithms have been used for this purpose, including weighted voting (Golub et al. 1999), artificial neural networks (Ellis et al. 2002; Bloom et al. 2004), discriminant analysis (Nguyen and Rocke 2002; Orr and Scherf 2002; Antoniadis et al. 2003; Le et al. 2003), classification and regression trees (Boulesteix et al. 2003), support vector machines (Brown et al. 2000; Ramaswamy et al. 2001), and k-nearest neighbors (Theilhaber et al. 2002). Each of these uses an original set of samples, or training set, to develop a rule that uses the gene expression data (trimmed to a previously identified set of informative genes) for a new compound to place this new compound into the context of the original sample set, thus identifying its class.

Functional and Network Inference for Mechanistic Analysis

Although class prediction analysis may tell us what response a particular compound is likely to produce, it does not necessarily shed light on the underlying mechanism of action. Moving from class prediction to mechanistic understanding often relies on additional work to translate toxicogenomic-based hypotheses to validated findings. Bioinformatic tools play a key role in developing those hypotheses by integrating information that can facilitate interpretation—including gene ontology terms, which describe gene products (proteins), functions, processes, and cellular locations; pathway database information; genetic mapping data; structure-activity relationships; dose-response curves; phenotypic or clinical information; genome sequence and annotation; and other published literature. Software developed to facilitate this analysis includes MAPPFinder (Doniger et al. 2003), GOMiner (Zeeberg et al. 2003), and EASE (Hosack et al. 2003), although they may only provide hints about possible mechanisms. There is no universally accepted way to connect the expression of genes, proteins, or metabolites to functionally relevant pathways leading to particular phenotypic end points, so a good deal of user interaction and creativity is currently required.

New approaches to predict networks of interacting genes based on gene expression profiles use several modeling techniques, including boolean networks (Akutsu et al. 2000; Savoie et al. 2003; Soinov 2003), probabilistic boolean networks (Shmulevich et al. 2002a,b; Datta et al. 2004; Hashimoto et al. 2004; ), and Bayesian networks (Friedman et al. 2000; Imoto et al. 2003; Savoie et al. 2003; Tamada et al. 2003; Zou and Conzen 2005). These models treat individual objects, such as genes and proteins, as “nodes” in a graph, with “edges” connecting the nodes representing their interactions. A set of rules for each edge determines the strength of the interaction and whether a particular response will be induced. These approaches have met with some success, but additional work is necessary to convert the models from descriptive to predictive. In metabolic profiling, techniques that monitor metabolic flux and its modeling (Wiback et al. 2004; Famili et al. 2005) also may provide predictive models.

The advent of global toxicogenomic technologies, and the data they provide, offers the possibility of developing quantitative, predictive models of bio logic systems. This approach, dubbed “systems biology,” attempts to bring together data from many different domains, such as gene expression data and metabolic flux analysis, and to synthesize them to produce a more complete understanding of the biologic response of a cell, organ, or individual to a particular stimulus and create predictive biomathematical models. Whereas toxicogenomic data are valuable even when not used in a systems biology mode, achieving this systems-level understanding of organismal response and its relationship to the development of a particular phenotype is a long-term goal of toxicogenomics and other fields. The best efforts to date have allowed the prediction of networks of potentially interacting genes. However, these network models, while possibly predictive, lack the complexity of the biochemical or signal transduction pathways that mediate cellular responses. Attempts to model metabolic flux, even in simpler organisms like yeast and bacteria, provide only rough approximations of system function responses and then only under carefully controlled conditions. However, significant progress in the ability to model complex systems is likely and additional toxicogenomic research will continue to benefit from and help advance systems biology approaches and their applications.


An opinion paper by Bailey and Ulrich (2004) outlined the use of microarrays and related technologies for identifying new biomarkers; see Box 3-2 for definitions. Within the drug industry, there is an acute need for effective biomarkers that predict adverse events earlier than otherwise could be done in every phase of drug development from discovery through clinical trials, including a need for noninvasive biomarkers for clinical monitoring. There is a widespread expectation that, with toxicogenomics, biomarker discovery for assessing toxicity will advance at an accelerated rate. Each transcriptional “fingerprint” reflects a cumulative response representing complex interactions within the organism that include pharmacologic and toxicologic effects. If these interactions can be significantly correlated to an end point, and shown to be reproducible, the molecular fingerprint potentially can be qualified as a predictive biomarker. Several review articles explore issues related to biomarker assay development and provide examples of the biomarker development process (Wagner 2002; Colburn 2003; Frank and Hargreaves 2003).

Box Icon

BOX 3-2

Defining Biomarkers. Throughout this chapter, a wide range of applications of gene expression microarray and other toxicogenomic technologies has been discussed. Many of the most promising applications involve using gene, protein, or metabolic expression (more...)

The utility of gene expression-based biomarkers was clearly illustrated by van Leeuwen and colleagues’ 1986 identification of putative transcriptional biomarkers for early effects of smoking using peripheral blood cell profiling (van Leeuwen et al. 1986). Kim and coworkers also demonstrated a putative transcriptional biomarker that can identify genotoxic effects but not carcinogenesis using lymphoma cells but noted that the single marker presented no clear advantage over existing in vitro or in vivo assays (Kim et al. 2005). Sawada et al. discovered a putative transcriptional biomarker predicting phospholipidosis in the HepG2 cell line, but they too saw no clear advantage over exist ing assays (Sawada et al. 2005). In 2004, a consortium effort based at the International Life Sciences Institute’s Health and Environmental Sciences Institute identified putative gene-based markers of renal injury and toxicity (Amin et al. 2004). As has been the case for transcriptional markers, protein-based expression assays have also shown their value as predictive biomarkers. For example, Searfoss and coworkers used a toxicogenomic approach to identify a protein biomarker for intestinal toxicity (Searfoss et al. 2003).

Exposure biomarker examples also exist. Koskinen and coworkers developed an interesting model system in rainbow trout, using trout gene expression microarrays to develop biomarkers for assessing the presence of environmental contaminants (Koskinen et al. 2004). Gray and colleagues used gene expression in a mouse hepatocyte cell line to identify the presence of aromatic hydrocarbon receptor ligands in an environmental sample (Gray et al. 2003).

Ultimately, toxic response is likely to be mediated by changes at various levels of biologic organization: gene expression, protein expression, and altered metabolic profiles. Whereas most work to date has focused on developing biomarkers based on the output of single toxicogenomic technologies (for example, transcriptomics, proteomics, metabolomics), an integrated approach using multiple technologies provides the opportunity to develop multidomain biomarkers that are more highly predictive than those derived from any single technology. Further, existing predictive phenotypic (and genotypic) measures should not be ignored in deriving biomarkers.

Finally, particular attention must be paid to developing toxicogenomicbased biomarkers, especially those that are not tied mechanistically to a particular end point. In 2001, Pepe and colleagues outlined the stages of cancer biomarker development (Pepe et al. 2001) (see Table 3-2), suggesting that a substantial effort involving large populations would be required to fully validate a new biomarker for widespread clinical application. The Netherlands breast cancer study discussed in Chapter 9 (validation) is an example of a biomarker that has reached Phase 4, with a prospective analysis in which 6,000 women will be recruited and screened at an estimated cost of $54 million (Bogaerts et al. 2006; Buyse et al. 2006). Most toxicogenomic studies have reached only Phase 1 or Phase 2 and significant additional work and funding are necessary if toxicogenomic biomarkers are to achieve the same level of validation.

TABLE 3-2. Phases of Cancer Biomarker Development As Defined by Pepe et al. (2001) .


Phases of Cancer Biomarker Development As Defined by Pepe et al. (2001) .


This chapter focused largely on questions of experimental design and the associated analytical approaches that can be used to draw biologic inferences. The published examples have largely drawn on individual studies in which datasets have been analyzed in isolation. However, any of these methods can be applied more generally to larger collections of data than those in individual studies, provided that the primary data and the information needed to interpret them are available.

Clearly, a carefully designed database containing toxicogenomic data along with other information (such as structure-activity relationships and information about dose-response and phenotypic outcome for exposure) would allow many of the unanswered questions about the applicability of genomic technologies to toxicology to be addressed. In fact, a more extensive analysis would allow scientists to more fully address questions about reproducibility, reliability, generalizability, population effects, and potential experimental biases that might exist and that would drive the development of standards and new analytical methods.

A distinction must be drawn between datasets and a database. A database compiles individual datasets and provides a structure for storing the data that captures various relationships between elements, and it facilitates our ability to investigate associations among various elements. Such a database must go beyond individual measurements and provide information about, for example, how the individual experiments are designed, the chemical properties of the individual compound tested, the phenotypes that result, and the genetic background of the animals profiled. Many considerations must go into designing such a database and populating it with relevant data; a more detailed discussion is provided in Chapter 10. However, creating such a database that captures relevant information would allow more extensive data mining and exploration and would provide opportunities currently not available. Making full use of such a database would also require a commitment to develop new analytical methods and to develop software tools to make these analytical methods available to the research and regulatory communities.

Although assembling a central toxicogenomic database would be a massive undertaking, creating such a resource, with a focus not only on data production but also on delivery of protocols, databases, software, and other tools to the community, should serve as a catalyst to encourage others to contribute to building a more comprehensive database. Mechanisms should be investigated that would facilitate the growth of such a database by using data from academic and industrial partners. When possible and feasible, attention should be paid to integrating development of such a database and related standards with the work of parallel efforts such as caBIG (NCI 2006d) at the National Cancer Institute. The success of any toxicogenomic enterprise depends on data and information and the National Institute of Environmental Health Sciences and other federal agencies must make an investment to produce and provide those to the research community.


  1. Develop specialized bioinformatics, statistical, and computational tools to analyze toxicogenomic data. This will require a significant body of carefully collected controlled data, suggesting the creation of a national data resource open to the research community. Specific tools that are needed include the following:
    1. Algorithms that facilitate accurate identification of orthologous genes and proteins in species used in toxicologic research,
    2. Tools to integrate data across multiple analytical platforms (for example, gene sequences, transcriptomics, proteomics, and metabolomics), and
    3. Computational models to enable the study of network responses and systems-level analyses of toxic responses.
  2. Continue to improve genome annotation for all relevant species and elucidation of orthologous genes and pathways.
  3. Emphasize the development of standards to ensure data quality and to assist in validation.
Copyright © 2007, National Academy of Sciences.
Bookshelf ID: NBK10221


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.1M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...