![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2009 Chou and Bushel; licensee BioMed Central Ltd. Discernment of possible mechanisms of hepatotoxicity via biological processes over-represented by co-expressed genes 1Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA 2Department of Biostatistical Sciences, Wake Forest University School of Medicine, Medical Center Blvd., Winston-Salem, NC 27157, USA Corresponding author.Jeff W Chou: jchou/at/wfubmc.edu; Pierre R Bushel: bushel/at/niehs.nih.gov Received November 13, 2008; Accepted June 18, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Hepatotoxicity is a form of liver injury caused by exposure to stressors. Genomic-based approaches have been used to detect changes in transcription in response to hepatotoxicants. However, there are no straightforward ways of using co-expressed genes anchored to a phenotype or constrained by the experimental design for discerning mechanisms of a biological response. Results Through the analysis of a gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraging alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, we identified biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach called Coherent Co-expression Biclustering (cc-Biclustering) for clustering a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Supervised biclustering identified 87 genes co-expressed and correlated with ALT in all the samples exposed to the chemicals. None of the over-represented pathways related to liver injury. However, biclusters with subsets of samples exposed to one of the 7 hepatotoxicants, but not to a non-toxic isomer, contained co-expressed genes that represented pathways related to a stress response. Unsupervised biclustering of the data resulted in 1) four to five times more genes within the bicluster containing all the samples exposed to the chemicals, 2) biclusters with co-expression of genes that discerned 1,4 dichlorobenzene (a non-toxic isomer at low and mid doses) from the other chemicals, pathways and biological processes that underlie liver injury and 3) a bicluster with genes up-regulated in an early response to toxic exposure. Conclusion We obtained clusters of co-expressed genes that over-represented biological processes and molecular pathways related to hepatotoxicity in the rat. The mechanisms involved in the response of the liver to the exposure to 1,4-dichlorobenzene suggest non-genotoxicity whereas the exposure to the hepatotoxicants could be DNA damaging leading to overall genomic instability and activation of cell cycle check point signaling. In addition, key pathways and biological processes representative of an inflammatory response, energy production and apoptosis were impacted by the hepatotoxicant exposures that manifested liver injury in the rat. Background The liver is considered one of the vital organs in the body. Its has several major functions including the production of bile to break down fat, glycogen storage, decomposition of red blood cells, production of cholesterol, plasma protein synthesis and drug metabolism just to name a few. The latter takes place by a host of specialized detoxification enzymes and pathways that biochemically modify or metabolize xenobiotics to harmless metabolites and other byproducts for clearance from the body [1]. However, the metabolism of some drugs and compounds leads to toxic intermediates that can harm the liver and severely disrupt its function [2]. Drug-induced liver injury (DILI) is the leading cause of liver failure in the United States (US) and is quickly becoming a major concern world-wide [3]. In fact, DILI accounts for more than 50 percent of the cases of acute liver failure in the US and more than 75 percent of cases of adverse drug reactions result in liver transplantation or death [4]. Drug-induced hepatotoxicity is the most frequent cause for a drug to be withdrawn from the market, restricted in its use or have a warning associated with it due to adverse drug reactions [5]. A better understanding of the pathophysiology of DILI and the mechanisms involved in the manifestation of hepatotoxicity are critical to improving human health and public awareness of potentially harmful toxicants [6]. Microarray gene expression analysis has been used to study the effects of toxicants and other environmental stressors on biological systems [7-11]. Recently, Lobenhofer et al. [12] used gene expression data from rats exposed to a compendium of hepatotoxicants to show that blood gene expression patterns could be used not only to classify animals based on the compound they were exposed to, but also to provide a reasonable indication of the severity level of liver injury. In addition, Bushel et al. [13] demonstrated that gene expression profiles from rat blood samples could accurately predict exposure levels of acetaminophen to the rat liver better than traditional clinical panels. Furthermore, human subjects treated with acetaminophen could be classified based on blood gene expression levels. Although these efforts led to informative conclusions about the genes expression changes related to drug-induced hepatotoxicity, they did not capture the breath of the biological mechanisms altered by the toxic insult. While hierarchical clustering groups genes and samples by the similarity of expression across all the elements in a data matrix, biclustering [14-17] aims to identify sub-matrices (subsets of rows [genes] and subsets of columns [samples] from an original data matrix) of gene expression that are coherent and possess homogeneity of co-expression. Hence, the biclusters contain co-expressed genes that represent distinct biological responses related to mechanistic changes. In this paper we used a biclustering method to analyze a compendium gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants (in a dose response and time series manner) and leveraged alanine aminotransferase (ALT), a serum enzyme indicative of liver injury, as the phenotypic marker in order to identify biclusters of co-expressed genes that over-represent biological processes and pathways related to hepatotoxicity. Results Experimental design Male 12 week old F344 Fischer rats were exposed to one of the following chemicals: 1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine or diquat dibromide [12]. For each chemical, four to six rats (two in one case), were exposed to three or four different dose levels (from subtoxic to toxic) of the compound and sacrificed at 6, 24 and 48 hrs later to extract liver RNA for microarray hybridization. The 318 samples and exposure conditions are listed in Table 1. The number of samples in each group varied from 32 to 72. Liver necrosis was observed in all the rats exposed to one of the eight chemicals at high doses (Table 2). However, 1,4-dichlorobenzene is an isomer of 1,2-dichlorobenzene and is non-toxic at low and mid doses. Since each compound has its unique chemical structure and properties, the activated biological processes, molecular pathways, extent of toxicity in the liver and injury could be highly similar or dissimilar. Considering these variables, we partitioned the columns of the gene expression data matrix into eight groups (one for each chemical) for analysis using Coherent Co-expression (cc)-Biclustering.
Identification of co-expressed genes based on ALT enzyme levels Each group of chemicals included a set of samples with different exposure conditions (different doses and time points). We used ALT as a profile to supervise the extraction of the genes that are expressed similarly and correlated with the phenotypic marker. ALT values in each of the chemical exposures were generally elevated from 6, 24, to 48 hrs and from low doses to high doses. For a probability threshold (pt) value of 0.001, supervised cc-Biclustering extracted 84 biclusters in which the genes were correlated (r-value >= 0) to ALT and 76 biclusters in which the genes were anti-correlated (r-value < 0) to ALT. The number of genes in each bicluster varied. The largest number of genes in a bicluster was 713. The number of groups (chemicals) in the biclusters had a range from a minimum of 1 to a maximum of 8. The biclusters which contained only one chemical could represent the uniqueness in gene expression for the agent and the biclusters which contained all eight chemicals could provide some clues to the common responses to the exposures across the agents. Figure Figure11
The Gene Ontology analysis of three biclusters from two chemicals, galactosamine and thioacetamide, revealed interesting categorization of the genes (see Additional file 1 Figure S1 and Table S1). The 401 genes in a bicluster subset by galactosamine over-represented a KEGG pathway related to extracellular matrix (ECM) receptor interaction (Figure S1A). Another bicluster of 364 genes subset by thioacetamide had a different set of significantly over-represented categories representing the Wnt signalling pathway and glycerophospholipid metabolism (Figure S1B). The third bicluster of 289 genes subset by both chemicals had biological processes and KEGG pathways related to neuron development and cell adhesion as over-represented categories (Figure S1C). Identification of co-expressed genes that discern isomers and an early response Although the blood serum level of ALT is a good indicator of liver injury, it is not considered a prognosticator of toxic insult as the true nature and extent of the liver damage is not proportional to the elevation in the serum enzyme activity [18]. For instance, a normal ALT level does not necessarily mean that the liver is definitely normal and high levels of ALT in the blood doesn't necessarily indicate the extent to which the liver is inflamed or damaged. In addition, it is likely that ALT levels elevate well after the mechanistic changes have occurred that led to the liver injury from toxic exposure. Therefore, we set out to use cc-Biclustering in an unsupervised fashion (without ALT) so we could find biclusters of genes that may be unrelated to ALT or respond before ALT elevation and yet are very informative in terms of the manifestation of hepatotoxicity. We obtained more biclusters (~ three times more) using unsupervised cc-Biclustering when the same threshold pt was used as in the supervised case. Similar to the biclusters in Figure Figure1,1
Heat maps of the gene expression from three biclusters subset by seven chemicals (excluding 1,4-dichlorobenzene), are shown in Figure Figure4.4
Discussion and Conclusion Over the recent years several ways of investigating drug-induced hepatotoxicity has been explored. Using gene expression analysis of samples exposed to toxicants offers a genome-wide assessment of the transcriptional changes that occur from the insult. However, current methodologies for analyzing the data are limited in that they typically do not "anchor" the changes in gene expression to the phenotype of toxicity nor do they constrain them by the experimental design. Case in point is hierarchical clustering of genes and samples based on gene expression data. Although the method provides an overall view of the clusters of genes that are co-expressed within a group of highly similar samples, it does not extract subclusters of co-expressed genes that are related to a given phenotype, end-point measure of the samples or subset of experiments. Yoon et al. [17] proposed a method for discovering coherent biclusters from gene expression data using decision diagrams constituted from binary representations of a set of samples in which the expression of a subset of genes are highly similar (coherent). However, the method does not integrate phenotypic data nor is it guided by the experimental design (i.e. constraints imposed by a time series or dose response study). Linking co-expressed genes to a phenotype of interest or set of experimental conditions can potentially enhance the interpretation of the biological systems that are impacted from the manifestation of an outcome [22]. We analyzed a compendium gene expression dataset containing 318 liver samples from rats exposed to hepatotoxicants and leveraged alanine aminotransferase (ALT), a serum enzyme indicative of liver injury as the phenotypic marker, to identify several biological processes and molecular pathways that may be associated with mechanisms of hepatotoxicity. Our analysis used an approach to biclustering called Coherent Co-expression Biclustering (cc-Biclustering) for clustering of a subset of genes through a coherent (consistency) measure within each group of samples representing a subset of experimental conditions. Existing biclustering methods use some measure of merit to determine whether a row (gene) or column (sample) should be included or excluded from a bicluster [23]. cc-Biclustering uses a given coherent measure (CM) to determine whether a gene or a group of samples is included or excluded from a bicluster. The CM used between pairs of gene vectors (or gene vectors with a phenotypic profile) is flexible. Depending on a research interest, CM can be chosen to be Pearson correlation, Euclidean distance or some other measure of (dis)similarity. Unsupervised cc-Biclustering uses a pairwise comparison of the gene expression profiles to extract biclusters. In the case of supervised cc-Biclustering, we correlated the co-expression of the subset of genes within a bicluster with ALT. The overlap between the up-regulated genes from the supervised and unsupervised cc-Biclustering methods is 61 genes (Additional file 1; Figure S2) where as the overlap between the down-regulated genes from the two methods is 78 genes (Additional file 1: Figure S3). The sharp contrast between biclusters in Figure Figure11 Comparison of the over-represented categories between the genes in the biclusters shown in Figures Figures33 Although the data from each array is independently collected, in a vast number of cases, many of the related biological samples are highly correlated. For example, some samples are from the same tissue, or treated by the same chemical, or a time series. Same samples are merely biological replicates. In our study we analyzed rat liver samples exposed to one of eight chemicals. Each of the hepatotoxicants has its unique chemical structure and causes different levels and extent of liver injury (Table 2). The activated biological processes and molecular pathways can be highly similar or dissimilar. The gene sets involved in responding to each of the hepatotoxicants can overlap to a large extent or be somewhat unique. Using our biclustering approach we were able to capitalize on the experimental design and/or phenotypic measure to discern possible mechanisms of hepatotoxicity through the over-representation of pathways and biological processes determined by co-expressed genes in cc-Biclusters. More work is underway to efficiently process the wealth of biclusters extracted for a more informed interpretation of the mechanistic changes that take place during the formation of hepatotoxicity. Methods Gene expression data Male 12 week old F344 Fischer rats were individually exposed to one of the following chemicals: 1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine or diquat dibromide. All eight chemicals were studied using standardized procedures, i.e. a common array platform, experimental procedures and data retrieving and analysis processes. For details of the experimental design see Lobenhofer et al[12]. Briefly, for each chemical, four to six male rats were exposed to a low dose, mid dose(s) or a high dose of the chemical and sacrificed at 6, 24 or 48 hrs later. At necropsy, the liver of the rats were harvested for RNA extraction. A time-matched vehicle control pool was made for each chemical by pooling equal amounts of RNA from each of the control animals. Each treated animal was hybridized against a time-matched control pool to the Agilent Rat Oligonucleotide Microarray with a dye-swap technical replicate. Fluorescence intensities were measured with an Agilent DNA Microarray Scanner and processed with the Agilent Feature Extraction software version A.7.5.1 for bromobenzene and 1,2-dichlorobenzene arrays and version 8.1.1.1 for the others. See the header of the Agilent data files for more details of the scanning and data acquisition hardware and software parameters. The 318 samples and exposure conditions are listed in Table 1. Liver necrosis was observed in all the rats exposed to one of the eight chemicals at high doses. However, 1,4-dichlorobenzene is an isomer of 1,2-dichlorobenzene and is non-toxic at low and mid doses. Since each compound has its unique chemical structure and properties, the activated biological processes, molecular pathways, type of toxicity in the liver and injury could be highly similar or dissimilar. Considering these variables, we partitioned the columns of the gene expression data matrix (Additional file 2) into eight groups (one for each chemical) for analysis using cc-Biclustering (Additional file 3). The number of samples in each group varied from 32 to 72. The data is publicly available at the Gene Expression Omnibus (GEO) database http://www.ncbi.nlm.nih.gov/geo/ under series GSE15785 and at the Chemical Effects in Biological Systems (CEBS) database http://cebs.niehs.nih.gov under accession number 001-00001-0020-000-4. Clinical chemistry At sacrifice, blood was collected into serum separation tubes (BD Microtainer® Tubes, BD, Franklin Lakes, NJ) and serum was separated. Clinical chemistry analyses of alanine aminotransferase (ALT) was performed on all rats at study termination. Serum levels of the established liver injury marker ALT (Additional file 4) increase when the liver shows inflammation, injury or hepatotoxicity. cc-Biclustering Rather than obtaining coherent measures across the whole vector of gene expression, cc-Biclustering uses the sample information in the study design of the experiment to partition the M columns into J groups according to a given phenotypic response or biological study of interest. For instance, as shown in the next section where an expression matrix has eight different chemicals, each group corresponds to a chemical with which samples (biological replicates included) are treated at different doses and time points. A gene expression value of an element in matrix A now has four indices, i.e.
where row index i is from 1 to N number of genes; the column index breaks up into three sub-indices, j, k, and l. The index j is from 1 to J number of groups (i.e., each of the groups corresponds to a chemical). The index k is from 1 to K number of treatments within a group (i.e., corresponding to a given chemical, a treatment is exposure to a given dose at a given time point). The index l is from 1 to L, where L is the number of biological replicates. The general idea of cc-Biclustering is to map matrix A to a binary coherent matrix H(hi, j) according to an inclusion\exclusion criterion function. The H matrix has N rows and J columns. The advantages of bracketing samples or conditions into a set of groups are four-fold. (1) A coherent measure of expression is computed within a group to determine if a gene aij is included in a bicluster, where aij is a vector of expression values of ith row and jth group. (2) Gene expression aij can be time and/or dose series derived. If there were several replicates at a given time and dose, aij is evaluated to determine if it is differentially expressed and should be included in a bicluster. (3) The coherent measure (CM) used between pairs of gene vectors ai1,j and ai2,j (or phenotypic profile Sj) is flexible. Depending on a research interest, CM can be chosen to be Pearson correlation, Euclidean distance, or some other measure of biologically relevant similarity. (4) The cc-Biclustering algorithm is simple. Like many coherent value based models (such as an additive model, multiplicative model or based on a given CM) cc-Biclustering converts a gene expression matrix A to a binary matrix H with a given p-value threshold. The extracted biclusters then have constant rows and columns [15]. We used the DAVID database (April 2008 version) for Gene Ontology (January 2008 download) biological processes and KEGG pathway (January 2008 download) analyses of the genes within the biclusters and used p-values for over-representation of these biological categories based on a one-tailed Fisher exact probability or EASE score [32]. The supervised and unsupervised cc-Biclustering algorithms are described in more detail and depicted in pseudo code (Additional file 1). A comparison of the analysis of a microarray gene expression data set from Arabidopsis thaliana samples treated with various conditions and times of exposure using unsupervised cc-Biclustering and the Cheng and Church biclustering algorithm [33] with parameters δ = 0.5, α = 1.2 and output = 10 biclusters revealed that our method produced the top 3 biclusters of genes (by size) which are highly correlated within the subset of samples that shared similar exposure conditions with a Pearson correlation in the range of +0.73 to +0.84 whereas the other method produced the top 3 biclusters of genes with correlation in the range of +0.01 to +0.55. Authors' contributions JWC and PRB designed the strategy of cc-Biclustering. JWC implemented the cc-Biclustering algorithm, analyzed the data and wrote part of the manuscript. PRB provided suggestions, advice and guidance for the concept of the research, interpreted the results and also wrote part of the manuscript. The authors have read and approved the final manuscript. Additional file 1 Supervised and unsupervised cc-Biclustering details, pseudo code and supplemental figures and table. Click here for file(177K, pdf) Additional file 2 Two-dimensional matrix of the gene expression ratio data (data preprocessed and fluor-flips per biological sample merged [averaged]). Click here for file(26M, zip) Additional file 3 Sample information denoting biological replicates for cc-Biclustering. Click here for file(23K, txt) Acknowledgements We thank the National Center for Toxicogenomics at the National Institute of Environmental Health Sciences (NIEHS) for the hepatotoxicant compendium data. We also thank Jennifer Fostel and David Fargo for their critical review of the manuscript. This research was supported, in part by, the Intramural Research Program of the NIH and NIEHS. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
Ann Intern Med. 2002 Dec 17; 137(12):947-54.
[Ann Intern Med. 2002]N Engl J Med. 2006 Feb 16; 354(7):731-9.
[N Engl J Med. 2006]Nat Rev Drug Discov. 2005 Jun; 4(6):489-99.
[Nat Rev Drug Discov. 2005]Toxicol Sci. 2005 Aug; 86(2):453-69.
[Toxicol Sci. 2005]Environ Health Perspect. 2007 Apr; 115(4):572-8.
[Environ Health Perspect. 2007]Nucleic Acids Res. 2008 Jan; 36(Database issue):D892-900.
[Nucleic Acids Res. 2008]Genome Biol. 2008; 9(6):R100.
[Genome Biol. 2008]Proc Natl Acad Sci U S A. 2007 Nov 13; 104(46):18211-6.
[Proc Natl Acad Sci U S A. 2007]BMC Bioinformatics. 2008 Apr 23; 9():210.
[BMC Bioinformatics. 2008]IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec; 2(4):339-54.
[IEEE/ACM Trans Comput Biol Bioinform. 2005]Genome Biol. 2008; 9(6):R100.
[Genome Biol. 2008]Comp Hepatol. 2003 Jan 31; 2(1):3.
[Comp Hepatol. 2003]Hepatology. 2006 Feb; 43(2 Suppl 1):S45-53.
[Hepatology. 2006]IEEE/ACM Trans Comput Biol Bioinform. 2005 Oct-Dec; 2(4):339-54.
[IEEE/ACM Trans Comput Biol Bioinform. 2005]BMC Genomics. 2008 Sep 16; 9():419.
[BMC Genomics. 2008]Bioinformatics. 2007 Jan 1; 23(1):50-6.
[Bioinformatics. 2007]BMC Genomics. 2008 Jun 16; 9():288.
[BMC Genomics. 2008]Toxicol Sci. 2004 Jul; 80(1):193-202.
[Toxicol Sci. 2004]Environ Health Perspect. 2006 Apr; 114(4):553-9.
[Environ Health Perspect. 2006]Mol Cell Biol. 2008 Apr; 28(7):2201-12.
[Mol Cell Biol. 2008]BMC Genomics. 2008 Jun 16; 9():288.
[BMC Genomics. 2008]Genome Biol. 2008; 9(6):R100.
[Genome Biol. 2008]IEEE/ACM Trans Comput Biol Bioinform. 2004 Jan-Mar; 1(1):24-45.
[IEEE/ACM Trans Comput Biol Bioinform. 2004]Genome Biol. 2003; 4(5):P3.
[Genome Biol. 2003]Proc Int Conf Intell Syst Mol Biol. 2000; 8():93-103.
[Proc Int Conf Intell Syst Mol Biol. 2000]