• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jun 14, 2005; 102(24): 8621–8626.
Published online Jun 6, 2005. doi:  10.1073/pnas.0407672102
PMCID: PMC1143582
Genetics

Identification of coexpressed gene clusters in a comparative analysis of transcriptome and proteome in mouse tissues

Abstract

A major advantage of the mouse model lies in the increasing information on its genome, transcriptome, and proteome, as well as in the availability of a fast growing number of targeted and induced mutant alleles. However, data from comparative transcriptome and proteome analyses in this model organism are very limited. We use DNA chip-based RNA expression profiling and 2D gel electrophoresis, combined with peptide mass fingerprinting of liver and kidney, to explore the feasibility of such comprehensive gene expression analyses. Although protein analyses mostly identify known metabolic enzymes and structural proteins, transcriptome analyses reveal the differential expression of functionally diverse and not yet described genes. The comparative analysis suggests correlation between transcriptional and translational expression for the majority of genes. Significant exceptions from this correlation confirm the complementarities of both approaches. Based on RNA expression data from the 200 most differentially expressed genes, we identify chromosomal colocalization of known, as well as not yet described, gene clusters. The determination of 29 such clusters may suggest that coexpression of colocalizing genes is probably rather common.

Keywords: coexpression and colocalization, comparative expression profiles

Most biochemical processes within and between cells are put into effect by the interaction between proteins, or between proteins and their substrates (1-3). The proteome of a cell is the result of controlled biosynthesis and, therefore, is largely (but not exclusively) regulated by gene expression (4). Vice versa, the transcriptome can be regarded as a sensitive read-out of the proteome or the biochemical state of the cell. Thus, transcriptome and proteome feed back to each other in a highly complex way. The understanding of this functional regulation is generally limited to distinct signaling or metabolic pathways. To begin to understand the mutual regulatory interactions between transcriptome and proteome, a comparative approach including the simultaneous monitoring of expression at the RNA and protein levels will be required.

The basic technologies for genome-wide expression analyses at the mRNA (5-7) and protein levels (8-10) are available. Transcript profiling was used to assess normal variability in gene expression levels of mouse liver, kidney, and testis (11) and to analyze changes in expression patterns during embryonic and fetal liver development (12). So far, comparative transcriptome and proteome analyses in complex organisms are very limited and have been performed in human platelets (13) and heart tissue (14), and in the Anopheles and Culex salivary glands (15, 16). In rodents, the proteome of mouse primary islet cells was correlated with RNA expression data of purified primary rat beta cells, suggesting a close correlation between mRNA and protein expression (17). A parallel analysis of transcripts and proteins at a genomic scale in identical mouse tissue samples has not been performed.

We use DNA chip-based expression profiling, 2D gel electrophoresis, and subsequent peptide mass fingerprinting (PMF) to explore the general feasibility of such a comparative gene expression analysis. A comparison of RNA and protein expression profiles from adult male mouse liver and kidney was made. The choice of different tissues provided a large set of differentially expressed proteins and genes. We used this set of differential expression profiles as a tool to address three major questions. (i) Does protein expression correlate with transcriptional regulation for the most differential proteins? (ii) Do transcriptomics and proteomics approaches detect functional categories with different preferences? (iii) Does coregulated gene expression correlate with colocalization in the genome?

Materials and Methods

Mouse Tissues. Breeding of wild-type C3HeB/FeJ mice was under specified pathogen-free conditions. Left kidney and dorsal lobe of the liver were collected at the age of 105 days (+/-5 days) from male mice, killed between 9:00 a.m. and 12:00 noon by CO2 asphyxiation. Organs were immediately frozen in liquid N2.

Protein Isolation. For pH gradient 4-7, 50 mg of tissue was ground in liquid N2. Ten milligrams was dissolved in 200 μl of lysis buffer {7 M urea/2 M thiourea/2% DTT/4% CHAPS (3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate)/0.8% Pharmalyte 3-10} and sonicated for 10 cycles for 1 s (60 W). The sample was kept shaking for 30 min at 25°C, and centrifuged for 5 min at 20,000 × g. Protein concentration was determined by a modified Bradford method. Two hundred fifty micrograms of protein was loaded onto each 4-7 immobilized pH gradient strip.

For pH gradient 6-11, 15 mg of the tissue powder was suspended in 200 μl of 4°C trichloroacetic acid (TCA)/acetone 20% (vol/vol)/50% (vol/vol). After sonication for 15 min (30 W) in a 4°C water bath, the suspension was diluted with 1.2 ml of TCA/acetone 20% (vol/vol)/50% (vol/vol), vortexed for 2 min, kept at 4°C for 16 h and centrifuged for 30 min at 20,000 × g (4°C). The pellet was washed twicein200 μl of acetone, sonicated in a 4°C water bath for 20 min, centrifuged for 30 min at 20,000 × g (4°C), resuspended in 200 μl of lysis buffer, and sonicated 10 times for 1 s (60 W) on ice. Samples were kept shaking at room temperature for 45 min and spun down for 5 min at 20,000 × g. A modified Bradford protein determination was done. Three hundred micrograms of protein was loaded onto 6-11 immobilized pH gradient strips.

2D Gel Electrophoresis (2-DE). Isoelectric focusing (IEF) was done with 18-cm immobilized pH gradient strips from Amersham Pharmacia Bioscience. For each sample, five gels with gradients pH 4-7 and 6-11 were made. After focusing to the steady state, the strips were loaded with SDS and equilibrated in DTT and iodacetamide (18).

The second dimension was performed as SDS/PAGE. T12% and C2.8% SDS gels were run vertically in a Höfer ISO-Dalt chamber by using the Laemmli buffer system. The SDS/PAGE was stopped when the bromophenol blue front ran off the gel; 1,800-2,000 Vh were applied. Gels were stained with Sypro Ruby and scanned with a Fuji fluorescence scanner. Mastergels were made from five replicas of one tissue in each gradient and matched (proteomweaver, Definiens AG, Munich). Statistical calculations were performed allowing a standard deviation of <30% and a confidence level of 0.05 in the t test. Coomassie-stained micropreparative gels were run with 500 μg of protein per gel.

PMF MALDI-TOF. Proteins were identified by PMF MALDI-TOF. Spots were picked from SDS gels, washed three times with 10 mM NH4HCO3 and 30% acetonitril (ACN), incubated overnight in 5 μl 25 ng/μl trypsin (Roche Diagnostics)/10 mM NH4HCO3 (pH 8) at 37°C, and sonicated for 20 min at 25°C, and the supernatant was concentrated in a SpeedVac. The solution was processed through a C18 reversed phase ZipTip column (Millipore) by using 0.1% trifluoroacetic acid (TFA) and 80% ACN for elution. Eluted peptides were put on MALDI target and cocrystallized with 1 μl of dihydroxybenzoeic acid. MALDI-TOF analysis (Voyager STR, Applied Biosystems) was done in reflector mode in the mass range of 700 to 4,000 daltons. Spectra were matched with the National Center for Biotechnology Information database to identify the corresponding protein.

DNA Chip Expression Profiling. Total RNA was isolated according to manufacturer's protocols by using RNeasy kits (Qiagen, Hilden, Germany). The concentration of total RNA was measured by OD260/280 reading. Per DNA chip, 20 μg of total RNA were used for reverse transcription and indirectly labeled with Cy3 or Cy5 fluorescent dyes according to The Institute for Genomic Research (TIGR) protocol as described (19). Probes were PCR amplified from the 20,000 (20k) mouse arrayTAG clone set as described (20). Amplified probes were dissolved in 3× SSC and spotted on aldehyde-coated slides (CEL Associates, Pearland, TX) by using the Microgrid TAS II spotter (Biorobotics Genomic Solutions, Huntingdon, U.K.) with Stealth SMP3 pins (Telechem, Sunnyvale, CA). Spotted slides were rehydrated, blocked, denatured, and dried as described (19, 21). The hybridization mixture was placed on prehybridized microarrays and hybridized at 42°C for 18-20 h. Microarrays were immersed in 40 ml of 3× SSC and then successively washed in 40 ml of 1× SSC, 40 ml of 1× SSC/0.1% SDS, and 40 ml of 0.1× SSC at room temperature as described (19). Dried slides were scanned with a GenePix 4000A scanner and analyzed by using the genepix pro 3.0 image processing software (Axon Instruments, Union City, CA). Gene expression data have been submitted to the Gene Expression Omnibus database.

Simulation of Gene Distribution. The probability to obtain clusters was first simulated by generating random distributions of spots along the genome. The second simulation is based on the random selection of genes from the published mouse genome sequence. To reduce redundancy in this list (University of California, Santa Cruz (UCSC) Genome Browser, known genes track), we filtered those genes that start before the middle of a previous gene. The upper confidence bounds were derived under the standard binomial model from the number of successful simulations. The procedure was written in the statistical language r and is available under http://ibb.gsf.de/homepage/volkmar.liebscher/genom/mousesim.html.

Results

Differential Transcriptome of Mouse Liver and Kidney. To analyze the differential transcriptome of mouse liver and kidney, RNA expression profiling was performed with cDNA microarrays (22) containing a sequence-verified 20,200 mouse clone set (19, 21). Sixteen dual-color DNA chip hybridizations of cDNAs from age-matched C3HeB/FeJ male mice were made (Table 1, chips 1 a-f, 2 a-f, and 3 a-d, which is published as supporting information on the PNAS web site). For each individual mouse, six or four replicate hybridizations were done. Between 16,092 and 19,592 probes had detectable signals in individual chips, and 9,042 probes had signals in all microarrays (Table 1).

We first analyzed the significance of genes with signals on all 16 microarrays. Based on expectations from random permutations of genes and expression ratios, the selection of the top 1,802 differentially expressed genes would include one or more reproducibly regulated (false positive) genes by chance with P < 0.01 (Table 1). Of the 1,802 differentially expressed genes, 821 were more abundant in liver than in kidney, and 981 genes were more abundant in kidney as compared with liver (fully listed in Table 2, which is published as supporting information on the PNAS web site). In addition, genes were ranked based on the lowest absolute signal intensity ratio in 16-chip hybridizations regardless of reproducibility. Although this criterion does not a priori select constant gene expression patterns, no nonreproducible gene expression, in terms of inconsistencies in up- or down-regulation in 16 repetitions, was found within the 470 strongest differentially expressed genes (Table 3, which is published as supporting information on the PNAS web site). Based on statistics, we expect this selection to contain one or more nondifferentially expressed (NDE) genes with a significance level of P < 0.01. The numbers of actual nonreproducible genes and expected NDE (false positive) genes for P < 0.01 are given for different gene selections in Table 3, confirming the reliability of the data. Additional confidence in the gene expression data is gained from the fact that independent probes for the same gene result in similar expression ratios (see, for example, probes for Cai, Scp2 Mup, Car3, Arg-1, and Akr1a4 in Table 4, which is published as supporting information on the PNAS web site). Also, the specificity of the probes used on our DNA chip was recently assessed experimentally (20).

Colocalization of Differentially Expressed Genes. Analyzing the chromosomal localization we found that the orthologues of the human proximal SERPIN subcluster were reproducibly coexpressed (23). The genes Serpina10 (rank no. 60 in Table 2), Serpina6 (rank no. 110), Serpina1b (rank no. 31), Serpina1d (rank no. 61), Serpina1a (rank no. 75), and Serpina1e (rank no. 53) were strongly expressed in liver but not in kidney.

To make out other potential clusters of coregulated genes, we systematically analyzed the chromosomal localization of the top 200 differentially expressed genes (Fig. 1). The localization was determined by blasting these probe sequences over the October 2003 assembly of the mouse genome by using mouseblat on the UCSC Genome Browser (24).

Fig. 1.
Chromosomal localization of the top 200 differentially expressed genes based on DNA chip expression data. The top 100 genes relatively more abundant in liver are shown in red, and the top 100 genes with higher expression in kidney than in liver are shown ...

We identified 25 genomic regions containing two or three coexpressed genes within <1 Mb (numbered 1 to 25 in Fig. 1) and four regions with at least four coregulated genes within <2 Mb (labeled A to D in Fig. 1). Using in silico simulations (n = 10,000) of the random distribution of 200 points (“genes”) in 2.5 Gb, the size of the mouse genome (25), we derived the upper 95% confidence bound for the probability to obtain at least 29 regions of 1 Mb with at least two genes by chance of P < 0.0005 and to obtain four or more regions with at least four genes in 2 Mb of P < 0.0005. This simulation includes some simplifications, such as neglecting the size of a gene in relation to the genome and assuming an equal distribution of genes along the genome. Thus, we ran a second simulation that is based on the published mouse genome sequence and annotation. For this simulation, we analyzed 10,000 distributions of 200 randomly selected genes from the list of all known genes in the Mouse May 2004 Assembly (26, 27). For each run, we recorded the frequency with which we find at least 29 (resp. 4) nonoverlapping windows of 1 Mb (respectively 2 Mb) containing at least two (respectively 4) genes. The 95% confidence bounds show that the colocalization is significant because the probability to find 29 small cluster (1 Mb, at least two genes) or to find four larger cluster (2 Mb, at least four genes) by chance is P < 0.03 or P < 0.007, respectively.

Genes of at least 10 clusters are paralogous genes of eight gene families: Carbonic anhydrases (Car, cluster 2), Fibrinogens (Fg, cluster 3), Apolipoproteins (Apo, cluster 6 and 13), Cytochrome P450 family 2 (Cyp2, clusters 7 and 20), Kallikreins (Klk, cluster 8), Serine protease inhibitors (Serpin, cluster D), Interalpha trypsin inhibitors (Itih, cluster 18), and Solute carriers (Slc, cluster 24).

To characterize some clusters of coexpressed genes in more detail, we included in our analysis genes in the intergenic regions as well as genes flanking clusters. For example, the Apolipoprotein cluster on mouse chromosome 7 has an evolutionary-conserved arrangement in mouse and man (28, 29). In the mouse, Apoe, Apoc1, Apoc4, and Apoc2 are localized within an interval of ≈20 kb with the same transcriptional orientation. Genes of this Apolipoprotein cluster are expressed stronger in liver than in kidney: Apoe is represented by two probes on our DNA chip (rank nos. 72 and 25, Table 2), and Apoc1, Apoc4, and Apoc2 are each represented by one probe (rank nos. 2, 225, and 104, Table 2). The downstream and upstream flanking genes of this Apolipoprotein cluster, Tomm40 and Clptm1, are also represented each by one probe (data not shown). However, they are not differentially regulated between liver and kidney, suggesting that the regulation is confined exclusively to genes of the Apolipoprotein cluster.

Differential Proteome of Mouse Liver and Kidney. Samples from mouse 1 (Tables 1 and 3) were divided such that RNA and protein data were obtained from the identical sample. A total of 2,445 spots were detected in the liver proteome compared with 2,261 spots in the kidney proteome (Fig. 2A). To detect differential protein expression, the quantification of each spot was compared between both organs. With a factor at least 1.5-fold, 366 spots were more abundant in kidney as compared with liver, and 439 spots were more abundant in liver than in kidney.

Fig. 2.
Mastergels of mouse liver and kidney protein extracts (A) and example for the identification of a differential protein signal (B). (A) Each image is a digital mastergel of five 2D gels of identical protein extracts. Proteins more abundant in either liver ...

For subsequent PMF, spots were selected based on stringent criteria allowing a standard deviation of <30% in five replicates and a confidence level of P < 0.05 in the t test (Fig. 2). Based on these criteria, 47 differential spots were selected for protein identification. Seven spots consisted each of two distinct proteins (Fig. 2A). Mass fingerprinting of three spots did not lead to the identification of known proteins, resulting in 51 independent protein identifications (33 in liver and 18 in kidney).

Six proteins (Krt1-18, Cps1, Mup1, Car3, Vil, and Akr1a4) were identified within either two or three individual spots, suggesting that these major differential proteins are present in different isoforms or with different posttranslational modifications. The 51 identified proteins thus represent 43 distinct proteins (fully listed in Table 4).

Many of the major differentially expressed proteins are characteristic markers for the tissues analyzed. Villin (Vil) is a structural protein localized in the microvilli of brush borders of proximal kidney tubules (30, 31), and aldehyde reductase (Akr1a4), with previously reported strongest transcription in kidney, is functionally involved in the detoxification of reactive aldehyde intermediates (32, 33). In liver, expression of the intermediate filament Keratin 18 (Krt1-18, keratin-type I-cytoskeletal) was previously described in epithelia, and mutations in Keratin 18 have been identified as risk factors for developing liver disease of multiple etiologies (34, 35). The hepatocyte-restricted expression of carbamoyl phosphate synthetase I (Cps1) restricts the urea cycle to liver (36, 37). Carbonic anhydrase 3 (Car3) and major urinary protein 1 (Mup1) are known for their expression and physiological function in liver (38-40). The finding that a considerable number of the identified proteins are characteristic markers for the examined tissues gives confidence in the differential protein data.

Assessment of Transcript and Protein Functions. To compare the functions of differential transcripts and proteins, we collected the functional annotations (biological process and molecular function) of all identified proteins and the top 100 differentially expressed genes in the Mouse Genome Informatics database (Fig. 3 and Table 4). More than 70% of the identified proteins were annotated as metabolic enzymes or associated with biosynthesis. The majority of the remaining functionally annotated proteins were either transport (11% in liver and 6% in kidney) or structural (7% in liver and 6% in kidney) proteins (Fig. 3). The functional categories among transcripts were less dominated by metabolic enzymes (12% in liver and 14% in kidney) and comprised more diverse functional annotations. The latter included genes coding for proteases and protease inhibitors and proteins associated with apoptosis. In particular, 14% of the transcripts relatively abundant in liver had various unique functions. Among differential transcripts, 22% in liver and 36% in kidney were from genes without a functional annotation in the Mouse Genome Informatics database. In contrast, among the identified proteins, 8% in liver and 0% in kidney had no functional annotation (Fig. 3). Because the functional annotations for both proteins and transcripts were derived from the same database and because the distribution of categories of functions is different for proteins and transcripts, the distribution of functional categories does not merely reflect the makeup of the database. Instead it is conceivable that the differences in the relative abundance of functional categories are due to different preferences of the proteomics and transcriptomics methods (see also Discussion).

Fig. 3.
Pie charts of functional categories for genes more abundant in liver or kidney at the protein or transcript level. Data are based on the top 100 differentially expressed transcripts and the 43 identified proteins. Standardized Gene Ontology (GO) classifications ...

Transcriptional Versus Translational Regulation. We assessed the differential expression at the RNA level of those proteins that were identified as relatively abundant in either organ. Of the 43 identified proteins, 37 were represented by at least one probe on our microarray (Table 4). Nine genes were represented by two (Cai, Scp2, Actr3, Car3, Arg-1, Hmgcs2, and Akr1a4), three (Acox1), or four (Mup gene family) probes (Table 4).

In liver, 18 of the 24 proteins (75%) for which DNA probes were present were also significantly more abundant at the transcript level in liver as compared with kidney. In addition, for three genes (Rad23b, Krit1-18, and Gpt1), DNA chip experiments suggested reproducible up-regulation on all slides on which spots could be identified. For two genes (Rnf20 and Actr3) that were highly expressed at the protein level, RNA expression profiling did not indicate differential regulation (Table 4). For the unknown gene, gb|BC026366, only 2 of 16 chips resulted in hybridization signals, not allowing assessment of the transcriptional regulation.

Thirteen of the 18 proteins relatively abundant in kidney were represented by at least one probe on the DNA chip. Five of these genes (38%) were also significantly more abundant at the RNA level in kidney as compared with liver (Atp6v1b2, Arbp, Akr1a4, Oxct, and Tpi). In addition, transcripts for two genes (Vil and Ldh1) were more abundant in kidney as compared with liver but were either not detected on all DNA chips or were reversely regulated on one of 16 chips (Table 4). Acox1 had a tendency to be stronger expressed in kidney based on DNA chip data. Fumarate hydratase 1 (Fh1) and fumarylacetoacetate hydrolase (Fah), both major proteins detected in kidney, were strongly transcribed in liver but not in kidney, indicating reverse regulation on the transcript and protein levels (Table 4). DNA chip expression profiling did not suggest differential regulation of the remaining three genes (Mtx2, Ak1, and Dnahc11).

Taken together, of the 37 proteins that were also represented by a probe on the microarray, 29 genes (79%) were either clearly regulated with same tendency on all chips (18 in liver and 5 in kidney) or on most chips (3 in liver and 3 in kidney). There was evidence for no transcriptional regulation of five genes and for reverse regulation of two differentially expressed proteins (Fig. 4). DNA chip data did not allow assessment of transcript regulation for one differentially expressed protein, possibly due to low gene expression levels. Although transcriptional and translational regulation correlate positively for the majority of genes, the comparative approach also demonstrates that some proteins are either transcriptionally not differentially regulated or show a reverse transcriptional regulation.

Fig. 4.
Transcript expression of differential proteins. Twenty-nine genes had the same tendency at the transcript level (orange), five were not differentially regulated at the transcript level (pink), two were reversely regulated (light blue), and one was weakly ...

Discussion

Using DNA chip-based expression profiling with >20,200 probes, we identified >1,800 transcripts differentially regulated with high statistical significance between mouse liver and kidney. 2D gel electrophoresis detected around 2,300 spots in each organ. About 800 spots were regulated with a factor of at least 1.5-fold. PMF of 47 isolated spots resulted in the identification of 43 distinct differential proteins. We used this rather comprehensive gene expression data set as a tool to (i) evaluate functions of differential transcripts and proteins, (ii) relate transcriptional and posttranscriptional regulation, and (iii) map differential transcripts to the mouse genome.

The comparison of the functional annotation of the major differential proteins and transcripts suggests that protein and transcript detection methods reveal functional categories with different preference. Metabolic enzymes constitute the largest fraction of identified proteins. A minor fraction is associated with other functions such as transport or structure. These observations corroborate similar findings made, for example, in the analysis of the mouse brain proteome (9, 10). In contrast, differential transcripts have more diverse functions (Fig. 3). On one hand, the relatively low number of diverse functional groups at the protein level may be due to current limitations of the proteome analysis method. We estimate the detection limit of the proteomics approach to at least 1,000 copies of a protein per cell. The proteins detected by 2D gel electrophoresis represent the most abundant proteins. In addition, we selected the most differential spots for protein identification. This experimental limitation is probably one important reason why the detected proteins mostly have metabolic functions. Thus, regarding differences in protein expression, a major distinction between liver and kidney cells seems to be the set of metabolic enzymes activated in the respective tissue. The better sensitivity of DNA chip expression profiling may be one reason why the differential transcripts have more diverse functions. The latter included 22% (liver) and 36% (kidney) novel genes and genes without functional annotation. Thus, DNA chip-based transcriptome analysis may also be an efficient method for the identification of novel disease-associated genes (41, 42).

The comparative approach opened the possibility to relate regulation at the transcript and posttranscriptional levels. In our experimental set-up, we can easily analyze the expression at the transcript level of differentially expressed proteins because all probes on our DNA chip have been sequenced. The reverse, finding the corresponding protein for a differential transcript on the 2D gel, would require specific antibodies or a systematic PMF analysis of all spots on 2D gels. The majority of the differential proteins was also regulated with the same tendency in DNA chip analyses (Fig. 4). This observation suggests that, at least for the most differential proteins, gene expression at the transcript level correlates well with protein expression. Similarly, a close correlation between mRNA and protein expression was suggested in rodent pancreatic islets cells (17) and mitochondria from distinct mouse tissues (43). Five differential proteins (Rnf20, Actr3, Mtx2, Ak1, and Dnahc11) were not regulated at the RNA level, suggesting that the differential expression of these proteins could be due to the stability or differences in secretion or accumulation of these proteins. Moreover, Fh1 and Fah, both abundant proteins in kidney, were strongly transcribed in liver but not in kidney, possibly suggesting different turnover rates or efficiencies of translation in the two tissues. The comparison of gene regulation at the transcript and protein levels thus provides a proof-of-principle for the usefulness of the comparative approach.

Our transcriptome analysis of two functionally diverse tissues led to the identification of >1,000 differentially expressed genes. This high number of regulated genes allowed the assessment of chromosomal colocalizations, resulting in the description of 29 clusters of coexpressed genes. Chromosomal regions of coexpressed genes have also been identified based on expression profiling data in yeast, Caenorhabditis elegans, Drosophila, man, and mouse (44-48). The coregulation of closely linked genes through shared sequence elements in cis (such as enhancers, repressors, insulators, locus control and matrix attachment regions, etc.) has been described for gene families such as apoE, α-globin, β-globin, Hox genes, and others (49-52). Similarly, our expression data identified the proximal Serpin subcluster as linked and differentially regulated genes. The arrangement of these genes is conserved between mouse and man, except that the human SERPINA1 gene has five isoforms in mouse (53). Recently, a control region was identified in the human locus that is required for SERPIN gene activation and for chromatin remodeling of the proximal subcluster (54).

The coregulation of linked genes may be imposed either by sharing cis-regulatory interactions or, alternatively, may be associated with a more general or long-range property of genomic sequences (49, 55, 56). Additional data suggest that at least some of the genes, identified here as coexpressed, may indeed be coregulated through the same regulatory factors. For example, the expression of the Serpin and the Fetuin clusters in liver may at least in part require the same transcription factors. Human HNF3 (Foxa3 in mouse; rank no. 167 in our liver expression data; Table 2) is an essential factor for the transcriptional regulation of many hepatic genes that can affect chromatin structure by displacing linker histones at least in the serum albumin enhancer. It was also suggested to be one of the potential factors regulating expression of SERPIN genes (54, 57, 58). HNF3 binding sites were also identified in the liver-specific FETUIN (AHSG) gene. In the mouse, Ahsg and Fetub are direct neighboring genes within ≈50 kb on chromosome 16 (Chr. 3 in man) (59). Both genes Ahsg (two probes, rank nos. 9 and 11), and Fetub (rank no. 636) were strongly expressed in liver and weakly expressed in kidney (Table 2). Based on these observations, we hypothesize that the colocalization of coexpressed genes in our study may at least in part be of functional relevance.

The clusters of coexpressed genes identified here provide a basis for the identification of common regulatory sequences. They are currently analyzed systematically in a combination of in silico gene- and region-wise, intra- and interspecies comparative approaches. Predictions on regulatory sequences must be followed by functional mutagenesis studies in vivo.

Supplementary Material

Supporting Tables:

Acknowledgments

We thank K. Seidel and S. Schädler for excellent technical assistance. This work was supported by funds from Nationales Genomforschungsnetz and Deutsches Humangenomprojekt (to J.B. and M.H.d.A.) and from Sonderforschungsbereich (SFB) 386 (to J.B. and H.V.L.).

Notes

Author contributions: T.M., A.H., T.H., M.K., and M.H. performed research; T.M., A.H., T.H., M.K., T.M.S., H.V.L., and J.B. analyzed data; F.L., M.H.d.A., and J.B. designed research; and T.M. and J.B. wrote the paper.

This paper was submitted directly (Track II) to the PNAS office.

Abbreviation: PMF, peptide mass fingerprinting.

Data deposition: The sequences reported in this paper have been deposited in the Gene Expression Omnibus database (accession nos. GSE1696 and GPL1413).

References

1. Li, S., Armstrong, C. M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P. O., Han, J. D., Chesneau, A., Hao, T., et al. (2004) Science 303, 540-543. [PMC free article] [PubMed]
2. Ge, H., Liu, Z., Church, G. M. & Vidal, M. (2001) Nat. Genet. 29, 482-486. [PubMed]
3. Walhout, A. J. & Vidal, M. (2001) Nat. Rev. Mol. Cell Biol. 2, 55-62. [PubMed]
4. Kanapin, A., Batalov, S., Davis, M. J., Gough, J., Grimmond, S., Kawaji, H., Magrane, M., Matsuda, H., Schonbach, C., Teasdale, R. D., et al. (2003) Genome Res. 13, 1335-1344. [PMC free article] [PubMed]
5. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., et al. (1996) Nat. Biotechnol. 14, 1675-1680. [PubMed]
6. Pease, A. C., Solas, D., Sullivan, E. J., Cronin, M. T., Holmes, C. P. & Fodor, S. P. (1994) Proc. Natl. Acad. Sci. USA 91, 5022-5026. [PMC free article] [PubMed]
7. Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. (1995) Science 270, 467-470. [PubMed]
8. Klose, J., Nock, C., Herrmann, M., Stuhler, K., Marcus, K., Bluggel, M., Krause, E., Schalkwyk, L. C., Rastan, S., Brown, S. D., et al. (2002) Nat. Genet. 30, 385-393. [PubMed]
9. Gauss, C., Kalkum, M., Lowe, M., Lehrach, H. & Klose, J. (1999) Electrophoresis 20, 575-600. [PubMed]
10. Tsugita, A., Kawakami, T., Uchida, T., Sakai, T., Kamo, M., Matsui, T., Watanabe, Y., Morimasa, T., Hosokawa, K. & Toda, T. (2000) Electrophoresis 21, 1853-1871. [PubMed]
11. Pritchard, C. C., Hsu, L., Delrow, J. & Nelson, P. S. (2001) Proc. Natl. Acad. Sci. USA 98, 13266-13271. [PMC free article] [PubMed]
12. Jochheim, A., Cieslak, A., Hillemann, T., Cantz, T., Scharf, J., Manns, M. P. & Ott, M. (2003) Differentiation (Berlin) 71, 62-72. [PubMed]
13. McRedmond, J. P., Park, S. D., Reilly, D. F., Coppinger, J. A., Maguire, P. B., Shields, D. C. & Fitzgerald, D. J. (2004) Mol. Cell. Proteomics 3, 133-144. [PubMed]
14. Ruse, C. I., Tan, F. L., Kinter, M. & Bond, M. (2004) Proteomics 4, 1505-1516. [PubMed]
15. Valenzuela, J. G., Francischetti, I. M., Pham, V. M., Garfield, M. K. & Ribeiro, J. M. (2003) Insect Biochem. Mol. Biol. 33, 717-732. [PubMed]
16. Ribeiro, J. M., Charlab, R., Pham, V. M., Garfield, M. & Valenzuela, J. G. (2004) Insect Biochem. Mol. Biol. 34, 543-563. [PubMed]
17. Cardozo, A. K., Berthou, L., Kruhoffer, M., Orntoft, T., Nicolls, M. R. & Eizirik, D. L. (2003) J. Proteome Res. 2, 553-555. [PubMed]
18. Gorg, A. (1993) Biochem. Soc. Trans. 21, 130-132. [PubMed]
19. Seltmann, M., Horsch, M., Drobyshev, A., Chen, Y., Hrabě de Angelis, M. & Beckers, J. (2005) Mamm. Genome 16, 1-10. [PubMed]
20. Drobyshev, A. L., Machka, C., Horsch, M., Seltmann, M., Liebscher, V., Hrabě de Angelis, M. & Beckers, J. (2003) Nucleic Acids Res. 31, E1-1. [PMC free article] [PubMed]
21. Beckers, J., Herrmann, F., Rieger, S., Drobyshev, A. L., Horsch, M., Hrabě de Angelis, M. & Seliger, B. (2005) Int. J. Cancer 114, 590-597. [PubMed]
22. Schena, M., Shalon, D., Heller, R., Chai, A., Brown, P. O. & Davis, R. W. (1996) Proc. Natl. Acad. Sci. USA 93, 10614-10619. [PMC free article] [PubMed]
23. Forsyth, S., Horvath, A. & Coughlin, P. (2003) Genomics 81, 336-345. [PubMed]
24. Kent, W. J. (2002) Genome Res. 12, 656-664. [PMC free article] [PubMed]
25. Waterston, R. H., Lindblad-Toh, K., Birney, E., Rogers, J., Abril, J. F., Agarwal, P., Agarwala, R., Ainscough, R., Alexandersson, M., An, P., et al. (2002) Nature 420, 520-562. [PubMed]
26. Karolchik, D., Baertsch, R., Diekhans, M., Furey, T. S., Hinrichs, A., Lu, Y. T., Roskin, K. M., Schwartz, M., Sugnet, C. W., Thomas, D. J., et al. (2003) Nucleic Acids Res. 31, 51-54. [PMC free article] [PubMed]
27. Karolchik, D., Hinrichs, A. S., Furey, T. S., Roskin, K. M., Sugnet, C. W., Haussler, D. & Kent, W. J. (2004) Nucleic Acids Res. 32, D493-D496. [PMC free article] [PubMed]
28. Hoffer, M. J., Hofker, M. H., van Eck, M. M., Havekes, L. M. & Frants, R. R. (1993) Genomics 15, 62-67. [PubMed]
29. van Eck, M. M., Hoffer, M. J., Havekes, L. M., Frants, R. R. & Hofker, M. H. (1994) Genomics 21, 110-115. [PubMed]
30. Pinson, K. I., Dunbar, L., Samuelson, L. & Gumucio, D. L. (1998) Dev. Dyn. 211, 109-121. [PubMed]
31. Athman, R., Louvard, D. & Robine, S. (2003) Mol. Biol. Cell 14, 4641-4653. [PMC free article] [PubMed]
32. Takahashi, M., Fujii, J., Teshima, T., Suzuki, K., Shiba, T. & Taniguchi, N. (1993) Gene 127, 249-253. [PubMed]
33. Allan, D. & Lohnes, D. (2000) Mech. Dev. 94, 271-275. [PubMed]
34. Ku, N. O., Darling, J. M., Krams, S. M., Esquivel, C. O., Keeffe, E. B., Sibley, R. K., Lee, Y. M., Wright, T. L. & Omary, M. B. (2003) Proc. Natl. Acad. Sci. USA 100, 6063-6068. [PMC free article] [PubMed]
35. Moll, R., Franke, W. W., Schiller, D. L., Geiger, B. & Krepler, R. (1982) Cell 31, 11-24. [PubMed]
36. Schofield, J. P., Cox, T. M., Caskey, C. T. & Wakamiya, M. (1999) Hepatology 29, 181-185. [PubMed]
37. Su, A. I., Cooke, M. P., Ching, K. A., Hakak, Y., Walker, J. R., Wiltshire, T., Orth, A. P., Vega, R. G., Sapinoso, L. M., Moqrich, A., et al. (2002) Proc. Natl. Acad. Sci. USA 99, 4465-4470. [PMC free article] [PubMed]
38. Parkkila, S., Kivela, A. J., Kaunisto, K., Parkkila, A. K., Hakkola, J., Rajaniemi, H., Waheed, A. & Sly, W. S. (2002) BMC Gastroenterol. 2, 13. [PMC free article] [PubMed]
39. Shaw, P. H., Held, W. A. & Hastie, N. D. (1983) Cell 32, 755-761. [PubMed]
40. Clark, A. J., Ghazal, P., Bingham, R. W., Barrett, D. & Bishop, J. O. (1985) EMBO J. 4, 3159-3165. [PMC free article] [PubMed]
41. Dhanasekaran, S. M., Barrette, T. R., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., Pienta, K. J., Rubin, M. A. & Chinnaiyan, A. M. (2001) Nature 412, 822-826. [PubMed]
42. Su, A. I., Welsh, J. B., Sapinoso, L. M., Kern, S. G., Dimitrov, P., Lapp, H., Schultz, P. G., Powell, S. M., Moskaluk, C. A., Frierson, H. F., Jr., et al. (2001) Cancer Res. 61, 7388-7393. [PubMed]
43. Mootha, V. K., Bunkenborg, J., Olsen, J. V., Hjerrild, M., Wisniewski, J. R., Stahl, E., Bolouri, M. S., Ray, H. N., Sihag, S., Kamal, M., et al. (2003) Cell 115, 629-640. [PubMed]
44. Cohen, B. A., Mitra, R. D., Hughes, J. D. & Church, G. M. (2000) Nat. Genet. 26, 183-186. [PubMed]
45. Roy, P. J., Stuart, J. M., Lund, J. & Kim, S. K. (2002) Nature 418, 975-979. [PubMed]
46. Caron, H., van Schaik, B., van der Mee, M., Baas, F., Riggins, G., van Sluis, P., Hermus, M. C., van Asperen, R., Boon, K., Voute, P. A., et al. (2001) Science 291, 1289-1292. [PubMed]
47. Spellman, P. T. & Rubin, G. M. (2002) J. Biol. 1, 5. [PMC free article] [PubMed]
48. Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004) Proc. Natl. Acad. Sci. USA 101, 6062-6067. [PMC free article] [PubMed]
49. Spitz, F., Gonzalez, F. & Duboule, D. (2003) Cell 113, 405-417. [PubMed]
50. Allan, C. M., Taylor, S. & Taylor, J. M. (1997) J. Biol. Chem. 272, 29113-29119. [PubMed]
51. Engel, J. D. & Tanimoto, K. (2000) Cell 100, 499-502. [PubMed]
52. Li, Q., Harju, S. & Peterson, K. R. (1999) Trends Genet. 15, 403-408. [PubMed]
53. Goodwin, R. L., Barbour, K. W. & Berger, F. G. (1997) Mol. Biol. Evol. 14, 420-427. [PubMed]
54. Marsden, M. D. & Fournier, R. E. (2003) Mol. Cell. Biol. 23, 3516-3526. [PMC free article] [PubMed]
55. Li, Q., Peterson, K. R., Fang, X. & Stamatoyannopoulos, G. (2002) Blood 100, 3077-3086. [PMC free article] [PubMed]
56. Zakany, J., Kmita, M. & Duboule, D. (2004) Science 304, 1669-1672. [PubMed]
57. Cirillo, L. A. & Zaret, K. S. (1999) Mol. Cell 4, 961-969. [PubMed]
58. Costa, R. H., Grayson, D. R. & Darnell, J. E., Jr. (1989) Mol. Cell. Biol. 9, 1415-1425. [PMC free article] [PubMed]
59. Denecke, B., Graber, S., Schafer, C., Heiss, A., Woltje, M. & Jahnen-Dechent, W. (2003) Biochem. J. 376, 135-145. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...