• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Dec 1, 2002; 30(23): 5036–5055.
PMCID: PMC137973

Analysis of histone acetyltransferase and histone deacetylase families of Arabidopsis thaliana suggests functional diversification of chromatin modification among multicellular eukaryotes

Abstract

Sequence similarity and profile searching tools were used to analyze the genome sequences of Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Caenorhabditis elegans and Drosophila melanogaster for genes encoding three families of histone deacetylase (HDAC) proteins and three families of histone acetyltransferase (HAT) proteins. Plants, animals and fungi were found to have a single member of each of three subfamilies of the GNAT family of HATs, suggesting conservation of these functions. However, major differences were found with respect to sizes of gene families and multi-domain protein structures within other families of HATs and HDACs, indicating substantial evolutionary diversification. Phylogenetic analysis identified a new class of HDACs within the RPD3/HDA1 family that is represented only in plants and animals. A similar analysis of the plant-specific HD2 family of HDACs suggests a duplication event early in dicot evolution, followed by further diversification in the lineage leading to Arabidopsis. Of three major classes of SIR2-type HDACs that are found in animals, fungi have representatives only in one class, whereas plants have representatives only in the other two. Plants possess five CREB-binding protein (CBP)-type HATs compared with one to two in animals and none in fungi. Domain and phylogenetic analyses of the CBP family proteins showed that this family has evolved three distinct types of CBPs in plants. The domain architecture of CBP and TAFII250 families of HATs show significant differences between plants and animals, most notably with respect to bromodomain occurrence and their number. Bromodomain-containing proteins in Arabidopsis differ strikingly from animal bromodomain proteins with respect to the numbers of bromodomains and the other types of domains that are present. The substantial diversification of HATs and HDACs that has occurred since the divergence of plants, animals and fungi suggests a surprising degree of evolutionary plasticity and functional diversification in these core chromatin components.

INTRODUCTION

Gene expression in eukaryotes involves a complex interplay among transcription factors and chromatin proteins that pack chromosomal DNA into the confined space of the nucleus while poising genes for activation or repression (1). The basic unit of chromatin is the nucleosome core particle, a structure in which ~146 bp of DNA is wrapped around a protein octamer made up of two subunits each of the core histones H2A, H2B, H3 and H4 (2). Core histones can exist in multiple alternative states of acetylation, methylation, phosphorylation, ubiquitination or ADP-ribosylation (3). The regulatory significance of these modifications for processes including gene repression, gene activation and replication is increasingly clear (46).

Lysines at the N-terminal ends of the core histones are the predominant sites of acetylation and methylation and a regulatory role for these modifications was proposed as early as 1964 (7). However, decades passed before it was demonstrated that active genes are preferentially associated with highly acetylated histones whereas inactive genes are associated with hypoacetylated histones (8). The N-termini of histones H3 and H4 were subsequently shown to be essential for repression of the silent mating type loci in Saccharomyces cerevisiae (9,10). Enhancer-dependent activation of other S.cerevisiae genes also required these N-terminal sequences (1113). Collectively, these studies suggested that histones are integral to both gene activation and gene repression mechanisms. A breakthrough was the finding that a Tetrahymena thermophila protein with histone acetyltransferase (HAT) activity shared substantial similarity with S.cerevisiae Gcn5p (14), the catalytic subunit of several multi-protein complexes required to activate a diverse set of genes. A complementary breakthrough was the finding that a purified mammalian histone deacetylase (HDAC) was similar to Rpd3p (15), a protein which helps repress numerous genes in S.cerevisiae (16), also as part of a larger protein complex (1719). Histone acetylation and deacetylation are thought to exert their regulatory effects on gene expression by altering the accessibility of nucleosomal DNA to DNA-binding transcriptional activators, other chromatin-modifying enzymes or multi-subunit chromatin remodeling complexes capable of displacing nucleosomes (20,21).

Sequence characterization reveals at least four distinct families of HATs and three families of HDACs (3,22,23). HATs include: (i) the GNAT (GCN5-related N-terminal acetyltransferases)-MYST family (24,25) whose members have sequence motifs shared with enzymes that acetylate non-histone proteins and small molecules; (ii) the p300/CREB-binding protein (CBP) co-activator family in animals implicated in regulating genes required for cell cycle control, differentiation and apoptosis (26,27); and (iii) the family related to mammalian TAFII250, the largest of the TATA binding protein-associated factors (TAFs) within the transcription factor complex TFIID (28). These three families are widespread in eukaryotic genomes, and homologous proteins are also involved in non-HAT reactions in prokaryotes and Archaea. Mammals have a fourth HAT family that includes nuclear receptor coactivators such as steroid receptor coactivator (SRC-1) and ACTR, a thyroid hormone and retinoic acid coactivator that is not represented in plants, fungi or lower animals (22,29,30).

Major groups of HDACs include the RPD3/HDA1 superfamily, the Silent Information Regulator 2 (SIR2) family (31) and the HD2 family. RPD3/HDA1-like HDACs are found in all eukaryotic genomes. Interestingly, homologous proteins that have acetate utilization and acetylpolyamine aminohydrolase activities are also present in bacteria and Archaea, organisms that lack histones (17,32). The SIR2 family of HDACs is distinctive in that it has no structural similarity to other HDACs and requires NAD as a cofactor (33). In S.cerevisiae, SIR2 is known to play roles in repression of silent mating type loci (34), repression of rRNA gene recombination (35), and repression of protein-coding genes inserted near telomeres (34) or within rRNA gene arrays (36). Mutations in SIR2 also affect aging and longevity in S.cerevisiae (37,38). SIR2-related proteins form a large family with members present in all kingdoms of life, including bacteria (39). The third family, the HD2-type HDACs, were first identified in maize and appear to be present only in plants (40,41). HD2-type HDACs are homologous to a class of cistrans prolyl isomerases present in other eukaryotes (42).

Limited information is available concerning the roles of most proteins in the four HAT homology groups and the three HDAC homology groups in control of gene expression in multicellular eukaryotes, especially in plants (43,44). Here, we present phylogenetic and domain analyses of HAT- and HDAC-related proteins identified in searches of the essentially complete Arabidopsis thaliana genome sequence. To test and correct open-reading frames (ORFs) predicted by exon-modeling algorithms, cDNA sequences were determined for most of these proteins. Alternative splicing was demonstrated for 3 of 16 genes encoding HDACs. Together, these data provide a foundation for the functional analysis of these important chromatin-modifying activities in Arabidopsis, as well as in other plants and model organisms.

MATERIALS AND METHODS

Database similarity searches of the Arabidopsis genome and other plant sequences

Known HDAC and HAT protein sequences available from a variety of eukaryotic organisms (Table S1) were used as queries to search the complete Arabidopsis genome sequence (45) using the TBLASTN and TFASTX programs (46,47). To assure that all homologous genes in these families had been identified, three additional searches were performed. First, all Arabidopsis protein sequences in GenBank including those predicted by genome annotation were searched with the query sequences using BLASTP, FASTA and SSEARCH. Secondly, these protein sequences were searched for protein family (Pfam) domains known to be present in previously characterized HDAC and HAT proteins using the program HMMER (http://hmmer.wustl.edu/). Thirdly, predicted Arabidopsis HDAC and HAT proteins were used as queries to search for additional paralogous genes in the Arabidopsis genome sequence using TBLASTN and TFASTX. Sequences having an E-value of 0.01 or less were investigated further. However, this third approach did not find any proteins in addition to those that had already been identified by the initial TBLASTN or TFASTX searches.

Gene nomenclature

The genes identified in this study are listed in Table Table2.2. To designate newly identified genes, we used three-letter symbols that specify the homology group to which a gene belongs as follows: HAG for HATs of the GNAT/MYST superfamily, HAC for HATs of the CBP family, HAF for HATs of the TAFII250 family, HDA for HDACs of the RPD3/HDA1 superfamily, SRT for HDACs of the SIR2 family (sirtuins), and HDT for HDACs of the HD2 family (‘HD-tuins’). To designate individual genes within a homology group, the three-letter symbol is followed by a numeral that does not imply orthology because in many cases it was not possible to determine orthology. To ensure that orthology is not inferred from numerals, a different series of numerals was assigned to different species: A.thaliana genes are indicated by the numerals 1–99, Zea mays by 101–199, S.cerevisiae by 201–299, Caenorhabditis elegans by 301–399, Drosophila melanogaster by 401–499 and Schizosaccharomyces pombe by 601–699 (for other organisms, see Table Table2).2). Names of genes previously assigned in the literature or in GenBank were retained, except in Arabidopsis, for which we propose that the designations defined here should be used. To avoid possible confusion with HDA1 of S.cerevisiae, the Arabidopsis HDA series begins with HDA2.

Table 2.
Sequence accession numbers for HAT and HDAC genes analyzed

Gene annotation

High quality plant protein sequences for phylogenetic analysis were obtained in several steps. First, the gene prediction programs GeneMark (48), GenScan (49) and NetPlantGene (50) were used to produce gene models for these sequences. From these separate models, a single consensus model was derived. To verify the gene models, RNA gel blots were used to determine the length of the mRNA from each expressed gene. The positions of exons in the consensus model were then tested by analysis of available Arabidopsis EST sequences using the gene prediction tool, GeneSeqer (51). For genes that were not completely represented by EST sequences, EST clones were obtained from the ABRC and Kazusa stock centers and sequenced. Remaining gaps between known cDNA sequences were filled by sequencing RT–PCR amplification products obtained using total RNA as the template and using primers that annealed to predicted coding sequences. Although the actual start codon for each protein has not been identified with certainty, none of the predicted proteins lack known conserved N-terminal or C-terminal domains, suggesting that the modified gene models are reasonably accurate.

cDNA sequence was not determined for HDA10 and HDA17 because these genes are truncated in their HDAC domains, HAC2 because it could not be amplified by RT–PCR, and HAC12 and HAF2 because they are highly similar to HAC1 and HAF1, respectively. HAC12 and HAF2 were annotated according to the splicing models of HAC1 and HAF1. In the case of HAC4, the only transcript we detected carries a premature nonsense codon that would eliminate conserved regions of the protein, although the transcript extends beyond this and contains these conserved regions. Thus, for purposes of the phylogenetic and domain analyses presented here, we have used an algorithm-derived splicing model that predicts the conserved CBP-type HAT domain and cDNA sequence-derived splice junctions in the remainder of HAC4. Alternative splicing products were observed for three genes: HDA2, HDA15 and SRT2. For purposes of the phylogenetic analyses presented here, we used the predicted protein sequence that possessed intact conserved domains (HDA2alt1, HDA15alt1 and SRT2alt1).

The cDNA sequence data for the HAT and HDAC genes have been submitted to the GenBank data library under the following accession numbers: HAC4 (AF512559, AF512560, AH011643), HAC5 (AF512557, AF512558, AH011642), HAG2 (AF512724), HAF1 (AF510669), HDA2alt1 (AF510671), HDA2alt2 (AF510165), HDA7 (AF510166), HDA9 (AF512725), HDA15alt2 (AF510169), HDA15alt3 (AF510170), HDA18 (AF510670), SRT2alt2 (AF510171), SRT2alt3 (AF510172), SRT2alt4 (AF510173), SRT2alt5 (AF510174), SRT2alt6 (AF510175). For rest of the genes, cDNA sequences submitted by other groups were found in GenBank and were identical to the sequence data generated by Plant Chromatin Consortium. Their accession numbers are as follows: HAC1 (AF323954), HAG1 (AF338768), HAG3 (AY056323), HAG4 (AY099684), HAG5 (NM_121011), HDA5 (AY090936), HDA6 (AF195548), HDA8 (AY097371), HDA14 (AY052234), HDA15alt1 (NM_112737), HDA19 (AY093153), HDT1 (AF195545), HDT2 (AF044914), HDT3 (AF372889), HDT4 (AF255713), SRT1 (AF283757), SRT2alt1 (AY045873).

Similarity searches of non-plant genomes for HDAC and HAT genes

The genomes of a diverse group of organisms were searched with the query sequences (Table S1), as well as with any Arabidopsis HDAC and HAT sequences showing similarity to these query sequences. First, BLASTP searches of the individual proteomes of baker’s yeast (S.cerevisiae), nematodes (C.elegans), fruit flies (D.melanogaster), and several species of bacteria and Archaea were conducted. Secondly, genomic sequences of humans (Homo sapiens), fission yeast (S.pombe; http://www.sanger.ac.uk/Projects/S_pombe/), and leishmania (Leishmania major; http://www.sanger.ac.uk/Projects/L_major/) were individually searched for homologous sequences using TBLASTN. Thirdly, the public GenBank nr (non-repeating) databases were searched to identify homologs in additional species using BLAST and PSI-BLAST. Fourthly, a large number of plant EST collections (including Z.mays, Oryza sativa, Lycopersicon esculentum, Medicago truncatula, Glycine max, Triticum aestivum, Sorghum bicolor, Gossypium arboreum, Solanum tuberosum, Hordeum vulgare, Lotus japonicus and Mesembryanthemum crystallinum) were searched using TBLASTN. Plant ESTs were assembled into contigs using the FAKtory DNA sequence assembly system (http://bcf.arl.arizona.edu/faktory/) and the contigs were translated into amino acid sequences for further analysis.

Analysis of protein families

Phylogenetic analysis. Protein sequences and domains were aligned using Clustal W (52), edited with Genedoc (http://www.psc.edu/biomed/genedoc/), and an unrooted phylogenetic tree was constructed by the distance method using the neighbor-joining algorithm implemented in the program Neighbor in the PHYLIP (3.5) package (53). The Dayhoff PAM model of protein evolution was used to compute the distances between the sequences (54) using the PROTDIST program. This analysis allowed the identification of the most similar protein sequences in the same or different organisms based upon protein sequence similarity in the multiple sequence alignment. These alignments are available in Figures S1–S3. Identification of a paralogous family of sequences was revealed by the presence of a cluster of similar sequences from one organism or group of organisms that appeared to have arisen by gene duplication. Assignments of likely orthology were based upon the observation of a high level of sequence similarity among unique sets of sequences present in diverse organisms. In order to assess how well the multiple sequence alignment supported the branch patterns in the predicted phylogenetic tree of the sequences, a bootstrap analysis was performed using PHYLIP. This method resampled columns in the multiple sequence alignment to generate 500–1000 new alignments, each of which was used to produce a new tree. The number of alignments that support each branch pattern in the tree was then assessed and is reported in the appropriate figure. When a clear majority of bootstrap trees (>70%) were in agreement, support was considered to be good. In many cases, bootstrap support was excellent, in the 95–100% range.

For phylogenetic analysis of the HD2 family (Fig. (Fig.5),5), mRNA and EST sequences encoding the HD2-type HDAC domains were aligned by CLUSTALW. These alignments are available in Figure S4. Following some minor editing to match the codons to the protein sequence alignment, an unrooted tree was produced using the maximum likelihood method as implemented in the DNAML program using the default transition/transversion ratio of 2:1 in the PHYLIP suite.

Figure 5
Maximum likelihood analysis of the plant HD2 family nucleic acid sequences. This analysis is based upon a codon-by-codon alignment of the first 273 positions of the maize HD2 cDNA sequence, corresponding to the HDAC domain, with other plant cDNA and ...

Motif analysis of the RPD3/HDA1-related HDACs. To identify common motifs in the HDAC domain, a multiple sequence alignment of representative proteins in each of the three HDAC classes was generated by CLUSTALW. Each multiple sequence alignment was then searched for regions with strongly conserved patterns having high information content (55). Information content was determined by producing a sequence logo using WebLogo (http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi). A logo is a graph that displays the amount of information at each column in the alignment and is measured in bits (reduction in uncertainty above background amino acid frequencies). The logo also shows the contribution of each amino acid to this information.

Domain analysis of HAT proteins. HAT protein sequences identified in Arabidopsis and in the proteomes of other organisms were analyzed for the presence of any domain present in the Pfam (Protein Family) database. The collection of Pfam hidden Markov model (HMM) profiles for domain families (version 6.5) was downloaded from the Pfam web site. Sequence profile searches were performed using the software HMMER (http://hmmer.wustl.edu/). For certain domains such as the CBP-type HAT domain, for which a Pfam model is not available, a multiple sequence alignment generated using CLUSTALW was examined for the presence of the biochemically-defined HAT domain in human CBP protein (26) and a profile HMM was constructed using programs in the HMMER package. The Predict Protein resource based on neural networks (PHD at http://maple.bioc.columbia.edu/predictprotein/) and the Discrimination of protein secondary structure server (DSC at http://bioweb.pasteur.fr/seqanal/interfaces/dsc-simple.html) was used for predicting the secondary structure for proteins. The KIX domains in CBP-type HAT proteins were further searched against a database of position-specific-scoring-matrices representing conserved structural domains (3D-pssm at http://www.sbg.bio.ic.ac.uk/~3dpssm/) to find similarity with the known KIX domain structure.

RESULTS

Identification of HDAC and HAT proteins and alternative transcripts encoded by the Arabidopsis genome

The Arabidopsis genome sequence was searched for homologs of known HDAC and HAT proteins as described in the Materials and Methods. A total of 16 Arabidopsis HDAC genes and 12 HAT genes were identified (Table (Table1).1). Of the 16 HDACs, 10 belong to the RPD3/HDA1 superfamily and were named with the symbol HDA, four belong to the HD2 family and were given the name HDT (‘HD-tuins’), and two belong to the SIR2 family and were named with the symbol SRT. Two additional members of the HDA family were found that have partial HDAC domains. Of the 12 HATs, five belong to the GNAT/MYST superfamily and were named with the symbol HAG, five belong to the CBP family and were named with the symbol HAC, and two belong to the TAFII250 family and were named with the symbol HAF.

Table 1.
Genes encoding HAT and HDAC homologs in Arabidopsis

Consensus gene splicing models were first developed by comparison of several computationally determined models. Because computational methods do not predict all splice sites correctly, cDNA sequences were generated from EST clones and RT–PCR products (see Materials and Methods). In addition to revising splicing models, the cDNA sequence analysis detected multiple splicing products for three HDAC genes (HDA2, HDA15 and SRT2) (Fig. (Fig.1).1). Revised coding sequences, predicted proteins and alternative splicing products are available at the Plant Chromatin Database, ChromDB (http://www.chromdb.org).

Figure 1
Alternative splicing of HDA2, HDA15 and SRT2. Sequence coordinates indicate the position of exons within the unspliced transcripts relative to the start of the ‘alt1’ RT–PCR product sequences. The approximate location of predicted ...

The phylogenetic and domain analyses presented here are based on alternative products designated ‘alt1’ (Fig. (Fig.1),1), each of which is predicted to encode intact, conserved HDAC domains. The HDAC domain is disrupted in alternative transcripts produced by HDA2 and HDA15. SRT2 produced six alternative transcripts via different combinations of the same splice sites, affecting a putative nuclear localization signal and the SIR2 domain. The alternative splice site in the SIR2 domain appears to be evolutionarily conserved because it also occurs in a putative ortholog in tomato. Alternative splicing in the 5′-untranslated region (5′-UTR) of SRT2alt2 and alt5 could affect translation efficiency or mRNA stability. Details are presented in Figure Figure11.

The RPD3/HDA1 superfamily of HDACs

A total of 10 representatives possessing the complete HDAC domain (Pfam designation PF00850) that defines the RPD3/HDA1 superfamily were identified in Arabidopsis (Table (Table1).1). Two additional predicted proteins, HDA10 and HDA17, were found that possess only the 30 and 40 C-terminal amino acids, respectively, of the HDAC domain.

Sequence similarity searches of a variety of eukaryotic and prokaryotic genomes, as well as other sequences available in public databases (including ESTs), led to the identification of a total of 72 RPD3/HDA1 superfamily protein sequences (including 10 in Arabidopsis) that possess an intact HDAC domain. For 80% of these sequences, the 300 amino acid HDAC domain constitutes more than half of the protein. For the remaining 20% of the sequences, additional sequences were present. Searching these larger proteins using Pfam (version 6.5) did not reveal any additional domains, although there is a possibility of the presence of additional domains that have not yet been identified.

Figure Figure22 shows an unrooted phylogenetic tree illustrating the relationships among the 72 RPD3/HDA1 superfamily proteins (listed in Table Table2),2), produced by aligning their HDAC domains (for double-domain proteins each domain was analyzed separately). The analysis in Figure Figure22 is based on a mixture of both predicted and experimentally determined protein sequences. In order to confirm these results, the analysis was also performed using only experimentally derived sequences (i.e. those confirmed by cDNA sequences). The clustering patterns and the bootstrap support for these patterns were similar to those shown in Figure Figure22 (data not shown). The RPD3/HDA1 superfamily, represented by these 76 domain sequences, is divided into two major clades based on a strongly supportive bootstrap value (85%). These clades, shown in Figure Figure22 as two lightly shaded ovals, include three classes of eukaryotic proteins: Classes I and II, both of which have been reported previously based on a smaller number of sequences (56,57), and a new class of proteins, Class III. These Class III proteins include a recently cloned and characterized human HDAC11 (58).

Figure 2
Phylogenetic analysis of the RPD3/HDA1 HDAC superfamily. Unrooted neighbor-joining tree of 76 RPD3/HDA1 superfamily sequences includes four double-domain sequences with each domain being analyzed separately. Confidence levels of the branching ...

The two major clades include proteins from both prokaryotes and eukaryotes. The rightmost clade includes acetylpolyamine aminohydrolase proteins from multiple species of Archaea and bacteria, suggesting that HDAC proteins in this clade could be derived from these prokaryotic proteins. The leftmost clade includes acetoin-utilizing proteins from bacteria (but no Archaea sequences), suggesting that the HDAC proteins in this clade could have originated from these bacterial proteins. Proteins from other lower eukaryotic organisms, including Plasmodium falciparum, T.thermophila and L.major, were present in only the leftmost clade of Figure Figure2.2. This evolutionary link between the prokaryotic proteins and the HDACs is also evident at the level of enzymatic activity. HDACs and acetylpolyamine aminohydrolases catalyze the removal of an acetyl group from acetylated aminoalkyls by cleaving an amide bond and reconstituting the positive charge on the substrate; acetoin utilization proteins catalyze deacetylation of acetoin (32).

Class I proteins. The total number of Class I proteins found in the Arabidopsis genome is similar to the numbers found in other sequenced genomes (Table (Table4).4). The four Arabidopsis proteins lie within a cluster comprised of S.cerevisiae RPD3p and several animal Class I proteins with good bootstrap support (70%) (Fig. (Fig.2).2). Three of the Arabidopsis proteins, along with other plant proteins, group into two branches forming clusters A and B, each with excellent bootstrap support (100%). The proteins in cluster A (including Arabidopsis HDA19) are 73–80% identical at the amino acid level and may comprise an orthologous group. The proteins in cluster B (which includes Arabidopsis HDA6 and HDA7) are somewhat more divergent than the proteins in A (58–74% identical at the amino acid level) cluster. The strongly supported separation of clusters A and B suggests the possibility of functional diversification. Because both clusters contain dicot and monocot proteins, they would seem to have originated by gene duplication predating divergence of the monocot and dicot lineages. Immunological data indicates zmRPD3 (cluster A) and zmHD1b-II (cluster B) to be associated with human Rbap46/48 like proteins (59) found in the NuRD and SIN3 HDAC complex (60).

Table 4.
Summary of HDAC and HAT homologs found in plants, fungi and animals

One of the Arabidopsis class I proteins, HDA9, is highly similar at the nucleotide level to HDA10 and HDA17, both of which possess an incomplete HDAC domain. HDA10 lies ~11 kb from HDA9, and the interval between these two genes contains an ORF annotated in GenBank as encoding a ‘disease-resistance-like’ gene. HDA17 lies on a neighboring BAC clone, adjacent to a second copy of this ‘disease-resistance-like’ gene, suggesting that HDA10 and HDA17 were derived from HDA9 by sequence rearrangements that duplicated part of HDA9 and its flanking sequences. These events appear to be relatively recent in evolution, considering that the homologous regions of these three genes are 97% identical at the nucleotide level. Genetic and biochemical analyses will be required to determine whether HDA10 and HDA17 possess some function, perhaps related to that of HDA9, or are non-functional pseudogenes.

Class II proteins. The Arabidopsis genome possesses three Class II proteins (designated HDA5, HDA15 and HDA18), a total similar to that found in other sequenced genomes (Table (Table4).4). A subset of Class II proteins found in humans, mice, C.elegans and D.melanogaster are ‘double-domain’ proteins, i.e. they possess two tandem HDAC domains separated by a small, but variable, spacer region. In human and mouse proteins, each domain has been found to be an independently functional catalytic domain (57). Double-domain proteins have not been found in either S.cerevisiae or S.pombe, each of which has a single Class II protein with a single domain. Likewise, the Arabidopsis genome does not contain any double-domain Class II proteins. Recently, HDAC6, a human double-domain protein, has been shown to be a cytoplasmic tubulin deacetylase, not an HDAC (61).

Class II proteins are more divergent in sequence than are Class I proteins, resulting in longer, more poorly supported branches (Fig. (Fig.2),2), and making it impossible to definitively classify orthologous and paralogous groups. Two clusters of plant Class II proteins (indicated by brackets in Fig. Fig.2)2) can be identified by phylogenetic analysis. HDA5 and HDA18 appear to be more closely related to the double-domain proteins from animals than to HDA15, and so may act on proteins other than histones. Sequence analysis revealed the presence of putative nuclear export signals in HDA5 and HDA18. Similar nuclear export signals in human and mouse class II proteins are known to be involved in shuttling these proteins between an active state in the nucleus and an inactive, phosphorylated state in the cytoplasm (62,63). Interestingly, HDA15 contains a RanBP zinc-finger domain. Such domains have been implicated in nucleocytoplasmic transport and nuclear envelope localization (64).

HDA5 and HDA18 occur immediately adjacent to each other on chromosome V, consistent with a gene duplication event. Their encoded proteins share 84% identity, mostly in the HDAC domain. The coding sequences of these genes share the same splice site positions throughout the HDAC domain which lies toward the 5′ end of the transcript, whereas their C-terminal regions are unrelated to each other. The C-terminal region of HDA5 does not possess any known protein domains, whereas that of HDA18 is predicted to encode a predominantly α-helical domain. This putative domain carries a leucine zipper motif and is similar to structural domains found in filamentous proteins, including coiled-coil dimers and two S.pombe proteins (cut3 and cut14) that are required for chromosome condensation and segregation (65).

A third gene (At5g61050), with partial homology to HDA5 and HDA18 outside the HDAC domain, was also found immediately downstream of HDA5 (Fig. (Fig.3).3). HDA5, HDA18 and At5g61050 are located within a 10 kb segment on chromosome V. The five exons of At5g61050 share similarity with some exons of HDA5 and HDA18, however, the region encoding the HDAC domain is missing in At5g61050, so it is not classified as an HDAC protein. The high degree of sequence identity in homologous regions of the three genes suggests two recent duplications of HDA5 to produce the progenitors of HDA18 and At5g61050. The duplication was apparently followed (or accompanied by) an internal deletion in one gene copy to form At5g61050 and acquisition of repeated sequences elements encoding an α-helical domain in the other gene copy to form HDA18. This gene duplication event is not shared by all the angiosperms, and appears to be unique to a lineage within the dicots including Arabidopsis. Whether this event resulted in diversification of function of Class II proteins remains to be determined.

Figure 3
Schematic representation of the exon–intron and domain organization of the HDA18-HDA5-At5g61050 gene cluster on chromosome V. Coordinates indicate the position of the start and stop codons of the three genes in the P1 clone MAF19 (accession no. ...

Class III: a new class of proteins in the RPD3/HDA1 superfamily. A major finding of our analysis is a new class of HDAC proteins, which we designate Class III, represented in Arabidopsis by HDA2. Class III includes predicted proteins HDA403 from D.melanogaster, HDA308 from C.elegans and HDAC11, an EST contig from humans (Fig. (Fig.2)2) that has been recently identified (58). These proteins are conserved at the amino acid level, being 45% or more identical in pairwise sequence alignments. Additional members of this class were found in the EST database, but were not included in our analysis because their HDAC domains were incomplete. Class III proteins are a part of a cluster that includes three bacterial sequences encoding acetoin utilization proteins (vcAcu and drAcu) and a cyanobacteria glutamine synthetase protein (synGln) (Fig. (Fig.2),2), with good bootstrap support (99%). The presence of a well supported cluster of diverse proteins is consistent with a novel function for class III HDAC proteins in higher eukaryotes, possibly of bacterial origin. No class III proteins were detected in fungal genomes.

Multiple sequence alignments of Classes I, II and III proteins identified conserved motifs within the HDAC domain, with some amino acids common to all HDAC classes and others unique to a particular HDAC class (Fig. (Fig.4).4). A conserved but distinct pattern of amino acids for Class III proteins is evident, providing additional support for a novel biological function for these proteins.

Figure 4
Class III proteins in the RPD3/HDA1 protein superfamily have distinct motifs in the HDAC domain. Alignment of the HDAC domain of Arabidopsis HDA2 protein with human HDAC11, D.melanogaster HDA403 and C.elegans HDA308. These proteins and their ...

Unclassified proteins. The Arabidopsis genome encodes two additional HDAC proteins in the RPD3/HDA1 superfamily, HDA8 and HDA14. Although these proteins fall into the same major clade as Class II proteins, they do not cluster with them (Fig. (Fig.2).2). Instead, they are present in a poorly supported group of highly diverse proteins that includes acetylpolyamine aminohydrolases from the Archaea, as well as S.cerevisiae HDAC protein Hos3p. The low sequence similarity between S.cerevisiae Hos3p and Arabidopsis HDA8 and HDA14 and the poor bootstrap support for this grouping indicates that these proteins are not closely related. Searches of existing genome and EST databases, including plant sequences, using Hos3p, HDA8 and HDA14 as query sequences did not identify any additional proteins in this group.

To determine whether the sequences from Archaea and bacteria influence the classification of these eukaryotic proteins, the tree was regenerated without these sequences. In the resulting tree, S.cerevisiae Hos3p and Arabidopsis HDA14 protein moved into the class II cluster, but HDA8 did not. This test revealed that S.cerevisiae Hos3p and Arabidopsis HDA14 can not be assigned to any definitive cluster, but appear to be relatives of Class II proteins. Arabidopsis HDA8 seems to be more closely related to prokaryotic acetylpolyamine aminohydrolase proteins than to Class II; it is possible that this protein might have acetylpolyamine deacetylating activity or other deacetylating activity rather than histone deacetylation activity. In the motif analysis of all three HDAC classes shown in Figure Figure4,4, Hos3p, HDA8 and HDA14 share the conserved amino positions of Class II proteins, corresponding with their location in the same major clade as Class II proteins.

The HD2 family: unique to plants

Plants possess a family of HDAC proteins, the HD2 family, which is not found in animals or fungi (40) and is distantly related to cistrans isomerases found in insects, S.cerevisiae and parasitic apicomplexans (42). Using maize HD2 as a query, four candidate proteins, HDT1, HDT2, HDT3 and HDT4, were identified in the Arabidopsis proteome (Table (Table1).1). The conserved N-terminus of these proteins contains the HD2-type HDAC domain of approximately 100 amino acids. The proteins are comprised of a conserved N-terminal domain, a central acidic domain and variant C-terminal domain. Two of these proteins, HDT1 and HDT2, have been analyzed in a recent paper showing that antisense silencing of HDT1 results in aborted seed development (41). A sequence comparison of Arabidopsis and maize HD2-type proteins has been made by Dangl et al. (66).

Plant EST sequence databases were searched to find HD2-type HDAC proteins in other plant species (listed in Table Table22 and Fig. Fig.5).5). Comparison of the HDAC domains of these proteins revealed a series of highly conserved motifs within the HDAC domain. A phylogenetic analysis of the nucleotide sequences encoding these conserved motifs in the HDAC domains was performed, producing the tree shown in Figure Figure5.5. A similar analysis using protein sequences produced a tree with similar topology and the same major features although with varying but somewhat lower bootstrap support than the DNA tree. This analysis permits two general observations to be made concerning the evolution of the HD2 gene family in plants. First, dicot and monocot sequences are separated into two distinct clades strongly supported by bootstrap analysis (98%), indicating that a single HD2 gene in the ancestor of monocots and dicots gave rise to all HD2 proteins in these groups. Secondly, the clustering pattern in dicots is consistent with a gene duplication event occurring before the diversification in dicot evolution that produced the families Solanaceae (tomato and potato), Malvaceae (cotton) and Aizoaceae (ice plant), although this conclusion is only weakly supported by bootstrap analysis (<50%). More recent duplications that are strongly supported by bootstrap analysis are also evident in several species [e.g. Arabidopsis HDT1 and HDT2 (100%), barrel medic HDT1301 and HDT1302 (90%), and maize HD2a, HD2b and HD2c (100%)]. It will be interesting to determine whether the considerable amount of genetic diversification of the HD2 family has been accompanied by functional diversification.

The SIR2 family of HDACs

Plants possess representatives of the SIR2 family of NAD-dependent HDAC proteins, known as sirtuins. Sirtuins occur across a wide range of organisms, including prokaryotes, fungi, plants and animals and are defined by a 175 amino acid domain (Pfam designation PF02146) comprised of a series of conserved motifs. Based on variation in this domain, the eukaryotic proteins fall into four main classes (31). A fifth class is present in some prokaryotes, but most prokaryotic sirtuins fall into Classes II and III (31). A search of the Arabidopsis genome identified two SIR2 family proteins, SRT1 and SRT2, fewer than are found in fungi and animals (Table (Table44).

In order to identify additional plant sequences for use in a phylogenetic analysis, Arabidopsis SRT1 and SRT2 proteins were used as queries of plant EST collections, revealing six related proteins (Table (Table22 and Fig. Fig.6).6). Phylogenetic analysis of all plant SIR2 homologs and homologs from representative species in the Frye (31) classification of SIR2-like proteins is shown in Figure Figure6.6. Of the four classes of SIR2 proteins, plant proteins are only found within divergent plant lineages in Classes II or IV. Both classes contain plant and animal proteins but no fungal proteins (Table (Table4).4). Class IV includes two divergent animal lineages represented in flies and humans. All plant Class IV proteins cluster in a single, less divergent lineage associated with one of these animal lineages. Both plants and animals have a single lineage of Class II proteins. No plant proteins cluster with proteins of Class I, which includes all five S.cerevisiae sirtuins, as well as homologs in animals and S.pombe.

Figure 6
Phylogenetic analysis of plant SIR2 proteins. Unrooted neighbor-joining tree of 31 SIR2-related proteins shows the four previously identified classes of SIR2 proteins. The two plant protein clusters are highlighted in bold. Confidence levels of the branching ...

Representation of HATs in the GNAT/MYST superfamily in the Arabidopsis genome

In the GNAT/MYST superfamily of HAT proteins, GNAT proteins are defined by the presence of a HAT domain (Pfam designation PF00583) which is comprised of four motifs, A–D, whereas MYST proteins possess only the A motif of the HAT domain (22).

The GNAT family is generally considered to be comprised of four subfamilies designated GCN5, ELP3, HAT1 and HPA2. The HPA2 subfamily has in vitro histone acetylation activity (67), but it is not yet known whether these proteins play any role in the control of gene expression. In the Arabidopsis genome, we identified a single homolog of each of the GCN5, ELP3 and HAT1 subfamilies (HAG1, HAG3 and HAG2, respectively) and no homolog of the HPA2 subfamily. HAG1 (atGCN5) and its associated adaptor proteins [similar to yeast SAGA complex (22)] in Arabidopsis have been known for their involvement in cold regulated gene expression (68). Searches of the S.cerevisiae, S.pombe, D.melanogaster and C.elegans genomes, as well as the nearly complete human genome, also identified a single representative of the GCN5, ELP3 and HAT1 subfamilies in each; only fungi were found to possess the HPA2 subfamily (Table (Table4).4). Thus, Arabidopsis appears to have the same representation of GNAT family HATs as do animals, suggesting that the plant proteins may form complexes similar to those formed in yeast and animals (69).

The Arabidopsis genome was found to encode two MYST family proteins, HAG4 and HAG5. Fungal genomes were found to have two to three, and animal genomes four to six, MYST family proteins. Thus, the number of plant MYST family representatives is within the range found in other eukaryotic organisms, though at the lower end of this range, and below the numbers found in animals (Table (Table44).

The CREB-binding protein (CBP) family of HATs

The CBP family of HAT proteins is comprised of large, multi-domain proteins (Fig. (Fig.7A)7A) which, until recently, had been reported only in animals. The histone acetylation domain of the CBP family is unrelated to that of the GNAT/MYST superfamily; we refer to this as the CBP-type HAT domain. The Arabidopsis genome encodes five CBP-type HAT domain proteins (HAC1, HAC2, HAC4, HAC5 and HAC12), whereas the number of CBP proteins predicted in animals is only one to two (Table (Table4).4). The absence of the CBP family in fungi suggests that this type of protein was lost during the evolution of fungi.

Figure 7
Domain architecture of the CBP-type HAT family and phylogenetic analysis of their HAT domains. (A) Schematic representation of the domain organization of Arabidopsis and animal CBP proteins. Different domains are identified by different symbols and colors, ...

Phylogenetic analysis of the plant and animal CBP-type HAT domains indicates an early divergence of HAC2 from the lineage leading to the other four Arabidopsis HAC proteins (Fig. (Fig.7B).7B). Consistent with this divergence, in vitro assays of HAC2 did not detect any HAT activity, whereas it was readily detected for HAC1 (70). Similarly, HAC4 has diverged significantly from HAC1, HAC12 and HAC5. Interestingly, the HAT domains of human and mouse CBP proteins are 96% identical, whereas the two closest Arabidopsis CBP paralogs (HAC1 and HAC12) are only 90% identical in the HAT domain.

The domain architecture of CBP-type HAT proteins differs between plants and animals (Fig. (Fig.7A)7A) in four major respects. (i) Bromodomains. As was noted also by Bordoli et al. (70), plant CBP-type HATs lack a bromodomain. The role of the bromodomain in the animal proteins is to bind acetylated histones (71). The lack of a bromodomain in the plant proteins suggests that these proteins utilize a different domain to perform this function or that another bromodomain protein acts as a bridge between acetylated histones and CBP-type HATs. (ii) KIX domains. All animal CBP-type HAT proteins possess a KIX domain by which they bind the nuclear factor CREB (72). Bordoli et al. (70) reported that the Arabidopsis proteins lack KIX domains. However, we found a weakly defined KIX-like domain in four of the five Arabidopsis proteins (Fig. (Fig.7A).7A). The KIX domain is known to be comprised of three α-helices joined by connecting loops (73). The plant KIX-like domains from HAC1, HAC5 and HAC12 have three α-helices with about the same spacing as in the animal KIX domain, whereas HAC4 has two α-helices. A search of all four plant KIX-like sequences against a database of position-specific-scoring-matrices representing conserved structural domains (3D-pssm) produced a match with the matrix representing the KIX domain. Interestingly, the location of the KIX domain relative to the TAZ-type zinc finger domain in the animal proteins differs from the location of the KIX-like domain relative to this domain in the plant proteins (Fig. (Fig.7A).7A). (iii) Zinc finger domains. ZZ and TAZ types of zinc finger domains are found only in CBP-type proteins and are known to mediate protein–protein interactions with transcription factors (74). Animal CBP-type proteins have one ZZ-type zinc-finger domain located near the C-terminal end of the CBP-type HAT domain, whereas all the plant proteins have two such domains, one of which lies within the HAT domain. Both plant and animal proteins possess two TAZ-type zinc fingers, one on each side of the HAT domain. The N-terminal TAZ-type domain is located at a greater distance from the HAT domain in the animal proteins than in the plant proteins. (iv) Glutamine-rich regions. Animal CBP-type HATs possess an extensive glutamine-rich region near the C-terminus, which harbors the binding site for the unrelated mammal-specific HATs, SRC-1 and ACTR (75,76). Plant proteins lack such a C-terminus (70) (Fig. (Fig.7A),7A), which is not particularly surprising given that plants lack this family of HATs (22), which we have confirmed by searching the Arabidopsis genome.

The TAFII250 family of HAT proteins

The human TAFII250 protein is a subunit of transcription factor IID (TFIID) (77) and has a HAT domain unrelated to the GNAT/MYST and CBP-type HAT domains. Using animal protein sequences as queries, two Arabidopsis TAFII250 homologs were identified and designated HAF1 and HAF2 (Table (Table1).1). These long predicted proteins are 72% identical to each other at the amino acid level. A similar search against the complete C.elegans, D.melanogaster, S.pombe and S.cerevisiae genomes, and the nearly complete human genome, identified only one homolog in each organism. Hence, Arabidopsis is unusual in encoding two predicted TAFII250 HAT proteins.

The human and D.melanogaster proteins have a 260 amino acid long TAFII250-type HAT domain (28). A multiple sequence alignment revealed the presence of a domain in the Arabidopsis and C.elegans proteins that is similar in length to the human and D.melanogaster TAFII250 HAT domains. This domain is 45–75% identical among this group of organisms. A similar type of HAT domain in S.cerevisiae is shorter in length, lacking amino acids at the C-terminus of the domain, but still has HAT activity (28). Thus, the plant proteins are more similar to the animal proteins in this respect than to the fungal proteins.

The overall domain architecture of TAFII250-type proteins in plants, animals and fungi is presented in Figure Figure88 and shows three interesting features. (i) In addition to the TAFII250-type HAT domain, the human and D.melanogaster proteins have two bromodomain copies on the C-terminal side of the HAT domain, whereas the Arabidopsis proteins possess only a single bromodomain in this region. (ii) A zinc-finger-type C2HC domain is located at an approximately equal distance downstream of the HAT domain in each of the seven sequences, presumably with a role in DNA binding or protein–protein interactions. (iii) A conserved ubiquitin signature at the N-terminal side of the HAT domain was found in each Arabidopsis protein, but not in the animal or fungal proteins. No other Pfam ubiquitin-associated domain was found in the animal or fungal proteins. In D.melanogaster, the region of TAFII230 responsible for ubiquitin-conjugating activity for histone H1 overlaps the TAFII250-type HAT domain (78), and these regions are presumably present in the highly conserved TAFII250-type HAT domains in the Arabidopsis proteins.

Figure 8
Domain architecture of the TAFII250 proteins. A schematic representation is shown of the domain organization of Arabidopsis and animal TAFII250 proteins aligned by the N-terminus of the HAT domain. Different domains are identified by different symbols ...

Arabidopsis bromodomain proteins

Because of the disparity in number and occurrence of bromodomain between plant and animal HAT proteins, we performed a preliminary search for all bromodomain- containing proteins in Arabidopsis using the bromodomain HMM profile from Pfam. Twenty-nine Arabidopsis bromodomain proteins were found (Table (Table3),3), all of which had only a single bromodomain. Although the majority of bromodomain proteins in fungi and animals also possess a single bromodomain, many have from two to five bromodomains (79). Thus, plants lack multi-bromodomain proteins.

Table 3.
Arabidopsis bromodomain proteins and associated domains within these proteins

Bromodomain proteins exist in diverse classes defined according to the presence of other domains in those proteins (80). We performed a domain analysis of the 29 Arabidopsis proteins for other Pfam domains. Unlike fungi or animal bromodomain proteins that commonly possess zinc fingers (81), none of the Arabidopsis bromodomain proteins possess any type of zinc finger, such as a PHD domain, with the exception of the C2HC zinc knuckle observed in HAF1 and HAF2. As noted previously, bromodomains are often associated with certain other domain classes in other organisms, whereas the same associations are not observed in Arabidopsis proteins. In the case of CBP-type HATs, animal proteins contain both a bromodomain and multiple zinc fingers, whereas Arabidopsis CBP-type HATs contain only zinc fingers (Fig. (Fig.7A).7A). Another interesting difference is that an animal homolog of fly Trithorax-related proteins (mouse protein AAK26242) has a bromodomain associated with a SET domain, whereas no bromodomain protein in Arabidopsis contains a SET domain. Thus, the utilization of bromodomains differs not only in HATs, but also in other types of chromatin protein, in plants as compared to animals and fungi.

DISCUSSION

The Arabidopsis genome is predicted to encode 16 HDAC and 12 HAT proteins, which is somewhat more than the number of such genes found in other sequenced eukaryotic genomes (Table (Table4).4). The distribution among different homology groups of HDACs and HATs in Arabidopsis differs from that in fungi and animals in several respects, as summarized in Table Table4.4. Phylogenetic and domain analyses of these proteins predict that some have functionally diversified during plant evolution, whereas others appear to have conserved the functions of their ancestral homologs. In addition, the observed alternative mRNA splicing of three HDAC genes suggests the possibility of further functional diversification of these protein families and a complex relationship between gene number and the actual number of gene products encoded within plant genomes, as also appears to be the case for the human genome (82).

The most obvious indication of diversification of histone acetylation/deacetylation functions in plants as compared to animals and fungi is that plants possess a unique family of HDACs, the HD2 gene family (66). Because no homologs of HD2 are found in any animal or fungal genome, these proteins could serve a novel plant function or could provide a function similar to one carried out by a different type of HDAC in animals and fungi. Our phylogenetic analysis is consistent with a greater degree of functional diversification in the HD2 family in dicots than monocots. This analysis suggests that a gene duplication event may have occurred early in dicot evolution and that further diversification has occurred in the lineage leading to Arabidopsis, suggesting functional diversification of the HD2 subfamily.

We found that the SIR2 family is under-represented in plants as compared to fungi and animals. It is possible that the HD2 family has taken over some of the function(s) of sirtuins. Another possibility is that alternative splicing has provided added diversity of sirtuin functions. Plants possess two classes of sirtuins that are also represented in animals, but not in fungi. The SIR2 family has major biological significance including determining the life span of S.cerevisiae cells and aging in animals (37,38), but its function in plants remains unknown.

Phylogenetic analysis of the RPD3/HDA1 superfamily revealed another similarity between plants and animals, but not fungi, in that both possess representatives of Class III proteins, whereas fungi have none. It is possible that these unclassified proteins have an activity other than histone deacetylation.

The degree of evolutionary change differs significantly among HAT gene families (Table (Table4).4). At one extreme, gene number in three subfamilies of the GNAT family is completely conserved. The fourth GNAT subfamily (HPA2) is specific to fungi. At the other extreme, the CBP family has been amplified in plants to five genes as compared to a single representative in most animals, and none in fungi. There are two TAFII250-type proteins in plants as compared to one in fungi and animals. The size of the MYST family ranges from two in Arabidopsis and S.pombe to five in D.melanogaster. Domain and phylogenetic analyses of the CBP-type proteins revealed three classes of these proteins in plants, as compared to a single class in animals, as well as major differences in domain architecture between plant and animal proteins. In addition, HAC2 appears to have diverged early in plant evolution. Its HAT domain appears to have evolved more rapidly than the lineage from which it diverged, and its N-terminal region lacks domains present in other plant CBPs, consistent with in vitro experiments that suggest it does not have HAT activity (70). HAC4 also appears to have evolved more rapidly than the lineage from which it diverged and has distinct features in its N-terminal region.

The Arabidopsis genome encodes proteins homologous to factors in yeast and mammals that associate with HAT complexes SAGA and ADA (GCN5, ADA2 homologs) and HDAC complexes NuRD and SIN3 (RPD3-like, Mi-2, MBD, RbAP46/48 homologs) (see http://www.chromdb.org), suggesting that the Arabidopsis GNAT family HATs and RPD3 family HDACs form complexes similar to those in other organisms (68). In contrast, an analysis of the domain structure of Arabidopsis CBP and TAFII250 proteins suggests that these proteins may form complexes different from their animal relatives. Plant CBP proteins lack a bromodomain, whereas animal CBPs have one, and plant TAFII250 proteins have a single bromodomain, compared to the two bromodomains found in their animal homologs. The plant proteins may utilize a different domain that serves the function of the second animal bromodomain in these proteins or may interact with a different bromodomain protein. A precedent for this possibility can be seen in TAFII145 proteins in S.cerevisiae which do not have a bromodomain, but that interact with Bdf1p. Bdf1p contains two bromodomains and may substitute for the missing C-terminal sequences in the S.cerevisiae TafII145p protein (83). Although we identified a number of bromodomain-containing proteins in Arabidopsis, none of these have enough sequence similarity to Bdf1p to suggest a homologous function. However, the Arabidopsis genome encodes two proteins (SGA1 and SGA2; www.chromdb.org) that are similar to yeast Asf1p. Asf1p interacts with Bdf1p, and its counterpart in humans, CIA/ASF1, interacts with the two bromodomains of human TAFII250 (84). Thus, the possibility exists that one of the many bromodomain proteins in Arabidopsis plays the role of Bdf1p and interacts with an Asf1p homolog. Interestingly, the Arabidopsis genome encodes two TAFII250 proteins and two ASF1 homologs, whereas yeast and animals encode only one of each.

In addition, our analysis of the Arabidopsis genome sequence revealed that all Arabidopsis bromodomain- containing proteins have only a single bromodomain, in contrast to some animal and S.cerevisiae bromodomain proteins that have multiple copies, ranging from two to five bromodomains. Many bromodomain-containing transcription factors also possess a conserved PHD finger (8587). Our finding of the absence of such a conserved feature in Arabidopsis bromodomain proteins suggests that the manner in which bromodomains are deployed and utilized differs between plants and animals.

Alternative splicing of two RPD3/HDA1 family genes and one SIR2 family gene could indicate alternative regulatory functions of the RNAs or the predicted protein products, different enzymatic or structural functions for the proteins, or no function at all. Alternative splicing that is conserved in Arabidopsis and tomato SIR2 homologs is suggestive evidence for function of an alternative splicing product, but it is also possible that this is a non-functional splicing product, merely an incidental consequence of a conserved RNA sequence.

These evolutionary differences in fundamental chromatin components among plants, animals and fungi suggest that there may be more evolutionary plasticity and more functional diversification in core chromatin components than might have been anticipated just a few years ago. This diversity is likely to reflect important differences in the manner in which chromatin controls gene expression in these three major kingdoms of eukaryotes, and supports the suggestion that plants have developed mechanisms of global gene regulation related to their unique developmental pathways and environmental responses (88).

SUPPLEMENTARY MATERIAL

Supplementary Material is available at NAR Online.

[Supplementary Material]

ACKNOWLEDGEMENTS

Expert technical assistance was provided by Rayeann Archibald and Todd Smith for DNA sequencing and Sharon E. Wilensky for RNA gel blot data. We thank Raghavendra K. Guru for assistance in verifying splicing models. We thank our colleagues of the Chromatin Functional Genomics Consortium for their comments, suggestions and support. This publication is based upon work supported by the National Science Foundation under Grant No. 9975930.

Notes

DDBJ/EMBL/GenBank accession nos+

REFERENCES

1. Kadonaga J.T. (1998) Eukaryotic transcription: an interlaced network of transcription factors and chromatin-modifying machines. Cell, 92, 307–313. [PubMed]
2. Kornberg R.D. and Lorch,Y. (1999) Twenty-five years of the nucleosome, fundamental particle of the eukaryote chromosome. Cell, 98, 285–294. [PubMed]
3. Strahl B.D. and Allis,C.D. (2000) The language of covalent histone modifications. Nature, 403, 41–45. [PubMed]
4. Grunstein M. (1997) Histone acetylation in chromatin structure and transcription. Nature, 389, 349–352. [PubMed]
5. Ng H.H. and Bird,A. (2000) Histone deacetylases: silencers for hire. Trends Biochem. Sci., 25, 121–126. [PubMed]
6. Struhl K., Kadosh,D., Keaveney,M., Kuras,L. and Moqtaderi,Z. (1998) Activation and repression mechanisms in yeast. Cold Spring Harb. Symp. Quant. Biol., 63, 413–421. [PubMed]
7. Allfrey V.G., Faulkner,R. and Mirsky,A.E. (1964) Acetylation and methylation of histones and their possible role in regulation of RNA synthesis. Proc. Natl Acad. Sci. USA, 51, 786. [PMC free article] [PubMed]
8. Hebbes T.R., Thorne,A.W. and Crane-Robinson,C. (1988) A direct link between core histone acetylation and transcriptionally active chromatin. EMBO J., 7, 1395–1402. [PMC free article] [PubMed]
9. Kayne P.S., Kim,U.J., Han,M., Mullen,J.R., Yoshizaki,F. and Grunstein,M. (1988) Extremely conserved histone H4 N terminus is dispensable for growth but essential for repressing the silent mating loci in yeast. Cell, 55, 27–39. [PubMed]
10. Thompson J.S., Ling,X. and Grunstein,M. (1994) Histone H3 amino terminus is required for telomeric and silent mating locus repression in yeast. Nature, 369, 245–247. [PubMed]
11. Durrin L.K., Mann,R.K., Kayne,P.S. and Grunstein,M. (1991) Yeast histone H4 N-terminal sequence is required for promoter activation in vivo. Cell, 65, 1023–1031. [PubMed]
12. Mann R.K. and Grunstein,M. (1992) Histone H3 N-terminal mutations allow hyperactivation of the yeast GAL1 gene in vivo. EMBO J., 11, 3297–3306. [PMC free article] [PubMed]
13. Grunstein M. (1992) Histones as regulators of genes. Sci. Am., 267, 68B–74B. [PubMed]
14. Brownell J.E., Zhou,J., Ranalli,T., Kobayashi,R., Edmondson,D.G., Roth,S.Y. and Allis,C.D. (1996) Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell, 84, 843–851. [PubMed]
15. Taunton J., Hassig,C.A. and Schreiber,S.L. (1996) A mammalian histone deacetylase related to the yeast transcriptional regulator Rpd3p. Science, 272, 408–411. [PubMed]
16. Suka N., Carmen,A.A., Rundlett,S.E. and Grunstein,M. (1998) The regulation of gene activity by histones and the histone deacetylase RPD3. Cold Spring Harb. Symp. Quant. Biol., 63, 391–399. [PubMed]
17. Kadosh D. and Struhl,K. (1998) Histone deacetylase activity of Rpd3 is important for transcriptional repression in vivo. Genes Dev., 12, 797–805. [PMC free article] [PubMed]
18. Kadosh D. and Struhl,K. (1998) Targeted recruitment of the Sin3–Rpd3 histone deacetylase complex generates a highly localized domain of repressed chromatin in vivo. Mol. Cell. Biol., 18, 5121–5127. [PMC free article] [PubMed]
19. Kadosh D. and Struhl,K. (1997) Repression by Ume6 involves recruitment of a complex containing Sin3 corepressor and Rpd3 histone deacetylase to target promoters. Cell, 89, 365–371. [PubMed]
20. Guschin D., Wade,P.A., Kikyo,N. and Wolffe,A.P. (2000) ATP-dependent histone octamer mobilization and histone deacetylation mediated by the Mi-2 chromatin remodeling complex. Biochemistry, 39, 5238–5245. [PubMed]
21. Fuks F., Burgers,W.A., Brehm,A., Hughes-Davies,L. and Kouzarides,T. (2000) DNA methyltransferase Dnmt1 associates with histone deacetylase activity. Nature Genet., 24, 88–91. [PubMed]
22. Sterner D.E. and Berger,S.L. (2000) Acetylation of histones and transcription-related factors. Microbiol. Mol. Biol. Rev., 64, 435–459. [PMC free article] [PubMed]
23. Imhof A., Yang,X.J., Ogryzko,V.V., Nakatani,Y., Wolffe,A.P. and Ge,H. (1997) Acetylation of general transcription factors by histone acetyltransferases. Curr. Biol., 7, 689–692. [PubMed]
24. Neuwald A.F. and Landsman,D. (1997) GCN5-related histone N-acetyltransferases belong to a diverse superfamily that includes the yeast SPT10 protein. Trends Biochem. Sci., 22, 154–155. [PubMed]
25. Candau R., Moore,P.A., Wang,L., Barlev,N., Ying,C.Y., Rosen,C.A. and Berger,S.L. (1996) Identification of human proteins functionally conserved with the yeast putative adaptors ADA2 and GCN5. Mol. Cell. Biol., 16, 593–602. [PMC free article] [PubMed]
26. Bannister A.J. and Kouzarides,T. (1996) The CBP co-activator is a histone acetyltransferase. Nature, 384, 641–643. [PubMed]
27. Giles R.H., Peters,D.J. and Breuning,M.H. (1998) Conjunction dysfunction: CBP/p300 in human disease. Trends Genet., 14, 178–183. [PubMed]
28. Mizzen C.A., Yang,X.J., Kokubo,T., Brownell,J.E., Bannister,A.J., Owen-Hughes,T., Workman,J., Wang,L., Berger,S.L., Kouzarides,T. et al. (1996) The TAF(II)250 subunit of TFIID has histone acetyltransferase activity. Cell, 87, 1261–1270. [PubMed]
29. Leo C. and Chen,J.D. (2000) The SRC family of nuclear receptor coactivators. Gene, 245, 1–11. [PubMed]
30. Xu L., Glass,C.K. and Rosenfeld,M.G. (1999) Coactivator and corepressor complexes in nuclear receptor function. Curr. Opin. Genet. Dev., 9, 140–147. [PubMed]
31. Frye R.A. (2000) Phylogenetic classification of prokaryotic and eukaryotic Sir2-like proteins. Biochem. Biophys. Res. Commun., 273, 793–798. [PubMed]
32. Leipe D.D. and Landsman,D. (1997) Histone deacetylases, acetoin utilization proteins and acetylpolyamine amidohydrolases are members of an ancient protein superfamily. Nucleic Acids Res., 25, 3693–3697. [PMC free article] [PubMed]
33. Imai S., Armstrong,C.M., Kaeberlein,M. and Guarente,L. (2000) Transcriptional silencing and longevity protein Sir2 is an NAD-dependent histone deacetylase. Nature, 403, 795–800. [PubMed]
34. Aparicio O.M., Billington,B.L. and Gottschling,D.E. (1991) Modifiers of position effect are shared between telomeric and silent mating-type loci in S. cerevisiae. Cell, 66, 1279–1287. [PubMed]
35. Gottlieb S. and Esposito,R.E. (1989) A new role for a yeast transcriptional silencer gene, SIR2, in regulation of recombination in ribosomal DNA. Cell, 56, 771–776. [PubMed]
36. Smith J.S. and Boeke,J.D. (1997) An unusual form of transcriptional silencing in yeast ribosomal DNA. Genes Dev., 11, 241–254. [PubMed]
37. Guarente L. (2000) Sir2 links chromatin silencing, metabolism and aging. Genes Dev., 14, 1021–1026. [PubMed]
38. Guarente L. and Kenyon,C. (2000) Genetic pathways that regulate ageing in model organisms. Nature, 408, 255–262. [PubMed]
39. Brachmann C.B., Sherman,J.M., Devine,S.E., Cameron,E.E., Pillus,L. and Boeke,J.D. (1995) The SIR2 gene family, conserved from bacteria to humans, functions in silencing, cell cycle progression and chromosome stability. Genes Dev., 9, 2888–2902. [PubMed]
40. Lusser A., Brosch,G., Loidl,A., Haas,H. and Loidl,P. (1997) Identification of maize histone deacetylase HD2 as an acidic nucleolar phosphoprotein. Science, 277, 88–91. [PubMed]
41. Wu K., Tian,L., Malik,K., Brown,D. and Miki,B. (2000) Functional analysis of HD2 histone deacetylase homologues in Arabidopsis thaliana. Plant J., 22, 19–27. [PubMed]
42. Aravind L., Koonin,E.V., Dangl,M., Lusser,A., Brosch,G., Loidl,A., Haas,H. and Loidl,P. (1998) Second family of histone deacetylases. Science, 280, 1167a.
43. Lusser A., Kolle,D. and Loidl,P. (2001) Histone acetylation: lessons from the plant kingdom. Trends Plant Sci., 6, 59–65. [PubMed]
44. Graessle S., Loidl,P. and Brosch,G. (2001) Histone acetylation: plants and fungi as model systems for the investigation of histone deacetylases. Cell. Mol. Life Sci., 58, 704–720. [PubMed]
45. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature, 408, 796–815. [PubMed]
46. Altschul S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. [PMC free article] [PubMed]
47. Pearson W.R., Wood,T., Zhang,Z. and Miller,W. (1997) Comparison of DNA sequences with protein sequences. Genomics, 46, 24–36. [PubMed]
48. Borodovsky M. and McIninch,J. (1993) Recognition of genes in DNA sequence with ambiguities. Biosystems, 30, 161–171. [PubMed]
49. Burge C. and Karlin,S. (1997) Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268, 78–94. [PubMed]
50. Hebsgaard S.M., Korning,P.G., Tolstrup,N., Engelbrecht,J., Rouze,P. and Brunak,S. (1996) Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res., 24, 3439–3452. [PMC free article] [PubMed]
51. Usuka J., Zhu,W. and Brendel,V. (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics, 16, 203–211. [PubMed]
52. Thompson J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. [PMC free article] [PubMed]
53. Felsenstein J. (1989) PHYLIP—Phylogeny inference package version (3.2). Cladistics, 5, 164–166.
54. Dayhoff M.O., Schwartz,R.M. and Orcult,B.C. (1978) A model of evolutionary change in proteins. In Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, Vol. 5, Suppl 3, 345–352.
55. Schneider T.D. and Stephens,R.M. (1990) Sequence logos—a new way to display consensus sequences. Nucleic Acids Res., 18, 6097–6100. [PMC free article] [PubMed]
56. Rundlett S.E., Carmen,A.A., Kobayashi,R., Bavykin,S., Turner,B.M. and Grunstein,M. (1996) HDA1 and RPD3 are members of distinct yeast histone deacetylase complexes that regulate silencing and transcription. Proc. Natl Acad. Sci. USA, 93, 14503–14508. [PMC free article] [PubMed]
57. Grozinger C.M., Hassig,C.A. and Schreiber,S.L. (1999) Three proteins define a class of human histone deacetylases related to yeast Hda1p. Proc. Natl Acad. Sci. USA, 96, 4868–4873. [PMC free article] [PubMed]
58. Gao L., Cueto,M.A., Asselbergs,F. and Atadja,P. (2002) Cloning and functional characterization of HDAC11, a novel member of the human histone deacetylase family. J. Biol. Chem., 277, 25748–25755. [PubMed]
59. Lechner T., Lusser,A., Pipal,A., Brosch,G., Loidl,A., Goralik-Schramel,M., Sendra,R., Wegener,S., Walton,J.D. and Loidl,P. (2000) RPD3-type histone deacetylases in maize embryos. Biochemistry, 39, 1683–1692. [PubMed]
60. Ahringer J. (2000) NuRD and SIN3 histone deacetylase complexes in development. Trends Genet., 16, 351–356. [PubMed]
61. Hubbert C., Guardiola,A., Shao,R., Kawaguchi,Y., Ito,A., Nixon,A., Yoshida,M., Wang,X.F. and Yao,T.P. (2002) HDAC6 is a microtubule-associated deacetylase. Nature, 417, 455–458. [PubMed]
62. Grozinger C.M. and Schreiber,S.L. (2000) Regulation of histone deacetylase 4 and 5 and transcriptional activity by 14-3-3-dependent cellular localization. Proc. Natl Acad. Sci. USA, 97, 7835–7840. [PMC free article] [PubMed]
63. Verdel A., Curtet,S., Brocard,M.P., Rousseaux,S., Lemercier,C., Yoshida,M. and Khochbin,S. (2000) Active maintenance of mHDA2/mHDAC6 histone-deacetylase in the cytoplasm. Curr. Biol., 10, 747–749. [PubMed]
64. Vetter I.R., Nowak,C., Nishimoto,T., Kuhlmann,J. and Wittinghofer,A. (1999) Structure of a Ran-binding domain complexed with Ran bound to a GTP analogue: implications for nuclear transport. Nature, 398, 39–46. [PubMed]
65. Saka Y., Sutani,T., Yamashita,Y., Saitoh,S., Takeuchi,M., Nakaseko,Y. and Yanagida,M. (1994) Fission yeast cut3 and cut14, members of a ubiquitous protein family, are required for chromosome condensation and segregation in mitosis. EMBO J., 13, 4938–4952. [PMC free article] [PubMed]
66. Dangl M., Brosch,G., Haas,H., Loidl,P. and Lusser,A. (2001) Comparative analysis of HD2 type histone deacetylases in higher plants. Planta, 213, 280–285. [PubMed]
67. Angus-Hill M.L., Dutnall,R.N., Tafrov,S.T., Sternglanz,R. and Ramakrishnan,V. (1999) Crystal structure of the histone acetyltransferase Hpa2: a tetrameric member of the Gcn5-related N-acetyltransferase superfamily. J. Mol. Biol., 294, 1311–1325. [PubMed]
68. Stockinger E.J., Mao,Y., Regier,M.K., Triezenberg,S.J. and Thomashow,M.F. (2001) Transcriptional adaptor and histone acetyltransferase proteins in Arabidopsis and their interactions with CBF1, a transcriptional activator involved in cold-regulated gene expression. Nucleic Acids Res., 29, 1524–1533. [PMC free article] [PubMed]
69. Ogryzko V.V. (2001) Mammalian histone acetyltransferases and their complexes. Cell. Mol. Life Sci., 58, 683–692. [PubMed]
70. Bordoli L., Netsch,M., Luthi,U., Lutz,W. and Eckner,R. (2001) Plant orthologs of p300/CBP: conservation of a core domain in metazoan p300/CBP acetyltransferase-related proteins. Nucleic Acids Res., 29, 589–597. [PMC free article] [PubMed]
71. Dhalluin C., Carlson,J.E., Zeng,L., He,C., Aggarwal,A.K. and Zhou,M.M. (1999) Structure and ligand of a histone acetyltransferase bromodomain. Nature, 399, 491–496. [PubMed]
72. Parker D., Ferreri,K., Nakajima,T., LaMorte,V.J., Evans,R., Koerber,S.C., Hoeger,C. and Montminy,M.R. (1996) Phosphorylation of CREB at Ser-133 induces complex formation with CREB-binding protein via a direct mechanism. Mol. Cell. Biol., 16, 694–703. [PMC free article] [PubMed]
73. Radhakrishnan I., Perez-Alvarado,G.C., Parker,D., Dyson,H.J., Montminy,M.R. and Wright,P.E. (1997) Solution structure of the KIX domain of CBP bound to the transactivation domain of CREB: a model for activator:coactivator interactions. Cell, 91, 741–752. [PubMed]
74. Ponting C.P., Blake,D.J., Davies,K.E., Kendrick-Jones,J. and Winder,S.J. (1996) ZZ and TAZ: new putative zinc fingers in dystrophin and other proteins. Trends Biochem. Sci., 21, 11–13. [PubMed]
75. Yao T.P., Ku,G., Zhou,N., Scully,R. and Livingston,D.M. (1996) The nuclear hormone receptor coactivator SRC-1 is a specific target of p300. Proc. Natl Acad. Sci. USA, 93, 10626–10631. [PMC free article] [PubMed]
76. Kamei Y., Xu,L., Heinzel,T., Torchia,J., Kurokawa,R., Gloss,B., Lin,S.C., Heyman,R.A., Rose,D.W., Glass,C.K. et al. (1996) A CBP integrator complex mediates transcriptional activation and AP-1 inhibition by nuclear receptors. Cell, 85, 403–414. [PubMed]
77. Ruppert S., Wang,E.H. and Tjian,R. (1993) Cloning and expression of human TAFII250: a TBP-associated factor implicated in cell-cycle regulation. Nature, 362, 175–179. [PubMed]
78. Pham A.D. and Sauer,F. (2000) Ubiquitin-activating/conjugating activity of TAFII250, a mediator of activation of gene expression in Drosophila. Science, 289, 2357–2360. [PubMed]
79. Dyson M.H., Rose,S. and Mahadevan,L.C. (2001) Acetyllysine-binding and function of bromodomain-containing proteins in chromatin. Front. Biosci., 6, D853–D865. [PubMed]
80. Jeanmougin F., Wurtz,J.M., Le Douarin,B., Chambon,P. and Losson,R. (1997) The bromodomain revisited. Trends Biochem. Sci., 22, 151–153. [PubMed]
81. Jones M.H., Hamana,N., Nezu,J. and Shimane,M. (2000) A novel family of bromodomain genes. Genomics, 63, 40–45. [PubMed]
82. Black D.L. (2000) Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell, 103, 367–370. [PubMed]
83. Matangkasombut O., Buratowski,R.M., Swilling,N.W. and Buratowski,S. (2000) Bromodomain factor 1 corresponds to a missing piece of yeast TFIID. Genes Dev., 14, 951–962. [PMC free article] [PubMed]
84. Chimura T., Kuzuhara,T. and Horikoshi,M. (2002) Identification and characterization of CIA/ASF1 as an interactor of bromodomains associated with TFIID. Proc. Natl Acad. Sci. USA, 99, 9334–9339. [PMC free article] [PubMed]
85. Venturini L., You,J., Stadler,M., Galien,R., Lallemand,V., Koken,M.H., Mattei,M.G., Ganser,A., Chambon,P., Losson,R. et al. (1999) TIF1gamma, a novel member of the transcriptional intermediary factor 1 family. Oncogene, 18, 1209–1217. [PubMed]
86. Schultz D.C., Friedman,J.R. and Rauscher,F.J.,3rd (2001) Targeting histone deacetylase complexes via KRAB-zinc finger proteins: the PHD and bromodomains of KAP-1 form a cooperative unit that recruits a novel isoform of the Mi-2alpha subunit of NuRD. Genes Dev., 15, 428–443. [PMC free article] [PubMed]
87. Bochar D.A., Savard,J., Wang,W., Lafleur,D.W., Moore,P., Cote,J. and Shiekhattar,R. (2000) A family of chromatin remodeling factors related to Williams syndrome transcription factor. Proc. Natl Acad. Sci. USA, 97, 1038–1043. [PMC free article] [PubMed]
88. Meyerowitz E.M. (2002) Plants compared to animals: the broadest comparative study of development. Science, 295, 1482–1485. [PubMed]
89. Sherman J.M., Stone,E.M., Freeman-Cook,L.L., Brachmann,C.B., Boeke,J.D. and Pillus,L. (1999) The conserved core of a human SIR2 homologue functions in yeast silencing. Mol. Biol. Cell, 10, 3045–3059. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Conserved Domains
    Conserved Domains
    Link to related CDD entry
  • EST
    EST
    Published EST sequences
  • Gene
    Gene
    Gene links
  • Gene (nucleotide)
    Gene (nucleotide)
    Records in Gene identified from shared sequence links
  • GEO Profiles
    GEO Profiles
    Related GEO records
  • GSS
    GSS
    Published GSS sequences
  • HomoloGene
    HomoloGene
    HomoloGene links
  • MedGen
    MedGen
    Related information in MedGen
  • Nucleotide
    Nucleotide
    Published Nucleotide sequences
  • Pathways + GO
    Pathways + GO
    Pathways, annotations and biological systems (BioSystems) that cite the current article.
  • Protein
    Protein
    Published protein sequences
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links
  • Taxonomy
    Taxonomy
    Related taxonomy entry
  • Taxonomy Tree
    Taxonomy Tree

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...