pmc logo image
Logo of archaeaJournal's HomeHomeEditorial BoardAuthor GuidelinesAims and ScopeManuscript Submission

Formats:

Archaea. 2006 August; 2(1): 11–30.
Published online 2006 January 23.
PMCID: PMC2685588
Distribution, structure and diversity of “bacterial” genes encoding two-component proteins in the Euryarchaeota
Mark K. Ashby*1,2
1 Department of Basic Medical Sciences, Biochemistry Section, University of the West Indies, Mona Campus, Kingston 7, Jamaica
2 School of Biological and Chemical Sciences, Queen Mary University of London, Mile End Road, London E1 4NS, U.K.
* Corresponding author (Email: m.ashby/at/qmul.ac.uk)
Received June 13, 2005; Accepted January 12, 2006.
The publicly available annotated archaeal genome sequences (23 complete and three partial annotations, October 2005) were searched for the presence of potential two-component open reading frames (ORFs) using gene category lists and BLASTP. A total of 489 potential two-component genes were identified from the gene category lists and BLASTP. Two-component genes were found in 14 of the 21 Euryarchaeal sequences (October 2005) and in neither the Crenarchaeota nor the Nanoarchaeota. A total of 20 predicted protein domains were identified in the putative two-component ORFs that, in addition to the histidine kinase and receiver domains, also includes sensor and signalling domains. The detailed structure of these putative proteins is shown, as is the distribution of each class of two-component genes in each species. Potential members of orthologous groups have been identified, as have any potential operons containing two or more two-component genes. The number of two-component genes in those Euryarchaeal species which have them seems to be linked more to lifestyle and habitat than to genome complexity, with most examples being found in Methanospirillum hungatei, Haloarcula marismortui, Methanococcoides burtonii and the mesophilic Methanosarcinales group. The large numbers of two-component genes in these species may reflect a greater requirement for internal regulation. Phylogenetic analysis of orthologous groups of five different protein classes, three probably involved in regulating taxis, suggests that most of these ORFs have been inherited vertically from an ancestral Euryarchaeal species and point to a limited number of key horizontal gene transfer events.
Keywords: histidine kinase, hybrid kinase, response regulator
Two-component systems are one of the key means by which bacteria respond to environmental changes (Hoch 2000, Stock et al. 2000, Alves and Savageau 2003, Hellingwerf 2005). They are assumed to be of bacterial origin, having radiated into archaea and some eukaryotes by horizontal gene transfer (HGT) (Koretke et al. 2000). Two-component systems consist of a sensor and a response protein. The sensor protein is characterized by a histidine kinase (HK) made up of two main domains, a phosphoacceptor (HisKA) and a histidine kinase ATPase (HATPase) and, in many cases, other sensory domains are present (Galperin et al. 2001, Zhulin et al. 2003). The response protein (response-regulator, RR) is characterized by a response regulator domain that has a conserved aspartate residue. The histidine kinase autophosphorylates a conserved histidine residue in response to a signal and the phosphate group is then transferred to the conserved aspartate residue of the response-regulator. The transfer of the phosphate group to the response-regulator elicits a response causing a change in taxis, development or gene expression. Histidine kinases and response-regulators are sometimes found together in a single polypeptide known as a hybrid kinase (Hoch 2000, Stock et al. 2000).
The recognition of the Archaea as a distinct division of life has been strengthened by the availability of a number of complete genome sequences, representing three phyla (Euryarchaota, Crenarchaeota and Nanoarchaeota). This has, in turn, enabled a more rigorous phylogenetic analysis based on the fusion of ribosomal protein sequences (Matte-Tailliez et al. 2002, Brochier et al. 2004, Bapteste et al. 2005, Makarova and Koonin 2005) and clusters of conserved orthologous genes (COGs) (Makarova and Koonin 2003). Analysis of genome sequences has revealed genes of bacterial origin in the genomes of archaea and vice versa (Nelson et al. 1999). The importance of HGT in the evolution of prokaryotes and the implications for phylogeny and definition of species is still being discussed (Ochman et al. 2000, Forterre et al. 2002, Boucher et al. 2003, Koonin 2003, Kurland et al. 2003, Lawrence and Hendrickson 2003).
For bacteria, it has been shown that the number of two-component genes possessed by an organism is related to the complexity of its genome, its physiology and the changeability of its habitat (Ashby 2004, Galperin 2005). The greater the value of any of those parameters, the greater the need for regulation of cellular activities.
The aim of this study was to analyze the complement of genes in archaeal genomes that could encode two-component proteins. The putative two-component proteins were classified by their domain structure, and the number of each class was determined for each species. Potential orthologous groups and those that may be part of operons, with two or more two-component genes, are indicated. Phylogenetic analysis of possible orthologous groups representing five classes of protein, three associated with taxis, is shown.
Genome sequence data
The list of publicly available Euryarchaeal genome sequences is shown in Table 1, along with brief details of the habitat, physiology, genome size, putative number of open reading frames (ORFs) and the abbreviation used with gene sequences (Makarova and Koonin 2003). The sequences and annotations for the annotated sequences (up to October 2005) were accessed at the Integrated Microbial Genomes (IMG) server (http://img.jgi.doe.gov/pub/main.cgi) and HaloLex (http:// www.halolex.mpg.de/). The identity of potential two component genes was determined by reference to the published assignments located at http://www.tigr.org/tdb/ (Bult et al. 1996, Smith et al. 1997, Kawarabayasi et al. 1998, Klenk et al. 1997, Ng et al. 2000, Deppenmeier et al. 2002, Slesarev et al. 2002, Galagan et al. 2002, Cohen et al. 2003, Baliga et al. 2004, Falb et al. 2005). This was supplemented by BLASTP (Altschul et al. 1997) searches of each genome with a battery of two component domains (domains used include receivers from CheY and OmpR, HisKA/HATPase and Hpt; see Table A1) from Methanosarcina acetivorans and E. coli K12 at IMG (http://img.jgi.doe.gov/pub/main.cgi), The Integrated Genome Resource (TIGR, http://www.tigr.org/tdb/) or the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov/).
Table 1.
Table 1.
Strain description of members of the Euryarchaea for which there are publicly available genome sequences as of October 2005. The gene name prefix is used with the gene designations to aid in (more ...)
Bioinformatic analysis
Putative two-component domains were initially assigned in ORFs by Pfam batch analysis (http://www.sanger.ac.uk/Software/Pfam/; Bateman et al. 2004). Domains were recorded for each two-component gene only if they were scored as “Pfam’s trusted match thresholds.” Domain assignments were checked and modified using the more extensive domain assignments at InterPro (http://www.ebi.ac.uk/interpro/). The results were used to classify the putative two-component proteins by domain organization using a nomenclature adapted from Ohmori et al. (2001), with each group subdivided by the organization of the identified signalling domains. The cartoon style diagrams that present the domain organization of the deduced sequences were constructed from these data, with the sizes of the domains roughly in proportion to each other. For clarity, each gene name begins with a four character acronym (except for Haloarcula marismortui, Pyrococcus abyssi GE5 and Pyrococcus horikoshii OT3, see Table 1) followed by either the locus tag that can be found at IMG or HaloLex (http://img.jgi.doe.gov/pub/main.cgi and http://www.halolex.mpg.de/) or the gene object identifier if the sequencing or annotation is incomplete.
To determine orthologous groups, orthology information, based on the bidirectional best hits from BLASTPs of each organism against each other organism polypeptide, is accessible at IMG (http://img.jgi.doe.gov/pub/main.cgi). This definition is not completely accurate, but it provides a useful approximation as it is not always possible to know whether the polypeptides arose from a single gene present in the last common ancestor (orthologues) or from a gene duplication within a genome (paralogues). Alignments for phylogenetic analysis were performed by TCoffee (Notredame et al. 2000) and accessed at the Centre Nationale de la Recherche Scientifique website (http://igs-server.cnrs-mrs.fr/Tcoffee/tcoffee_cgi/index.cgi) and ClustalW alignments (Thompson et al. 1994) were performed at the European Bioinformatics Institute (http://www.ebi.ac.uk/clustalw/). Representatives from three bacterial phyla were included in the alignments (chosen by having the best match to one of the archaeal ORFs, either as an orthologue or by BLASTP at IMG). Phylogenetic analysis by neighbor-joining (Bootstrap 250) was performed using MEGA version 3.0 (Kumar et al. 2004) and by maximum-likelihood (Felsenstein 1996) using Molphy, accessed at the Institut Pasteur, biological software website (http://bioweb.pasteur.fr/intro-uk.html#phylo).
Closely linked two-component genes and probable operons that contain two or more two-component genes were constructed from the chromosome map images at IMG (http://img.jgi.doe.gov/pub/main.cgi), TIGR (http://www.tigr.org/tdb/) and the biology of extremophiles website (http://www-archbac.u-psud.fr/homepage.html)
The structural classification of potential two-component proteins is shown in Table 2. No two-component encoding gene could be found in the Crenarchaeota or Nanoarchaeota (data not shown). Sensor domains are drawn as ellipses and two-component (HisKA, HATPase_c and response regulator) and output domains are drawn as rectangles. parentheses followed by figures indicate the number of similar domains that may be found in the proteins listed in each subclass.
Table 2.
Table 2.
Compilation and cartoon diagrams of all putative archaeal open reading frames encoding two-component proteins. Abbreviations; Cach = Cache; ConA = ConA-like glucanase; Gly = Glycos_transf_2; HKA (more ...)
Figure T2a.
Figure T2a.
Figure T2a.
Compilation and cartoon diagrams of all putative archaeal open reading frames encoding two-component proteins. Compilation and cartoon diagrams of all putative archaeal open reading frames encoding (more ...)
Figure T2b.
Figure T2b.
Figure T2b.
Continued from Table 2. Compilation and cartoon diagrams of all putative archaeal open reading frames encoding two-component proteins. Abbreviations; Cach = Cache; ConA = ConA-like glucanase; Gly (more ...)
Figure T2c.
Figure T2c.
Figure T2c.
Continued from Table 2. Compilation and cartoon diagrams of all putative archaeal open reading frames encoding two-component proteins. Abbreviations; Cach = Cache; ConA = ConA-like glucanase; Gly (more ...)
Histidine kinases
Different types of histidine kinases (HK) are listed in Table 2A. Histidine kinases contain two domains; a dimerization and a phosphoacceptor domain (HisKA or HisKA_2) and a HATPase_c domain (Grebe and Stock 1999). HisKA and HisKA_2 are part of a His kinase A phosphoacceptor domain superfamily that also includes HWE_HK and HisKA_3 (Karniol and Vierstra 2004; Pfam accession CL0025).
HKI
Histidine kinase Is are HKs containing HisKA and HATPase domains. There may be other domains in some of these examples which are not currently recognized. The HKI ORFs vary greatly in size, ranging from 175 to 592 amino acids in length.
HKII
Histidine kinase IIs are HKs containing sensor GAF and PAS/PAC domains. The GAF domains (cGMP phosphodiesterase, adenylyl cyclases, bacterial transcription factors FhlA) are associated with small molecule binding, in particular cAMP and cGMP (Aravind and Ponting 1997, Ho et al. 2000, Anantharaman et al. 2001). The GAF domain is usually found in combination with PAS (Drosophila period clock protein, vertebrate aryl hydrocarbon receptor nuclear translocator and Drosophila single-minded protein) or PAC (PAS-associated C-terminal motif) domain, or both. One class of PAS domains is known to bind cofactors such as heme and FAD (Bibikov et al. 2000, Sardiwal et al. 2005). Sensing of light, oxygen or redox potential by PAS domains requires cofactors, whereas sensing signals such as voltage, xenobiotics and nitrogen availability does not (Ponting and Aravind 1997, Gilles-Gonzalez and Gonzalez 2004). The PAC domains are proposed to contribute to the PAS domain fold. The shared feature of GAF and PAS/PAC domains is the binding of a diverse set of regulatory small molecules that often remain unidentified; all three domains are common signal transduction system components (Anantharaman and Aravind 2001, Zhulin et al. 2003). There is one example containing Cache and one containing SBP_bac_3 (bacterial extracellular solute-binding proteins, family 3). The Cache domain is a signalling domain found in animal calcium channel subunits and it is thought to form an extracellular or periplasmic ligand sensor (Anantharaman and Aravind 2001). SBP_bac_3 is involved in active transport of solutes across the cytoplasic membrane and in the initiation of signal transduction pathways (Tam and Saier 1994). This is by far the largest subgroup of ORFs, containing 161 out of the total of 489 (33% of the total).
HKIII
Histidine kinase IIIs are HKs that possess a HAMP “linker” (histidine kinase, adenylyl cyclase, methyl-accepting chemotaxis protein and phosphatase) domain. The HAMP domain is usually associated with the transmission of a signal across a membrane from periplasmic ligand-binding domains (Aravind and Ponting 1999, Appleman and Stewart 2003, Zhu and Inouye 2004). Eight examples of HKIIIs have an N-terminal putative periplasmic signalling CHASE4 domain and four have an N-terminal periplasmic signalling Cache domain (Anantharaman and Aravind 2001, Zhulin et al. 2003). These domains are positioned next to the HAMP domain, presumably for efficient transfer of the signal.
HKVI
Histidine kinase IVs are the CheA-like chemotaxis signalling proteins that contain an N-terminal Hpt (histidine phosphotransfer) and Hkd (histidine kinase dimerization) domain and a C-terminal CheW domain. Some contain one or two P2 domains between the Hpt and Hkd. The Hpt domain is involved in mediating phosphotransfer from one receiver domain to another (Hoch 2000). Hkd (H-kinase-dim) is the dimerization domain of CheA and CheW that interacts with methyl-accepting chemotaxis proteins (MCPs), relaying signals to CheY, and thereby affecting flagellar rotation (West et al. 1995). The P2 domain is involved in enhancing the interaction of CheY with the HK (Jahreis et al. 2004, Stewart and van Bruggen 2004). Thermococcus kodakaraensis has two open reading frames with a frame shift mutation that probably encodes for a CheA-like protein. All of the HKVI genes discussed are located close to other genes that could be involved with signal transduction and are probably transcribed as single operons (see Table A2).
HATPase_c
These contain no dimerization or phosphoacceptor domains currently recognized at INTERPRO.
His_KA
There are five groupings that contain His_KA without a discernable HATPase_c domain.
Response regulators
Response regulators (RR) are listed in Table 2B. These contain a characteristic receiver (RR/T_reg) domain, which is about 120 amino acids long and contains a conserved aspartate residue about halfway along the molecule that accepts a phosphate group from an HK.
RR I
Response regulator Is are simple orphan (no other domain detected) RRs, representing the second largest group of two-component ORFs (24% of the total).
RRIII
Response regulator IIs contain an RR fused to a potential DNA binding domain. Such regulators are found only in H. marismortui. Of these, there are only three examples that containeither the HTH_10 or DUF24 domain (PF04967 and PF 01638). These are the only RRs that are possibly transcriptional regulators, but there may be other currently unidentified DNA-binding domains in other RRs or hybrid kinases.
RRIV
Response regulator IVs contain an N-terminal RR fused to output or signal domains. There are 16 examples of CheB fused to the RR. The CheB domain is related to methylesterase and is likely to be concerned with chemotaxis (West et al. 1995). There is one example of two RRs fused to a glycosyl transferase domain in M. thermoautotrophicus (Pfam Accession number: PF00353). The glycosyl transferase domain is involved in transferring sugar moieties from a donor to recipient molecules. There are a lot of Methanospirilum hungatei ORFs fused with PAS/PAC or GAF domains, or both, however, as the annotation is incomplete, some of these ORFs may turn out to be part of hybrid kinases.
Hybrid kinases
Hybrid kinases (HY) are shown in Table 2C. They are defined as containing both HK and RR domains. The nomenclature is based on the position and number of RR with respect to the HK. There is an incomplete HYI in A. fulgidus that has a PAC/ PAS and GAF sensor domain, but no discernable HATPase domain.
HYI
Hybrid kinase Is have a single RR N-terminal to the HK.
HYII
Hybrid kinase IIs have a single RR C-terminal to the HK. There is only one example in M. acetivorans.
HYIII
Hybrid kinase IIIs have two RRs either N or C-terminal to the HK.
Distribution of putative two-component ORFs
The total number of ORFs within each class of two-component proteins, for each species of Euryarchaeota, is shown in Table 3. No two-component ORFs were found in the four Crenarchaeota species or N. equitans (data not shown). No two-component ORF was found in M. jannaschii, M. kandleri, P. furiosus or in any of the members of the Thermoplasmales. The three other Pyrococcus species each have only three two-component ORFs (Thermococcus kodakaraensis HKVI that has a frame shift Tkod_61070420/30, that could be a sequencing error has been counted as one), representing 0.17% of the protein-coding capacity of the genome. Methanococcus maripaludis and Halobacterium also have a small number, six (0.34%) and 16 (0.64%), respectively. Archaeoglobus fulgidus, M. thermoautotrophicus and the four Methanosarcinales groups have a comparatively large number of two-component ORFs, from 23 to 67. This represents from 1.03 to 1.48% of the coding capacity of the four complete genome assignments. Haloarculamarismortui has the largest number of two-component encoding genes, of the complete annotations, which represents 1.93% of the total protein coding capacity of the genome. Methanospirillum hungatei appears to have the largest number of two-component genes at 87, though the annotation of the genome is incomplete (so no percentage is given in Table 3). The HKs form half to two-thirds of the two-component ORFs for each species (except Pyrococcus sp. and Methanospirillum hungatei). The DNA-binding domains (putative) were only detected as part of the RRs in H.marismortui.
Table 3.
Table 3.
Distribution of euryarchaeal two-component open reading frames. The total number of identified two-component genes are shown (Total 2-C) and are given as a percentage of the total protein (more ...)
The PAS/PAC and GAF sensory domains are found in 293 of the 489 putative proteins surveyed. These sensory domains are absent in the Pyrococcus sp. A total of 18 ORFs were found that contain the HAMP domain that would in most cases be involved in transferring signals from sensor domains detecting information outside the cell.
Orthologous groups
Potential orthologous groups are shown in Table 4. These results are based on the bidirectional best hits from BLASTPs at IMG. The identification of orthologous groups at IMG may not be correct in all cases as some groupings may include ORFs that are due to gene duplication, hence a paralogue (in a different organism) rather than an orthologue. It is, nevertheless, a useful tool for assigning putative orthologous groups when no functional information is available. The groups have been named with a three letter acronym for ease of reference (Table 4). In arr18/19, RRIV-CheB, the grouping was modified from the information at IMG based on the phylogenetic analysis presented in Figure 4Figure 4. (see below). There are many orthologous groups that contain two or three members, particularly within the Methanosarcinales. The more interesting groups are those that have more members or that have members in different genera. These will be discussed below, particularly those that are part of taxis operons (ahk40–43, arr7, arr10, arr18 and arr19).
Table 4.
Table 4.
Potential orthologous two-component open reading frames.
Figure 4.
Figure 4.
Figure 4.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from response regulator IV group arr18 and arr19. Group arr18 = MaceMA0015, MbarA_0985 and (more ...)
Phylogenetic analysis
Figures 1Figure 1. to 5 show neighbor-joining and the supplementary Figures A1 to A5 contain the maximum likelihood phylogenetic analyses of alignments from TCoffee analysis. Phylogenetic analysis of the same ORFs was also performed on alignments made by ClustalW (data not shown), but the results were not found to differ significantly. Figure 7 contains the ORFs from the two HKII groups, ahk5 and ahk22, with the three closest bacterial ORFs (to MaceMA2890) from Cyanobacteria, Firmicutes and Proteobacteria. Figure 8 is composed of ORFs from the four HKVI ‘CheA like’ groups, ahk40–43, and the three closest bacterial ORFs (to MaceMA0014) from Thermatogae, Firmicutes and Proteobacteria. Figure 3Figure 3. is the analysis of a number of RRI orphans, arr7 and arr10 (from putative taxis operons), arr12 and arr14 (> 200 amino acids) with bacterial ORFs from Thermatogae, Firmicutes and Proteobacteria (closest to MaceMA3068). Figure 4Figure 4. shows the results for the two RRIV CheB orthologous groups from taxis operons, arr18 and arr19 with bacterial representatives from Thermatogae, Proteobacteria and Actinobacteria (closest to MaceMA0015). Figure 5Figure 5. shows results for ahy2 and bacterial representatives from Cyanobacteria, Actinobacteria and Proteobacteria.
Figure 1.
Figure 1.
Figure 1.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from histidine kinase II groups ahk5 and ahk22. Group ahk5 = MaceMA2890, MbarA_ 2935, (more ...)
Figure 3.
Figure 3.
Figure 3.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from the response regulator I orphans arr7, arr10, arr12 and arr14. Group arr7 = (more ...)
Figure 5.
Figure 5.
Figure 5.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from hybrid kinase 1 group ahy2. The bacterial ORFs are Gviolglr3434, RrubRruA1653 and SaverSAV3017. (more ...)
Figure 2.
Figure 2.
Figure 2.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from the histidine kinase IV groups ahk40–43. Group ahk40 = MaceMA3066, MmazMM0328, (more ...)
Linked genes
Genes that are located close to each other on the genome and transcribed in the same orientation are shown in Table A2. Most of these are likely to be part of operons. This provides clues to some cognate pairs of HKs, HYs and RRs. All putative HKVI encoding genes are located with other “chemotaxis” genes in “chemotaxis operons,” including two such operons for M. acetivorans and M. mazei. Included in these “chemotaxis operons” are the orthologous groups, ahk40–43, arr7, arr10 and arr18 and arr19 (Table 4).
Distribution of two-component ORFs
Ten species, representing four genera have at least 17 putative two-component ORFs. Some of these two-component ORFs are quite sophisticated in structure, including the multiple sensor HKIIs, CheA-like HKVIs and the hybrid kinases. The results presented here show that a number of euryarchaeal species have an extensive array of two-component sensory ORFs. These proteins may sense a number of different internal signals by means of PAS/PAC domains and their associated cofactors (Ponting and Aravind 1997, Gilles-Gonzalez and Gonzalez 2004). In addition, the potential to sense other small molecules (particularl cNMPs) via the GAF domains (Aravind and Ponting 1997, Ho et al. 2000, Anantharaman et al. 2001) and extracellular signals, by the CHASE4 and Cache putative sensory domains, via HAMP domains (Anantharaman and Aravind 2001, Zhulin at al. 2003) shows that these organisms (in particular H.marismortui, Natronomonas pharaonis, Methanospirillum hungatei and the Methanosarcinales) possess sophisticated and complex sensory networks. As yet, none of these putative two-component genes have a functional name, so functions can be assigned only by similarity. The DNA-binding RRs are common in bacteria that regulate gene expression (Ashby 2004, Galperin 2005). However only three RRs have been identified with putative DNA binding domains, all in H. marismortui. If regular indiscriminate HGT were taking place, one would expect to see more DNA-binding RRs in archaeal sequences. Presumably the large number of orphan RRs are involved in regulation of cellular activity by interacting directly with other proteins. Transcriptional control is probably maintained by the many DNA-binding domains that have been identified as part of one-component systems in archaea (Ulrich et al. 2005). In these systems the DNA-binding output domain is linked directly to a sensor domain without any phosphotransfer.
Of the species that have the most two-component genes, H. marismortui and Natronomonas pharaonis are halophilic and the Methanosarcinales and Methanospirillum hungatei are mesophiles. The mesophiles coexist with a large and diverse population of bacteria, giving ample opportunity for HGT, whereas the opportunity for HGT in the halophilic organisms would be more restricted. This begs the question of how the distribution of two-component genes that can be seen in the Euryarchaeota arose. Was it through HGT exclusively or by vertical transfer from a common ancestral euryarchaeal organism coupled with gene duplications?
Phylogeny and inheritance of two-component ORFs
The phylogenetic analysis of five different sets of orthologous ORFs, chosen because they are found in most of the species that contain two-component ORFs (Figures 1Figure 1.–5), were found to closely match the published phylogenies for these organisms (Matte-Tailliez et al. 2002, Brochier et al. 2004, Bapteste et al. 2005).
For ahk5 and ahk22, shown in Figure 1Figure 1., the phylogeny of each group agrees with the current phylogeny of these organisms and the position of the three bacterial examples indicates that the two groups may have arisen through a separate HGT event in an ancestral euryarchaeal species for ahk5 and possibly, into an ancestral methanogen for ahk22.
The results for the CheA-like HKVI ORFs are shown in Figure 8. Ahk40, ahk42 and ahk43 (except Mhun_401793120) cluster together and probably represent vertical inheritance from a single HGT event into an ancestral Euryarchaeota species (one bacterial ORF from T. maritima giving the best match). Ahk41 appears to be a separate group, found in the Methanosarcinales, that clusters on its own and seems to be more closely associated with the Firmicutes and Proteobacterial examples, presumably representing a separate HGT event. The three Methanospirillum hungatei ORFs seem to be due to separate (Mhun_401793120 probably should not be in ahk43) HGT events and Mhun_401784470 and Mhun_401776240 are probably true paralogues.
Figure 3Figure 3. shows the results for four orphan RR groups. The two groups, associated with putative taxis operons ahk7 and ahk10, group closely together, however, ahk10, which is found only in the Methanosarcinales is probably due to an HGT event into a direct ancestor of this group. The other two orthologous groups, arr12 and arr14 are quite separate from the first two mentioned groups (arr7 and arr10) and probably arose from separate HGT events into the ancestors of methanogens (AfulAF2419 appears to be a distant member of ahk12).
Figure 4Figure 4. shows the results for the two RRIV-CheB orthologous groups associated with taxis. The arr18 orthologous group found in Methanosarcinales groups separately from arr19, being closer to two of the bacterial ORFs. Therefore arr18 appears to be the result of a separate HGT event in an ancestor of the Methanosarcinales, whereas arr19 appears to be the result of an HGT event into an ancestor of Euryarchaeota.
The phylogeny for ahy2 (the biggest hybrid kinase orthologous group), shows that these members probably arose from more than one HGT event. The combined results for the orthologous groups found in potential taxis operons are shown in Table A2.
The operon that contains HKVI (ahk40/42/43), RRI (arr7) and RRIV-CheB (arr19) appears to have arisen as an HGT event that transferred the whole operon into an ancestor of the Euryarchaeota. In contrast, the taxis operon containing HKVI (ahk41), RRI (arr10) and RRIV-CheB (arr18) appears to have arisen from a separate HGT event of the whole operon into a direct ancestor of the Methanosarcinales.
The results presented here suggest that HGT has taken place from bacterial species both into ancestral Euryarchaeota and more recently into the methanogens. However the large numbers of two-component genes in the mesophilic methanogens and the Halobacteriales probably reflect their well known metabolic flexibility (Bapteste et al. 2005, Falb et al. 2005). This in turn, necessitates an increased requirement for regulation of cellular activity in a changing environment rather than the increased potential for HGT from bacteria. Most of the two-component ORFs that can be observed in these groups of organisms are probably derived from paralogous gene duplication events, the number of two-component ORFs observed would be driven by the requirement to control cellular activity as the organisms evolve. A limited number of HGT events could be sufficient to account for the diversity of phosphotransfer and sensory domains.
Any function of two-component ORFs is inferred by homology to known bacterial genes (e.g. HKVI and chemotaxis) and awaits in situ or in vitro studies, or both. This highlights the importance of interfacing between bioinformaticians and biochemists to plan experiments in an informed way, particularly where orthologues are identified and found in more than one genus and hence may play central roles in cellular regulation.
Acknowledgments
Mark Ashby was supported by New Initiative Funding from the University of the West Indies. The author wishes to thank John Allen, Elke Dittmann, Conrad Mullineaux and Ruth-Sarah Rose for critical reading of this manuscript.
Appendix
Figure A1.
Figure A1.
Figure A1.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from histidine kinase II groups ahk5 and ahk22. Group ahk5 = MaceMA2890, MbarA_2935, (more ...)
Figure A2.
Figure A2.
Figure A2.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from histidine kinase VI groups ahk40–43. Group ahk40 = MaceMA3066, MmazMM0328, AfulAF1040 (more ...)
Figure A3.
Figure A3.
Figure A3.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from response regulator I groups arr7, arr10, arr12 and arr14. Group arr7 (more ...)
Figure A4.
Figure A4.
Figure A4.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from response regulator IV groups arr18 and arr19. Group arr18 = MaceMA0015, MbarA_0985 and (more ...)
Figure A5.
Figure A5.
Figure A5.
Phylogenetic analysis by neighbor-joining of putative two-component open reading frames (ORFs) from hybrid kinase 1 group ahy2. The bacterial ORFs are Gviolglr3434, RrubRruA1653 and SaverSAV3017. (more ...)
Table A1.
Two-component protein domains from M. acetovrans (M. ace) and Escherichia coli K12 (E. coli) used for BLASTP searches, showing the online accession numbers and the amino acid range that was used. Abbreviation: aa = amino acids.
DomainSpeciesGeneAmino acid region

CheY ReceiverE. coliNP_416396.1
M. aceMA0016
M. aceMA3068
OmpR ReceiverE. coliNP_417864.1aa 6–124
Histidine kinaseE. coliNP_417863.1aa 234–439
M. aceMA0490 (HisKA_2)aa 639–847
HptE. coliNP_415513.1aa 815–896
M. aceNP_614988aa 5–106
Table A2.
Closely linked genes that may be part of operons.
Assigned gene no.ClassificationOrthologue numberAdditional information

Archaeoglobus fulgidus
AF448His_KA PACPASGAF
AF449RRI-CheY
AF450HKIII + 1 PASPAC
AF1042RRI-CheYarr7AF1055 mcp
AF1041RRIV-CheBarr18AF1044CheW
AF1040HKVIarr40AF1039 CheC
AF1472HYI-HisKAahy2
AF1473RRI-CheYarr9
AF2419RRI-CheYarr12
AF2420His_KA PACPASGAF
Halobacterium
VNG0974GRRI-CheYarr7VNG0976G CheW
VNG0973GRRIV-CheBarr19VNG0970G CheC1 VNG0971G
HKVIahk43VNG0967G CheD
VNG0966G CheRHKIIIahk22
VNG1374GHATPase HAMP
VNG1375G
VNG2036CRRI-CheYarr1
VNG2037CHKII
Haloarcula marismortui
rrnAC0410HKII + 2PASPACarr9
rrnAC0411RRI-CheY
rrnAC0412HYI
rrnAC0413HKII + 1PASPAC
rrnAC0456HATPase_c+ gyraTop6B
rrnAC0457HATPase_cTop6A
rrnAC2204RRIV-CheBarr19rrnAC2206 CheR
rrnAC2205HKVICheW-CheAahk43
rrnAC2692HYI + PASPACGAF
rrnAC2694HATPase_c + PAS
Methanobacter thermoautotrophicus
MTH444HKIahy9
MTH445RRI-CheYarr9
MTH446HYI + (PASPAC)2ahy2
MTH447RRI-CheYarr6
MTH457RRIV-PACPAS
MTH459HKI
MTH548RRIV-G_transf
MTH549RRI-CheY
MTH901HYI
MTH902HYI-PASPAC
Methanococcus maripaludis
MMP0926RRI-CheBarr19MMP0925 CheW
MMP0927HKVIahk42MMP0928 CheD
MMP0933RRI-CheYarr7MMP0929 mcp
MMP1303HKIVahk32
MMP1304RRI-CheYarr2
Methanosarcina acetivorans
MA0014HKVIahk41
MA0015RRI-CheBarr19
MA0016RRI-CheYarr10
MA0018RRI-CheYarr16MA0019 mcp
MA0020 CheW
Methanosarcina acetivorans cont’d
MA0551HKII + PASPAC
MA0552HKIIIahk9
MA0619HKIIIahk12
MA0620HKIIIahk14
MA0758HKIII
MA0759HKIIIahk30
MA1267HYI + PASPACahy2
MA1268RRI-CheYarr9
MA1269RRI-CheY
MA1270HKII + PASPAC
MA1468RRI-CheY
MA1469RRI-CheYarr8
MA1470HKIIIahk8
MA1627HKIIIahk25
MA1628HKII + PASPACahk4
MA1645HKIIIahk17
MA1646HKII + PASPACahk6
MA2012RRI-CheY
MA2013HYII + Hpt
MA3066HKVIahk40MA3063 CheR
MA3068RRI-CheYarr7MA3064 CheD
MA3067RRI- CheBarr18MA3065 CheC
MA3070 CheW
MA3368HKIIIahk24
MA3370HKII + PASPAC
MA4376RRI-CheYarr12
MA4377HYIII CHSE4HPahy7
Methanosracina barkeri
MbarA_0984HKVIahk41MbarA_0983 CheR
MbarA_0985RRIV-CheBarr19
MbarA_0986RRI-CheYarr10
MbarA_0988RRI-CheYMbarA_0989 mcp
Opp orientation to aboveMbarA_0990 CheW
MbarA_1051RRI-CheYarr12
MbarA_1052HYIIIahy7
MbarA_3036HKII
MbarA_3037HKII
MbarA_3247HYIPASPAC
MbarA_3248RRI-CheY
MbarA_3250HKII
MbarA_3447HKII
MbarA_3448
Methanosracina mazei
MM0168HKIIahk9
MM0169HKIII
MM0328HKVIahk40MM3025 CheR
MM0329RRI- CheBarr18MM0326 CheD
MM0330RRI-CheYarr7MM0327 CheC
MM0332 CheW
MM0333 mcp
MM1325HKVIahk41MM1323 CheC
MM1326RRI-CheYarr19MM1324
MM1327arr10CheB
MM1328 opp orientation to aboveRRI-CheYarr16MM1329 mcp
MM1330 CheW
Methanosracina mazei cont’d.
MM2275HKIIIahk28
MM2276HKIII
MM2277HKIII
MM2515HKIIIahk8
MM2516RRI-CheYarr8
MM2880RRI-CheYarr3
MM2881HYIIIahy6
MM2953RRI-CheYarr4
MM2954RRI-CheYarr5
MM2955HKVIPASPACCach
MM3205HYIII
MM3206RRI-CheY
Pyrococcus abyssi
PAB1330RRI-CheYarr7PAB1329 CheR
PAB1331RRVI-CheBarr19PAB1333 CheC
PAB1332HKVIahk42PAB1334 CheC
PAB1335CheW
PAB1336 mcp
Pyrococcus horikoshi
PH0482RRI-CheYarr7PH0481 CheB
PH0483RRIV-CheBarr19
PH0484HKVIahk42
R1. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PubMed]
R2. Alves R., Savageau M.A. Comparative analysis of prototype two-component systems with either bifunctional or monofunctional sensors: differences in molecular structure and physiological function. Mol. Microbiol. 2003;48:25–51. [PubMed]
R3. Anantharaman V., Aravind L. The CHASE domain: a predicted ligand-binding module in plant cytokinin receptors and other eukaryotic and bacterial receptors. Trends Biochem. Sci. 2001;26:579–582. [PubMed]
R4. Anantharaman V., Koonin E.V., Aravind L. Regulatory potential, phyletic distribution and evolution of ancient, intracelluar small-molecule-binding domains. J. Mol. Biol. 2001;307:1271–1292. [PubMed]
R5. Appleman J.A., Stewart V. Mutational analysis of a conserved signal-transducing element: the HAMP linker of the Escherichia coli nitrate sensor NarX. J. Bacteriol. 2003;185:89–97. [PubMed]
R6. Aravind L., Ponting C.P. The GAF domain: an evolutionary link between diverse phototransducing proteins. Trends Biochem. Sci. 1997;22:458–459. [PubMed]
R7. Aravind L., Ponting C.P. The cytoplasmic helical linker domain of receptor histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins. FEMS Microbiol. Lett. 1999;176:111–116. [PubMed]
R8. Ashby M.K. Survey of the number of two-component response regulator genes in the complete and annotated genome sequences of prokaryotes. FEMS Microbiol. Lett. 2004;231:277–281. [PubMed]
R9. Baliga N.S., Bonneau R., Facciotti M.T., et al. Genome sequence of Haloarcula marismortui: a halophilic archaeon from the Dead Sea. Genome Res. 2004;14:2221–2234. [PubMed]
R10. Bapteste E., Brochier C., Boucher Y. Higher-level classification of the Archaea: evolution of methanogenesis and methanogens. Archaea. 2005;1:353–363. [PubMed]
R11. Bateman A., Coin L., Durbin R., et al. The Pfam protein families database. Nucleic Acids Res. 2004;32:D138–141. [PubMed]
R12. Bibikov S.I., Barnes L.A., Gitin Y., Parkinson J.S. Domain organisation and flavin adenine dinucleotide-binding determinants in the aerotaxis signal transducer Aer of Escherichia coli . Proc. Natl. Acad. Sci. USA. 2000;97:5830–5835. [PubMed]
R13. Boucher Y., Douady C.J., Papke R.T., Walsh D.A., Boudreau M.E.R., Nesbø C.L., Case R.J., Doolittle W.F. Lateral gene transfer and the origins of prokaryotic groups. Annu. Rev. Genet. 2003;37:283–328. [PubMed]
R14. Brochier C., Forterre P., Gribaldo S. Archaeal phylogeny based on proteins of the transcription and translation machineries: tackling the Methanopyrus kandleri paradox. Genome Biol. 2004;5:R17. [PubMed]
R15. Bult C.J., White O., Olsen G.J., Zhou L., Fleischmann R.D., Sutton G.G., Blake J.A., Fitzgerald L.M., Clayton R.A., Gocayne J.D. Complete genome sequence of the methanogenic archaeon, Methanococcus janaschii . Science. 1996;273:1058–1073. [PubMed]
R16. Cohen G.N., Barbe V., Flament D., et al. An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abysii . Mol. Microbiol. 2003;47:1495–1512. [PubMed]
R17. Deppenmeier U., Johann A., Hartsch T., et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 2002;4:453–461. [PubMed]
R18. Falb M., Pfeiffer F., Palm P., Rodewald K., Hickmann V., Tittor J., Oesterhelt D. Living with two extremes: conclusions from the genome sequence of Natromonas pharaonis . Genome Res. 2005;15:1336–1343. [PubMed]
R19. Felsenstein J. Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol. 1996;266:418–427. [PubMed]
R20. Forterre P., Brochier C., Philippe H. Evolution of the Archaea. Theor. Popul. Biol. 2002;61:409–422. [PubMed]
R21. Galagan J.E., Nusbaum C., Roy A., et al. The genome of M. acetivorans reveals extensive metabolic and physiological diversity. Genome Res. 2002;12:532–542. [PubMed]
R22. Galperin M.Y. A census of membrane-bound and intracellular signal transduction proteins in bacteria: bacterial IQ, extroverts and introverts. BMC Microbiol. 2005;5:35. [PubMed]
R23. Galperin M.Y., Nikolskaya A.N., Koonin E.V. Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol. Lett. 2001;203:11–21. [PubMed]
R24. Gilles-Gonzalez M.-A., Gonzalez G. Signal transduction by heme-containg PAS-domain proteins. J. Appl. Physiol. 2004;96:774–783. [PubMed]
R25. Grebe T.W., Stock J.B. The histidine protein kinase superfamily. Adv. Microb. Physiol. 1999;41:139–227. [PubMed]
R26. Hellingwerf K.J. Bacterial observations: a rudimentary form of intelligence? Trends Microbiol. 2005;13:152–158. [PubMed]
R27. Ho Y.-S., Burden L., Hurley J.H. Structure of the GAF domain, a ubiquitous signaling motif and a new class of cyclic GMP receptor. EMBO J. 2000;19:5288–5299. [PubMed]
R28. Hoch J.A. Two-component and phosphorelay signal transduction. Curr. Opin. Microbiol. 2000;3:165–170. [PubMed]
R29. Jahreis K., Morrison T.B., Garzón A., Parkinson J.S. Chemotactic signaling by an Escherichia coli CheA mutant that lacks the binding domain for phosphoacceptor partners. J. Bacteriol. 2004;186:2662–2672.
R30. Karniol B., Vierstra R.D. The HWE histidine kinases, a new family of bacterial two-component sensor kinases with potentially diverse roles in environmental signaling. J. Bacteriol. 2004;186:445–453. [PubMed]
R31. Kawarabayasi Y., Sawada M., Horikawa H., et al. Complete sequence and gene organisation of the genome of a hyper-thermophilic archaebacterium, Pyrococcus horikoshii OT3. DNA Res. 1998;5:55–76. [PubMed]
R32. Klenk H-P., Clayton R.A., Tomb J.-F., et al. The complete sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeglobus fulgidus . Nature. 1997;390:364–370. [PubMed]
R33. Koonin E.V. Horizontal gene transfer: the path to maturity. Mol. Microbiol. 2003;50:725–727. [PubMed]
R34. Koretke K.K., Lupas A.N., Warren P.V., Rosenberg M., Brown J.R. Evolution of two-component signal transduction. Mol. Biol. Evol. 2000;17:1956–1970. [PubMed]
R35. Kumar S., Tamura K., Nei M. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief. Bioinform. 2004;5:150–163. [PubMed]
R36. Kurland C.G., Canback B., Berg O.G. Horizontal gene transfer: A critical view. Proc. Natl. Acad. Sci. USA. 2003;100:9658–9662. [PubMed]
R37. Lawrence J.G., Hendrickson H. Lateral gene transfer: when will adolescence end? Mol. Microbiol. 2003;50:739–749. [PubMed]
R38. Makarova K.S., Koonin E.V. Comparative genomics of archaea: how much have we learned in six years, and what’s next? Genome Biol. 2003;4:115. [PubMed]
R39. Makarova K.S., Koonin E.V. Evolutionary and functional genomics of the Archaea. Curr. Opin. Microbiol. 2005;8:586–594. [PubMed]
R40. Matte-Tailliez O., Brochier C., Forterre P., Philippe H. Archael phylogeny based on ribosomal proteins. Mol. Biol. Evol. 2002;19:631–639. [PubMed]
R41. Nelson K.E., Clayton R.A., Gill S.R., et al. Evidence for lateral gene transfer between archaea and bacteria from genome sequence of Thermatoga maritima . Nature. 1999;399:323–329. [PubMed]
R42. Ng W.V., Kennedy S.P., Mahairas G.G., et al. Genome sequence of Halobacterium species NRC-1. Proc. Natl. Acad. Sci. USA. 2000;97:12176–12181. [PubMed]
R43. Notredame C., Higgins D.G., Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 2000;302:205–217. [PubMed]
R44. Ochman H., Lawrence J.G., Groisman E.A. Lateral gene transfer and the nature of bacterial innovation. Nature. 2000;405:209–304.
R45. Ohmori M., Ikeuchi M., Sato N., et al. Characterization of genes encoding multi-domain proteins in the genome of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120. DNA Res. 2001;8:271–284. [PubMed]
R46. Ponting C.P., Aravind L. PAS: a multifunctional domain family comes to light. Curr. Biol. 1997;7:R674–R677. [PubMed]
R47. Sardiwal S., Kendall S.L., Movahedzadeh F., Rison S.C., Stoker N.G., Djordjevic S. A GAF domain in the hypoxia/ NO-inducible Mycobacterium tuberculosis DosS protein binds haem. J. Mol. Biol. 2005;353:929–936. [PubMed]
R48. Slesarev A.I., Mezhevaya K.V., Makarova K.S., et al. The complete genome of hyperthermophile Methanopyrs kanleri AV19 and monophyly of archaeal methanogens. Proc. Natl. Acad. Sci. USA. 2002;99:4644–4649. [PubMed]
R49. Smith D.R., Doucette-stamm L.A., Deloughery C., et al. Complete genome sequence of Methanobacterium thermoautotrophicum DH: Functional analysis and comparative genomics. J. Bacteriol. 1997;179:7135–7155. [PubMed]
R50. Stewart R.C., van Bruggen R. Association and dissociation kinetics for CheY interacting with the P2 domain of CheA. J. Mol. Biol. 2004;336:287–301. [PubMed]
R51. Stock A.M., Robinson V.L., Goudreau P.N. Two-component signal transduction. Annu. Rev. Biochem. 2000;69:183–215. [PubMed]
R52. Tam R., Saier M.H. Structural, functional, and evolutionary relationships among extracellular solute-binding receptors of bacteria. Microbiol. Rev. 1994;57:320–346. [PubMed]
R53. Thompson J.D., Higgins D.G., Gibson T.J. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680. [PubMed]
R54. Ulrich L.E., Koonin E.V., Zhulin I.B. One-component systems dominate signal transduction in prokaryotes. Trends Microbiol. 2005;13:52–56. [PubMed]
R55. West A.H., Martinez-hackert E., Stock A.M. Crystal structure of the catalytic domain of the chemotaxis receptor methylesterase, CheB. J. Mol. Biol. 1995;250:276–290. [PubMed]
R56. Zhu Y., Inouye M. The HAMP linker in histidine kinase dimeric receptors is critical for symmetric transmembrane signal transduction. J. Biol. Chem. 2004;279:48152–48158. [PubMed]
R57. Zhulin I.B., Nikolskaya A.N., Galperin M.Y. Common extracellular sensory domains in transmembrane receptors for diverse signal transduction pathways in bacteria and archaea. J. Bacteriol. 2003;185:285–294. [PubMed]

See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph
See more articles cited in this paragraph