NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Madame Curie Bioscience Database [Internet]. Austin (TX): Landes Bioscience; 2000-.

Cover of Madame Curie Bioscience Database

Madame Curie Bioscience Database [Internet].

Show details

The “PACE” Concept Pointed at New Key Proteins Involved in RNA Metabolism

*.

* Corresponding Author: Jean Armengaud—CEA, DSV, IBEB, Lab Biochim System Perturb, Bagnols-sur-Cèze, F-30207, France. Email: rf.aec@duagnemra.naej

DNA and RNA Modification Enzymes: Structure, Mechanism, Function and Evolution, edited by Henri Grosjean.
©2009 Landes Bioscience.
Read this chapter in the Madame Curie Bioscience Database here.

Several hundreds of genomes from the three Domains of life have been sequenced so far leading to the identification of an impressive list of new genes. When structural genomics programs were launched at the beginning of this decade, several research groups proposed to study specific targets selected with innovative criteria.1,2 Genomic studies conducted by Olsen and Woese3 reveal a profound evolutionary distinction between the informational and the metabolic facets of the cell: most informational aspects of Archaea are closely related to those seen in Eukarya, whereas most metabolic aspects of Archaea resemble those seen in Bacteria. The “PACE” concept proposed by Forterre and his coworkers2 came from this observation. They proposed that hypothetical conserved proteins in Archaea and Eukarya absent in Bacteria should be information processing proteins, i.e., related to protein synthesis, DNA or RNA metabolisms. PACE stands for Protein from Archaea without assigned function Conserved in Eukarya. They listed 32 PACE targets in the year 2000. At this stage, almost nothing could be inferred by data mining for most of them. I propose a survey on the knowledge collected by the scientific community over the last eight years on the whole set of PACE proteins. The main conclusion of this analysis is somewhat surprising: most of these proteins are not related to DNA metabolism, but rather to RNA metabolism and more specifically to RNA maturation. Possible explanations for such a strong link between the “PACE” concept and RNA maturation are discussed.

Introduction

Since the first whole cellular genome ever sequenced, that of Haemophilus influenzae in 1995, more than 820 complete genomes have been sequenced: 708 bacteria, 53 archaea and 56 eukaryotes. This resulted in an impressive catalog of new coding sequences: 6.2 million proteins are referenced in databases to date, with as many as one fourth still without assigned function. The sequencing of about a thousand more genomes is currently under progress. With these data in hand, phylogenetic relationships between proteins were systematically analyzed, leading to their classification into families such as the Clusters of Orthologous Groups of proteins (COG) database.4

Several research groups proposed to prioritize targets for experimental study taking into account that some proteins although universally present were not characterized.1,2,5 The main criteria for target prioritization were: (i) phyletic distribution (ubiquitous or nearly ubiquitous proteins are attractive, but not only …), (ii) essentiality or phenotype of the corresponding knock-out mutant (but this parameter is complex to assess), (iii) clues from polypeptide sequence motif, genome context analysis, protein structure, expression data, partners, cellular localization, if available. Genomic studies conducted by Olsen and Woese3 revealed a profound evolutionary distinction between the informational and the metabolic facets of the cell: most informational aspects of Archaea are closely related to those seen in Eukarya, whereas most metabolic aspects of Archaea resemble those seen in Bacteria (Fig. 1). The “PACE” concept proposed by Forterre and his coworkers2 came from this observation. They proposed that hypothetical conserved proteins in Archaea and Eukarya absent in Bacteria should be information processing proteins, i.e., related to protein synthesis, DNA or RNA metabolisms (Fig. 1). In 2000, they listed 32 PACE targets and highlighted their biomedical interest (see: http://www-archbac.u-psud.fr/projects/pace/). At this stage, almost nothing could be inferred by data mining for most of these protein families.2 Over these last eight years, two-thirds of these targets were characterized both at the structural and functional levels (Table 1). The orthologs from archaea and yeast were used in most experiments because of a lower level of complexity or better stability.

Figure 1. Phyletic patterns of conserved proteins from the three cellular Domains of life.

Figure 1

Phyletic patterns of conserved proteins from the three cellular Domains of life. Bacteria, Archaea and Eukarya are shown by large circles in a Venn diagram format. Homologous protein families are shown with either cross (unknown function) or circle (characterized (more...)

Table 1. PACEs current knowledge.

Table 1

PACEs current knowledge.

Central Metabolism

Although PACE proteins were proposed to be information processing proteins, two have been shown to act in the central energetic metabolism. The PACE11 (COG1019) homologous proteins were found to be involved in Coenzyme A synthesis in Archaea.6 These proteins catalyze the synthesis of dephospho-Coenzyme A from 4'-phosphopantetheine and ATP. Although only distantly related, structural modeling revealed that archaeal and eukaryal homologs belong to the same nucleotidyltransferase superfamily as the bacterial homologs (COG0669).6 In eukaryotes, the two last steps of Coenzyme A biosynthesis are carried out by a bifunctional enzyme resulting from the fusion of the two archaeal enzymes. PACE20 (COG4809) is an enzyme named ADP-specific phosphofructokinase which is involved in the glycolytic pathway that is deviant from the classical Embden-Meyerhof pathway. This enzymatic activity was first discovered in the hyperthermophilic archaeon Pyrococcus furiosus in 1994. The sequence information of this tetrameric enzyme was described five years later.7 In eukaryotes, homologs are ADP-dependent glucokinase. In humans, this enzyme may play a role in glycolysis possibly during ischemic conditions.

Protein Synthesis, Folding and Posttranslational Modifications

PACE17 (COG1779) are C4-type zinc-finger proteins, called ZPR1 in eukaryotes. In response to growth stimuli, eukaryotic ZPR1 assembles into complexes with eukaryotic translation Elongation Factor 1A (eEF1A) and SMN protein. The role of the later protein, responsible of the Werdnig-Hoffman syndrome (muscle atrophy) in humans, is not known. Although the structure of ZPR1 was determined, its function related to translation still remains uncharacterized.8 PACE13 (COG1730) proteins are one of the subunits of the prefoldin molecular chaperone. This chaperone comprises six different subunits in eukaryotes but only two in archaea. The heterohexameric prefoldin captures a nonnative protein and subsequently delivers it to a group II chaperonin for proper folding.9 The structure of archaeal prefoldin is a double beta-barrel assembly, with six long coiled coils protruding from it like a jellyfish with six tentacles. PACE09 (COG1736) is one of the five enzymes necessary for diphthamide biosynthesis. Diphthamide is a unique posttranslational modification of histidine known to occur only at a specific residue in translation Elongation Factor 2. The yeast PACE09 enzyme, named DPH1/OVCA1, was shown to transfer the 3-amino-3-carboxypropyl group of S-adenosylmethionine (SAM) to the imidazole C-2 of the precursor histidine residue.10 This protein is known to be a tumor suppressor. The physiological role of diphthamide in cell physiology is still an open question.

Maintenance of Genomic Stability

PACE16 comprises the Dna2 helicases (COG1112) as well as a specific bifunctional helicase (COG2251/COG1112) in Pyrococci. Dna2 plays an important role in Okazaki fragment maturation on the lagging strand. During replication, Dna2 assists the flap endonuclease Fen1 in RNA/DNA primer removal. It plays also a role in telomere maintenance in eukaryotes.11

mRNA Synthesis and Maturation

PACE14 (COG1243) are multidomain proteins with histone acetyltransferase and radical-SAM domains. They are called Elp3 in eukaryotes and were shown to be a subunit of the Elongator complex involved in transcriptional activation and transcript elongation.12 PACE01 (COG1341) is a conserved family of P-loop NTPases named Clp1/Grc3. The human Clp1 is a component of the cleavage factor IIm involved in the cleavage of pre-mRNA in the cleavage/polyadenylation reaction.13 This protein was also found associated with the human tRNA splicing endonuclease (see chapter by Mitchell and Li in this volume). It is the first ever known RNA kinase that phosphorylates the 5' end of the 3' exon during tRNA splicing.14 Two homologs, Clp1 and Grc3, exist in Saccharomyces cerevisiae, but the exact role of the latter is unknown.

rRNA Maturation

Proteins from the PACE04 group can be distinguished into two paralogous subgroups: COG1718 and COG0478. What differentiates them is an additional putative RNA-binding domain (HTH-domain) in the latter. They are Ser/Thr protein kinases called RIO-kinases.15 Their function is related to cell cycle control and rRNA maturation. Interestingly, genes specifying COG1718 orthologs are located in all archaeal genomes just contiguous to genes encoding COG1094 KH-domain proteins which are also conserved in Eukarya and Archaea (PACE05 group). In yeast, this nucleolar protein is named Pno1/Dim2 and is required for pre-18S rRNA processing. It interacts with the 18S rRNA dimethyltransferase Dim1 and also with Nob1, which is involved in proteasome biogenesis. PACE06 (COG1500) is a 30 kDa protein known as the Shwachman-Bodian-Diamond syndrome protein in humans. The yeast homolog is critical for the release and recycling of the nucleolar shuttling factor Tif6 from pre-60S ribosomes, a key step in 60S maturation and translational activation of ribosomes.16 PACE19 (COG1756) are SPOUT-class methyltransferases (see chapter by Czerwoniec, Kasprzak, Kaminska et al in this volume) involved in ribosome biogenesis. Yeast Emg1/Nep1 binds to a 6-nt RNA-binding motif found in 18S rRNA and facilitates the incorporation of ribosomal protein Rps19 during the formation of pre-ribosomes.17 PSI-BLAST analysis of PACE28 (COG2042) members does not indicate possible link with even remotely-related proteins that have been functionally or structurally characterized and are real orphans.18 In yeast, null mutant of the Yor006/TSR3 gene accumulates 20S pre-rRNA. Thus, its function is linked to rRNA processing.

tRNA Maturation

PACE08 (COG1369) polypeptides are a subunit of Ribonuclease P, a complex required for tRNA maturation that comprises four subunits in archaea and nine in eukaryotes (see chapter by Mitchell and Li in this volume). PACE18 (COG1041) proteins were shown to (di)methylate guanosine at position 10 of tRNAs.19 They were called Trm-G10 (archaea)/Trm11 (eukaryotes). Paralogs were identified in Pyrococci but their functions have not yet been investigated. Three PACE groups are involved in the complex modification of phenylalanine tRNA at position 37, namely wyosine-derivatives biosynthesis (see chapter by Urbonavicius, Armengaud, Droogmans et al in this volume). The yeast proteins involved in this pathway, TYW1 (PACE22, COG0731), TYW2 (PACE24, COG1590) and TYW3 (PACE25, COG2520) were recently characterized. The latter PACE group comprises also the Trm5 methyltransferase responsible for m1G methylation at position 37 of tRNAs in Archaea and Eukarya.

RNA Recycling and Degradation

PACE07 (COG1990) proteins are peptidyl-tRNA hydrolases involved in the recycling of peptidyl-tRNA that may prematurely dissociate from the mRNA template during the elongation step of translation.20 These proteins release tRNA from peptidyl-tRNA by cleaving the ester bond between the peptide and the tRNA. In many eukaryotes, archaeal (COG1990) and bacterial (COG0193) types of peptidyl-tRNA hydrolase co-exist. PACE26 (COG1096) is a subunit of the RNA exosome that comprises 9 to 11 subunits in eukaryotes but only 4 in archaea.21 It is a 3' to 5' exoribonuclease complex that participates in degradation and processing of cellular RNA (mRNA decay, processing of sn(o)RNAs and rRNAs). PACE27 (COG1650) homologs are D-aminoacyl-tRNA deacylases specifically found in archaea and plants, which do not present any sequence homology with known bacterial deacylases.22 By recycling D-aminoacyl-tRNAs into free tRNA molecules, these deacylases counter-react against the toxicity associated with the formation of D-aminoacyl-tRNAs in vivo due to D-amino acids.

Eleven PACEs Are Still Poorly Characterized

Year after year, the list of still unknown proteins shrinks a little more. The function of eleven groups of PACE proteins (Table 1) remains to be deciphered. PACE12 proteins were shown to be atypical GTPases with an unusual large dimerization interface and the presence of a peculiar Gly-Pro-Asn (GPN) tripeptide-loop inserted onto the GTPase core-fold.23 Based on these structural peculiarities, this whole family was named “GPN-loop GTPases”. Their function remains unknown although their genomic context in archaea links these proteins with DNA replication.24 Although of unknown function, PACE23 (COG2106) proteins were shown to be encoded in a ribosomal protein operon in archaea and to have significant structural similarity with CspA, which is an RNA chaperone that binds RNA to prevent hairpin formation for transcription antitermination.25 PACE21 (COG0037) proteins are not senso stricto PACE proteins because homologs are found in bacteria. Their functions are not known but they share sequence similarities with TilS/MesJ, the tRNA(Ile)-lysidine synthase. Lysidine, a lysine-combined modified cytidine, is exclusively located at the anticodon wobble position (position 34) of eubacterial tRNA(Ile) (see chapter by Suzuki et al in this volume). PACE32 (COG2016) proteins have a PUA domain (see chapter by Czerwoniec, Kasprzak, Kaminska et al in this volume) similar to pseudouridine synthase and archaeosine transglycosylase. For this reason, they were predicted to be RNA interacting proteins. Of unknown function until now, the yeast homolog was shown to be associated with ribosomes. The human ortholog MCT-1 is oncogenic.2

Conclusive Remarks and Future Prospects

This survey shows that the functions of two-thirds of the 32 PACEs have been characterized. Structural genomics achieved a similar coverage with a tridimensionnal structure available for 22 (Table 1). Finding the function of the last targets might still be a long shot because no clues are available yet. At least eighteen PACEs are related to RNA metabolism, eleven being more specifically related to RNA maturation. Only one was found directly related to DNA metabolism. Thus, the link between the “PACE” concept and RNA maturation is quite strong. This leads back to the question about the evolution of cellular life (see chapters by Forterre and Grosjean in this volume and chapter by Jetsch and Jurkowski in this volume). Of the three main information processing systems, genome replication, transcription and translation, the first is the only one whose central components are not universally conserved. As presently considered by molecular evolutionists, this is due to the transition from RNA to DNA-based genomes. This transition, probably originating from ancient viruses, could not have been straightforward and unique, generating diversity amongst genome replication systems and DNA-related proteins.27 For this reason, the “PACE” concept, i.e., proteins conserved in Archaea and Eukarya but absent in Bacteria, was expected in the year 2000 to mainly point to components involved in genome replication or DNA repair. In this survey, we observed that Archaea and Eukarya share more specific RNA-related mechanisms than previously thought.2

A first explanation for such a strong link between the “PACE” concept and RNA maturation is that the transition from RNA to DNA-based genomes let quite a large place for concomitant adaptations regarding RNA metabolism. PACEs are probable relics of the ancient RNA world that have been no longer useful for Bacteria (at least during their emergence) rather than real new inventions of the archaeal/eukaryotic branch. This reductive evolution could have been stronger for RNA maturation-related mechanisms because most RNA maturations are devoted to fine tuning of the translation process. Why the functions of these conserved RNA-related proteins have been described only recently? In my opinion, over the last three decades most biochemists and geneticists have been so focused and successful on DNA metabolism, the gold mine, that only a few conserved DNA-related mechanisms remain to be discovered. Indeed, only a few novel DNA-related complexes were identified in the archaeal/eukaryotic branch over the last eight years: the accessory DNA replication GINS complex is one example. Moreover, the functions of some RNA-related proteins have been elusive for many years due to (i) the difficulty to identify their targets (RNA molecules are diverse) and (ii) the existence of multiple homologs that require much more time to study and compare.

In conclusion, more elusive mechanisms related to RNA metabolism probably exist and deserve further attention. The eleven PACEs that remain still poorly characterized are good tracks to follow. The best way to determine their function is probably to join the efforts of geneticians, biochemists, phylogenomists and structuralists on each of these proteins. In addition, new PACEs probably still remain to be identified. Analyzing the recent data from massive sequencing efforts of new complete genomes or metagenomes will probably give new interesting targets. Moreover, new strategies to better annotate genomes are currently proposed (Armengaud J, manuscript in preparation for Current Opinion in Microbiology). Proteogenomics consists in using protein sequence information obtained by massively paralleled shotgun proteomics to annotate or re-annotate sequenced genomes.28-29 A novel generation of accurate mass spectrometers allows to identify thousands of peptides corresponding to hundreds of proteins. The precise sequencing of N-terminal peptides of proteins at a genomic scale is precious to accurately delineate the corresponding initiation codons.30 My own research team applied such approach to improve the annotation of the Deinococcus deserti bacterium (de Groot A et al., submitted to publication). In this study, we identify 11,129 unique peptides corresponding to 1348 proteins. Interestingly, a large number of hypothetical conserved proteins and orphans were uncovered, and novel genes were detected. With such data in hands for various proteomes, we will soon work at establishing a novel catalog of PACEs.

Acknowledgements

I thank Henri Grosjean for his unsurpassed enthusiasm regarding the multifaceted RNA world, Patrick Forterre for his pioneering work on Pyrococcus abyssi and his kind invitation on the PACE ferry, as well as the Commissariat à l'Energie Atomique and Agence Nationale de la Recherche (JCJC06_152439 ANR project) supports.

References

1.
Galperin MY, Koonin EV. ‘Conserved hypothetical’ proteins: prioritization of targets for experimental study. Nucleic Acids Res. 2004;32:5452–63. [PMC free article: PMC524295] [PubMed: 15479782]
2.
Matte-Tailliez O, Zivanovic Y, Forterre P. Mining archaeal proteomes for eukaryotic proteins with novel functions: the PACE case. Trends Genet. 2000;16:533–6. [PubMed: 11102699]
3.
Olsen GJ, Woese CR. Archaeal genomics: an overview. Cell. 1997;89:991–4. [PubMed: 9215619]
4.
Tatusov RL, Natale DA, Garkavysev IV. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 2001;29:22–8. [PMC free article: PMC29819] [PubMed: 11125040]
5.
Brenner SE. Target selection for structural genomics. Nat Struct Biol. 2000;7:967–9. [PubMed: 11104002]
6.
Armengaud J, Fernandez B, Chaumont V. et al. Identification, purification and characterization of an eukaryotic-like phosphopantetheine adenylyltransferase (coenzyme a biosynthetic pathway) in the hyperthermophilic archaeon pyrococcus abyssi. J Biol Chem. 2003;278:31078–87. [PubMed: 12756245]
7.
Tuininga JE, Verhees CH, van der Oost J. et al. Molecular and biochemical characterization of the ADP-dependent phosphofructokinase from the hyperthermophilic archaeon pyrococcus furiosus. J Biol Chem. 1999;274:21023–8. [PubMed: 10409652]
8.
Mishra AK, Gangwani L, Davis RJ. et al. Structural insights into the interaction of the evolutionarily conserved ZPR1 domain tandem with eukaryotic EF1A, receptors and SMN complexes. Proc Natl Acad Sci USA. 2007;104:13930–5. [PMC free article: PMC1955815] [PubMed: 17704259]
9.
Iizuka R, Sugano Y, Ide N. et al. Functional characterization of recombinant prefoldin complexes from a hyperthermophilic archaeon, thermococcus sp. strain KS-1. J Mol Biol. 2008;377:972–83. [PubMed: 18295793]
10.
Liu S, Milne GT, Kuremsky JG. et al. Identification of the proteins required for biosynthesis of diphthamide, the target of bacterial ADP-ribosylating toxins on translation elongation factor 2. Mol Cell Biol. 2004;24:9487–97. [PMC free article: PMC522255] [PubMed: 15485916]
11.
Masuda-Sasa T, Polaczek P, Peng XP. et al. Processing of G4 DNA by dna2 helicase/nuclease and RPA provides insights into the mechanism of dna2/RPA substrate recognition. J Biol Chem. 2008;283:24359–73. [PMC free article: PMC2528986] [PubMed: 18593712]
12.
Winkler GS, Kristjuhan A, Erdjument-Bromage H. et al. Elongator is a histone H3 and H4 acetyltransferase important for normal histone acetylation levels in vivo. Proc Natl Acad Sci USA. 2002;99:3517–22. [PMC free article: PMC122555] [PubMed: 11904415]
13.
Paushkin SV, Patel M, Furia BS. et al. Identification of a human endonuclease complex reveals a link between tRNA splicing and pre-mRNA 3' end formation. Cell. 2004;117:311–21. [PubMed: 15109492]
14.
Weitzer S, Martinez J. The human RNA kinase hClp1 is active on 3' transfer RNA exons and short interfering RNAs. Nature. 2007;447:222–6. [PubMed: 17495927]
15.
LaRonde-LeBlanc N, Wlodawer A. A family portrait of the RIO kinases. J Biol Chem. 2005;280:37297–300. [PubMed: 16183636]
16.
Menne TF, Goyenechea B, Sanchez-Puig N. et al. The shwachman-bodian-diamond syndrome protein mediates translational activation of ribosomes in yeast. Nat Genet. 2007;39:486–95. [PubMed: 17353896]
17.
Leulliot N, Bohnsack MT, Graille M. et al. The yeast ribosome synthesis factor emg1 is a novel member of the superfamily of alpha/beta knot fold methyltransferases. Nucleic Acids Res. 2008;36:629–39. [PMC free article: PMC2241868] [PubMed: 18063569]
18.
Armengaud J, Dedieu A, Solques O. et al. Deciphering structure and topology of conserved COG2042 orphan proteins. BMC Struct Biol. 2005;5:3. [PMC free article: PMC549553] [PubMed: 15701177]
19.
Armengaud J, Urbonavicius J, Fernandez B. et al. N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in archaea and eukaryota. J Biol Chem. 2004;279:37142–52. [PubMed: 15210688]
20.
Fromant M, Ferri-Fioni ML, Plateau P. et al. Peptidyl-tRNA hydrolase from sulfolobus solfataricus. Nucleic Acids Res. 2003;31:3227–35. [PMC free article: PMC162332] [PubMed: 12799450]
21.
Liu Q, Greimann JC, Lima CD. Reconstitution, activities and structure of the eukaryotic RNA exosome. Cell. 2006;127:1223–37. [PubMed: 17174896]
22.
Ferri-Fioni ML, Fromant M, Bouin AP. et al. Identification in archaea of a novel D-Tyr-tRNATyr deacylase. J Biol Chem. 2006;281:27575–85. [PubMed: 16844682]
23.
Gras S, Chaumont V, Fernandez B. et al. Structural insights into a new homodimeric self-activated GTPase family. EMBO Rep. 2007;8:569–75. [PMC free article: PMC2002535] [PubMed: 17468740]
24.
Berthon J, Cortez D, Forterre P. Genomic context analysis in archaea suggests previously unrecognized links between DNA replication and translation. Genome Biol. 2008;9:R71. [PMC free article: PMC2643942] [PubMed: 18400081]
25.
Zarembinski TI, Kim Y, Peterson K. et al. Deep trefoil knot implicated in RNA binding found in an archaebacterial protein. Proteins. 2003;50:177–83. [PMC free article: PMC2792022] [PubMed: 12486711]
26.
Glansdorff N, Xu Y, Labedan B. The last universal common ancestor: emergence, constitution and genetic legacy of an elusive forerunner. Biol Direct. 2008;3:29. [PMC free article: PMC2478661] [PubMed: 18613974]
27.
Forterre P. The origin of viruses and their possible roles in major evolutionary transitions. Virus Res. 2006;117:5–16. [PubMed: 16476498]
28.
Jaffe JD, Stange-Thomann N, Smith C. et al. The complete genome and proteome of Mycoplasma mobile. Genome Res. 2004;14:1447–61. [PMC free article: PMC509254] [PubMed: 15289470]
29.
Gupta N, Benhamida J, Bhargava V. et al. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res. 2008;18:1133–42. [PMC free article: PMC2493402] [PubMed: 18426904]
30.
Aivaliotis M, Gevaert K, Falb M. et al. Large-scale identification of N-terminal peptides in the halophilic archaea Halobacterium salinarum and Natronomonas pharaonis. J Proteome Res. 2007;6:2195–204. [PubMed: 17444671]
Copyright © 2000-2013, Landes Bioscience.
Bookshelf ID: NBK6599
PubReader format: click here to try

Views

  • PubReader
  • Print View
  • Cite this Page

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to pubmed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...