Large-Scale Molecular Evolutionary Analysis Uncovers a Variety of Polynucleotide Kinase Clp1 Family Proteins in the Three Domains of Life

Genome Biol Evol. 2019 Oct 1;11(10):2713-2726. doi: 10.1093/gbe/evz195.

Abstract

Clp1, a polyribonucleotide 5'-hydroxyl kinase in eukaryotes, is involved in pretRNA splicing and mRNA 3'-end formation. Enzymes similar in amino acid sequence to Clp1, Nol9, and Grc3, are present in some eukaryotes and are involved in prerRNA processing. However, our knowledge of how these Clp1 family proteins evolved and diversified is limited. We conducted a large-scale molecular evolutionary analysis of the Clp1 family proteins in all living organisms for which protein sequences are available in public databases. The phylogenetic distribution and frequencies of the Clp1 family proteins were investigated in complete genomes of Bacteria, Archaea and Eukarya. In total, 3,557 Clp1 family proteins were detected in the three domains of life, Bacteria, Archaea, and Eukarya. Many were from Archaea and Eukarya, but a few were found in restricted, phylogenetically diverse bacterial species. The domain structures of the Clp1 family proteins also differed among the three domains of life. Although the proteins were, on average, 555 amino acids long (range, 196-2,728), 122 large proteins with >1,000 amino acids were detected in eukaryotes. These novel proteins contain the conserved Clp1 polynucleotide kinase domain and various other functional domains. Of these proteins, >80% were from Fungi or Protostomia. The polyribonucleotide kinase activity of Thermus scotoductus Clp1 (Ts-Clp1) was characterized experimentally. Ts-Clp1 preferentially phosphorylates single-stranded RNA oligonucleotides (Km value for ATP, 2.5 µM), or single-stranded DNA at higher enzyme concentrations. We propose a comprehensive assessment of the diversification of the Clp1 family proteins and the molecular evolution of their functional domains.

Keywords: comprehensive identification; experimental verification; large protein; molecular evolution; multidomain protein; protein family.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Animals
  • Archaeal Proteins / chemistry
  • Archaeal Proteins / genetics
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / genetics
  • Bacterial Proteins / metabolism
  • Eukaryota / enzymology
  • Eukaryota / genetics
  • Evolution, Molecular*
  • Humans
  • Multigene Family
  • Polynucleotide 5'-Hydroxyl-Kinase / chemistry*
  • Polynucleotide 5'-Hydroxyl-Kinase / genetics*
  • Polynucleotide 5'-Hydroxyl-Kinase / metabolism
  • Protein Domains

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Polynucleotide 5'-Hydroxyl-Kinase