N-terminal catalytic domain of GIY-YIG intron endonuclease I-TevI, I-BmoI, I-BanI, I-BthII and similar proteins
I-TevI is a site-specific GIY-YIG homing endonuclease encoded within the group I intron of the thymidylate synthase gene (td) from Escherichia coli phage T4. It functions as an endonuclease that catalyzes the first step in intron homing by generating a double-strand break in the intronless td allele within a sequence designated the homing site. I-TevI recognizes its extensive 37 base pair DNA target in a site-specific, but sequence-tolerant manner. The cleavage site is located at 23 (upper strand) and 25 (lower strand) nucleotides upstream of the intron insertion site. A divalent cation, such as Mg2+, is required for the catalysis. I-TevI also acts as a repressor of its own transcription. It binds an operator that is located upstream of the I-TevI coding sequence and overlaps the T4 late promoter, which drives I-TevI expression from within the td intron. I-TevI binds the homing sites and the operator with the same affinity, but cleaves the homing site more efficiently than the operator. I-TevI consists of an N-terminal catalytic domain, containing the GIY-YIG motif, and a C-terminal DNA-binding domain that binds DNA as a monomer, joined by a flexible linker. The C-terminal domain includes three subdomains: a zinc finger, a minor-groove binding alpha-helix (NUMOD3, nuclease-associated modular domain 3), and a helix-turn-helix domain (HTH). The last two are responsible for DNA-binding. The zinc finger is part of the linker and not required for DNA-binding. It is implicated as a distance sensor to constrain the catalytic domain to cleave the homing site at a fixed position. None of other GIY-YIG endonucleases have been found to have the zinc finger motif. This family also includes a reduced activity isoschizomer of I-TevI, I-BmoI, which is encoded within the group I intron of the thymidylate synthase (TS) gene (thyA) from Bacillus mojavensis. I-BmoI catalyzes the first step in intron homing by generating a double-strand break in the intronless td allele within a sequence designated the homing site in the presence of a divalent cation cofactor, such as Mg2+. In the absence of Mg2+, I-Bmol only nicks one of the strands. Both I-BmoI and I-TevI bind a homologous stretch of TS-encoding DNA as monomers, but use different strategies to distinguish intronless from intron-containing substrates. I-TevI recognizes substrates at the level of DNA-binding. However, I-BmoI binds both intron-containing and intronless TS-encoding substrates, but efficiently cleaves only intronless substrate. Afterwards they cleave their respective intronless substrates in the same positions, and both require a critical G-C base pair adjacent to the top strand site for efficient cleavage. The C-terminal domain of I-BmoI has nuclease-associated modular DNA-binding domains (NUMODs), but lacks the zinc finger, which is different from that of I-TevI. Although the zinc finger implicated as a distance determination in I-TevI is absent, I-BmoI still possesses some cleavage distance discrimination. Besides I-TevI and I-BmoI, this family contains a putative GIY-YIG homing endonuclease, I-BanI, encoded within the self-splicing group I intron of nrdE gene from Bacillus anthracis. It contains two major domains, the N-terminal GIY-YIG domain and the C-terminal DNA-binding domain that consists of a minor-groove DNA binding alpha-helix motif and a helix-turn-helix (HTH) motif. I-BanI generates a double-strand break (DSB) in the intronless nrdE gene. The cleavage site is located at 5 and 7 nucleotides upstream of the intron insertion site, with 2-nucleotide 3' extensions. The recognition site is 35 to 40 base pairs and covers the cleavage site with a bias toward the downstream region including the (intervening sequence) IVS insertion site. Moreover, this family contains another putative GIY-YIG homing endonuclease, I-BthII, encoded within the self-splicing group I intron of nrdF gene from Bacillus thuringiensis ssp. pakistani. It contains a GIY-YIG motif that generates a double-strand break (DSB) in the intronless nrdF gene. The cleavage site is located at 7 and 9 nucleotides upstream of the intron insertion site, leaving 2-nucleotide 3' extensions. The recognition site is 27 to 29 base pairs with the DSB cleavage site at the 5'-end of the top strand, and with the intervening sequence (IVS) insertion site approximately in the middle of the recognition site.