Toxic anti-phage defense proteins inhibited by intragenic antitoxin proteins

Recombination-promoting nuclease (Rpn) proteins are broadly distributed across bacterial phyla, yet their functions remain unclear. Here we report these proteins are new toxin-antitoxin systems, comprised of genes-within-genes, that combat phage infection. We show the small, highly variable Rpn C-terminal domains (RpnS), which are translated separately from the full-length proteins (RpnL), directly block the activities of the toxic full-length proteins. The crystal structure of RpnAS revealed a dimerization interface encompassing a helix that can have four amino acid repeats whose number varies widely among strains of the same species. Consistent with strong selection for the variation, we document plasmid-encoded RpnP2L protects Escherichia coli against certain phages. We propose many more intragenic-encoded proteins that serve regulatory roles remain to be discovered in all organisms.

with rhamnose induction in ER2170 background, overnight cultures were diluted to OD600 = 0.05. At 60 min, rhamnose was added to a final concentration of 0.2%. After growing for another 4 h at 30˚C, the cultures were diluted to an OD600 of 0.05 in LB with 100 μg/mL amp and 0.2% rhamnose and samples were collected around 6 h.
rpn gene (protein id: WP_009902002.1) was amplified from C. difficile strain 630 genomic DNA and cloned into pTXB1 vector. Two different mutants to disrupt the predicted RpnS RBS and iTIS, RpnL*-M1 and RpnL*-M2, were generated with NEBuilder ® HiFi DNA Assembly Master Mix using oligonucleotides listed in Dataset S2. The resulting plasmids were transformed into E. coli T7 Express lysY/I q (New England Biolabs, C3013I). Six colonies were inoculated into 5 mL of LB with 100 μg/mL Amp. The culture was incubated at 30˚C for 4.5 h and was further diluted 12-fold in 30 mL of LB with 100 μg/mL Amp. Upon reaching OD600 ~ 0.6, the culture was cooled at 4˚C for 1 h. Isopropyl β-D-thiogalactoside (IPTG) was added to the culture to a final concentration of 0.4 mM. The culture then was incubated at 15˚C with shaking at 100 rpm for another 20 h. Cells were collected by centrifugation at 4,000 x g for 10 min at 4˚C. After discarding the supernatant, the cells were resuspended in 2 mL buffer containing 20 mM Tris-HCl, 500 mM NaCl, pH 8.0. The cells were lysed with sonication (Branson 450 Digital Sonifier). The resulting mixture was centrifuged at 11,000 x g for 10 min at 4˚C. The samples were diluted 20-fold before loading. The diluted samples were analyzed by SDS-PAGE (4-20%). TCEP (45 mM) instead of β-mercaptoethanol was added to the sample buffer as the reducing agent to minimize the cleavage of the intein tag during sample preparation. The gel was transferred to nitrocellulose membranes and probed with anti-CBD monoclonal antibody (New England Biolabs, E8034S) as the primary antibody (10 μL of antibody in 10 mL 5% of non-fat milk at 4˚C overnight) and goat anti-mouse IgG HRP-conjugated antibody (Pierce, 1858413) as the secondary antibody (3 μL of antibody in 15 mL of 5% non-fat milk at 4˚C overnight, according to the manufacturer's protocol).

RpnCS+REPEAT.
For single-plasmid experiments with RpnA and RpnB, a single colony was used to incubate a fresh RB medium (10 g/L tryptone, 5 g/L yeast extract, 5 g/L NaCl, pH 7.2) containing 100 μg/mL Amp. After overnight incubation at 30˚C, the culture was diluted into 10 mL of fresh RB with 100 μg/mL Amp to an OD600 of 0.05, and the subculture was grown at 30˚C. At 1.0 h, rhamnose was added to a final concentration of 0.2% (to induce RpnL expression). For each sample, 100 μL of culture was taken at 7.5 h for the β-galactosidase activity assay. OD600 was recorded at the same time.
For two-plasmid experiments with RpnA and RpnB, a single colony was used to incubate a fresh RB medium containing 100 μg/mL Amp and 25 μg/mL Cm. After overnight incubation at 30˚C, the culture was diluted into 10 mL of fresh RB with 100 μg/mL Amp and 25 μg/mL Cm to an OD600 of 0.05, and the subculture was grown at 30˚C. At 1.0 h, arabinose (Sigma-Aldrich, W325501-100G) was added to a final concentration of 0.2% (to induce RpnS expression). At 2.0 h, rhamnose was added to a final concentration of 0.2% (to induce RpnL expression). Aliquots (100 μL) were taken at 9.0 h for the RpnA samples and 7.5 h for the RpnB samples based on growth curves in (3). OD600 was recorded at the same time.
For two-plasmid experiments with RpnC, after overnight incubation in LB at 30˚C, the culture was diluted into 10 mL of fresh RB with 100 μg/mL Amp and 25 μg/mL Cm to an OD600 of 0.05, and the subculture was grown at 23˚C. At 2.0 h, arabinose was added to a final concentration of 0.2% (to induce RpnCS expression) and rhamnose was added to a final concentration of 0.2% (to induce RpnCL expression). Aliquots (100 μL) were taken at 16 h.
OD600 was recorded at the same time. Three or four independent biological replicates were performed for all the assays.

Rpn Protein Characterization
Protein Overexpression and Purification. Protein purifications were achieved with a handpacked gravity column or a prepacked column on a AKTA Purifier 10 FPLC system (Cytiva). pTXB1 derivatives carrying rpnALS and rpnAS were transformed into T7 Express competent E. coli (New England Biolabs, C2566H). pTXB1 derivatives carrying rpnAL* was transformed into E. coli T7 Express lysY/I q (with rpnA gene deleted). Six colonies were inoculated into 10 mL of LB with 100 μg/mL Amp. The overnight culture was diluted 10-fold to a fresh LB culture with 100 μg/mL Amp. After growing for 1 h, the culture was further diluted 25-fold in 1.0 L of LB with 100 μg/mL Amp. Upon reaching OD600 ~ 0.6, the culture was cooled at 4˚C for 2 h. IPTG was added to the culture to a final concentration of 0.4 mM. The culture then was incubated at 15˚C with shaking at 100 rpm for another 20 h. Cells were collected by centrifugation at 5,000 x g for 15 min at 4˚C. After discarding the supernatant, the cells were resuspended in buffer A (20 mM Tris-HCl, 500 mM NaCl, 0.5 mM TCEP, pH 8.0) with 50 mL buffer per 10 g cells. The cells were lysed with sonication. The resulting mixture was centrifuged at 15,000 x g for 30 min at 4˚C. A small fraction of the supernatant was subject to immunoblot analysis. The samples corresponding to RpnALS and RpnAL* were diluted 10-fold before loading. RpnAS sample was diluted 40-fold. The diluted samples were analyzed by SDS-PAGE (4-20%). The gel was transferred to nitrocellulose membranes and blotted against anti-CBD monoclonal antibody as the primary antibody (10 μL of antibody in 10 mL 5% non-fat milk at 4˚C overnight) and goat anti-mouse IgG HRP-conjugated antibody (3 μL of antibody in 15 mL 5% non-fat milk at 4˚C overnight) as the secondary antibody according to the manufacturer's protocol. The rest of the supernatant was applied to a gravity column with 10 mL of pre-washed chitin resin (New England Biolabs, S6651L). After washing with 500 mL of buffer A, 60 mL of cleavage buffer (buffer A with 50 mM β-mercaptoethanol) was applied to the column rapidly. The column was maintained at 4˚C for 40 h to maximize the cleavage of intein tag. The desired protein was eluted with buffer A and concentrated with Amicon® Ultra-15 centrifugal filter (Sigma-Aldrich).
RpnALS and RpnAL* were further purified by anion-exchange chromatography. RpnALS and RpnAL* were buffer-exchanged into buffer B (20 mM Tris-HCl, 100 mM NaCl, 0.5 mM TCEP, pH 8.0) and loaded onto a HiTrap Q HP 5-mL column (Cytiva). The protein was eluted with a 150 mL-linear gradient (from 100 mM to 600 mM NaCl). The desired fractions were pooled and concentrated. RpnAS was further purified by size-exclusion column. Concentrated The structure was solved with experimental phasing using a Se-Met substituted protein.
A highly redundant (7.4-fold, Friedel's law false) data set was collected on a single crystal in two 200º rotation sweeps at different goniometer arc settings. There was no appreciable radiation decay. Data were collected using 1.5418 Å radiation from a Rigaku FR-X X-ray source and an Eiger2 4M pixel array detector at 95K. Data were integrated and scaled using XDS and XSCALE (8). Single wavelength Anomalous Scattering (SAS or SAD) phasing and density modification were carried out using Phenix's (9) AutoSol procedure that resulted in a clearly interpretable electron density map. Phenix's autobuilt model was completed manually using O (10), and was refined using energy minimization in Phenix. The final model contains residues between M244 and A289; the three C-terminal residues are disordered. 100% of the residues are in the most favored region of the Ramachandran plot. Interfaces were analyzed using 'Protein interfaces, surfaces and assemblies' service PISA (11) at the European Bioinformatics Institute (http://www.ebi.ac.uk/pdbe/prot_int/pistart.html). Information about the structure is given in SI Appendix, Table S1. Structure Prediction. All Rpn proteins structures were predicted with AlphaFold 2.2.0 (12,13) using NIH's Biowulf cluster. The predictions with the highest confidence were shown. pLDDT scores were shown in different colors to represent the model confidence.
Clusters of Orthologous Group (COG) annotation for genes was determined using eggNOGmapper (version 2.1.6) with eggNOG orthology data (version 5.0.2) (18). Genes annotated with COG ID "COG5464" were identified as rpn orthologs. The identified Rpn genes, along with their sequences, taxonomic distribution, domain annotation and Neighboring genes, are available in Dataset S1 and the Github link(https://github.com/nlm-irp-jianglab/rpn_data). The taxonomic distribution of rpn orthologs was plotted with iTOL (15). The results were plotted at the taxonomic level of order and above. If all children of a clade have the same rpn presence/absence status, only the upper level is displayed.
Phylogenetic Analysis. To construct the phylogenetic tree of ECOR genomes (19), we utilized the Roary (20) software (version 3.13.0) with parameters set at "-i 90 -cd 90 -s -e -mafft" to extract the core gene alignments. These alignments were then used to build the tree using FastTree (version 2.1.10) with parameters set at "-nt -gtr". For Rpn protein sequences, alignments were performed with Clustal-Omega (version 1.2.4) with the default parameters. The phylogenetic tree was built from the alignments using FastTree (version 2.1.10 ) with default parameters (21). iTOL (15) was used to visualize the phylogenetic tree. The ECOR Rpn alignments and their phylogenetic trees were provided in a Github repository (https://github.com/nlm-irp-jianglab/rpn_data).

Gene Context Analysis.
We conducted genomic context analysis using a method similar to the one described previously (22). We randomly selected an equal number of random and rpn genes and identified the flanking genes for both sets of genes. The local genomic context of a gene was defined as all coding sequences within a region of ±10 kb of the gene or to the end of a contig if less than 10 kb. We used HMMER3 hmmscan (version 3.3.2) with E-value cutoffs of 10 -5 to search for defense-related domains in all coding sequences within this local context (23).
Specifically, we searched for defense-related Pfam and COG domains that were previously identified in studies by Makarova et al. (24), Doron et al. (25), and Gao et al. (26).
coli MG1655 according to published protocols (30). Phage titers were determined by mixing serial dilutions with top agar (31).

Grow Curve Following Phage Infection in Liquid
Media. E. coli MG1655 harboring pBR322* or pBR322*-RpnP2LS were grown in LB pH 8.5. Overnight cultures were diluted to OD600 around 0.2 and 180 μL of the diluted cultures were dispensed to a 96-well plate. Then, 20 μL of phage in different concentrations were added to reach different MOI. The plate was incubated in a CLARIOstar plate reader at 30˚C with shaking at 300 rpm. OD600 was measured every 15 min.

One-
Step Growth Curves to Measure Burst Size. One-step growth curves were performed following published procedures (29,32). E. coli MG1655 harboring pBR322* or pBR322*-RpnP2LS were grown in LB pH 8.5. Overnight cultures were diluted to OD600 around 0.1. After reaching OD600 around 0.4, 450 μL cells were infected with phage at an MOI of 0.005. Phages were allowed to adsorb to the cells for 10 min before diluting 10,000-fold. Cells were continually  repeats were represented as a series of parallel diagonals whose frequency and spacing reflect their abundance and unit lengths. The value should be between 0 and 1.