Selective Isolation of Eggerthella lenta from Human Faeces and Characterisation of the Species Prophage Diversity

Eggerthella lenta is an anaerobic, high GC, Gram-positive bacillus commonly found in the human digestive tract that belongs to the class Coriobacteriia of the phylum Actinobacteria. This species has been of increasing interest as an important player in the metabolism of xenobiotics and dietary compounds. However, little is known regarding its susceptibility to bacteriophage predation and how this may influence its fitness. Here, we report the isolation of seven novel E. lenta strains using cefotaxime and ceftriaxone as selective agents. We conducted comparative and pangenome analyses of these strains and those publicly available to investigate the diversity of prophages associated with this species. Prophage gene products represent a minimum of 5.8% of the E. lenta pangenome, comprising at least ten distantly related prophage clades that display limited homology to currently known bacteriophages. All clades possess genes implicated in virion structure, lysis, lysogeny and, to a limited extent, DNA replication. Some prophages utilise tyrosine recombinases and diversity generating retroelements to generate phase variation among targeted genes. The prophages have differing levels of sensitivity to the CRISPR/cas systems of their hosts, with spacers from 44 E. lenta isolates found to target only five out of the ten identified prophage clades. Furthermore, using a PCR-based approach targeting the prophage attP site, we were able to determine that several of these elements can excise from the host chromosome, thus supporting the notion that these are active prophages. The findings of this study provide further insights into the diversity of prophages infecting species of the phylum Actinobacteria.


Introduction
The human gastrointestinal tract hosts a wide variety of microorganisms (bacteria, archaea, yeasts, protists and viruses) capable of extending the metabolic capabilities of the human body, with bacterial numbers in the human gut estimated to be at 10 11 CFU/g of faeces [1,2]. These organisms are capable of extending the metabolic potential of the human superorganism, playing an essential role in the digestion of polysaccharides, the synthesis of vitamins and amino acids, as well as the modification of endogenous compounds [3]. Additionally, they play an important role in the metabolism of xenobiotics (including drugs, dietary compounds and environmental toxins), impacting the bioavailability, activity and toxicity of these compounds [4,5]. More than fifty different pharmaceutical compounds have been identified as susceptible to such metabolic alterations [6]. One human gut commensal bacterium implicated in these bioconversion processes is Eggerthella lenta (previously known as Eubacterium lentum) [7]. This is a Gram-positive, high GC, anaerobic, Human faecal samples were collected from ten adult volunteers who were enrolled by APC Microbiome Ireland as part of a study examining the human gut virome, according to study protocol APC055 approved by the Cork Research Ethics Committee (CREC). Faecal samples were transported to the laboratory, aliquoted and frozen at −80 • C within 2-3 h from voiding. Look to Supplementary Information S1, Table S2 for isolate source details for E. lenta isolates. E. lenta isolation from faeces was performed using BHI++ agar (1% w/v agar) and BHI++ overlay (0.25% w/v agarose) supplemented with ceftriaxone (20 µg/mL) and cefotaxime (2 µg/mL). Faecal samples were resuspended in PBS buffer at 0.5 g/mL, serially diluted and added to overlay and poured to agar, with subsequent incubation in anaerobic conditions at 37 • C for 72 h. Presumptive E. lenta colonies (those with brown/yellow pigmentation) were examined by light microscopy utilising Gram staining and a species-specific colony PCR using MyTaq Red Mix (Bioline, London, UK) using primers previously described (Supplementary Information S1, Table S3) [10]. A total of eight confirmed E. lenta strains and seven additional unidentified colonies were selected for further analysis utilising genome sequencing.

Demonstration of DGR Functionality of Prophage DSM2243phi4
BHI++ broth was used for the growth of DSM2243. All fermentations were performed in a continuous format for four days with controlled conditions in a final volume of 200 mL and using myControl Minibio-500 mL systems (Applikon Biotechnology, JG Delft, The Netherlands). The dissolved oxygen level was maintained below 0.1% by sparging with an anaerobic gas mix (80% (v/v) N2, 10% (v/v) CO2, 10% (v/v) H2). The medium was kept at 37 • C with constant stirring of 50 rpm and a controlled pH of 6.8. The vessel was inoculated with 2 mL of DSM2243 for 24 h in batch. After that, fresh broth was pumped to the vessels at a rate of 400 mL/day. The culture was then plated in a manner that allowed the isolation of randomly selected colonies, which were subsequently prepared for genome sequence in the same manner as previously mentioned.

Detection of Circularised Prophage Genomes among Host Strain Cells
PCR verifications were used to detect circularised prophage genomes. To determine the presence of prophage predicted attP sites primers were utilised that were specific boundaries of these genome loci. MyTaq Red Mix (Bioline, London, UK) was utilised for PCR performed on E. lenta cells resuspended in phosphate buffer saline (PBS) buffer. Details of primers and annealing temperatures for these PCRs are provided in Supplementary Information S1, Table S2. Subsequently, the resulting PCR products were sent for sanger sequencing and aligned to the respective prophage genome.

Detection of Virions in the Supernatant of Strain DSM2243
The supernatant obtained from a pelleted culture of DSM2243 was treated with DNase I (6 µg/mL) and RNase A (3.34 µg/mL) at 37 • C for 30 min, with subsequent treatment with 10% SDS and proteinase K (60 µg/mL) at 55 • C for 30 min. DNA extraction was then performed with phenol: chloroform: isoamyl alcohol (25:24:1 v/v) and chloroform: isoamyl alcohol (24:1 v/v). DNA precipitation was conducted with 0.3 M sodium acetate (pH 5.2) and isopropanol (50% v/v) and subsequent DNA clean-up with Zymogen clean/concentration columns (Zymo Research, CA, USA). Library preparation using the Nexteria XT kit and genome sequencing was conducted in the same manner as previously mentioned, with sequence coverage being determined with Bowtie2 and Samtools, using both forward and reverse sequence reads.

Data Processing and Visualisation
Data manipulation and graphic illustration were performed using the R environment (https://www.r-project.org, accessed on 2 December 2021) with the following packages: reshape, ggpolt2, tidyverse, heatmap.2, and complex heatmap.

DDBJ/ENA/GenBank Submission Details
For accession numbers for genomes and plasmids of E. lenta isolated in this study look to Table 1. Please contact authors to obtain identified prophage genome sequences discussed in the study.

Isolation of E. lenta from Human Faecal Samples
For E. lenta, as well as most species of the order Coriobacteriia, there is limited information regarding growth media that can be used for their selective isolation [66]. Most well-described isolates of this bacterium have been obtained using non-selective media or procedures only based on inherent metabolic features (e.g., oxidation of digoxin or conversion of lignans) [67][68][69]. In this study, we devised a selective agar composition based on BHI++ media as described by Bisanz et al. [27], supplemented with the broad-spectrum antibiotics cephalosporins ceftriaxone and cefotaxime as the bacterium has been reported to be highly resistant to both agents (we term this media as Elen-BHI++, hereafter) [18,70]. We first verified that this antibiotic resistance is present among several strains in our possession [n = 7] with a median MIC of >156 µg/mL for both ceftriaxone and cefotaxime (Supplementary S1, Table S4).
For E. lenta strain isolation, human faeces from ten different individuals were diluted and plated onto Elen-BHI++ agar. When serial diluted human faeces was plated onto this selective agar and subsequently incubated under optimal conditions, it resulted in the formation of a limited number of colony morphologies enabling the identification of those presumptive to be E. lenta. Those colonies found to possess a distinct dark yellow/brown pigmentation were identified as this bacterium (Supplementary S2, Figure S1). Strains of this species have been reported to produce this pigmentation on BHI++ media, a phenomenon that is suspected to be related to its predicted ability to produce carotenoid compounds [27]. It was also observed that this colony pigmentation was found more consistently when the bacterium was grown using double overlays utilising soft agarose of Elen-BHII++ medium. Eight of the ten faecal samples yielded presumptive E. lenta colonies and these were subsequently confirmed by species-specific colony PCR. Enumeration of these colonies ranged from 1.3 × 10 5 to 4.2 × 10 6 CFU/g with a median of 4.2 × 10 5 CFU/g [n = 7] of faeces. Selected E. lenta isolates were confirmed as Gram-positive rods under light microscopy, with cells appearing either singularly or in long chains (Supplementary S2. Figure S2). Other dominant colony morphologies identified to grow on Elen-BHI++ agar (white colonies of varying size) were identified as Bacteroides fragilis or Bacteroides uniformis.

Genomes of E. lenta Isolates
Genome assembly was performed using short and long-reads to obtain high-quality complete genomes for seven E. lenta isolates obtained in this study (Supplementary Information 1, Tables S1 and S2). Average sequence coverage for short and long reads being 1129 and 31-fold, respectively. Sequenced isolates were found to have an average genome size of 3.34 ± 0.246 mbp with a high GC content of 63.54 ± 0.95% (mean ± SD); these values are within ranges typically reported for this species [71]. Dot plot alignments were performed with the seven fully-sequenced genomes of E. lenta obtained in this study (APC055-539-5C, APC055-529-1D, APC055-949-4, APC055-928-H3-3, APC055-924-7B, APC055-920-1E and APC-F2-3) with those publicly available (C592 and DSM2243). Based on the obtained dot plots, variability is observed across members of E. lenta species, with general conservation of genome synteny. Occasional break points are observed at multiple chromosomal locations across the nine compared genomes, indicative of insertions/deletions events that occurred within the species (Figure 1). Furthermore, cryptic plasmids were obtained with the genomes of APC055-529-1D (single plasmid) and APC-F2-3 (two plasmids). These plasmids ranged from 3086 to 3844 bp in size with a GC content of 58 ± 1% and containing between 5 and 6 CDS, with each plasmid possessing a gene that could be identified as a plasmid replication protein (identified to possess a Rep_2 domain [PF01719] or a distantly related structural homolog). Similar elements have previously been described to be found among this species [72]. The sequence depth of these plasmids was found to be 5 to 33-fold greater than that of the host genome.
These genomes were compared to 50 non-redundant genomes of E. lenta obtained from public repositories, comprised of two complete and 48 high-quality draft sequences with a low number of contigs (median of 58 (range 13-465)) (Supplementary Information S1, Table S1). Based on our comparison, E. lenta isolates were found to possess an average nucleotide identity (ANI) of ≥97%, with genomes of this species sharing an ANI of ≈88% Microorganisms 2022, 10, 195 7 of 25 with the type of strain of Eggerthella sinesis, the closest related species to E. lenta currently defined in the literature (Supplementary Information S2, Figure S3).
These genomes were compared to 50 non-redundant genomes of E. lenta ob from public repositories, comprised of two complete and 48 high-quality draft sequ with a low number of contigs (median of 58 (range 13-465)) (Supplementary Inform S1, Table S1). Based on our comparison, E. lenta isolates were found to possess an av nucleotide identity (ANI) of ≥97%, with genomes of this species sharing an ANI of with the type of strain of Eggerthella sinesis, the closest related species to E. lenta cur defined in the literature (Supplementary Information S2, Figure S3).
. Figure 1. Dot plot illustrating whole genome alignment (MUMmer) of seven E. lenta strains lated in this study (denoted with the APC prefix) and those publicly available (C592 and DSM2243) with the genome of the type strain DSM2243.

Comparative Analysis of E. lenta Isolates
To facilitate the identification of prophage sequences associated with this spe pangenome analysis was performed with the E. lenta genomes (n = 57) based on th tering of their predicted proteins into orthologous groups (OGs). These genomes identified to contain an average of 2933 ± 115 proteins, which could be placed into pan-genome of 7235 OGs (identity = 30%, coverage = 70%) with a core genome of 154 ( Figure 2A). This analysis allowed the categorisation of the genes of E. lenta into categories: core (genes shared among all isolates), accessory (genes shared among isolates) and unique (genes unique to a particular isolate). Our genome comparison lished that the core genome represents approximately 53% of the number of genes on an average sized E. lenta genome ( Figure 2B). This is within the range of that rep for other bacterial species associated with the human gastrointestinal tract, with

Comparative Analysis of E. lenta Isolates
To facilitate the identification of prophage sequences associated with this species, a pangenome analysis was performed with the E. lenta genomes (n = 57) based on the clustering of their predicted proteins into orthologous groups (OGs). These genomes were identified to contain an average of 2933 ± 115 proteins, which could be placed into a total pan-genome of 7235 OGs (identity = 30%, coverage = 70%) with a core genome of 1547 Ogs (Figure 2A). This analysis allowed the categorisation of the genes of E. lenta into three categories: core (genes shared among all isolates), accessory (genes shared among some isolates) and unique (genes unique to a particular isolate). Our genome comparison established that the core genome represents approximately 53% of the number of genes found on an average sized E. lenta genome ( Figure 2B). This is within the range of that reported for other bacterial species associated with the human gastrointestinal tract, with core genomes ranging from 44% to 61% for Escherichia coli and Bifidobacterium longum, respectively [73,74]. Based on our comparative analysis the E. lenta pan-genome was found to be open (when interpreted using Heap's law), implying that the size of the pan-genome will tend to increase with the analysis of additional genomes [41]. Accordingly, each genome within this dataset was found to provide a median of 23 unique genes (range 5 to 373) not shared among other genomes ( Figure 2B).

Identification and Diversity of Prophages
Screening of the pangenome of 57 E. lenta genomes for OGs encoding hallmark phage proteins (terminase, major head and portal protein) resulted in the identification of prophage-like elements in the genomes of 26 strains. In total, 33 prophages were identified with some strains found to possess up to two distinct prophages per genome ( Table 2). Utilising progressive Mauve for sequence alignment of these E. lenta genomes with isolates DSM2243 or C592 (strains with publicly available complete genomes sequences) allowed determination of the approximate location of prophage termini. The genome sizes of complete prophages (those with approximate genome termini determined) were found to vary from 32 to 42 kb, with GC contents typically lower than that of their host ranging from 58 to 67%. In five cases, we could not identify prophage termini due to the incompleteness of host draft sequences. Gegenees derived BLASTn analysis allowed the clustering of these prophages in ten distinct clades, each sharing an identity of >60% across the whole prophage sequence at the nucleotide level, with a higher level of inter-clade relatedness indicated with a homology of >40% at nucleotide level between clades 6 and 9 ( Figure 3). These prophage clades were further confirmed by sequence analysis of their proteomes and phylogenetic analysis using VICTOR ( Supplementary Information S2, Figure S5). Of note, the most populated clades were 1 and 7, collectively representing more than half of the prophages identified.
The prediction of potential integration sites (attB) was performed using genome The COG (cluster of orthologous groups) database is constituted of proteins whose functions are assumed to be derived from ancestral proteins with similar or identical functions and is a popular tool for functional classification of protein function [42]. The RPS-BLAST alignments against the COG database allowed COG assignment of 58% of OGs of E. lenta. OGs identified are involved in the metabolism of carbohydrates, amino acids, lipids, and secondary metabolites, which can be found to be part of the core genome. Additionally, such OGs were found throughout accessory and unique genes ( Figure 2C). Our analysis highlights that there is a core set of metabolic capabilities associated with all species but there are also strain-specific differences across isolates. It has already been shown that certain E. lenta strains vary in their capacity to act on digoxin and lignans [12]. In accordance with previous observations, our analysis identified strain differences in genes implicated in the inactivation of drugs such as digoxin (present in 28/57 genomes) and dopamine (present in 55/57 genomes), as well as the activation of phytochemicals such as lignans (present in 45/57 genomes). Furthermore, the examination of genes implicated in antibiotic resistance showed that beta-lactamases are a highly prevalent feature of this species with the identification of at least two varieties (PF13354, PF00144) of the enzyme among 56/57 and 55/57 genomes, respectively (Supplementary Information S2, Figure S4). Notably, this result contrasts with previous reports suggesting that this bacterium does not produce beta-lactamases [70,75]. The possession of these antibiotic resistance genes likely explain the high resistance of this species to ceftriaxone and cefotaxime found in this study and others [18,70]. Our analysis also showed that gene products associate with mobile genomic elements such as transposons and prophages are entirely associated with accessory and unique genes making up 2.2% (77 OGs) and 1.4% (33 OGs), respectively ( Figure 2C). This result indicates that these gene products are not shared among all isolates, with some constituting unique features present in particular isolates.

Identification and Diversity of Prophages
Screening of the pangenome of 57 E. lenta genomes for OGs encoding hallmark phage proteins (terminase, major head and portal protein) resulted in the identification of prophage-like elements in the genomes of 26 strains. In total, 33 prophages were identified with some strains found to possess up to two distinct prophages per genome ( Table 2). Utilising progressive Mauve for sequence alignment of these E. lenta genomes with isolates DSM2243 or C592 (strains with publicly available complete genomes sequences) allowed determination of the approximate location of prophage termini. The genome sizes of complete prophages (those with approximate genome termini determined) were found to vary from 32 to 42 kb, with GC contents typically lower than that of their host ranging from 58 to 67%. In five cases, we could not identify prophage termini due to the incompleteness of host draft sequences. Gegenees derived BLASTn analysis allowed the clustering of these prophages in ten distinct clades, each sharing an identity of >60% across the whole prophage sequence at the nucleotide level, with a higher level of inter-clade relatedness indicated with a homology of >40% at nucleotide level between clades 6 and 9 ( Figure 3). These prophage clades were further confirmed by sequence analysis of their proteomes and phylogenetic analysis using VICTOR (Supplementary Information S2, Figure S5). Of note, the most populated clades were 1 and 7, collectively representing more than half of the prophages identified.     The prediction of potential integration sites (attB) was performed using genome alignments, and we were able to predict an attB site for 26 of the 33 prophages (Table 3). Prophages of four clades (1, 2, 3, 7) integrate at transfer RNA (tRNA) genes for arginine, alanine, serine and leucine. Similar attachment sites have been described for prophages infecting two other Actinobacteria species, Bifidobacterium and Mycobacterium [76][77][78]. Additionally, prophages of clade 5 appear to integrate into the coding sequence for a hypothetical protein with predicted DNA binding activity (PF01381). Unfortunately, no putative attachment side could be assigned using this methodology for the remaining five clades. Additional confirmation for the attachment of clade 1 was determined by detecting direct terminal repeats in six of the seven representatives for which approximate prophage sequence boundaries were known. The terminal repeats look to represent the attP sites for these prophages and possess sequence homology to tRNA genes when aligned to the genome of the type of strain (Table 3)

Gene Content of Prophages and Possible Impact on Host Infection
The number of ORFs per prophage ranges from 35 to 64, and these can be assigned to 418 OGs (identity = 30%, coverage = 70%), representing approximately 5.8% of the total pangenome of the host species. Prophage proteins are highly diverse, with limited homology between prophage clades ( Figure 4A). Of the 418 OGs, only 98 were found to be shared across clades (such OGs being shared among a median of two clades (range 2-5)). Notably, only 37% of these OGs could be given a functional assignment. At the level of protein function, overlap could be observed among different OGs, an observation that resulted in the designation of multiple OGs among prophages implicated with similar function (Supplementary Information S2, Figure S6). For this reason, our annotation efforts classified prophage OGs into five major functional categories of virion assembly, lysogeny, host lysis, DNA replication and maintenance related, transcription and accessory ( Figure 4B). TTCAGATGGTGCGGGCGAGAGGACTTGAACCTCCA TGGGGTT tRNA-Leu (ELEN_RS15020)

Gene Content of Prophages and Possible Impact on Host Infection
The number of ORFs per prophage ranges from 35 to 64, and these can be assigned to 418 OGs (identity = 30%, coverage = 70%), representing approximately 5.8% of the total pangenome of the host species. Prophage proteins are highly diverse, with limited homology between prophage clades ( Figure 4A). Of the 418 OGs, only 98 were found to be shared across clades (such OGs being shared among a median of two clades (range 2-5)). Notably, only 37% of these OGs could be given a functional assignment. At the level of protein function, overlap could be observed among different OGs, an observation that resulted in the designation of multiple OGs among prophages implicated with similar function (Supplementary Information S2, Figure S6). For this reason, our annotation efforts classified prophage OGs into five major functional categories of virion assembly, lysogeny, host lysis, DNA replication and maintenance related, transcription and accessory ( Figure 4B).
(A)  Among proteins implicated in virion structure those involved in DNA packaging (major capsid, large terminase, portal protein) were present in all representatives as would be expected for genes encoding core phage features. Phylogenetic analysis of the large terminase indicates that the DNA packaging strategy of prophages clades 1 to 6 is a headful system related to Bacillus phage SPP1, while that of clades 7 to 10 use a cos-type system related to Bacillus phage phi105 and Lactococcus phage phiLC3 (Supplementary Information S2, Figure S7) [79][80][81]. Virion tail related proteins were also identified among the most frequently found functions, encoding for the tail tape measure (6/10 clades), tail completion (5/10 clades) and baseplate upper protein (4/10 clades), in accord with a tailed virion morphology. In fact, the structural proteins (major capsid, adopter, head completion and neck protein) are all consistent with a Siphoviridae-like morphology (Supplementary Information S1, Table S5). Of note, predicted structural proteins with Ig-like domains were found among prophages of clades, 2, 3 and 4. It has been previously shown that these domains are implicated in phage adherence to glycan residues present in mucin glycoprotein, often associated with mucosal surfaces such as the intestinal gut wall [82]. The presence of such domains on the virions of phage infecting E. lenta would likely be advan- Among proteins implicated in virion structure those involved in DNA packaging (major capsid, large terminase, portal protein) were present in all representatives as would be expected for genes encoding core phage features. Phylogenetic analysis of the large terminase indicates that the DNA packaging strategy of prophages clades 1 to 6 is a headful system related to Bacillus phage SPP1, while that of clades 7 to 10 use a cos-type system related to Bacillus phage phi105 and Lactococcus phage phiLC3 (Supplementary Information S2, Figure S7) [79][80][81]. Virion tail related proteins were also identified among the most frequently found functions, encoding for the tail tape measure (6/10 clades), tail completion (5/10 clades) and baseplate upper protein (4/10 clades), in accord with a tailed virion morphology. In fact, the structural proteins (major capsid, adopter, head completion and neck protein) are all consistent with a Siphoviridae-like morphology (Supplementary Information S1, Table S5). Of note, predicted structural proteins with Iglike domains were found among prophages of clades, 2, 3 and 4. It has been previously shown that these domains are implicated in phage adherence to glycan residues present in mucin glycoprotein, often associated with mucosal surfaces such as the intestinal gut wall [82]. The presence of such domains on the virions of phage infecting E. lenta would likely be advantageous as the bacterium has been shown to possess high adherence to gut epithelial cells in cell culture, indicating the bacterium strongly associates with the intestinal gut wall [83].
A recent study conducted on prophages of another Actinobacteria genus identified that phages infecting Bifidobacterium possess the so-called Rin system, with an RBP-locus tyrosine-family DNA invertase, implicated in conferring diversity in host range specificity [76]. Similarly to what was observed for Bifidobacterium, alignment of E. lenta prophages of clade 1 showed high gene synteny except for one location directly downstream of a tyrosine recombinase (Rin) (PS51898) where several small tandemly oriented genes (Rv) are also located, often encoding for small proteins with H-type lectin domain (Figure 3). These domains are typically involved in carbohydrate binding and can be found associated with phage RCB proteins, suggesting that genes of this prophage clade are implicated in host specificity [62]. These Rv genes can have homology with the C-terminus of a much larger gene (Rc) located directly downstream, thus indicating that gene recombination and shuffling occurs in these loci. Furthermore, we identified a short asymmetric 8 bp repeat (5 -ttccgtat-3 ) upstream and downstream of each Rv gene. This repeat sequence can be found inverted downstream of the Rc gene just after a stop codon, as well as located within the gene itself. These are expected to be the crossover sites (rix) that allow inversion to occur. This repeat sequence does not commonly appear in other regions of this prophage clade. Like the system found in Bifodobacterium prophages, Rv genes possess limited homology to each other ( Figure 5A), and the tyrosine recombinase is distinct from the tyrosine integrase, which possesses similar domain architecture and is located at the opposite wing of the genome of E. lenta clade 1 prophages ( Figure 5B).
As would be expected for prophages, genes typically associated with lysogeny could be found among most prophage clades, such as an integrase (7/10 clades) responsible for the insertion of a prophage into the host genome, all identified as tyrosine integrases (IPR002104). We also found genes containing the Cro/C1-type repressor family domain (9/10 clades), a domain associated with Cro and C1 proteins of coliphage Lambda, which act as gene repressors for the regulation of lysogeny. Lysis related genes were also found with proteins implicated in peptidoglycan degradation (8/10 clades). These possess varying enzymatic activities, with amidase being the most common (5/10 clades), followed by glycosidase hydrolase (2/10) and CHAP domains (1/10 clades). Furthermore, these proteins could be found to possess cell wall binding domains of varying types (Cholin Binding, SH3-like, PGBDSf, CW_7 and LysM). Most of these identified peptidoglycan degrading proteins are expected to play a role as endolysins due to their proximity to holin encoding genes. However, unlike genes implicated in virion structure, lysogeny and lysis, less success was achieved in the identification of genes implicated in DNA replication. Of note, this difficulty in identifying genes associated with DNA replication has also been observed among Bifidobacterium prophages [77], most probably indicating a current lack of reference genes in public databases.
However, one interesting finding among this category of gene products was the detection of ORFs with gene products predicted to encode RNA-dependent DNA polymerase (IPR000477) that we term ert (Eggerthella reverse transcriptase) among prophage clades 2, 3, 4, 7 and 10, which appear to form a diversity generating retro (DGR) element. Among Bordetella temperate phages (BPP-1, BIP-1 and BMP-1), RNA dependent DNA polymerase has been demonstrated to act as part of a system that causes nucleotide substitutions with genes for virion proteins involved with host specificity, resulting in phase variation [84]. It is likely that a similar system exists among E. lenta prophages. Members of clade 7 prophages were found to be highly related at the nucleotide level (Figure 3 apart from the 3 termini at the variable region (VR) of the mtd (major tropism determinant) gene located downstream of genes implicated in virion structural proteins. VR is a tandem repeat (TR) of a region located upstream of the 5 termini of ert that is 130 bp in size (BLASTn identity >90%). The architecture of this system in clade 7 Eggerthella prophages resembles that of the previously discussed Bordetella phages, even with a possible equivalent of the avd (accessory variability determinant) gene situated between the TR region and the mtd gene, the product of which complexes with Ert (the equivalent of Brt among Bordetella phages) and facilitates the DGR process [85]. The brt gene product utilising the TR region causes site-specific mutagenesis of the VR region [86]. This architecture of DGR with ert, mtd, predicted avd genes as well as the TR and VR regions, are similar among prophages in clades 3, 4 and 10. However, a slightly different configuration is observed among prophages of clade 2 ( Figure 6A). Among these prophages, the TR region lies within the ert gene with this region being 123 bp long, where once again the VR region (BLASTn identity >90% to TR) is positioned towards the 5 end of the mtd gene. Alignment of the VR regions between members of prophage clade 2 shows a high density of nucleotide substitutions. We could not assign a role to the mtd gene among these prophages, but it is suspected to play a role in virion structure due to their proximity to ORFs implicated with such function among these prophages. For Bordetella phage BPP-1, mtd has been identified to encode a protein that forms part of the tail fibre of the virion of this phage [87]. We were able to obtain evidence for the functionality of the DGR element of a clade 4 prophage associated with E. lenta strain DSM2243 (type strain). After fermentation in a chemostat (24 h as a batch, then 72 h as continuous), we isolated randomly selected colonies and subjected them to genome sequencing (average coverage >300). Analysis of the VR region of the mtd gene among such isolates shows nucleotide substitution at 10 different sites, enabling DSM2243phi4 to explore a sequence space of potentially up to 10 6 unique variants concerning this locus ( Figure 6B). The finding also indicates that the system is active while the prophage remains in its lysogenic state.
Microorganisms 2022, 10, x FOR PEER REVIEW 15 of 26 in cell culture, indicating the bacterium strongly associates with the intestinal gut wall [83]. A recent study conducted on prophages of another Actinobacteria genus identified that phages infecting Bifidobacterium possess the so-called Rin system, with an RBP-locus tyrosine-family DNA invertase, implicated in conferring diversity in host range specificity [76]. Similarly to what was observed for Bifidobacterium, alignment of E. lenta prophages of clade 1 showed high gene synteny except for one location directly downstream of a tyrosine recombinase (Rin) (PS51898) where several small tandemly oriented genes (Rv) are also located, often encoding for small proteins with H-type lectin domain (Figure 3). These domains are typically involved in carbohydrate binding and can be found associated with phage RCB proteins, suggesting that genes of this prophage clade are implicated in host specificity [62]. These Rv genes can have homology with the C-terminus of a much larger gene (Rc) located directly downstream, thus indicating that gene recombination and shuffling occurs in these loci. Furthermore, we identified a short asymmetric 8 bp repeat (5′-ttccgtat-3′) upstream and downstream of each Rv gene. This repeat sequence can be found inverted downstream of the Rc gene just after a stop codon, as well as located within the gene itself. These are expected to be the crossover sites (rix) that allow inversion to occur. This repeat sequence does not commonly appear in other regions of this prophage clade. Like the system found in Bifodobacterium prophages, Rv genes possess limited homology to each other ( Figure 5A), and the tyrosine recombinase is distinct from the tyrosine integrase, which possesses similar domain architecture and is located at the opposite wing of the genome of E. lenta clade 1 prophages ( Figure 5B).   Prophages among several bacterial species can also possess genes that impact host fitness. This also appears to be the case for E. lenta in which prophages of clade 5 possessing an operon of up to six genes implicated in exopolysaccharide biosynthesis (or possibly capsular polysaccharides). Three gene products can be broadly classed as polysaccharide transferases, one of which possesses the polysaccharide pyruvyl transferase domain (IPR007345) associated with WcaK of Escherichia coli implicated in the formation of colanic acid. Another of the proteins possess the domain UDP-N-acetylglucosamine 2epimerase WecB-like (IPR029767) that is related to WecB in Enterobacteriaceae, implicated in the formation of a surface antigen polysaccharide.
Other interesting accessory genes include a toxin-antitoxin system in prophage clade 6 that could play a role to ensure the retention of prophage in daughter cells, as seen with phage N15 of Escherichia coli [88]. Furthermore, the possession of an abortive infection (ABI) system protein (IPR011664) is present in members of prophage clades 5 and 7. It is tempting to speculate if such proteins may play a role in preventing infection of the host by competing prophages.

Taxonomic Placement of Prophages
To better understand the taxonomic position of E. lenta prophages relative to currently available phage genomes, we conducted a network-based analysis of shared protein clusters using vConTACT2 and a database of 12,892 phage genomes (Figure 7). The analysis placed the 33 prophages into six viral clusters (Supplementary Information S2,  Table S6), which approximates to genus level ranking in relation to ICTV classifications [54]. However, under-sampling with respect to the number of genomes available to representatives of each prophage clade has likely resulted in this placement, as the diversity of each prophage clade appears sufficient for genus designation (shared nucleotide sequence similarity >50%) [89]. These clusters were positioned within a complex network comprising of phages belonging to Siphoviridae containing several defined phage genera with bacterial hosts mostly situated among the phylum Firmicutes, but also some representatives of Actinobacteria. Ten phage genera could be identified to be situated within Prophages among several bacterial species can also possess genes that impact host fitness. This also appears to be the case for E. lenta in which prophages of clade 5 possessing an operon of up to six genes implicated in exopolysaccharide biosynthesis (or possibly capsular polysaccharides). Three gene products can be broadly classed as polysaccharide transferases, one of which possesses the polysaccharide pyruvyl transferase domain (IPR007345) associated with WcaK of Escherichia coli implicated in the formation of colanic acid. Another of the proteins possess the domain UDP-N-acetylglucosamine 2-epimerase WecB-like (IPR029767) that is related to WecB in Enterobacteriaceae, implicated in the formation of a surface antigen polysaccharide.
Other interesting accessory genes include a toxin-antitoxin system in prophage clade 6 that could play a role to ensure the retention of prophage in daughter cells, as seen with phage N15 of Escherichia coli [88]. Furthermore, the possession of an abortive infection (ABI) system protein (IPR011664) is present in members of prophage clades 5 and 7. It is tempting to speculate if such proteins may play a role in preventing infection of the host by competing prophages.

Taxonomic Placement of Prophages
To better understand the taxonomic position of E. lenta prophages relative to currently available phage genomes, we conducted a network-based analysis of shared protein clusters using vConTACT2 and a database of 12,892 phage genomes (Figure 7). The analysis placed the 33 prophages into six viral clusters (Supplementary Information S2, Table S6), which approximates to genus level ranking in relation to ICTV classifications [54]. However, under-sampling with respect to the number of genomes available to representatives of each prophage clade has likely resulted in this placement, as the diversity of each prophage clade appears sufficient for genus designation (shared nucleotide sequence similarity >50%) [89]. These clusters were positioned within a complex network comprising of phages belonging to Siphoviridae containing several defined phage genera with bacterial hosts mostly situated among the phylum Firmicutes, but also some representatives of Actinobacteria. Ten phage genera could be identified to be situated within this cluster, all infecting members of the Firmicutes. The closest defined genus that could be associated with the E. lenta prophages was that of Cequinduevirus (average edge weight 5.20) to prophage clade 9. Moreover, analysis with VIPtree further indicates a distant relationship between E. lenta prophages and Cequinduevirus (Supplementary Information S2, Figure S8). This genus comprises phages infecting the genus Lactobacillus-type phage Lactobacillus phage c5. These phages possess a Siphoviridae morphology and are suspected of having evolved from a lineage of phage that was once temperate due to their possession of proteins similar to Cro-like repressors, but now lack genes encoding other proteins necessary for this phage lifestyle [90]. Genome alignment of clade 9 prophages to those of Cequinduevirus shows that there is homology between several proteins implicated with virion capsid formation and DNA packaging ( Supplementary Information S2, Figure S9). This suggests that these phages likely utilise a similar DNA packaging strategy as Lactobacillus phage C5, which utilises a cos-type system [90].  Figure S8). This genus comprises phages infecting the genus Lactobacillus-type phage Lactobacillus phage c5. These phages possess a Siphoviridae morphology and are suspected of having evolved from a lineage of phage that was once temperate due to their possession of proteins similar to Cro-like repressors, but now lack genes encoding other proteins necessary for this phage lifestyle [90]. Genome alignment of clade 9 prophages to those of Cequinduevirus shows that there is homology between several proteins implicated with virion capsid formation and DNA packaging (Supplementary Information S2, Figure S9). This suggests that these phages likely utilise a similar DNA packaging strategy as Lactobacillus phage C5, which utilises a cos-type system [90].

Prophages and the CRISPR/cas System
E. lenta possesses a CRISPR/cas system of type I-C subgroup that has been demonstrated to be functional and is understood to target mobile genetic elements such as plasmids and prophages [26]. However, information on the CRISPR/cas system and its impact on prophages of E. lenta has yet to be described. Among the 58 genomes of E. lenta examined in this study, 44 were identified to possess a spacer array. These arrays were identified to possess between 13 to 104 spacers (median 52) with a median size of 34 nucleotides. In total, 2283 spacers (555 unique) were identified among their genomes, with 283 spacers (46 unique) found to target prophages of this species.
Only two of the 44 E. lenta genomes with identified CRISPR spaces did not target the prophages identified in this study. The remaining genomes were found to have arrays that harbour at least one spacer, with a maximum of three nucleotide miss matches, that could target a representative of a single prophage clade (7/44) (Figure 7), with others

Prophages and the CRISPR/cas System
E. lenta possesses a CRISPR/cas system of type I-C subgroup that has been demonstrated to be functional and is understood to target mobile genetic elements such as plasmids and prophages [26]. However, information on the CRISPR/cas system and its impact on prophages of E. lenta has yet to be described. Among the 58 genomes of E. lenta examined in this study, 44 were identified to possess a spacer array. These arrays were identified to possess between 13 to 104 spacers (median 52) with a median size of 34 nucleotides. In total, 2283 spacers (555 unique) were identified among their genomes, with 283 spacers (46 unique) found to target prophages of this species.
Only two of the 44 E. lenta genomes with identified CRISPR spaces did not target the prophages identified in this study. The remaining genomes were found to have arrays that harbour at least one spacer, with a maximum of three nucleotide miss matches, that could target a representative of a single prophage clade (7/44) (Figure 7), with others targeting up to two (7/44), three (27/44) or even four different prophage clades (1/44). Our analysis also shows that a single E. lenta genome can harbour up to six spacers targeting a single prophage clade, while it is also possible for the same spacer to be present in up to 10 different strains (Supplementary Information S1, Table S6). The majority of unique spacers (43 out of 46) were identified to target protein-coding regions, often predicted to encode for core phage functions (terminase, portal protein, integrase) with the targeting of such regions likely to provide efficient immunity (Supplementary Information S1, Table S7).
The different prophage clades are not evenly targeted by the CRISPR/cas system among E. lenta genomes. In fact, we identified spacers of arrays targeting prophages clades 1, 4, 6, 8 and 9 while other clades seem to be unaffected. This observation is unlikely from the uneven number of prophage genomes representing each clade in this study. For example, prophage clade 1 and clade 7 are the most populated, containing 8 and 10 genomes, respectively. However, no spacers were discovered to target the latter clade ( Figure 8). Likely indicating that such prophage clades, such as clade 7, possess a mechanism of resistance to CRISPR spacer targeting.
10 different strains (Supplementary Information S1, Table S6). The majority of unique spacers (43 out of 46) were identified to target protein-coding regions, often predicted to encode for core phage functions (terminase, portal protein, integrase) with the targeting of such regions likely to provide efficient immunity (Supplementary Information S1, Table  S7).
The different prophage clades are not evenly targeted by the CRISPR/cas system among E. lenta genomes. In fact, we identified spacers of arrays targeting prophages clades 1, 4, 6, 8 and 9 while other clades seem to be unaffected. This observation is unlikely from the uneven number of prophage genomes representing each clade in this study. For example, prophage clade 1 and clade 7 are the most populated, containing 8 and 10 genomes, respectively. However, no spacers were discovered to target the latter clade (Figure 8). Likely indicating that such prophage clades, such as clade 7, possess a mechanism of resistance to CRISPR spacer targeting. Furthermore, it is also possible for a host to harbour a spacer targeting a prophage infecting it. This was observed for isolates BSD2780120875_150330_C12, AB8#2, APC055-529-1D and MR1#12. However, only in the case of isolate AB8#2 did we identify the canonical 5′TCC PAM sequence flanking the target, suggesting that this case may represent the only example where a prophage is being actively targeted by the host CRISPR/cas system [26].
. Figure 8. Heatmap illustrating spacer sequence hits among the CRISPR array of different E. lenta isolates (y-axis) and E. lenta prophage clades (x-axis) with the number of representatives for each clade identified in brackets. The analysis utilised 283 spacers from CRISPR arrays of 44 E. lenta strains identified to target prophages of the species.

Evidence That Prophages Are Functional
We generated in silico and experimental evidence that indicates that the prophage elements identified in this study represent active prophage rather than domesticated elements. Furthermore, it is also possible for a host to harbour a spacer targeting a prophage infecting it. This was observed for isolates BSD2780120875_150330_C12, AB8#2, APC055-529-1D and MR1#12. However, only in the case of isolate AB8#2 did we identify the canonical 5 TCC PAM sequence flanking the target, suggesting that this case may represent the only example where a prophage is being actively targeted by the host CRISPR/cas system [26].

Evidence That Prophages Are Functional
We generated in silico and experimental evidence that indicates that the prophage elements identified in this study represent active prophage rather than domesticated elements.
Phylogenetic analysis of E. lenta strains shows that their relatedness cannot fully explain the presence of these elements among host genomes. Our phylogenetic analysis has highlighted that distantly related isolates can share prophages while those more closely related may not, showing that the gain and loss of these elements represent independent events from the evolutionary history of their hosts (Figure 9).
Microorganisms 2022, 10, x FOR PEER REVIEW 21 of 26 Figure 9. Phylogram constructed from concatenated core genes as predicted by ROARY with protein alignment using PRANK and tree constructed by FASTTree; bootstrap branch support indicated with values from 0 to 1. The table indicates the presence or absence of representatives of a particular prophage clade integrated into the genome of the bacterial isolate described.

Discussion
E. lenta is a bacterium of the human gastrointestinal tract implicated in the metabolism of medical and dietary compounds. To allow improved understanding of this species role in relation to human health routine strategies must be developed to enable its isolation and cultivation. To date, there is a limited description of a methodology to perform isolation of E. lenta from the human gut microbiome using selective growth. In this study, we devise a simple strategy utilising BHI++ medium supplemented with β-lactam antibiotics (ceftriaxone and cefotaxime). The use of this media allowed the selective isolation of E. lenta strains directly from human faeces (from eight of ten inspected individuals), based Analysis of read depth across prophage regions in comparison to that of the host genome show that in 16 of the 22 examined cases (isolates obtained in this study and others), prophage sequence coverage was found to be 1.1 to 5.2 times greater than that of the host (Supplementary Information S2, Figure S10). This observation was found among prophages belonging to clades 1, 2, 3, 4, and 7, implying that prophage genomes are present in higher copies than their host.
Furthermore, we could demonstrate the likely presence of virion of DSM2243phi4 when the supernatant of a culture of strain DSM2243 is subjected to genome sequencing following treatment with DNase. Sequence coverage could be seen to be 26-fold greater for the genome of DSM2243phi4 to that of the host, resulting in 44215 reads that mapped to the prophage with an average coverage of 474, compared to 442145 reads that aligned with an average coverage of 18 for the host.
The process of prophage excision results in the circularisation of its genome and restoration of the attP locus. We assessed the presence of the attP locus by PCR amplification specific for this region among prophages 14Aphi1, Valeniciaphi2, DSM2243phi4, 1-1-60FAAphi6 and 14Aphi7, representing clades 1, 2, 4, 6 and 7, respectively. This analysis verified that prophage genome circularisation could be observed in all tested cases (Supplementary Information, Figure S11), with sequenced amplicons aligning to their predicted loci for these prophages. Moreover, further confirmation of the predicted attP of clade 1 prophages was found with the alignment of the PCR amplicon of the attP to 14Aphi1 to the targeted tRNA gene (using strain DSM2243 as reference), which acts as its attB site of this prophage (Table 2). However, alignment of the attP site amplicon for those of 14Aphi7, 1-1-60FAAphi6, Valeniciaphi2 and DSM2243phi4 in a similar manner did not give any insight to their respective attB site.

Discussion
E. lenta is a bacterium of the human gastrointestinal tract implicated in the metabolism of medical and dietary compounds. To allow improved understanding of this species role in relation to human health routine strategies must be developed to enable its isolation and cultivation. To date, there is a limited description of a methodology to perform isolation of E. lenta from the human gut microbiome using selective growth. In this study, we devise a simple strategy utilising BHI++ medium supplemented with β-lactam antibiotics (ceftriaxone and cefotaxime). The use of this media allowed the selective isolation of E. lenta strains directly from human faeces (from eight of ten inspected individuals), based on their growth and characteristic colony morphology (colonies with dark yellow/brown pigmentation). Other bacteria identified to grow on this selective medium include B. fragilis or B. uniformis, both species are normally found in the gut microbiota of the human colon and have been documented to possess resistance to β-lactam antibiotics due to β-lactamase production [91,92].
To further understand the parameters that potentially influence the diversity and colonisation of E. lenta in the human gut, we investigated the prophages that infect this species. Genome sequencing and comparative genomics of seven newly sequenced and 50 publicly available E. lenta isolates allowed us to establish that 5.6% of the orthologous groups (OGs) that form the pangenome of E. lenta can be associated with prophages. This value is within range for prophages infecting other species of Actinobacteria, estimated at between 2% to 6.7% of OGs forming the Bifidobacterium pangenome [77,93].
These prophages could be placed into ten distantly related clades. Based on criteria set down by the International Committee for the Taxonomy of Viruses (ICTV) these clades achieve genus designation due to their shared nucleotide homology, gene synteny and a similar number of CDSs and tRNA genes [89]. The novelty of these prophages is striking, as indicated by their position among 12,892 phage genomes using vConTACT2. This phylogenetic analysis placed our newly identified prophages among a complex cluster of phages infecting members of the phylum Firmicutes, with their closest related genus being Cequinduevirus whose members infect the genus Lactobacillus. This analysis highlights that phage genomes that currently reside in public databases are skewed towards phages infecting bacteria that possess limited homology to those of E. lenta. As of 2021, NCBI virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/ accessed on 2 May 2020) shows phage host entries that can be placed in the phylum Actinobacteria represent approximately 9% of total entries. However, there is only one phage entry under the class Coriobacteriia, which contains E. lenta. The identification of these prophage genomes should enrich public databases, enabling better identification of such phages among metagenomic studies, but also improve bioinformatic tools that allow identification of prophage sequences among bacterial genomes.
Our comparative analysis allowed the identification of a rin shufflon among one of these prophage clades (clade 1), while a DRG element was found among another five others (clades 2, 3, 4, 7 and 10). These systems appear to act on genes implicated in phage host range (likely receptor binding proteins), likely causing their diversification and influencing host range. The rin shufflon has been described among prophages of Bifidobacterium [76], while DGR elements have been reported to be highly prevalent in the human microbiome [94], where they are found among phages infecting species of the phyla Proteobacteria, Firmicutes and Bacteroidetes [84,95,96]. Their presence has also been indicated among phages of Actinobacteria of the human microbiome, with prophage elements of E. lenta having been previously flagged [96]. This study confirms their widespread prevalence among prophages infecting this species. We also obtained evidence that prophages may potentially impact host fitness, with prophages of clade 5 possessing genes implicated in exopolysaccharide biosynthesis (or possibly capsular polysaccharides).
The CRISPR/cas system of E. lenta has been previously demonstrated to be functional and reported to target prophages. We confirm this finding, determining that 8.3% of the total unique spacers identified among CRISPR arrays of 44 E. lenta genomes examined in this study target prophages of five of the ten E. lenta prophage clades found in this study. These prophages are not uniformly targeted by this defence system, suggesting these prophages may utilise a defence mechanism against this system. Such as the utilisation of anti CRISPR proteins documented among phages infecting species of several different bacterial families and have been found to occur among lysogenic phage of Pseudomonas [97,98]. These observations may explain the lack of an obvious correlation between the presence or absence of a CRISPR/cas system and the number of prophages associated with a host genome. As has been determined for other Actinobacteria genera such as Bifidobacterium, the presence of CRISPR/cas does not mean a bacterium genome will possess fewer prophage elements in its genome [93].
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/microorganisms10010195/s1. Figure S1. Colony morphologies observed for presumptive E. lenta colonies. Figure S2. E. lenta isolates APC-FCC8 and APC055-920-1E Gram stained and examined under light microscopy. Figure S3. ANI % of 57 E. lenta genomes and E. sinesis DSM16107. Figure S4. Presence/absence heatmap of genes among E. lenta isolates involved in the activation/inactivation of drugs and dietary compounds. Figure S5. VICTOR-generated phylogenomic Genome-BLAST Distance Phylogeny (GBDP) trees of E. lenta prophages. Figure S6. Number OGs annotated with a particular function found in pangenome of E. lenta prophages. Figure S7. A maximum likelihood phylogenetic of the large terminase proteins of prophage belonging to different prophage clades of E. lenta. Figure S8. Analysis of E. lenta prophages using VIPtree, which indicates a distant relationship with phages of the genus Cequinduevirus. Figure S9. Comparison of the genomes of L. lenta clade 9 prophages with Lactobacillus phage c5 employing TBLASTX and visualised with Easyfig. Figure S10. Ratio of median coverage of different prophage vs median coverage of their entire bacterial host's genome. Figure S11. PCR targeting at attP of prophages among different strains of E. lenta. Table S1: Details of strains utilised in this study for experimental and genomic analysis. Table S2. Details for E. lenta isolates obtained in this study.