Small and Equipped: the Rich Repertoire of Antibiotic Resistance Genes in Candidate Phyla Radiation Genomes

ABSTRACT Microbes belonging to Candidate Phyla Radiation (CPR) have joined the tree of life as a new branch, thanks to the intensive application of metagenomics and sequencing technologies. CPR have been eventually identified by 16S rRNA analysis, and they represent more than 26% of microbial diversity. Despite their ultrasmall size, reduced genome, and metabolic pathways which mainly depend on exosymbiotic or exoparasitic relationships with the bacterial host, CPR microbes were found to be abundant in almost all environments. They can be considered survivors in highly competitive circumstances within microbial communities. However, their defense mechanisms and phenotypic characteristic remain poorly explored. Here, we conducted a thorough in silico analysis on 4,062 CPR genomes to search for antibiotic resistance (AR)-like enzymes using BLASTp and functional domain predictions against an exhaustive consensus AR database and conserved domain database (CDD), respectively. Our findings showed that a rich reservoir of divergent AR-like genes (n = 30,545 hits, mean = 7.5 hits/genome [0 to 41]) were distributed across the 13 CPR superphyla. These AR-like genes encode 89 different enzymes that are associated with 14 different chemical classes of antimicrobials. Most hits found (93.6%) were linked to glycopeptide, beta-lactam, macrolide-lincosamide-streptogramin (MLS), tetracycline, and aminoglycoside resistance. Moreover, two AR profiles were discerned for the Microgenomates group and “Candidatus Parcubacteria,” which were distinct between them and differed from all other CPR superphyla. CPR cells seem to be active players during microbial competitive interactions; they are well equipped for microbial combat in different habitats, which ensures their natural survival and continued existence. IMPORTANCE To our knowledge, this study is one of the few studies that characterize the defense systems in the CPR group and describes the first repertoire of antibiotic resistance (AR) genes. The use of a BLAST approach with lenient criteria followed by a careful examination of the functional domains has yielded a variety of enzymes that mainly give three different mechanisms of action of resistance. Our genome analysis showed the existence of a rich reservoir of CPR resistome, which is associated with different antibiotic families. Moreover, this analysis revealed the hidden face of the reduced-genome CPR, particularly their weaponry with AR genes. These data suggest that CPR are competitive players in the microbial war, and they can be distinguished by specific AR profiles.

new divisions (2). Representatives of these divisions have been moved out of the group of undiscovered living organisms (microbial dark matter) (3). Among these discoveries, many questions have been raised about a new group of microbes which is close to bacteria but quite unique and known as Candidate Phyla Radiation (CPR) (4,5).
CPR are a group of highly distinct and abundant ultrasmall microbes (6,7), which represent more than 26% of known bacterial diversity (3). These microbes are characterized by their reduced-size genomes (8) and the occurrence of a high percentage of unknown-function proteins (9). Recently, a comparative study of protein families between CPR and bacteria showed that CPR have a prevalence of proteins involved in a symbiotic lifestyle and interaction with other microbes (9,10). This interaction is required, since they have minimal metabolic capabilities (11). Therefore, they are highly auxotrophic with a lack of essential encoding genes for some pathways which are critical to the autonomous lifestyle (12).
Paradoxically, the lack of these genes can sometimes help them to survive in their habitat (13). For example, despite the absence of a viral CRISPR defense system in Patescibacteria (the superphylum that contains most CPR genomes), members of this superphylum can escape bacteriophage attacks (attachment) by the natural suppression of common phage membrane receptors (14).
In addition, these newly described microbes are considered uncultured bacteria to date. They have been detected thanks to metagenomic or metabarcoding analyses of their rRNA sequences (4). However, few members of the Saccharibacteria superphylum have been isolated in coculture with an obligatory bacterial host (15). This association is essential for its viability and growth (15). The first cocultured Saccharibacteria strain (TM7x, "Candidatus Nanosynbacter lyticus") was cocultured in 2015 based on its streptomycin resistance selection (15). So far, a few members of this superphylum have been successfully cocultured with different bacterial hosts by different protocols (15)(16)(17).
Moreover, according to metagenomic analyses of ancient DNA, CPR microbes have been reported in ancient samples of Neanderthal calcified dental plaque (calculus) dated thousands of years ago (29). In parallel, survival strategies, including antibiotic resistance (AR) gene components, have long been reported in the microbial world (30,31). Various studies have shown the natural existence of AR genes in microorganisms even before the discovery and introduction of antibiotics by humans in the mid-20th century (32). These AR genes have also been detected from ancient samples dating back millions of years in diverse environments (32). The mechanisms of AR are due to the absence of antibiotic targets, their modification following a mutation on preexisting genes, or the presence of protein-coding genes (33). Some genes can inactivate the antibiotic by enzymatic activity, while other genes confer AR by target protection or alteration (33).
Given that CPR members (i) are widely spread in different ecological niches and microbiomes, (ii) have never been isolated and grown in pure culture, and (iii) have a high number of unknown biosynthetic activities within their genomes, few if any studies have investigated the defense mechanisms and competing behavior of CPR cells. In fact, survival strategies, which are, namely, AR gene components expressed by CPR members against other microbes in different hostile/competitive environments, have not yet been explored. For this purpose, we describe for the first time the first repertoire of AR genes in CPR genomes by in silico analysis using BLAST and search for functional domains. We found that CPR members are also players in this microbial "infinity war."

RESULTS
CPR microbes contain vastly divergent AR-like genes according to reference bacterial protein databases. In this study, we developed an adaptive strategy for the specific detection of AR-like genes in the 4,062 CPR genomes tested ( Fig. 1; see also Table S1 in the supplemental material) and assigned them according to the available taxonomy of NCBI into 10 CPR superphyla. Only the superphylum Patescibacteria was divided into 4 groups or phyla: Parcubacteria, Gracilibacteria, Microgenomates, and unclassified Patescibacteria. The simple BLASTp search of the 3,654,820 CPR protein sequences predicted using the RAST server, against a total of 12,033 AR protein sequences, resulted in 320,121 matches ( Fig. 1). Reciprocal BLASTp followed by the search for conserved protein functional domains against the conserved domain database (CDD) led to 175,238 potential AR genes ( Fig. 1; see also Materials and Methods). We then focused only on enzyme-encoding genes that confer resistance to a given antibiotic family. However, after eliminating all BLASTp matches (hits) that need further examination of mutations (134,693 hits), we could retain a total of 30,545 hits ( Fig. 2; Tables S2 and S3), corresponding to 89 AR-like genes ( Fig. 2; Tables S2 and S3). These genes constituted the target data set in our analysis and were considered the CPR resistome ( Fig. 1). This is used for deciphering the high potential of proto-resistance genes as a deep reservoir of AR in these microorganisms.
Even though these potential resistance genes in CPR microbes had the same functional domains as bacteria, most AR gene sequences found in CPR had a similarity percentage ranging from 30% to 40% with bacterial AR gene sequences (Fig. 3). This highlights the divergence of the CPR sequences from those of bacteria. High divergence of FIG 1 Study design. The first step consists of annotating the CPR genomes available on the NCBI website, using the RAST server. The CPR protein sequences are considered queries for BLASTp against consensus databases of bacterial antibiotic resistance (AR) genes. The analysis was performed with a minimum identity and coverage percentage of 20% and 40%, respectively, and a maximum E value of 0.0001. The AR preliminary hits resulting from the simple BLASTp are queried against the multiple databases of AR genes as performing a reciprocal BLASTp. Further analyses were undertaken to detect the protein functional domain for hits with enzymatic activity using the conserved domain database (CDD). Finally, bibliographical research was conducted to select enzymes conferring resistance with specific mechanisms of actions as CPR resistome.
CPR sequences was observed specifically in beta-lactam, aminoglycoside, fosfomycin, and phenicol antibiotic resistance families with low percentages of sequence similarity (20% to 30%) with those of the bacteria (Fig. 3). It is noteworthy that among the total of 30,545 detected AR-like genes, 10,443 were annotated as hypothetical proteins by the RAST server despite having the functional domain responsible for the resistance to a proper antibiotic family.
In addition, we found AR hits in almost all studied CPR genomes across different superphyla; thus, 4,052 genomes were positive through our analysis out of 4,062 genomes tested (99.75%). The prevalence of the AR content was fairly diversified between the CPR superphyla, as the number of their available genomes was not homogeneous ( Fig. 2 and 4; see also Tables S2 and S3). Furthermore, each CPR superphylum holds AR genes to at least six different classes of antibiotics, and they have nearly the same distribution of AR hits (Fig. 5). The different CPR superphyla had in common AR FIG 2 Multi-informative heat map of antibiotic resistance (AR)-like genes in CPR genomes. Detection of 30,545 AR-like genes in 4,062 CPR genomes using an adapted AR screening strategy. The abundance of each AR-like gene on each CPR phylum is relative to the total number of AR-like genes found in all CPR phyla (number of AR-like genes found in CPR phylum divided by the total hit number of this AR family). MLS indicates the merging of the three antibiotic families macrolide, lincosamide, and streptogramin. "Others*" indicates the merging of five antibiotic families with fewer AR-like genes: pyrazinamide, nitroimidazole, bacitracin, colistin, and fusidic acid. A cross indicates an antibiotic that acts on the cell wall, a filled triangle indicates an antibiotic that acts on the ribosome, and a filled square indicates an antibiotic that acts on the nucleic acid. A star indicates AR-like genes that confer resistance by antibiotic-inactivating enzymes, an open circle indicates AR-like genes that confer resistance by antibiotic target alteration, and an open square indicates AR-like genes that confer resistance by antibiotic target protection. "Other CPR phyla*" indicates the merging of all "Candidatus" CPR phyla with fewer than 100 genomes: "Candidatus Berkelbacteria," "Candidatus Doudnabacteria," "Candidatus Wirthbacteria," Candidate division Kazan, "Candidatus Dojkabacteria," "Candidatus Absconditabacteria," and "Candidatus Gracilibacteria." The cladogram of CPR superphyla was based on the clustering of the heat map. genes to five antibiotic families, namely, glycopeptide, beta-lactam, MLS, tetracycline, and aminoglycoside, which highlight the importance of the function of these AR hits in CPR genomes.
The prevalence of detected enzymes according to each chemical antibiotic class. In this part, we examined all chemical classes of antibiotics for which we have detected hits in all CPR superphyla. Starting with glycopeptide, the resistance hits were found to be the most abundant AR-like genes. Glycopeptide antibiotics destabilize the cell wall by interfering with peptidoglycan synthesis (34). Resistance to glycopeptide (particularly vancomycin) involves the modification of the antibiotic target D-alanine:D-alanine into D-alanine:D-lactate or D-alanine:D-serine (34). Since vancomycin resistance is mediated by a cluster of genes including essential, regulatory, and accessory genes, we searched within CPR genomes for the presence of at least the three essential genes in the cluster (35). These essential genes can be classified into nine sorts based on their genetic sequences and structures: vanA, vanB, vanC, vanD, vanE, vanG, vanL, vanM, and vanN (35).
Forty-eight of the CPR genomes have a potential for vancomycin resistance as they carry the three essential genes for the functioning of a given cluster. We looked for the gene that gives the cluster name, plus vanH and vanX for D-Ala:D-Lac clusters and vanX and vanT for D-Ala:D-Ser clusters. We found a total of 18 D-Ala:D-Lac vancomycin clusters, including nine vanA clusters, three vanB clusters, five vanD clusters, and one vanM cluster. For D-Ala:D-Ser ligase gene clusters, we found 25 vanC clusters, one vanE cluster, three vanG clusters, three vanL clusters, and two vanN clusters (a total of 34 D-Ala: D-Serine vancomycin clusters) (Fig. S1). Of these 48 genomes, four had two different types of vancomycin clusters: one genome presented the essential genes of the vanB and vanD clusters, one genome had vanA and vanC clusters, and two genomes had vanL and vanN clusters (Fig. S1). More analysis is needed to search for the presence of other components (such as regulatory and accessory genes) and to verify the synteny of these genes as they participate together in the correct functioning of the vancomycin cluster.
Given that CPR members have very small genomes in comparison with other microorganisms (2, 4), it is profitable for these microbes to have multifunctional genes, such as beta-lactamases (36). The beta-lactam-resistant enzymes (namely, beta-lactamase) hydrolyze the beta-lactam ring in their molecular structure (37). The 5,759 beta-lactamresistant hits belong to four different classes (A, B, C, and D) ( Fig. 2; see also Tables S2 and S3). Class B metallo-beta-lactamases are the most frequent, representing 58.3% of those detected (3,359 hits over 5,759 hits) (Tables S2 and S3). This class has been classified into three different subclasses of metallo-beta-lactamases depending on the annotation of the CDD results: 17 hits belong to subclass B1, 385 hits to subclass B2, and 2,957 hits to subclass B3. Moreover, 2,400 of the serine-beta-lactamases are distributed over 27.9% of class A, 0.5% of class C, and 13.3% of class D (Tables S2 and S3).
Although macrolide-lincosamide-streptogramin (MLS) antibiotics are chemically distinct, but because of their similar mechanism of action, they are classified in the same group (38). MLS antibiotics act on the 50S subunit of the 23S rRNA gene (39,40). The most common genes (erm [n = 3,077 hits] and cfr [n = 648 hits]) ( Fig. 2; see also Tables S2 and S3) detected in the CPR genomes are involved in MLS resistance by altering the MLS target with esterase activity and methylation of the 23S rRNA subunit, respectively, followed by streptogramin acetyltransferase (vat; n = 338 hits) with MLS-inactivating enzyme activity (Tables S2 and S3). In addition, our study showed aminoglycoside resistance hits in all CPR superphyla with different transferase activities including adenylyltransferase, phosphotransferase, and acetyltransferase. Aminoglycosides are a family of molecules containing an aminocyclitol ring where they bind to the A site of the ribosome and disrupt protein synthesis (41). The majority of aminoglycoside-resistant-like genes code for acetyltransferase activity, of which the most abundant genes are aac (aminoglycoside acetyltransferase, n = 1,831) and gna (gentamicin acetyltransferase,  Tables S2 and S3). Finally, almost all tetracycline-resistant hits confer resistance through ribosomal protection. The genes encoding tetracycline resistance ribosomal protection protein in the CPR were tetT (n = 2,243), tetBP (n = 778), and tetW (n = 564) ( Fig. 2; Tables S2 and S3).
Antibiotic resistance profile according to CPR superphyla. Based on our AR screening strategy (Fig. 1), only 10 genomes out of the 4,062 analyzed genomes were found to not contain AR genes. The other genomes were found to be positive with a notable average of 7.5 AR-like genes per genome. The general distribution of hits classed according to antibiotic family was almost consistent in the various CPR superphyla, with some exceptions (Fig. 5), despite the high difference in the number of AR hits found between the Parcubacteria phylum including most CPR genomes and "Candidatus Wirthbacteria" with only two genomes (15,645 AR hits in 2,222 tested genomes compared to 21 AR hits, respectively) (Fig. 2, 4, and 5; see also Tables S2 and S3). Interestingly, CPR superphyla were clustered into three major groups according to their AR content and the abundance of the detected genes ( Fig. 2; see also Fig. S2). The first group includes Parcubacteria genomes, the second includes Microgenomates genomes, and the last group includes the remaining CPR superphyla. Three different AR profiles were therefore identified for CPR superphyla. In the Microgenomates group, we observed a significant number of genes with adenylyltransferase (aad) and acetyltransferase (gna) activity against aminoglycosides, phosphorylation of fosfomycin (fomA), and a remarkable number of class A and D beta-lactamases (Tables S2 and S3). Moreover, this group of Microgenomates possesses the greatest number of cat (chloramphenicol acetyltransferase) enzyme-encoding genes, streptogramin acetyltransferase (vat) genes, and rifampin phosphotransferase (rph) genes detected among all CPR genomes. In contrast, taking Saccharibacteria as an example of the group of other CPR superphyla, members of this superphylum have a high number of streptogramin lyases  Tables S2 and S3).
It should be noted that the more available genomes we analyzed, the more likely we were to detect additional antibiotic resistance families. This is the case for the Parcubacteria group, where bah (the amidohydrolase enzyme that inactivates bacitracin), icr (intrinsic colistin resistance enzyme), and fus (fusidic acid resistance enzyme) were found only in this CPR group ( Fig. 2; see also Tables S2 and S3). Finally, considering the isolation source (i.e., human-associated genomes versus environmental genomes) for a given CPR lineage, each of the three CPR superphyla found regularly in humans, namely, Saccharibacteria, Gracilibacteria, and Absconditabacteria, showed a quantitative variability aspect of AR-like gene content compared to their environmental counterparts ( Fig. 6; Fig. S3). In the Saccharibacteria superphylum, for example, the erm gene is more abundant in human-associated CPR genomes (258 hits over 71 genomes) than in those found in the environment (80 hits over 187 genomes) ( Fig. 6; Table S4). It is also the case for the vat gene in the Gracilibacteria phylum (5 hits for human-associated CPR genomes and 1 hit for others recovered from environmental sources). Interestingly, genomes from the three human-associated CPR superphyla showed a higher average of AR-like genes per genome than others detected from the environment. This was found specifically in the Absconditabacteria superphylum, with 9.8 AR-like genes/genome for those detected in humans and 2.75 for others found in the environment (Fig. 6; see also Table S4).
To sum up, these results are suggestive of the influence of the CPR environment on its phenotypic characteristics and suggest a link between CPR members, other microbes, and their environment. Together, the high presence of AR-like genes in all CPR genomes may suggest that they have other vital functions for CPR cells and that therefore they are not strictly related to antibiotic resistance. They are likely to be functionally linked to other metabolic pathways and, subsequently, to participate in the survival of these microorganisms.

DISCUSSION
There are significant knowledge gaps in our understanding of the physiological and biological processes of CPR, as well as of their interactions with host bacteria and their potential associations with human pathologies. Thus, it is essential to expand our research on these living microorganisms, which represent a new branch in the tree of life (2). This study aimed to report the existence of AR in these microbes and to determine the AR profile of each CPR superphylum. These analyses may contribute toward a better elucidation of CPR phenotypic characteristics and defense mechanisms.
Here, we conducted a thorough in silico screening for AR in all CPR genomes available on NCBI, regardless of their quality and assembling methods, to detect the maximum of AR genes. The superphylum assignments were taken from NCBI and were not independently confirmed in our study, since our main aim is not reclassifying CPR but reporting AR genes in all their available genomes. Our analysis was based on a thoughtful strategy for these new microorganisms, using computational methods with adapted criteria. We revealed a rich repertoire of AR genes contained in almost all tested CPR genomes. We allocated the AR-like genes into families/groups to visualize the prevalence of AR genes in different CPR superphyla and, potentially, to find a correlation between genes encoding resistance to a particular antibiotic family and the superphylum of interest.
Since resistance has never been searched for in CPR before and given that CPR microbes have not yet been grown in pure culture without their bacterial host, their resistance can be explored only by in silico analysis for the moment. AR screening of CPR genomes by analyzing nucleotide sequences against a database of bacterial resistance genes (the classical method of AR profiling in the bacterial domain) (42) resulted in a negligible number of hits compared with our optimized strategy. Only five CPR genomes were positive with a total of 9 AR genes. These genes fall within the AR genes found using our optimized strategy with protein sequences of the CPR genomes tested (see Table S5 in the supplemental material).
It was critical to establish an adapted strategy for AR screening in CPR genomes, as they have different nucleotide and protein sequences from bacterial ones (1). Because of the functional constraints, protein sequences have a low evolutionary rate compared to DNA sequences (43). Accordingly, we used the protein sequences that were determined by RAST annotation, which gave the low percentages of unannotated proteins (44). The high percentage of hypothetical proteins obtained is concentrated in CPR, because many of their metabolic pathways and biosynthetic capacities have not been determined yet (10).
Attempting to study a new branch of the tree of life when there is a huge lack of data is challenging. For this reason, less stringent BLAST parameters were used to achieve a more comprehensive exploration of the AR contents (9). Multiple AR gene databases were used to detect maximum hits, since there is currently no specific AR database for CPR members. In addition, a reciprocal BLASTp search was performed to reduce the number of false-positive results. Then, we identified the functional protein domains for the detected hits to retain only the protein sequences with patterns related directly to the AR and more precisely the enzymatic activities. These enzymes confer resistance by acting directly on the inactivation of the corresponding antibiotic or by protection or alteration of its target. The AR genes that require small nucleotide polymorphism to generate resistance were discarded from the analyses since CPR sequences are neither comparable nor similar to those of bacteria. To benchmark our AR screening strategy, we did the same analysis against a well-characterized set of conventional bacteria [Streptomyces coelicolor A3(2) and Streptomyces hygroscopicus] (Fig. S4). Interestingly, after keeping only the enzyme-encoding genes containing AR domains (33 and 37 AR-like genes, respectively), we had very similar results as the RAST annotation (according to PATRIC) of the 2 tested genomes (35 and 34 AR genes, respectively) (Fig. S4). In summary, our strategy and specifically the screening of AR domains give reliable results and can help to characterize new AR genes even in bacteria (45). Altogether, our multistep study design guarantees an optimal balance between the intended function (specificity) and permissive stringency (sensitivity).
Nevertheless, despite our precautions to be as exhaustive as possible, this strategy might have missed some resistance genes, leading to false-negative results. It could be expected that CPR members have AR sequences that are significantly different from those of bacteria, with new patterns and undescribed resistance mechanisms. Even though they may have evolved from within bacteria or have emerged from an unknown protogenote with bacteria (9), they definitely have sequences divergent from those of bacteria due to rapid evolutionary phenomena (1). Indeed, the AR genes of CPR were not found to be very similar to the AR genes in the AR databases from bacteria. In addition to the resistance profiling found in this study, the possible presence of efflux pumps in CPR cells, as in all living microorganisms, which participate in the detoxification process by expelling various harmful and xenobiotic compounds, should not be overlooked. In particular, these include the multidrug efflux mechanisms which are normally encoded by the chromosome (46).
The surprising and somewhat paradoxical presence of AR-like genes in the reduced genomes of CPR raises questions about their origin and their real function in these microorganisms. These genes can give CPR advantages to resist antibiotics released by other microorganisms sharing a common ecological niche with them. Thanks to our strategy, 34.2% of the results (10,344 out of 30,545 AR-like genes) can be now reannotated as potential AR genes instead of being simple hypothetical proteins. The divergence observed in these sequences suggests that they have other functions involved in different metabolic pathways rather than resistance to antibiotics. Further studies are needed to confirm their functions.
Concerning the resistance to the glycopeptide family, we could detect 20 different types of vancomycin resistance, in the 4,062 CPR genomes, which covers the high diversity that has previously been described for these genes (35). However, because the function of these genes depends on their presence in an operon (33), the lack of synteny compared to the better-characterized vancomycin clusters seems to bring into question the effectiveness of this system. At first glance, it seems important to search for the additional components, including accessory and regulatory genes, to have a complete vancomycin cluster (33).
However, as described previously, the membrane of CPR cells is very similar to that of Gram-positive bacteria (9), which develop resistance to vancomycin by modifying the D-alanine:D-alanine peptidoglycan precursor (33). CPR microbes seem to have a natural presence of regulatory genes in their genomes; thus, efflux pumps like vanR are present in all CPR genomes (100%) (data not shown). These genomes may naturally produce the D-Ala:D-Lac or D-Ala:D-Ser peptidoglycan precursors rather than the natural precursor D-Ala:D-Ala in bacteria. This supports the intelligent way in which these microorganisms survive with a limited number of genes, that is, an incomplete but functional cluster (i.e., no need for accessory genes, as their name indicates). This supports the idea that the CPR genome is simple but efficient. Further analysis should be carried out to verify the AR conferred by the absence or modification of the antibiotic's targets, in addition to that conferred by the presence of active enzymes determined as part of this study.
Our results also show that there is almost one beta-lactam resistance gene per CPR genome; 77% of the tested genomes have at least one gene which codes for beta-lactamases (classes A, B, C, and D). These genes may play a role in the degradation of substances used in metabolic pathways, including beta-lactams. Several studies have shown that beta-lactamase genes are multifunctional genes which play several roles including, but not limited to, endonuclease, exonuclease, RNase, and hydrolase (47).
Furthermore, beta-lactamases have been detected in other life domains including bacteria (48), eukaryotes (49), and archaea (50) and therefore may also be present in CPR. It is very likely that the presence of multifunctional genes is necessary and indispensable in CPR members, due to their small genomes and the very reduced number of genes per genome compared to other microorganisms.
Interestingly, aminoglycoside resistance has been mentioned and used for the coculture of TM7x (a phylotype of "Candidatus Saccharibacteria") with its host species bacterium, Schaalia odontolytica strain XH001 (15). The authors enriched TM7x through streptomycin selection, as its host is also highly resistant to streptomycin. It is likely that CPR members are resistant to aminoglycosides and other antibiotics targeting RNA. Besides having an uncommon ribosome composition/sequence, some CPR have introns in their 16S and 23S rRNA and tRNA (15). Given their tiny genomes, this is a prominent feature for them to carry multifunctional genes, depending on the intron splitting.
It is worth noting that the number of AR-like genes was not found to be correlated with genome size. Thus, the CPR that are found in the environment have a larger genome size but a smaller average of AR-like genes than the CPR detected in human microbiomes. This reversed tendency in the three CPR superphyla (Saccharibacteria, Gracilibacteria, and Absconditabacteria) detected in humans and the environment can be linked to the source of isolation of CPR genomes tested and/or to antibiotic consumption in humans.
The significant prevalence of AR genes in this new branch of the tree of life sheds light on the problem of choosing the appropriate treatment in the clinical field. The overlooking of AR screening in CPR might be responsible for the observed failure to provide adequate antibiotic treatment. It is important to investigate whether this failure of different cases is due to the presence of hidden resistance genes or the presence of resistance genes that have not been searched for. Indeed, our study has already confirmed that CPR genomes can act as resistance vectors that can transfer the AR profile to bacteria even without needing gene transfer events. Consequently, these AR genes may also give advantages to the attached bacteria (host bacteria) to survive in the environment (which secrete antibiotic without having the convenable resistance gene) or against other microbes which produce antibiotics. Otherwise, CPR could use their AR profile to protect their bacterial host against a given antibiotic, since they cannot survive without a viable one.
Finally, the AR-like genes detected in CPR genomes in our in silico screening are expected to be confirmed in upcoming in vitro experiments. A specific database for AR gene screening in CPR genomes needs to be created to collect these new results for further studies.
To conclude, this work contributes toward a new way of deciphering this new branch of the tree of life. We explicitly explored the CPR resistome by establishing an adapted AR screening strategy for these fastidious microorganisms. We found a gigantic reservoir of provisional AR, representing the first report of resistance genes in CPR genomes. These highly abundant microbes could be an interesting paradigm which constitutes an endless natural source of emerging resistances. Our findings represent a substantial opportunity for future scientific discoveries. If, as expected, the AR-like genes detected in CPR are involved in different metabolic pathways, further studies may lead to the successful growth of CPR cells in pure culture.
Genome annotation was generated using the Rapid Annotation using Subsystem Technology tool kit (RASTtk) as implemented in the PATRIC v3.6.8 annotation web service (51) (Fig. 1).
Detection of antibiotic-resistant genes in CPR genomes. For antimicrobial resistance profiling, we carried out an in-house BLAST search against the protein databases from ARG-ANNOT (Antibiotic Resistance Gene-ANNOTation) (42), BLDB (Beta-lactamase database) (52), and NDARO (National Database of Antibiotic Resistant Organisms) (53) containing 2,038, 4,260, and 5,735 sequences, respectively. To get a comprehensive view of the CPR resistome, we used relaxed parameters including a minimum percent identity and coverage length equal to 20% and 40%, respectively, and a maximum E value of 0.0001. All results were checked manually to remove duplications (Fig. 1).
Predicted AR genes in each CPR genome were individually compared to proteins in each AR database by reciprocal BLASTP (54). The number of reciprocal best hits was counted using an expectation value (E) of 0.0001 as the stringency threshold for determining a valid best hit (Fig. 1). Only the CPR protein sequence resulting from the reciprocal BLASTp and matched with the same AR gene resulting from the first BLASTp was conserved for the next step as the preliminary results of AR genes (Fig. 1).
To eliminate false-positive hits, a BLASTp search of the preliminary AR genes as a query data set was performed against the conserved domain database (CDD) (https://www.ncbi.nlm.nih.gov/Structure/cdd/ wrpsb.cgi) (Fig. 1). The predicted AR genes with a protein domain necessary for the AR mechanism were subsequently selected. A literature review was conducted for each family of antibiotics detected in the CPR genomes to determine the mechanism of AR. We were interested only in the genes in which the AR mechanism depends on enzymatic activity and did not consider the mechanisms that require a further search for site mutations (Fig. 1).
AR-like genes detected in CPR tested genomes are represented using Cytoscape v.3.8.2 to highlight the link between different antibiotic families and distinct CPR superphyla. These genes are also represented in a multi-informative heat map created by the Displayr online tool (www.displayr.com), to show the distribution of different AR-like genes on CPR superphyla and their mechanisms of AR.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only.