Deciphering Symbiotic Interactions of “Candidatus Aenigmarchaeota” with Inferred Horizontal Gene Transfers and Co-occurrence Networks

ABSTRACT “Candidatus Aenigmarchaeota” (“Ca. Aenigmarchaeota”) represents one of the earliest proposed evolutionary branches within the Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota (DPANN) superphylum. However, their ecological roles and potential host-symbiont interactions are still poorly understood. Here, eight metagenome-assembled genomes (MAGs) were reconstructed from hot spring ecosystems, and further in-depth comparative and evolutionary genomic analyses were conducted on these MAGs and other genomes downloaded from public databases. Although with limited metabolic capacities, we reported that “Ca. Aenigmarchaeota” in thermal environments harbor more genes related to carbohydrate metabolism than “Ca. Aenigmarchaeota” in nonthermal environments. Evolutionary analyses suggested that members from the Thaumarchaeota, Aigarchaeota, Crenarchaeota, and Korarchaeota (TACK) superphylum and Euryarchaeota contribute substantially to the niche expansion of “Ca. Aenigmarchaeota” via horizontal gene transfer (HGT), especially genes related to virus defense and stress responses. Based on co-occurrence network results and recent genetic exchanges among community members, we conjectured that “Ca. Aenigmarchaeota” may be symbionts associated with one MAG affiliated with the genus Pyrobaculum, though host specificity might be wide and variable across different “Ca. Aenigmarchaeota” organisms. This study provides significant insight into possible DPANN-host interactions and ecological roles of “Ca. Aenigmarchaeota.” IMPORTANCE Recent advances in sequencing technology promoted the blowout discovery of super tiny microbes in the Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, and Nanohaloarchaeota (DPANN) superphylum. However, the unculturable properties of the majority of microbes impeded our investigation of their behavior and symbiotic lifestyle in the corresponding community. By integrating horizontal gene transfer (HGT) detection and co-occurrence network analysis on “Candidatus Aenigmarchaeota” (“Ca. Aenigmarchaeota”), we made one of the first attempts to infer their putative interaction partners and further decipher the potential functional and genetic interactions between the symbionts. We revealed that HGTs contributed by members from the Thaumarchaeota, Aigarchaeota, Crenarchaeota, and Korarchaeota (TACK) superphylum and Euryarchaeota conferred “Ca. Aenigmarchaeota” with the ability to survive under different environmental stresses, such as virus infection, high temperature, and oxidative stress. This study demonstrates that the interaction partners might be inferable by applying informatics analyses on metagenomic sequencing data.

Metabolic features of "Ca. Aenigmarchaeota." Based on 8 MAGs from this study and 3 SAGs and 12 MAGs from previous studies, we constructed the metabolic pathways of "Ca. Aenigmarchaeota" (Fig. 2). Consistent with previous studies on the DPANN superphylum, all 23 "Ca. Aenigmarchaeota" MAGs have limited metabolic capacities. Pathways including the tricarboxylic acid cycle (TCA), fatty acid metabolism, and dissimilatory/assimilatory sulfur and nitrogen metabolism were missing (8,17,18). "Ca. Aenigmarchaeota" MAGs from hot springs and hydrothermal vents possess an incomplete glycolytic pathway (Fig. 2). All MAGs except DRTY-6_2 bin 201 lack the key enzyme phosphofructokinase (PFK), which impedes the formation of fructose-1,6-bisphosphate (fructose-1,6P) from fructose-6-phosphate (fructose-6P) (Data Set S1). DRTY-6_2 bin 201 seems to have a rather complete glycolysis pathway. However, the lack of glucokinase indicates the incapacity in the production of glucose-6-phosphate (glucose-6P) from glucose. Interestingly, the solely detected glycogen phosphorylase in this MAG and widely distributed phosphoglucomutase suggest that DRTY-6_2 bin 201 can phosphorylate glycogen into glucose-1-phosphate (glucose-1P) and further enter the glycolysis pathway by converting glucose-1P into glucose-6P, which could   Genome completeness was calculated as the percentage of detected marker genes among 54 conserved single-copy genes as listed in Data Set S1 in the supplementary material.
b Genome quality, including contamination and heterogeneity, were estimated by CheckM (16).
c Functional annotation was conducted by uploading genomes to the IMG database. Li et al. subsequently enter the rest of glycolysis pathway. The absence of pyruvate kinase (PK) and pyruvate kinase isozymes R/L (PKLR) prohibits the conversion of phosphoenolpyruvate (PEP) to pyruvate during the last step of glycolysis in DRTY-6_2 bin 201. However, phosphoenolpyruvate synthase (pps), which might perform the same function as PK in thermophiles (19,20), was detected in most MAGs in this study, including DRTY-6_2 bin 201. As a result, these genes may provide an alternative glycolysis pathway to DRTY-6_2 bin 201. Subsequently, genes encoding 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase (korAB) in most of the MAGs suggest that "Ca. Aenigmarchaeota" is able to catalyze the reaction from pyruvate to acetyl-CoA. The reverse reaction could be performed by pyruvate ferredoxin oxidoreductase (por), which is widely detected in MAGs of hydrothermal vents and groundwater. Three MAGs can generate membrane proton motive force (PMF) via a hydrolysis process encoded by the membrane-bound H 1 -phosphatase (H 1 -PPase) (21). Alternatively, PMF could be generated via reactions involved in the degradation of amino acids or Na 1 /H 1 antiporters (22). However, the absence of the electron transport chain (ETC), especially V/A-type ATPase, may suggest that "Ca. Aenigmarchaeota" could not produce ATP via PMF. Each "Ca. Aenigmarchaeota" contains at least one type of fermentation pathway. 201, QQ_2 bin 128, and most MAGs from groundwater could utilize adhE to produce acetaldehyde and utilize aldehyde dehydrogenase to produce acetate. In addition, QQ_2 bin 128 is predicted to produce lactate and ethanol, based on the presence of L-lactate dehydrogenase, and acetaldehyde dehydrogenase/alcohol dehydrogenase. Moreover, MAGs from hydrothermal vents and DRTY-6_2 bin 201 could utilize acetyl-CoA synthetase (ACSS) to produce acetate as previously described for other DPANN genomes (3). The production of acetate is predicted to support the growth of aerobic/anaerobic respiratory organisms, indicating the role of "Ca. Aenigmarchaeota" in the energy cycle of the microbial community (8). These results show that fermentation pathways could be the main sources of ATP generation for "Ca. Aenigmarchaeota" (23). Two genes relevant to polysaccharide degradation, including a-amylase (for starch and glycogen) and a-1,6-glucosidase (for starch and disaccharides), were identified in many of the hot spring MAGs (Data Set S1). DRTY-6_1 bin 65 might also degrade and utilize pullulan (GH13_20) and xyloglucan (GH16) (24,25). Aside from a-amylase, MAGs from hydrothermal vents harbored different and more glycoside hydrolases, including b-glucosidases (for disaccharides), glucoamylases (for starch), b-1,2-mannosidases (for b-1,2-mannotriose and b-1,2-mannobiose), and endo-1,4-b-mannanase (for b-1,4mannans, b-1,4-galactomannans, and b-1,4-glucomannans). However, MAGs from groundwater only had a-1,4-galactosaminogalactan hydrolase (for galactosaminogalactan). This might reflect a more active carbohydrate metabolism in thermal environments. None of the known carbon fixation pathways were detected in these MAGs, though three MAGs contain archaeal ribulose-bisphosphate carboxylase (RuBisCO). As previously described in other archaea, RuBisCO genes may function in the CO 2 -incorporating AMP pathway, together with genes encoding AMP phosphorylase and ribose-1,5-biphosphate isomerase (26,27). This pathway could produce glycerate-3-phosphate that enters the glycolysis pathway. Phylogenetic analysis suggests that the six RuBisCO genes recovered from "Ca. Aenigmarchaeota" belong to the form III group, of which five are from thermal environments (Fig. 3a). Four of them belong to form III-b, and the remaining two could be classified as a novel lineage that clustered with RuBisCO genes from candidate phyla radiation (CPR) genomes, which were previously suggested to have been passed by HGT from "Ca. Aenigmarchaeota" to CPR (26). Additionally, five MAGs distributed in both thermal and nonthermal ecosystems were identified to encompass all three genes involved in the AMP pathway (Data Set S1). We also identified different types of hydrogenases in "Ca. Aenigmarchaeota." Eight MAGs from thermal environments harbor NiFe 3b-type hydrogenases, which are clustered into one clade (Fig. 3b). This type of hydrogenase is functionally reversible and is capable of catalyzing the oxidation for anabolic metabolism or evolution of H 2 during fermentation (28). Unlike the MAGs from thermal environments, NiFe 3b-type of hydrogenases are absent in nonthermal environments. Instead, membrane-bound hydrogenases (group 4) are identified, illustrating that different strategies are used by nonthermophiles in producing PMF and H 2 .
Despite the possession of genes involved in glycolysis and the fermentation pathway, the absence of many pivotal pathways strongly suggests a symbiotic lifestyle for "Ca. Aenigmarchaeota." First, this archaeal phylum is devoid of de novo amino acid biosynthetic pathways. Although we found a variety of extracellular peptidases, membrane peptidases, and cytoplasmic peptidases that can degrade extracellular and intracellular proteins and peptides (Data Set S1) (3), only a few amino acid transporters were detected. The only identified one is an uncharacterized amino acid transporter (arCOG00009), suggesting a poor ability in the transport of peptides/amino acids extracellularly. Therefore, "Ca. Aenigmarchaeota" presumably obtain amino acids from hosts by physical contact, similar to Nanohaloarchaeota and Nanoarchaeota (29)(30)(31), and rely on a great number of peptidases to recycle their amino acids. Second, de novo nucleotide biosynthetic pathways are absent in most of the genomes of this phylum (32). Moreover, genes for purine and pyrimidine salvage pathways are rarely detected in most MAGs from hot springs (Data Set S1), illustrating the further reliance on a host to provide requisite nutrients. Third, "Ca. Aenigmarchaeota" genomes are unable to synthesize cell membranes de novo due to the lack of genes for synthesis of sterol isoprenoids involved in the mevalonate pathway (MVA), although genes encoding mevalonate kinase, glycerol-1-phosphate dehydrogenase, and associated enzymes for phospholipid biosynthesis have been detected (33,34).
Cell-surface structures might enable the interactions between DPANN archaea with their hosts (30). Genes encoding S-layers, a subset of confirmed archaellum proteins (FlaG, FlaI, and FlaJ), and several adjacent archaellum homologs are identified in most "Ca. Aenigmarchaeota" MAGs. Notably, type-IV pili in "Ca. Aenigmarchaeota" are solely found from MAGs inhabiting thermal environments (35)(36)(37). To a certain extent, these genes endow "Ca. Aenigmarchaeota" with protection, motility, and cell-to-cell attachment abilities, which might consequently facilitate host-symbiont interactions.
Stress responses used by "Ca. Aenigmarchaeota." Comparative genomics showed that "Ca. Aenigmarchaeota" inhabiting thermal environments harbor higher abundances of genes involved in genetic information processing, including "transcription," "translation," "replication and repair," and "folding, sorting, and degradation" (Fig. S3a). In addition, genes involved in "cell motility" and "posttranslational modification, protein turnover, and chaperones" predominantly were enriched in thermophiles (Fig. S3b). This reflects the fact that cells at high temperatures have to combat constant thermal denaturation of both macromolecules and monomers (38)(39)(40). "Ca. Aenigmarchaeota" MAGs from thermal environments have evolved multiple strategies to overcome this stress. Chaperonin GroEL, associated with the repair of DNA and protein damage caused by high temperature, was present in all "Ca. Aenigmarchaeota" genomes ( Fig. 4) (8,41). The prevalence of DNA repair protein RadA could be used for homologous recombination and as an alternative strategy for DNA repair (42), indicating the pervasiveness of genome reduction among these genomes. Type I glucokinase glucose-6-phosphate isomerase, archaeal glucose/mannose-6-phosphate isomerase Genomes Genes The core metabolic pathways with the presence/absence of genes of all "Ca. Aenigmarchaeota" genomes. The metabolic potential (columns) is mainly generated based on KEGG orthologs (KOs), clusters of orthologous groups (COGs), and archaeal clusters of orthologous groups (arCOGs). Genomes (rows) of "Ca. Aenigmarchaeota" were clustered by ecosystems. Eight genomes obtained from groundwater in Crestal Geyser were shown as one row because of the high metabolic similarity of the eight genomes. Small gray dots represent genes or metabolic pathways that were absent, and big dots represent genes or metabolic pathways that were present. Distinct metabolic pathways were distinguished by different colored dots. See Data Set S1 in the supplemental material for details.

G ly c o ly s is G lu c o n e o g e n e s is P y r u v a t e m e t a b o li s m F e r m e n t a t io n A m in o a c id m e t a b o li s m G a la c t o s e s m e t a b o li s m P e n t o s e p h o s p h a t e p a t h w a y A M P m e t a b o li s m A T P a s e N u c le o t id e b io s y n t h e s is p a t h w a
(IA-and IB-type) topoisomerases and reverse gyrases, the latter considered a hallmark of hyperthermophily, were solely detected in MAGs from thermal environments, which could stabilize DNA and modulate DNA topology to maintain the structure and integrity of chromosomes (40,43,44). HSP70 (DnaK), DnaJ, and GrpE were absent in MAGs from thermal environments but were commonly detected in groundwater MAGs (Fig. 4), which is consistent with the potential role of this system in the adaptation to mesophily (45). The presence of cell defense systems protects prokaryotic cells from virus infection. Based on the results of functional annotation, we investigated cell defense systems in "Ca. Aenigmarchaeota." DRTY-7_1 bin 34 was detected to contain a CRISPR-Cas system (Class III-A; Fig. S4a). All three types of restriction modification (RM) systems were found in "Ca. Aenigmarchaeota" (46) (Fig. S4b). The most frequently detected type III RM system was found in 15 (65%) of the total MAGs. Five of the eight MAGs from this study harbor a type II RM system, indicating an alternative common strategy for terrestrial thermal microbes to resist virus infection. Additionally, the recently discovered Hachiman system was detected in DRTY-6 bin 65 and the Gabija system was detected in DRTY-6 bin 215, providing broad protection against viruses (47) (Fig. S4c). The wide distribution of defense systems in thermophilic "Ca. Aenigmarchaeota" suggests that viruses could be an important threat to the survival of microbes in hot spring ecosystems (48)(49)(50). Viruses with different morphologies have been detected in hot springs and are highly active in situ (14,47). Hence, it seems plausible that "Ca. Aenigmarchaeota" may confer their hosts with immunity to viruses by serving as "viral decoys" (51). The attracted virus could be degraded, and the released DNA could be recycled as a nucleotide source (51). Therefore, the host-symbiont interaction between "Ca. Aenigmarchaeota" and its potential hosts appear to be mutually beneficial.
To reveal potential factors that facilitate the adaptation of "Ca. Aenigmarchaeota" across groundwater, hot springs, and hydrothermal vents, we compared the acquired genes between these groups. Results indicated very little overlap between HGTderived genes among these three ecosystems. Only 54 KEGG orthologs (KOs) and 67 archaeal clusters of orthologous groups (arCOGs) (11.04% and 12.27% among all KO and arCOG assignable HGTs) are shared by all the three groups (Data Set S3), suggesting that they have distinct adaptation strategies.
Through the BLAST-based analysis, as well as phylogenetic analysis, we can verify that HGT plays a crucial role during the adaptation to different stresses. For instance, the detected reverse gyrases in "Ca. Aenigmarchaeota" genomes seem to derive from "Ca. Bathyarchaeota," which may have improved the fitness of these organisms to inhabit high-temperature environments (Fig. S5a). Also, several genes encoding superoxide dismutase (SOD2) (Fig. S5b) and 8-oxo-dGTP diphosphatase (mutT) (Fig. S5c) were identified as HGTs, which could be used to resist oxidative damage and to generate PMF (22). Two genes, including transcription initiation factor IIB (TFIIB) and phage integrase, in these two MAGs DRTY-6_1 bin 65 and DRTY-7_1 bin 34 were identified as belonging to phyla outside DPANN, both of which show high identity to Euryarchaeota. However, neither of them was identified as HGTs by HGTector. From the constructed phylogeny, the TFIIB gene in Theionarchaea archaeon DG-70 is surrounded by genes from "Ca. Aenigmarchaeota," suggesting that members of "Ca. Aenigmarchaeota" are possible gene donors rather than recipients (Fig. S5d). In DRTY-6 bin 65, both genes are in the same scaffold. The taxonomic information of genes close to TFIIB is mostly affiliated with "Ca. Aenigmarchaeota," consolidating the inference of vertical inheritance of this gene. The phage integrase may be horizontally transferred from Euryarchaeota, since 6 of the 10 downstream genes are close relatives to Theionarchaea (51.3 to 77.4%) and three are close relatives to Thermoplasmata (48.3 to 61.1%). Among them, three genes exhibit homologies to methyltransferase, DNAbinding protein, and restriction endonucleases associated with type II restriction modification (RM) systems. This suggests that integrase-mediated HGT may confer "Ca. Aenigmarchaeota" the special niche to resist virus infection. Overall, the above observations further support that HGT plays a substantial role in shaping the genetic diversity of "Ca. Aenigmarchaeota" for stress response.
Putative functional and genetic interaction partners of "Ca. Aenigmarchaeota" inferred from in situ communities. Cell-to-cell contact possibly leads to an opportunity for extensive HGTs (57)(58)(59). The inferred HGTs of "Ca. Aenigmarchaeota" may facilitate us to infer their potential hosts. However, only xenologous sequences with high identities, the so-called "recent HGTs," could be used for the inference of current symbiotic relationships between the associated donor and recipient (59). Therefore, we ruled out the possibilities of Bacteria as potential hosts since most of the transferred genes represent ancient events with high divergence, though some Bacteria may contribute a lot to the genomic innovation of "Ca. Aenigmarchaeota." For instance, those genes transferred from Firmicutes only shared ;42% of the identities to the recipient genes in "Ca. Aenigmarchaeota." Additionally, most DPANN, such as Nanoarchaeota, Nanohaloarchaeota, and "Ca. Micrarchaeota," are incapable of synthesizing membrane spontaneously and must rely on assistance from their hosts (8,30,60,61). Therefore, it is more likely that the putative hosts are restricted to Archaea due to the similar membrane structure shared by DPANN and most other archaea (8). Additionally, all previous studies support our conjecture that DPANN archaea were exclusively associated with archaea to form the symbiotic relationship (e.g., Nanoarchaeum equitans and Ignicoccus hospitalis, "Ca. Micrarchaeota acidiphilum" Mia14 and Cuniculiplasma divulgatum PM4, "Ca. Nanohaloarchaeota antarcticus" and Halorubrum lacusprofundi, "Ca. Huberiarchaeum crystalense," and "Ca. Altiarchaeum hamiconexum") (17, 29-31, 62, 63).
Notably, we identified a recent HGT event between "Ca. Aenigmarchaeota" and one of the Crenarchaeota OTUs, which strengthened the putative symbiont-host relationship between them. The SOD2 gene in GMQ_1 bin 18-1 shows 89.1% identity and 100% of coverage to the gene in Pyrobaculum sp. WP30. By looking into the belonging sample, we identified a SOD2 gene from the GMQ_1 bin 7 that shares a high identity (89%) to the one in GMQ_1 bin 18-1. Remarkably, the GMQ_1 bin 7 MAG could be assigned to the genus Pyrobaculum as well. BLAST searches suggest that all SOD2 genes except the one in GMQ_1 bin 18-1 show identities of ,30%, indicating the recent HGT event specifically occurs in GMQ_1 bin 18-1 rather than in all "Ca. Aenigmarchaeota" members. In addition, the OTUs that GMQ_1 bin 7 and GMQ_1 bin 18-1 belonged to have a statistically significant positive correlation (Pearson's R 2 = 0.85, P , 1.68E214; Fig. 6c). These two bins are the first and fourth most abundant organisms in one sample by taking up greater than 50% of the cells in total. Based on the metabolic features of "Ca. Aenigmarchaeota" and Pyrobaculum, we proposed the potential interaction scenario between them (Fig. 7) (66). Specifically, for the MAGs of this study, Pyrobaculum GMQ_1 bin 7 could provide amino acids, nucleotides, membrane lipids, ATP, and active sugars to support growth of GMQ_1 bin 18-1, and GMQ_1 bin 18-1 possessed RM systems to protect the host for cell defense, which were absent in GMQ_1 bin 7. Additionally, the recent HGT of the SOD2 gene from GMQ_1 bin 7 provides GMQ_1 bin 18-1 with the capacity of oxidative stress resistance. However, the Pyrobaculum OTU is completely absent in the five communities where other "Ca. Aenigmarchaeota" MAGs in this study came from (Data Set S4). These results indicate that "Ca. Aenigmarchaeota" microbes in this study are likely to be associated with different interaction partners.
Conclusions. The enigmatic "Ca. Aenigmarchaeota" is still underexplored due to the insufficient cultured representatives or assembled genomes available. Here, we expanded the phylogenetic diversity of "Ca. Aenigmarchaeota" in hot spring environments and showed that "Ca. Aenigmarchaeota" can be found in diverse ecosystems on earth. They harbor limited metabolic capacities by missing several pivotal biosynthetic pathways, such as nucleotide, amino acid, and cell membrane biosynthesis, suggesting that such molecules need to be obtained from the environment or from the host as symbionts. Comparative genomics analysis reveals that genomes from thermal environments possess smaller genome sizes but stronger capacities in metabolizing carbohydrates. HGT The co-occurrence network between "Ca. Aenigmarchaeota" and other community members.
(a) The co-occurrence network was constructed based on the depth of the representative sequences of all OTUs in all samples using SparCC (95) with the default parameters, and 100 bootstrap samples were used to infer pseudo-P values. The nodes represent OTUs at a 95% cutoff, and these edges denote significant (P , 0.05, two-sided) and robust correlations (rho . 0.6) between pairwise OTUs.
(Continued on next page) identifies a salient number of gene flows from TACK and Euryarchaeota to "Ca. Aenigmarchaeota," especially the genes related to stress responses. By conducting cooccurrence network and recent HGT detection analyses, we highlight the power of informatics analysis in identifying putative interaction partners. However, even though significant correlations and genetic interactions are observed, further analyses such as fluorescence in situ hybridization, should be integrated to confirm the inference of hostsymbiont relationship. Overall, this study enables us to better understand the metabolic potentials and possible interactions between "Ca. Aenigmarchaeota" and their putative hosts, shedding light on the understanding of coevolution between hosts and symbionts.

MATERIALS AND METHODS
Site description, sampling, DNA extraction, and sequencing. All five hot spring sediment samples were collected from Tengchong County, Yunnan, China (24.95, 98.44)  The pH and temperature of DRTY-7 were 6.0 and 56°C, respectively, in January 2016. GuMingQuan (GMQ) is a pool with a hot spring fall above and with a length, width, and depth of around 98, 79, and 9.5 cm, respectively. Leaf litter and other debris on the bottom could be seen clearly. The sampling site is located upstream of the GMQ pool, which was named GMQS. The pH and temperature of GMQ were 9.0 and 89°C, respectively, in January 2016. QiaoQuan (QQ) is a hot spring stream flowing out from a soil slope with a rust-color trace. This spring is surrounded by bush and grass. The pH and temperature of QQ were 7.2 and 77°C, respectively, in May 2017. The JinZe-2 (JZ-2) pool is an artificial concrete cubic hot spring water reservoir with a ceiling covering the top in JinZe Hot Spring Resort. The JZ-2 pool contains turbid water and sediments on the bottom of the pool. The pH and temperature of JZ-2 were 7.6 and 63°C, respectively, in May 2017. The top 1 cm of sediment of each site was collected with a sterile iron spoon and transferred to a 50ml centrifuge tube. All sediment samples were then stored in liquid nitrogen before transporting to the lab and were stored at 220°C in the lab until DNA extraction.
Community DNA was extracted from approximately 20 g of sediment for each sample using the PowerSoil DNA isolation kit (MoBio). Libraries with an insert size of 350 bp were constructed by using an M220 Focused-ultrasonicator NEBNext and an Ultra II DNA library prep kit. The concentration of genomic DNA was measured with a Qubit fluorometer. The total genomic DNA was sequenced using an Illumina Hiseq 4000 instrument at Beijing Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). On average, 30 giga base pairs (Gbp) (2 Â 150 bp) of raw sequencing data for each sample were generated.
Metagenomic assembly and genome binning. Raw sequencing data were preprocessed to eliminate replicated reads and trim bases with low qualities, following workflows that were described previously (67). All quality reads of each data set were de novo assembled using SPAdes v3.9.0 (68) with the following parameters: -k 33,55,77,99,111 -meta. Scaffolds with a length of ,2,500 bp in each assembly were removed. BBMap v38.85 (http://sourceforge.net/projects/bbmap/) with the parameters k = 15 minid = 0.97 build = 1 was used to compute the coverage information by mapping clean reads to the corresponding assembled scaffolds without cross-mapping. Genome binning was performed based on the calculated sequence depth and tetranucleotide frequency (TNF) of each scaffold using MetaBAT v2.12.1 (69). Marker genes that occurred more than once in each bin were treated as contaminations, and associated contigs were removed manually. Specifically, genome bins were visualized by fragmenting each scaffold into pieces with the length ranging from 5 to 10 kb using ESOM v1.1 (70) for further curation, in which the discordant points were removed manually from the clusters. Scaffolds with similar TNFs but abnormal sequence depth (the abnormal sequences depth were examined manually, and the difference was mostly over 10-fold) compared to other scaffolds in the corresponding bins were also discarded. Subsequently, quality reads of the associated samples of each optimized genome bins were recruited by mapping onto all optimized genome bins using BBMap and were reassembled using SPAdes under the "-careful" mode with the parameter "-k 21,33,55,77,99,127." Contaminations and strain heteroge- OTUs are colored by modularity classes. Nodes have the same size, and edges have the same thickness. (b) The subnetworks that contain "Ca. Aenigmarchaeota" rpS3 genes. Only nodes and edges that have connections with "Ca. Aenigmarchaeota" in the corresponding modules are shown. Modules 1 in (a) were detected to contain "Ca. Aenigmarchaeota" rpS3 genes. The size of each node is proportional to the number of connections (i.e., degree). OTUs are colored by the phylum-level taxonomy. The thickness of edges denotes the Spearman rank correlation coefficients (rho values). Edges in purple show the connections between "Ca. Aenigmarchaeota" (green circles deviated from core networks) and other members. (c) Correlation between Pyrobaculum GMQ_1 bin 7-and GMQ_1 bin 18-1-associated OTUs. The two OTUs are observed in 33 out of 88 metagenomic samples. Results show a statistically significant positive correlation (Pearson's R 2 = 0.85, P , 1.68E214) between the sequence depths of these two OTUs.
Symbiotic Interactions of "Candidatus Aenigmarchaeota" July/August 2021 Volume 6 Issue 4 e00606-21 msystems.asm.org 13 neity were estimated by CheckM v1.0.12 (71); genome completeness was estimated by calculating the proportion of detected marker genes among 54 conserved archaeal single-copy genes (SCGs) (Data Set S1) (3). Finally, eight MAGs identified as "Ca. Aenigmarchaeota" were kept for the later analysis. Functional annotation of "Ca. Aenigmarchaeota" genomes. The eight MAGs were submitted to the Integrated Microbial Genomes & Microbiomes (IMG-M) (https://img.jgi.doe.gov/cgi-bin/m/main.cgi) database for gene prediction and functional annotation. For comparative genomics analysis, the annotation pipeline was also conducted locally. In brief, putative protein-coding sequences (CDS) of all MAGs, including eight MAGs from the present study and 15 genomes downloaded from public databases, were determined using Prodigal v2.6.3 (72) under the "-p single" model. Functional annotations were performed by comparing predicted CDSs against KEGG (73), evolutionary genealogy of genes: nonsupervised orthologous groups (eggNOG) (74), Pfam (75), and arCOG (76) databases using DIAMOND v0.7.9 (77) with a cutoff E value of ,1E25. rRNA-coding regions were identified using RNAmmer v1.2 (78). All MAGs were uploaded to the web server of tRNAscan-SE v2.0 (79) to identify the tRNA. The dbCAN2 webserver (80) was used to identify carbohydrate-active enzymes based on the carbohydrate-active enzymes (CAZy) database. The localization signals of genes annotated as peptidases were determined using the online tool PSORTb v3.0 (81). To detect the putative CRISPR-Cas systems in "Ca. Aenigmarchaeota" MAGs, tandem repeats, and spacers were identified using the online tool CRISPRFinder (82). Then, the genes nearby the region (both the upstream and downstream) were investigated manually to decide the type. The restriction modification (RM) systems and other cell defense systems, including Hachiman and Gabija reported in a previous study (44), were identified according to the KEGG and arCOG annotation results.

FIG 7
The potential interaction between "Ca. Aenigmarchaeota" and Pyrobaculum. Based on the annotation results of eight MAGs of "Ca. Aenigmarchaeota" in this study and the metabolic features of Pyrobaculum GMQ_1 bin 7 and other Pyrobaculum genomes, the schematic metabolic capacities and possible interaction scenario of "Ca. Aenigmarchaeota" and Pyrobaculum were shown. Metabolic pathways that presented in GMQ_1 bin 18-1 and GMQ_1 bin 7 were marked with blue and red circles, respectively. Abbreviations: TCA cycle, tricarboxylic acid cycle; ETC, electron transport chain; PPP, pentose phosphate pathway.
parameters set as -gt 0.95 -cons 50. Then, multiple alignments were concatenated and applied to reconstruct a maximal likelihood phylogenetic tree using IQ-TREE v1.6.11 with the following parameters: iqtree -s a -alrt 1000 -bb 1000 -nt AUTO (87,88). 16S rRNA genes were predicted for each MAG using RNAmmer. MAGs without 16S rRNA genes were further searched against the Ribosomal Database Project (RDP) database (downloaded on 18 October 2018) (89) using the BLASTn v2.8.11 program, and sequences with a length of .300 bp were selected and combined with those retrieved by RNAmmer. All 16S rRNA sequences were aligned using the online tool SINA v1.2.11 (90). Columns containing more than 95% gaps were removed, after which a maximum likelihood phylogenetic tree was constructed using IQ-TREE with the parameters iqtree -s a -alrt 1000 -bb 1000 -nt AUTO.
Reference sequences of hydrogenases were selected from a previous study (28). Alignments were generated using MUSCLE v3.8.31 (85), and divergent regions were filtered using TrimAl (86). The same model and detailed parameters of RAxML as RuBisCo large subunit phylogeny were used to construct the phylogeny of hydrogenase sequences.
Recruiting 16S rRNA from NCBI. Seventeen 16S rRNA gene sequences recovered from currently available "Ca. Aenigmarchaeota" genomes were used as input to search against the NCBI-nt (https://www.ncbi .nlm.nih.gov/) database via the BLASTn program with the default parameters. BLAST hits with coverage of .95% and identity of .85% were kept for the phylogenetic tree construction. Sequences were aligned, while poor alignment regions were eliminated. A maximum likelihood-based phylogeny was generated using RAxML v7.2.7 (86). Finally, we identified 236 16S rRNA gene sequences of "Ca. Aenigmarchaeota" by the result of phylogenetic analysis.
Detection of horizontally transferred genes. Twenty genomes with completeness of .75% and contamination of ,5% evaluated by CheckM were taken into consideration for the inferences of horizontal gene transfers (HGTs). As a result, three SAGs were removed because of the low quality. Putative HGTs were identified for each genome using HGTector v2 (92). To determine the interaction partners or possible hosts of "Ca. Aenigmarchaeota," recent horizontal gene transfers were identified by applying BLAST searches against the genome bins reconstructed from the corresponding communities. Only genes from outside DPANN are considered the so-called interphylum HGTs. Due to a lack of representative archaeal genomes in the prebuilt default database, especially the genomes from DPANN and TACK superphyla, 3,358 genomes were downloaded from the RefSeq database on 14 May 2019. Genome quality was evaluated using CheckM except microbes from the DPANN superphylum. Genome quality of the DPANN superphylum was performed using the same procedure as mentioned above. Genomes with completeness of ,80% and contamination of .10% were discarded. The remaining high-quality genomes were dereplicated at the phylum level using dRep v2.3.2 (93). Finally, 1,133 genomes were picked out, and 689 of those genomes complementary to the default database were appended. The combined sequences were compiled into a database using DIAMOND (77), and the relevant taxonomy files were changed correspondingly. Then, the "search" step was performed for the 20 high-quality "Ca. Aenigmarchaeota" genomes with default parameters. During the "analyze" step, genomes belonging to the DPANN superphylum (taxonomy ID, 1783276) were treated as "closeTax," but different species were used as "selfTax." Specifically, Aenigmarchaeum subterraneum SCGC AAA011-O16 (taxonomy ID, 743730) was set as the "selfTax" for all MAGs for DRTY-6_2 bin 215 and JZ-2_2 bin 245; Aenigmarchaeota archaeon ex4484_224 (taxonomy ID, 2012503) was used as "selfTax" for GMQ_1 bin 18-1 and QQ_2 bin 128; Aenigmarchaeota archaeon ex4484_14 (taxonomy ID, 2012502) was used as "selfTax" for DRTY-6_2 bin 201; and Aenigmarchaeota archaeon JGI 0000106-F11 (taxonomy ID, 1130284) was used for DRTY-6_1 bin 65, DRTY-7_1 bin 34, and DRTY-6_2 bin 202. The identified interphylum HGTs for each genome were visualized using SankeyMATIC (http://sankeymatic.com/).
In more detail, preliminary genome binning was conducted as described above for each "Ca. Aenigmarchaeota"-containing microbial community. The taxonomic information was determined using GTDB-tk v.0.2.2 (94). Then, gene calling was performed for all genome bins. The predicted putative coding sequences were formatted into a BLAST database. A BLAST search against this database was conducted with genes identified from "Ca. Aenigmarchaeota" as input. Only BLAST hits from outside DPANN with a sequence identity of $70% and aligned region of $100 amino acids were retained. The target genes and BLAST hits that represented the first or last genes of the belonging scaffolds ($5,000 bp) were discarded. The remaining genes were used for inferring the potential functional interactions between putative hosts and symbionts.
Phylogenetic analysis of four horizontally transferred genes. (i) Reverse gyrase (rgy). Scaffolds identified as rgy in "Ca. Aenigmarchaeota" were collected and used as input to blast against the NCBI-nr database (E value of ,1E25). The top 50 hits for each query were kept. The obtained rgy genes for all queries were combined and clustered using CD-HIT v4.6 (95), with a sequence identity cutoff of 90%. The identified representative sequences were used to build the phylogenetic tree. Sequences were aligned using MUSCLE, and poorly aligned regions were eliminated using TrimAl. Phylogeny was generated using IQ-TREE v1.6.11 (87) by integrating 1,000 times with the best model "LG1F1R9." The generated phylogenetic tree was rooted using MAD v2.2 (96) with the default parameters.
Superoxide dismutase (SOD2). The same procedures as the rgy phylogeny construction were applied, including sequence recruitment, clustering, alignment, and poorly aligned region elimination, except the sequence identity cutoff was set to 0.8 during the clustering using CD-HIT. The phylogenetic tree was generated using IQ-TREE with the best model "WAG1R10." (ii) 8-oxo-dGTP diphosphatase (mutT). The same procedures as the rgy phylogeny construction were applied, including sequence recruitment, clustering, alignment, and poorly aligned region elimination, except the sequence identity cutoff was set to 0.65 during the clustering using CD-HIT. The phylogenetic tree was generated using IQ-TREE with the best model "VT1F1R10." (iii) Transcription initiation factor IIB (TFIIB). The same procedures as the rgy phylogeny construction were applied, including sequence recruitment, clustering, alignment, and poorly aligned region elimination. The phylogenetic tree was generated using IQ-TREE with the best model "LG1R10." Network-based co-occurrence analysis. A total of 88 hot spring samples across the time and spatial scale were used to reveal the co-occurrence pattern of "Ca. Aenigmarchaeota" with other community members (Data Set S4). Metagenomic sequencing was conducted for all samples. Detailed quality control and assembly steps were done as described above. Genome binning was conducted for each community (as described above), and only genome bins with completeness of .50%, contamination of ,5%, and rpS3 gene called by AMPHORA2 (84) occurring exactly once in one genome were taken into consideration. The corresponding nucleotide sequences were extracted for the later analysis. Taxonomic information of each bin was obtained using GTDBtk v.0.2.2 (94). All predicted rpS3 gene sequences from different data sets were combined and clustered into OTUs using USEARCH 9.2.84 with the following parameters: -cluster_smallmem -id 0.95. Taxonomy of the representative sequences was assigned according to the taxonomic information of belonging bins. The sequence depth of each rpS3 gene sequence was used to build the OTU table and was calculated by mapping clean reads in each sample to the dereplicated rpS3 gene sequences. Specifically, clean reads from each sample were mapped to rpS3 gene sequences using BBmap with the same parameter settings as described above. The generated .bam files were sorted using SAMtools v.1.3.1 (97). Sequence depth was subsequently calculated using the "jgi_summari-ze_bam_contig_depths" program in MetaBAT. OTUs that occurred in less than six samples were filtered out to reduce the complexity, and 844 OTUs were kept for the subsequent network construction. The co-occurrence network was constructed using SparCC (98) with the default parameters, and 100 bootstrap samples were used to infer pseudo-P values. Those significant (P , 0.05, two-sided) and robust correlations (rho . 0.6) between pairwise OTUs were used to infer a reliable network. Network visualization and relevant parameter calculations regarding modularity, betweenness, closeness, average clustering coefficient, average weighted degree, and average shortest path length were conducted in Gephi v.0.9.2 (99).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, XLSX file, 0.

ACKNOWLEDGMENTS
We thank Guangdong Magigene Biotechnology Co., Ltd., China, for the assistance in data analysis and the entire staff from Yunnan Tengchong Volcano and Spa Tourist Attraction Development Corporation for strong support.
We declare no competing interests. This work was financially supported by the National Natural Science Foundation of China (no. 91951205 and 91751206), Natural Science Foundation of Guangdong