Conservation and Expansion of Transcriptional Factor Repertoire in the Fusarium oxysporum Species Complex

The Fusarium oxysporum species complex (FOSC) includes both plant and human pathogens that cause devastating plant vascular wilt diseases and threaten public health. Each F. oxysporum genome comprises core chromosomes (CCs) for housekeeping functions and accessory chromosomes (ACs) that contribute to host-specific adaptation. This study inspected global transcription factor profiles (TFomes) and their potential roles in coordinating CCs and ACs functions to accomplish host-specific pathogenicity. Remarkably, we found a clear positive correlation between the sizes of TFome and proteome of an organism, and FOSC TFomes are larger due to the acquisition of ACs. Among a total of 48 classified TF families, 14 families involved in transcription/translation regulations and cell cycle controls are highly conserved. Among 30 FOSC expanded families, Zn2-C6 and Znf_C2H2 are most significantly expanded to 671 and 167 genes per family, including well-characterized homologs of Ftf1 (Zn2-C6) and PacC (Znf_C2H2) involved in host-specific interactions. Manual curation of characterized TFs increased the TFome repertoires by 3%, including a disordered protein Ren1. Expression profiles revealed a steady expression of conserved TF families and specific activation of AC TFs. Functional characterization of these TFs could enhance our understanding of transcriptional regulation involved in FOSC cross-kingdom interactions, disentangle species-specific adaptation, and identify targets to combat diverse diseases caused by this group of fungal pathogens.

ABSTRACT 20 The Fusarium oxysporum species complex (FOSC) includes both plant and 21 human pathogens that cause devastating plant vascular wilt diseases and 22 threaten public health. Each F. oxysporum genome comprises core 23 chromosomes (CCs) for housekeeping functions and accessory chromosomes 24 (ACs) that contribute to host-specific adaptation. This study inspected global 25 transcription factor profiles (TFomes) and their potential roles in coordinating 26 CCs and ACs functions to accomplish host-specific pathogenicity. Remarkably, 27 we found a clear positive correlation between the sizes of TFome and proteome 28 of an organism, and FOSC TFomes are larger due to the acquisition of ACs. The RNA-seq datasets were previously described (Guo et  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023.  (Table S1). Since most of the terms are initially defined in the mammalian 184 systems, it was not surprising to see that overall, our fungal genomes are only 185 associated with 71 IPR terms out of the total 234 TF-related IPR terms (Table  186 S1, Materials and Methods, and Figure S1A-B for annotation pipeline). After 187 filtering out 13 and 10 terms for redundancy (two terms describing the identical 188 Comparing the total number of genes in a genome (x) and the total number of 195 TFs within that genome (y), we observed a strong positive correlation (y = 196 0.07264x -190.9, r 2 = 0.9361) (Figure 2A

Asterisk indicates the families without a presence in yeasts jwi
Based on this index value, we classified TF families into three major groups 212 (Table 2, Table S1). Group 1 contains 14 TF families with an expansion score of 213 1, indicating high conservation. Group 2 includes four families with an index 214 score of less than 1, reflecting some level of gene family contraction. About 30% of the TF families, fourteen, are associated with strong orthologous 220 conservation in all genomes we included in this study ( Figure 2B; Table 2; Table  221 S1). Because most of these conserved TF families are single-copied TF families, 222 these 30% conserved TF families only account for less than 2% of the total 223 TFomes. Based on a detailed study on S. cerevisiae and other model organisms,  (Table S1). 342 HSF is a family of transcription factors that activate the production of many heat 343 shock proteins that prevent or mitigate protein misfolding under abiotic/biotic 344 stresses (Feder and Hofmann 1999). All non-FOSC filamentous fungi have three 345 copies, while members of FOSC show expansion (e.g., Fo47: 4, Fol4287: 5, II5: 346 4, HDV274: 4, and Fo5176: 4) ( Figure 3A-B). Interestingly, all expanded HSFs 347 are phylogenetically close to Hsf1, which cluster together with the Hsf1 paralog of 348

Seven exceedingly expanded TF families 385
Among others, seven TF families have expansion indexes greater than 2 ( Table  386 2 and Figure 2B). Because of their drastic expansion, these seven families 387 overall account for more than 75% of the total TFome. These families include

other families 442
Other 20 TF families (expanded but with EIy <= 2) account for 20% of the TFome; 443 on average, each of these 20 families contains 9.6 copies in each genome 444 examined (Table S1) (Table S3 and 485 examples as described in the previous section). Compared to this list of curated 486 TFs using Orthofinder, we define 80 orthologous groups among Fusarium 487 genomes (Table S4). 62 out of the 80 orthogroups have been identified using the 488 above IPR-annotated pipeline, which enables the dissection of vastly expanded 489 and high copy number TF families such as Zn2-C6 and Znf_C2H2, which are 490 further mapped to 27 orthologous groups, including 17 in Zn2C6, 9 in Znf_C2H2, 491 and 1 containing both Znf_C2H2 and Zn2-C6 domains (Table S4). 492 This effort also results in additional annotation to 18 TF families (Table S4) (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint domains such as Ankyrin_rpt and WD40_repeat. 498 We then directly compared F. oxysporum with its Fusarium relatives to calculate 499 the expansion index as follows: 500 The EIf ranged, with the highest score being 3.54 (Fug, AreA_GATA) and the 502 lowest being 0.5 (Fox1, Fork_head) (Table S4). Among these 80 orthogroups, 36 503 groups show high conservation (EIf = 1) as they are single-copy orthologs across 504 Fusarium, among which ten were functionally validated in F. oxysporum (Table  505 S4). 24 groups have gene contraction in F. oxysporum (EIf < 1). A total of 20 506 groups are expanded in F. oxysporum (EIf > 1, Table 3, Table S4), including five 507 groups Fug1 (AreA_GATA, EIf = 3.54), Cos1 (Znf_C2H2, EIf = 2.8), Ftf1/Ftf2 508 (Zn2-C6, EIf = 2.7), Ebr1/Ebr2 (Zn2-C6, EIf = 2.5) and Ren1 (disordered, EIf = 2), 509 with an EIf value equal or greater than 2. We also identified PacC (EIf = 1.57) as 510 the second most expanded group within the highly expanded Znf_C2H2 family. 511 We will further discuss these six groups (highlighted in bold, Table 3). (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  are responsible for virulence and general metabolism. In F. oxysporum, Ebr1 is 531 Table 3

. Ortholog copy number and expansion index (EIf) of characterized and expanded TFs in F. oxysporum
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint found as multiple homologs, whereas in F. graminearum, it is seen as a single 532 copy (Jonkers et al. 2014). In F. oxysporum, three paralogous copies, Ebr2, 533 Ebr3, and Ebr4, are encoded in ACs and regulated by core copy Ebr1. The 534 importance of the core paralog has been shown by the reduced pathogenicity 535 and growth defects when it was knocked out (Jonkers et al. 2014). It is worth 536 noting that the Ebr2 coding sequence driven by an Ebr1 promoter was able to 537 rescue the Ebr1 knockout mutation, indicating some functional redundancy of this 538 family. Interestingly, the induction of AC-encoding PacC genes was CC-encoding PacC 552 gene-dependent, as the induction disappeared in the CC-encoding PacC 553 knockout mutant, further supporting a cross-talking between core and accessory 554 TFs (Yang 2020). Similar to EBR1, the expression of AC PacC genes is much 555 lower than that of the CC PacC gene, and knockouts of one AC PacC gene 556 affected a small subset of genes compared with the CC PacC knockout, which 557 has a broader effect on cellular processes (Yang 2020). 558 Fug1 has a role in pathogenicity (maize kernel colonization) and fumonisin 559 biosynthesis in F. verticillioides (Ridenour and Bluhm 2017). In addition, the 560 deletion of Fug1 increased sensitivity to the antimicrobial compound 2-561 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint benzoxazolinone and to hydrogen peroxide, which indicates that Fug1 plays a 562 role in mitigating stresses associated with the host defense (Ridenour and Bluhm 563 2017). Neither core copies nor accessory copies of these two genes were 564 experimentally examined in FOSC. Ren1 is a disordered protein with no IPR 565 functional domain. The expansion score EIf = 2 suggests a unique expansion 566 among FOSC. However, the only reported study on its function is in F.  We first asked what proportion of genes was expressed in conserved and 577 expanded categories (Table S5). We found that almost all genes (58 out of 60) 578 within the conserved category (Group 1) were consistently expressed (TPM > 1 579 across all conditions), supporting their general roles in controlling life processes. 580 Within the expanded category (Group 3), the proportion of genes being 581 consistently expressed ranges from 41% to 59% for core TFs, and ranges from 582 5% to 16% for accessory TFs. With a less strict filter (TPM > 1 at minimum 1 583 condition), we found that all genes within the conserved category were 584 expressed. Within the expanded category, the proportion of genes being 585 expressed accounts for 93% of core TFs across all strains and ranges from 49% 586 to 67% for accessory TFs. When we compared genes being consistently 587 expressed versus genes being expressed at a minimum one condition, the more 588 dramatic number increase for the expanded category (especially when we only 589 consider the accessory TFs) highlighted that the expanded category, especially 590 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint the accessory TFs, are more likely to be conditional expressed, further 591 supporting their role in niche adaptation. 592 With the goal of examining the expression and probing important core and 593 accessory TFs, we aimed to develop filtering parameters. Since most validated 594 TFs were reported in the reference Fol4287 strain, we first reviewed for the 595 reported TFs, both core, and accessory, the expression pattern during Fol4287 596 infecting tomatoes (Table S6) (Table S6). We then apply such measures to the transcriptome of all the 609 TFomes of Fol4287, Fo5176, and Fo47 to probe two types of TFs: 1) The 610 conserved core TFs related to plant colonization; 2) Expanded accessory TFs 611 related to host-specific pathogenicity. 612 Fol4287, Fo5176, and Fo47 upregulated 95, 62, and 44 core TFs during plant 613 colonization. Among them, ten copies are highly conserved (Table S7), as they 614 are single-copy orthologs across all 15 F. oxysporum strains. Two out of ten were 615 previously reported, Fow2 and Sfl1. Fow2, Zn2C6 TF, is required for full 616 virulence but not hyphal growth and conidiation in F. oxysporum f. sp. melonis 617 (Imazaki et al. 2007). The downstream targets of Fow2 remain unknown in F. 618 oxysporum, thus meriting further analysis. Sfl1 has been described in the 619 . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint previous section and is essential for vegetative growth, conidiation, sexual 620 reproduction, and pathogenesis, as shown in M. oryzae ). The 621 functions of FoSfl1 remain to be validated. 622 Fol4287, Fo5176, and Fo47 upregulated 29, 34, and 9 accessory TFs. Ftf1 and 623 Ren1 are particularly interesting (Figure 4 and Table S8). Though Ftfs have been 624 shown to play an essential role in pathogenicity in Fol, whether this pathway is 625 restricted to the same strain remains a question. Compared to Fol4287 which 626 contains ten accessory Ftfs and eight were upregulated during plant colonization, 627 Fo5176 includes six copies of accessory Ftfs, but only one copy was 628 upregulated. Interestingly, eight upregulated Fol4287 and one upregulated 629 Fo5176 Ftfs are clustered together (Figure 4). The unique expansion with 630 regulatory adaptation (i.e., fine-tuned expression regulation) seems to be 631 restricted to Fol4287 but not another pathogenic strain, Fo5176, when they infect 632 the hosts. Among Fo5176 expanded TFs, we identified Ren1. Compared to 633 Fol4287 which encodes only one accessory Ren1 that was not upregulated (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; a complex, and can act as an activator that promotes transcription or a repressor 707 that blocks the recruitment of RNA polymerase. Therefore, defining specific 708 functions of these identified binding sites through DAP-seq can be difficult. Gene 709 regulatory networks based on gene co-expression and other phenotypic and 710 multi-omics data as reported in Fusarium (Guo et al. 2016 can add more 711 resolution to these complex regulatory processes. However, the ultimate 712 understanding of the regulatory roles of each TF will come from careful molecular 713 and biochemical characterization.  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted February 10, 2023. ; https://doi.org/10.1101/2023.02.09.527873 doi: bioRxiv preprint