Genome-Wide Binding Analyses of HOXB1 Revealed a Novel DNA Binding Motif Associated with Gene Repression

Knowledge of the diverse DNA binding specificities of transcription factors is important for understanding their specific regulatory functions in animal development and evolution. We have examined the genome-wide binding properties of the mouse HOXB1 protein in embryonic stem cells differentiated into neural fates. Unexpectedly, only a small number of HOXB1 bound regions (7%) correlate with binding of the known HOX cofactors PBX and MEIS. In contrast, 22% of the HOXB1 binding peaks display co-occupancy with the transcriptional repressor REST. Analyses revealed that co-binding of HOXB1 with PBX correlates with active histone marks and high levels of expression, while co-occupancy with REST correlates with repressive histone marks and repression of the target genes. Analysis of HOXB1 bound regions uncovered enrichment of a novel 15 base pair HOXB1 binding motif HB1RE (HOXB1 response element). In vitro template binding assays showed that HOXB1, PBX1, and MEIS can bind to this motif. In vivo, this motif is sufficient for direct expression of a reporter gene and over-expression of HOXB1 selectively represses this activity. Our analyses suggest that HOXB1 has evolved an association with REST in gene regulation and the novel HB1RE motif contributes to HOXB1 function in part through a repressive role in gene expression.


Introduction
The binding of transcription factors (TFs) at specific regulatory sequences in the genome underlies the control of gene expression and regulation of transcriptional networks essential for programming animal development and organogenesis and for modulating physiological and pathological processes [1,2]. In the evolution of vertebrates, there is evidence for multiple rounds of duplication of genes and entire genomes [3][4][5][6], which has expanded the number of members in many TF families. The process of gene duplication and divergence in animal evolution provides a mechanism for the origin of novel gene functions. Paralogous genes in highly conserved TF families may undergo changes resulting in altered functional activities and modified interactions between the TFs and their regulatory regions. This diversification of protein function and the adoption of novel roles have been implicated as a major driver in the evolution of phenotypic complexity, diversity and innovation [7][8][9][10][11]. Changes in cis-regulatory sequences over evolutionary time are also a major contributing factor to morphological diversity. Because DNA-binding motifs are in general relatively short sequence motifs (~8-15 bp), small changes in specific binding motifs can have a progressive and significant impact on the efficiency of binding by TFs, resulting in modifications of organismal phenotypes [12,13]. 2 of 20 Much less is known about how sequence changes in TF proteins impact their functional activities. Amino acid variations in TF sequences may alter their DNA-binding specificity, leading to broad alterations in patterns of gene expression mediated by changes in their interactions with downstream target genes. This may result in the rewiring of circuits in fundamental transcriptional regulatory networks. Studies of conserved families of TFs have shown that they maintain a surprisingly high degree of similarity in their in vitro DNAbinding properties over 600 million years of evolution [14][15][16][17]. This has been used as a basis to suggest that changes in TFs, which significantly alter their DNA-binding specificities, are generally considered as a less common mechanism underlying the evolution of phenotypic diversity. However, gene duplication in evolution generates multiple copies of TFs that may serve as substrates for preserving ancestral activities and creating new opportunities for functional divergence [17]. Collectively, this process leads to the emergence of families of paralogous TFs that may contain diversified DNA-binding specificities and new functional activities [16,18]. Thus, identifying and characterizing novel DNA binding motifs of functionally diverged TFs can provide insight into the mechanisms of TF evolution and its contributions to the diversification of regulatory networks in animal evolution.
The HOX family of homeodomain-containing transcription factors is a useful model to investigate the evolution of novel DNA binding specificities. Hox genes have conserved roles in the specification of the anteroposterior (AP) body axis in a wide range of animals, from invertebrates to vertebrates, and changes in their expression and function are associated with the diversification of animal body plans [10,19,20]. The regulation, expression, and function of Hox genes have been extensively examined [6,15,[21][22][23][24][25]. However, how different HOX proteins, with similar in vitro DNA binding properties, regulate different developmental programs along the AP axis is poorly understood.
In our recent study, we examined this question using ChIP-seq (chromatin immunoprecipitation followed by deep sequencing) approaches to characterize the genome-wide binding properties of HOXA1 and HOXB1 proteins in mouse in embryonic stem (ES) cells differentiated into neural fates [26]. These proteins bind to distinct and largely nonoverlapping sets of target genes. Analyses of the binding profiles demonstrated that a large number of the HOXA1 bound regions are co-bound with Pre-B cell leukaemia transcription factor 1 (PBX1) and myeloid ectopic viral integration site (MEIS) proteins [26][27][28], which are known cofactors for many HOX proteins and other TFs [29][30][31]. In contrast, only a small subset of HOXB1 bound regions display co-occupancy with PBX and MEIS. Analysis of the underlying sequences in the binding peaks revealed that HOXA1 bound regions are enriched for known consensus motifs for PBX and MEIS, while motifs enriched in HOXB1 bound regions have diversified and represent a different set of DNA motifs. The presence of unknown binding motifs in HOXB1 bound peaks is interesting because cross-species functional analyses have demonstrated that mouse HOXA1 has retained the ancestral functions of Drosophila Labial, while HOXB1 has diverged and evolved new functions [26]. Therefore, investigating the properties of novel HOXB1 binding motifs can provide insight into the properties associated with its neofunctionalization.
In this study, we have examined a novel enriched motif associated with genome-wide binding profiles of the mouse HOXB1 protein and investigated its genomic properties and gene regulatory potential. We focused on the characterization of one of the novel motifs enriched in the HOXB1 bound regions and named this 15-base pair motif HB1RE (HOXB1 response element). The center of this motif has a known in vitro HOX binding sequence (5 -ATTA-3 ) [32]. However, the flanking sequences are much longer and are distinct from the previously characterized Hox-PBX bipartite motif [33,34]. This is widely distributed across the genome. Using an in vitro template binding assay and a multidimensional protein identification technology (MudPit)-based mass-spectrometry approach, we found that PBX1 can bind to the HB1RE motif in nuclear extracts. Genomic analyses revealed that different subsets of genomic regions enriched for this HB1RE motif are also bound by PBX1 or restrictive element-1 silencing transcription factor (REST) proteins. Functional analysis of this motif using transgenic constructs in electroporated chicken embryos showed that multiple copies of this motif are sufficient to drive expression of a reporter gene in the neural tube, suggesting it has in vivo regulatory potential. Over-expression of HOXB1 in this transgenic assay selectively represses reporter activity mediated by this motif. Together, our analyses suggest that HOXB1 binds the HB1RE motif and modulates in vivo function in part through a repressive role in gene expression.

ES Cell Culture and Induction of KH2 Cells with Retinoic Acid
KH2 mouse ES cells [35] at Passage 12 were cultured on gamma-irradiated feeder cells with Dulbecco's Modified Eagle Medium (DMEM) containing 15% fetal bovine serum, Non-Essential Amino Acids (NEAA), and β-mercaptoethanol. The media was changed 3 h before passaging. In transfer to feeder-free conditions, the media was aspirated and washed twice with Phosphate-Buffered Saline (PBS), then 2 mL of pre-warmed Trypsin/Ethylenediaminetetraacetic acid (EDTA) solution was added and placed in an incubator at 37 • C for 1 m. During this period, colonies float off when tapping the plate. Trypsin activity was stopped by adding 5 mL of Fetal Calf Serum -ES cell medium to the flask. Colonies were dissociated into single cells by pipetting up and down several times and pelleting the cells by centrifugation at 1000 rpm for 5 m. Media was aspirated and the cells were resuspended in an appropriate volume of fresh ES cell medium. Gelatinized plates were used for the feeder-free culture of the ES cells. Culture plates were treated 30 min before seeding with 0.1% gelatin. Gelatin was later aspirated just before seeding of the cells. In feeder-free culture, the KH2 cell lines were grown using N2B27+2i media supplemented with 2000 U/mL of ESGRO (Millipore) [36]. Cells were differentiated on a gelatinized plate using differentiation media (DMEM + 10% (vol/vol) serum + NEAA + 3.3 µM retinoic acid (RA). ES cells were harvested at 80-90% confluency for the experiments. In genomic experiments, mouse ES cells were differentiated using RA for 24 h [37]. In the case of the HOXB1 epitope tagged KH2 line, doxycycline was added with RA to simultaneously differentiate the cells and induce expression of HOXB1.

Genomic Analysis
Sequencing reads were aligned to the mouse genome (mm10) with bowtie2 using default parameters [41]. Peaks were called with MACS2 [42] with parameters "-p 0.25 -m 5 50" to ensure sufficient peaks for irreproducibility discovery rate (IDR) analysis. MACS2 peak coordinates were trimmed back to meet the actual IP signal. To compare the peaks from replicates, we used irreproducibility discovery rate (IDR version 2.0.7) and analyzed top 100,000 peaks by p-value, and only paired peaks with MACS2 fold-change ≥ 4, q-value ≤ 0.05, and IDR global p ≤ 0.1 were collected for future analyses.
For analysis of gene expression levels for near adjacent genes in HOXB1 binding regions, we used our previously published RNA-seq data detailing a time-course of changes in transcriptional profiles during differentiation of ES cells [39]. In brief, RNA was isolated using TRIzol reagent. RNA sample with an RIN (RNA-integrity number) of more than 9 was used for library preparation. PolyA-selected RNA libraries were prepared using mRNA-seq Library Prep Kit (Illumina, RS-100-0801) according to the manufacturer's protocol (15018460 Rev AOct 10). Here, 1 µg of total RNA was enriched for poly(A)+ RNA by oligo(dT) selection. The purified libraries were quantified with the High Sensitivity DNA assay on an Agilent 2100 Bioanalyzer. Libraries were sequenced single read with 36 nt sequencings on a GAIIx, and fastq files were returned. For each sample, reads were aligned to mm10 using Hisat2, allowing uniquely aligning reads only. Differential gene expression analysis was done using DeSeq2. Different time points were compared against the gene expression profile of uninduced ES cells.
Heatmaps were generated using CoverageView package from Bioconductor using 5 kb windows centered on peak summit, averaged into 100 bins. Log fold change values between IP and input samples were calculated using reads per million (RPM) and only positive values are shown. For the HOXB1 peak heatmap, the data shown are the highestlog fold change (LFC) data from any constituent HOXB1, thus the heatmap column shows the "best-case scenario" for the HOXB1 peak list. The nearest protein-coding neighbors were identified using Ensembl 91 and then KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analyses was performed on these sets of genes [47].

Western Blot
A standard Western blotting protocol was used to detect HOXB1 in template binding assay. Anti-FLAG antibody (Sigma F-1804, St. Louis, MO, USA) was used to detect the 3xFLAG tagged HOXB1 protein in the eluted samples.

Multidimensional Protein Identification Technology (MudPIT)
Trichloroacetic Acidprecipitated proteins were urea-denatured, reduced, alkylated, and digested with endoproteinase Lys-C (Roche, Basel, Swizerland), followed by modified trypsin (Promega, Madison, WI, USA) [50,51]. Peptide mixtures were loaded onto 250 µm fused silica microcapillary columns packed with strong cation exchange resin (Luna, Phenomenex, Aschaffenburg, Germany) and 5 µm C18 reverse-phase (Aqua, Phenomenex), and then connected to a 100 µm fused silica microcapillary column packed with 5 µm C18 reverse-phase (Aqua, Phenomenex) (Florens and Washburn, 2006). Loaded microcapillary columns were placed in-line with a Quaternary Agilent 1100 series HPLC pump and a linear trap quadrupole (LTQ) mass spectrometer (MS) equipped with a nano-LC electrospray ionization source (ThermoScientific, San Jose, CA, USA). Fully automated 10-step MudPIT runs were carried out on the electrosprayed peptides, as described in [50]. Tandem mass (MS/MS) spectra were interpreted using ProluCID (Xu et al., 2015) against a database consisting of 61,441 non-redundant M. musculus proteins (NCBI, 2020-11-03 release), with 426 usual contaminants (human keratins, IgGs, and proteolytic enzymes). To estimate false discovery rates (FDRs), the amino acid sequence of each non-redundant protein entry was randomized, resulting in a total search space of 123,728 non-redundant sequences. All cysteines were considered as fully carboxamidomethylated (+57 Da statically added), while methionine oxidation was searched as a differential modification. The DTASelect [52] and swallow, an in-house developed software, were used to filter ProLuCID search results at given FDRs at the spectrum, peptide, and protein levels. Here, all controlled FDRs were less than 1%. All five data sets were contrasted against their merged data set, respectively, using Contrast v 1.9 and in-house developed sandmartin v0.0.1. Our in-house developed software, NSAF7 v0.0.1, was used to generate spectral count-based label-free quantitation results. The MS datasets may be obtained from the MassIVE repository with accessions MSV000086596 and from the ProteomeXChange with accessions PXD023129.

Characterizing the Genome-Wide Binding Patterns of HOXB1 in Mouse ES Cells
Generating specific antibodies against specific HOX proteins is challenging because of the presence of highly conserved regions in the proteins. To overcome this limitation, we generated an epitope-tagged (3XFLAG) of HOXB1 in the mouse KH2 ES cell line by inserting the coding region fused with the epitope at a defined endogenous locus designed for tight doxycycline inducible expression [27,35,36,39]. These modified KH2 ES cells were then differentiated into neuroectodermal cell fates using previously established retinoid treatments, along with doxycycline to induce expression of the epitope-tagged version of HOXB1 [37,39]. The expression of HOXB1 from inducible locus was conducted under conditions that optimized the amount of doxycycline to generate levels comparable to the endogenous HOXB1 gene [27,39]. Gene expression analyses of the differentiated ES cells over a time-course revealed that the transcription profile at 24 h of differentiation is similar to that of the mouse embryonic hindbrain and spinal cord. Therefore, we used 24 h differentiated ES cells to map the genome-wide binding targets of HOXB1 using chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) with an antibody against the epitope tag [26,27]. In previous work, we compared the genome-wide binding properties of HOXB1 with HOXA1 and Labial to explore similarities and differences between HOX proteins and understand the basis of their evolutionary divergence [26]. Here, we have extensively analysed the HOXB1 data set and conducted additional experiments to gain more detailed insight into the properties of HOXB1.
Using replicate samples in a series of ChIP-seq experiments, we previously reproducibly mapped 2058 binding peaks for HOXB1 in the mouse genome ( Figure 1A) [26]. Gene ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway analysis of nearest neighbor genes in HOXB1 bound peaks suggest that HOXB1 downstream targets are enriched for genes and processes required for neurogenesis, neuronal processes, and behavior ( Figure 2A) [26]. This is consistent with functional and genetic studies in zebrafish, mouse, and humans, which demonstrated key functional roles for HOXB1 in the patterning of the hindbrain and facial nerve (VIIth cranial nerve) during craniofacial development [58][59][60][61].
Cis-regulatory studies have demonstrated that HOX1 proteins partner with the HOX co-factors PBX and MEIS to auto-regulate expression and cross-regulate several other Hox genes during hindbrain segmentation [56,57,[62][63][64][65]. In previous work, we observed that occupancy of PBX and MEIS is enriched on HOXA1 peaks compared with HOXB1, while REST binding is more enriched near HOXB1 peaks compared with HOXA1 [26]. Here, we undertook a detailed analysis of the relative distributions of REST, PBX, and MEIS on HOXB1 bound regions. We directly compared the genome-wide binding patterns of HOXB1 with those of PBX1 and MEIS and found that only a small number of the total HOXB1 binding peaks (152/2058: group 1) are co-bound with PBX1 and/or MEIS ( Figure 1A). This pattern is distinctly different from its paralog HOXA1, which displays significant overlaps in co-occupancy with PBX and MEIS [27], and previous observations on the interactions of PBX and MEIS with other HOX proteins in modulating their DNA binding specificity and function [33,66]. This difference in binding patterns is consistent with the idea that the HOXB1 protein has diverged from HOXA1 and Labial, in part through altered interactions with PBX proteins [26].
Analysis for the enrichment of putative transcription factor binding motifs in the regions occupied by HOXB1 uncovered motifs known to bind REST, DUX, ZNF41, CTCF, ZNF263, and NR5A2 proteins ( Figure 1B and [26]). We have also expanded this analysis and identified several novel/unknown motifs enriched near HOXB1 bound regions ( Figure 1B). In agreement with the lack of co-occupancy of PBX and MEIS in the ChIP-seq data, we find no evidence for the enrichment of PBX and MEIS binding motifs. The significant enrichment of consensus motifs for the REST protein complex is interesting, as REST functions as a transcriptional repressor to restrict the expression of neuronal genes in non-neural cells [67][68][69]. We recently demonstrated that REST displays occupancy on these putative motifs by performing ChIP-seq experiments with antibodies against the REST protein in differentiated ES cells [26]. The heatmap in Figure 1A shows that 22% (462/2058, group 2) of HOXB1 peaks are co-bound by the REST transcriptional repressor complex, validating the motif analyses. This raises the possibility of functional interactions between HOXB1 and the REST co-repressor complex in the regulation of downstream target genes.    Most of the HOXB1 bound regions that display co-occupancy with PBX1 (group 1) do not overlap with those co-bound by REST (group 2), and vice versa, suggesting a mutually exclusive relationship between these classes of binding regions ( Figure 1A). The remaining peaks bound by HOXB1 form group 3 (1444/2058) and they represent the majority (~70%) of bound regions. Peaks in group 3 do not correlate with binding by PBX, MEIS, or REST, implying that they have very different features.

Analysis of the HOXB1 Binding Peaks and Co-Association with PBX and REST
Assays for histone marks indicative of epigenetic states and chromatin accessibility (ATAC-seq) were conducted to investigate the features of chromatin-associated with occupancy of HOXB1 and co-binding with PBX1 and REST ( Figure 1A). HOXB1 peaks that correlate with PBX1 binding (group 1) display open chromatin in ATAC-seq analysis and enrichment of H3K27Ac, H3K4me1, and H3K4me3 histone marks, known to be associated with active loci. While HOXB1 and REST co-bound regions (group 2) have open chromatin, they lack histone marks indicative of active loci and display a modest enrichment for the H3K27me3 histone mark, known to be associated with Polycomb mediated repression. This suggests that HOXB1 co-occupancy with PBX1 is correlated with active regions of the genome, while co-binding with REST is correlated with repressed regions of the genome. Interestingly, the regions that do not display co-occupancy with either PBX1 and REST proteins (group 3) generally have a closed chromatin state, and most of the regions are devoid of any histone marks. Although, a few of the peaks in cluster 3 display H3K4me1 marks and also have open chromatin, suggesting that they may be poised for activity ( Figure 1A).
We further investigated the gene regulatory potential of PBX and REST protein cobinding with HOXB1. We analyzed the expression of differentiating ES cell at various time points and correlated their expression with HOXB1 co-binding with PBX1 and REST ( Figure 3). We found that HOXB1 and PBX1 co-binding at promoter and gene body is associated with higher expression of the genes as compared with genes bound only with HOXB1. In contrast, HOXB1 co-bound regions with REST at promoters and gene bodies show lower expression of the genes as compared with the genes bound only with HOXB1. This suggests that PBX1 binding on HOXB1 bound regions may lead to activation of the locus for gene expression, while REST binding suppresses this activity. This also correlates with the known regulatory activities of PBX1 in gene activation and REST in gene suppression and suggests that HOXB1 may use these proteins as a cofactor for the regulation of target genes.
To explore the potential function of genes regulated by HOXB1 in groups 1 and 2 associated with co-occupancy of PBX1 or REST, we extended KEGG analysis of the neighboring genes to these groups ( Figure 2B,C). While the KEGG analysis of all HOXB1 peaks shows enrichment for neurogenesis, neuronal processes, and behavior, there is some representation for genes associated with metabolism, biosynthesis, and cancer ( Figure 2A). In contrast, the group 1 peaks, where HOXB1 and PBX1 display co-occupancy, associate with genes required in cancer and alcoholism, while the group 2 regions with co-binding of HOXB1 and REST associate with genes required in neurogenesis, neuronal processes, and behavior ( Figure 2B,C). This suggests that HOXB1 binding may be associated with different co-factors and co-regulators to modulate the expression of the different sets of genes involved in these distinct processes. Analysis of the group 3 HOXB1 binding peaks, which do not overlap with either PBX1 or REST occupancy, indicates they appear to lie in gene poor regions and are associated with smaller subset of genes involved in axon guidance. HOXB1 peaks co-bound with PBX (group 1) overlap with a gene (−10 kb and +5 KB from gene body) in 93% of cases, while HOXB1 peaks co-bound with REST (group 2) overlap with a gene in 68% of cases. In contrast, only 57% of the group 3 HOXB1 bound peaks, which do not overlap with occupancy of PBX and REST, are associated with near adjacent genes. To explore the potential function of genes regulated by HOXB1 in groups 1 and 2 associated with co-occupancy of PBX1 or REST, we extended KEGG analysis of the neighboring genes to these groups ( Figure 2B,C). While the KEGG analysis of all HOXB1 peaks

Characterization of an Unknown HOXB1 Binding Target Motif, HB1RE
To begin to explore properties associated with the neofunctionalization and diversification of HOXB1, we further examined the novel DNA motifs enriched in the HOXB1 bound regions ( Figure 1B). In the top seven enriched motifs, we found a high level of enrichment for three unknown motifs. The top enriched motif is a low complexity motif of 5 -TTTCC-3 nucleotide repeat. The second unknown motif, ranked fourth overall, is a 15 nucleotide sequence (5 -T/AGCTGGGATTACAGG-3 ). This has a 5 -ATTA-3 sequence positioned in the middle of the motif, which is a well-characterized common sequence associated with in vitro analyses of HOX protein binding [32]. Hence, we hypothesized that this motif might be an interesting consensus sequence for exploring HOXB1 protein binding in vivo. We named this motif HOXB1 response element (HB1RE), and it is detected in 188 of the 2058 HOXB1 binding peaks ( Figure 4A). Enrichment of HB1RE sequences was examined in HOXB1, PBX, and REST bound regions to understand the specific association with HOXB1 binding. HOXB1 peaks were significantly enriched with HB1RE (p-value = 0.0001418) as compared with REST (p-value = 0.01725) and PBX (p-value = 1). To further characterize the HB1RE motif, we used genomic, biochemical, and functional approaches to examine its potential role in HOXB1-mediated gene regulation. promoter associates with lower expression of the genes as compared with HOXB1 alone. Similar results were also observed with HOXB1 co-binding with PBX and REST at gene body ( Figure 5B). These observations suggest that PBX1 co-binding enhances the transcriptional activity of HOXB1, while REST suppresses this activity.  HOXB1 peaks having the HB1RE motif were extracted from the total population of HOXB1 bound regions and compared with the PBX and REST occupancy, open chromatin, and histone marks ( Figure 4A). Interestingly, we observed the distribution of this motif in all three groups of HOXB1 bound peaks. Of the 188 regions containing the HB1RE motif, 18 falls in group 1 associated with co-occupancy by PBX1 and MEIS while 60 reside in group 2, correlated with co-binding by REST. It is interesting that the group 1 peaks are enriched for co-occupancy of PBX and MEIS, despite not possessing the typical consensus bipartite Hox-PBX motifs associated with mediating their physical interactions with HOX proteins and role in modulating DNA binding properties [30,31]. These HB1RE containing peaks in group 1 are also enriched for open chromatin and active histone marks, suggesting that HOXB1 and PBX1 may work together in positively regulating gene expression of the adjacent target genes. A browser shot of the Pecor locus is presented as an example of loci in this group ( Figure 4B). In contrast, the HOXB1 HB1RE containing peaks that are co-bound with REST proteins generally do not associate with active histone marks, but they do display open chromatin states. The Chgb locus is shown as an example of binding properties in this group 2 class of peaks ( Figure 4C). HB1RE containing peaks in group 3 show closed chromatin and the absence of histone modifications ( Figure 4D).
To further explore the consequence of PBX and REST co-binding with HOXB1 on HB1RE, we analyzed the expression of neighboring genes. We observed that HOXB1 cobinding PBX1 at gene promoter associates with higher expression of the genes as compared with HOXB1 alone ( Figure 5A). In contrast, HOXB1 co-binding with REST at gene promoter associates with lower expression of the genes as compared with HOXB1 alone. Similar results were also observed with HOXB1 co-binding with PBX and REST at gene body ( Figure 5B). These observations suggest that PBX1 co-binding enhances the transcriptional activity of HOXB1, while REST suppresses this activity.

Genomic and Biochemical Characterization of HB1RE Motif
The genomic analyses above imply an association or co-occupancy of HOXB1, PBX1, and REST with the HB1RE motif. It is possible that the co-occupancy is direct or indirect. Hence, we used an unbiased approach based on an in vitro template binding assay to identify proteins capable of binding to this motif in vitro [48,49]. The template binding assay utilized oligomerized versions of the HB1RE motif in combination with nuclear extracts from 24 h differentiated ES cell expressing 3X-FLAG-HOXB1 followed by Western blotting or MudPIT (Multidimensional Protein Identification Technology) analysis to identify proteins interacting with the HB1RE motif ( Figure 6). A Western blot using an anti-Flag antibody confirms that HOXB1 binds to this motif ( Figure 6B). The MudPIT analysis reveals that PBX1 and MEIS1, 2, and 3 are among the most highly enriched interaction proteins compared with control samples ( Figure 6C). As the HB1RE lacks canonical sequences for PBX and MEIS binding, these proteins are likely to bind to non-consensus sequences or are indirectly recruited via interactions with other factors, perhaps HOXB1 itself. These observations along with the genomic analysis ( Figure 1A) indicate that the HB1RE motif is capable of interacting with HOXB1, PBX1, and MEIS proteins in vitro and in vivo to modulate the expression of neighboring genes and represents a new type of Hox target motif. The MudPIT data indicate that several Polycomb complex proteins are also enriched for binding on the HB1RE motif, but we do not detect interactions with components of the REST complex. This suggests that the co-occupancy between HOXB1 and REST in peaks containing HB1RE motifs is likely to be mediated via the close associate of independent REST binding sites.

Genomic and Biochemical Characterization of HB1RE Motif
The genomic analyses above imply an association or co-occupancy of HOXB1, PBX1, and REST with the HB1RE motif. It is possible that the co-occupancy is direct or indirect. Hence, we used an unbiased approach based on an in vitro template binding assay to identify proteins capable of binding to this motif in vitro [48,49]. The template binding assay utilized oligomerized versions of the HB1RE motif in combination with nuclear extracts from 24 h differentiated ES cell expressing 3X-FLAG-HOXB1 followed by Western blotting or MudPIT (Multidimensional Protein Identification Technology) analysis to identify proteins interacting with the HB1RE motif ( Figure 6). A Western blot using an anti-Flag antibody confirms that HOXB1 binds to this motif ( Figure 6B). The MudPIT analysis reveals that PBX1 and MEIS1, 2, and 3 are among the most highly enriched interaction proteins compared with control samples ( Figure 6C). As the HB1RE lacks canonical sequences for PBX and MEIS binding, these proteins are likely to bind to non-consensus in vivo to modulate the expression of neighboring genes and represents a new type of Hox target motif. The MudPIT data indicate that several Polycomb complex proteins are also enriched for binding on the HB1RE motif, but we do not detect interactions with components of the REST complex. This suggests that the co-occupancy between HOXB1 and REST in peaks containing HB1RE motifs is likely to be mediated via the close associate of independent REST binding sites.

Reporter Assay for Regulatory Activity of the HB1RE Motif in Chicken Embryos
To further investigate the functional properties of the enriched HB1RE motif, we employed a transgenic reporter assay in chicken embryos ( Figure 7A). Six tandem copies of the HB1RE motif were placed upstream of a lacZ reporter gene to assess the transcriptional potential of this motif. We used the previously characterized autoregulatory element (ARE) region of the Hoxb1 gene in this assay as a control for regulatory activity [56,64]. Electroporation of the ARE reporter unilaterally into one side of the neural tube shows robust reporter staining in rhombomere 4 and posterior regions ( Figure 7B). Ectopic expression of Hoxb1 in combination with the ARE reporter shows that HOXB1 protein can

Reporter Assay for Regulatory Activity of the HB1RE Motif in Chicken Embryos
To further investigate the functional properties of the enriched HB1RE motif, we employed a transgenic reporter assay in chicken embryos ( Figure 7A). Six tandem copies of the HB1RE motif were placed upstream of a lacZ reporter gene to assess the transcriptional potential of this motif. We used the previously characterized autoregulatory element (ARE) region of the Hoxb1 gene in this assay as a control for regulatory activity [56,64]. Electroporation of the ARE reporter unilaterally into one side of the neural tube shows robust reporter staining in rhombomere 4 and posterior regions ( Figure 7B). Ectopic expression of Hoxb1 in combination with the ARE reporter shows that HOXB1 protein can bind and enhance reporter expression, while over-expression of Hoxb3 has no obvious effect on reporter activity. A similar analysis with the HB1RE reporter construct shows it is also able to direct reporter expression in the chicken neural tube, suggesting it has regulatory activity ( Figure 7C). However, in contrast to the ARE vector, over-expression of Hoxb1 in combination with the HB1RE reporter construct completely suppressed reporter activity. Conversely, overexpressing Hoxb3 with the HB1RE reporter induced an expansion of the reporter expression domain. These results suggest that the HB1RE motif may be responsive to multiple HOX proteins in regulating downstream target genes in the genome.
However, there appears to be a context-specific impact on activity, as the reporter data indicate HOXB1 has a repressive effect on the HB1RE motif, while HOXB3 has an activating role. These in vivo experiments clearly imply that the HB1RE motif is a novel binding site capable of receiving functional inputs from HOX proteins.
sion of the reporter expression domain. These results suggest that the HB1RE motif may be responsive to multiple HOX proteins in regulating downstream target genes in the genome. However, there appears to be a context-specific impact on activity, as the reporter data indicate HOXB1 has a repressive effect on the HB1RE motif, while HOXB3 has an activating role. These in vivo experiments clearly imply that the HB1RE motif is a novel binding site capable of receiving functional inputs from HOX proteins.

Discussion
In this study, we have investigated the genome-wide binding properties of HOXB1 in mouse ES cells differentiated into neural fates. Using genomic, biochemical, and in vivo regulatory assays, we characterized the properties and regulatory potential of a novel HOXB1 binding motif, HB1RE. We find that the regions bound by HOXB1 are distinctly different from those of its paralog HOXA1. The characterization of the distinct features of

Discussion
In this study, we have investigated the genome-wide binding properties of HOXB1 in mouse ES cells differentiated into neural fates. Using genomic, biochemical, and in vivo regulatory assays, we characterized the properties and regulatory potential of a novel HOXB1 binding motif, HB1RE. We find that the regions bound by HOXB1 are distinctly different from those of its paralog HOXA1. The characterization of the distinct features of the HOXB1 bound peaks provides mechanistic insights into the properties associated with its functional diversification during evolution. Our findings raise several interesting questions and have implications for understanding how HOX proteins modulate the activity of downstream genes and pathways in development, disease, and evolution.

HOXB1 and the PBX and MEIS Co-Factors
Previous studies using genetic, genomic, and biochemical approaches have demonstrated the importance of interactions between HOX proteins and the co-factors PBX and MEIS, which impact their DNA binding specificities and in vivo functions [33,66]. These co-factor interactions appear to be highly conserved and broadly employed in a wide range of animal species [66]. Consistent with this mode for regulating the specificity of DNA binding, we have shown that HOXA1 displays significant overlaps in genome-wide co-occupancy and interactions with PBX and MEIS on its downstream target genes [26][27][28].
We expected that the same paradigm would generally apply to its paralog HOXB1, in light of detailed work revealing that HOXB1 partners with PBX and MEIS to directly auto-and cross-regulate segmental expression of several other Hox genes in the developing hindbrain [56,57,[62][63][64][65]. However, the results in this study have shown that the genome-wide binding patterns of HOXB1 do not correlate well with that of PBX and MEIS. Only a small number (~7%) of the total HOXB1 binding peaks display co-occupancy with PBX and/or MEIS ( Figure 1A). Furthermore, there is very little overlap between the regions bound by HOXA1 and HOXB1. The difference in binding patterns and downstream target genes is consistent with the recent report suggesting that HOXB1 has functionally diverged from HOXA1 in part through changes in a short amino motif that alters interactions with PBX proteins [26].
There are other major differences in comparing the properties of HOXB1 and HOXA1 bound regions in mouse ES cells and their differentiated derivatives. Analysis of chromatin accessibility and histone marks has shown that HOXA1 binds primarily to regions that are open based on ATAC-seq and exhibit marks characteristic of active genes [28,36]. In contrast, we find that~70% of HOXB1 binding peaks are not located in accessible chromatin and lack active histone marks ( Figure 1A). The small number of HOXB1 bound peaks that do display open chromatin and active marks (group 1) are co-bound by PBX and MEIS. REST occupancy defines another class (group 2) that is in open chromatin, but they contain repressive marks. The majority of HOXA1 binding peaks are located adjacent to genes [28], while we observe the majority of HOXB1 peaks positioned in gene desserts. Hence, it is not surprising that HOXA1 and HOXB1 are associated with regulating downstream target genes linked to very different cellular and developmental processes.
The difference in preference for binding to accessible chromatin domains between HOXA1 and HOXB1 is interesting in light of recent findings demonstrating that several posterior HOX proteins display differences in their abilities to bind inaccessible chromatin sites [70]. This raises the possibility that HOXB1 may bind to inaccessible regions of chromatin and, in some cases, partner with PBX and MEIS to enhance chromatin accessibility and gene activity. MEIS has previously been shown to have the ability to control the access of histone deacetylases to a regulatory element bound by HOX proteins [71].
The HB1RE motif discovered in our analysis illustrates complexities in mediating interactions between HOX proteins and PBX and MEIS. There are no obvious consensus PBX or MEIS binding sequences in the HB1RE motif. However, the template affinity assay demonstrated that this motif can bind PBX and MEIS proteins in vitro ( Figure 6C). We favor the idea that the recruitment of PBX and MEIS to this template is indirect, mediated by other proteins bound to this site. Studies have shown that bipartite HOX-PBX motifs and closely positioned PBX-MEIS binding sites are effective in generating ternary complexes containing HOX, PBX, and MEIS proteins on Hox-response elements [64,65,72]. This shows that HOX co-factors can be recruited to HOX bound regulatory regions from near adjacent sites. We also find evidence for clusters of PBX and MEIS binding sites and occupancy in HOXA1 binding peaks [27]. Hence, it is possible that indirect recruitment of PBX and MEIS to complexes with HOX proteins may be an important mechanism for bringing these proteins together to potentiate gene expression and modulate DNA binding specificity.

The Relationship between REST, HOXB1, and Repression
Transcription factors do not work alone in modulating gene expression. They frequently work in combination through the integration of synergistic, antagonist, or parallel inputs directed by the architecture and organization of DNA binding motifs in cisregulatory elements. We found that a large proportion of the total HOXB1 bind peaks and 22% of the binding peaks that contain HB1RE motifs display enriched motifs for REST and co-occupancy of REST proteins (Figures 1A and 3). These peaks have enrichment for a repressive histone mark and KEGG analysis shows the near adjacent genes are enriched for association with neurogenesis, neuronal processes, and behavior ( Figure 2). The prevalence in close association of REST and HOXB1 binding motifs suggests the possibility of regu-latory interactions. The REST protein complex functions as a transcriptional repressor to restrict the expression of neuronal genes in non-neural cells [67][68][69]. In neural development, there is evidence that REST functions control the progressive timing of differentiation of neuronal progenitors. Hence, it would be interesting to examine whether HOXB1 impacts REST activity in neural or non-neuronal cells, by enhancing or inhibiting its inputs on shared target genes. HOXB1 can function as a positive input to stimulate gene expression on Hox-response elements in hindbrain segmentation [56,57,[62][63][64][65]. This is illustrated by its ability to activate reporter expression from the ARE when electroporated into chicken embryos ( Figure 7B). However, using this same assay, we found that HOXB1 can also inhibit reporter expression directed by the HB1RE, while HOXB3 activates expression. In this context, HOXB1 displays the selective ability to repress gene expression.
How might HOXB1 mediate repression of HB1RE reporter expression in the chick hindbrain? Our in vitro binding data ( Figure 6) show that PBX and MEIS are able to interact with HB1RE, presumably through mechanisms involving indirect recruitment, because it lacks consensus binding motifs for these factors. The HB1RE reporter expression observed in the chick neural tube ( Figure 7C) corresponds with domains that overlap with high levels of PBX and MEIS expression. Hence, we favor the idea that its regulatory potential in the neural tube is potentiated by inputs from PBX and MEIS. While we did not detect components of the REST complex in the in vitro binding assay. this may not capture all in vivo features of this element. Hence, it is possible that ectopic levels of HOXB1 recruit REST or other co-repressors to this element and inhibit its activity. To explore these possibilities, it would be valuable to further characterize in vivo interactions between HOXB1 and families of co-activators and co-repressors.

HB1RE and Novel Enriched Binding Motifs
The HB1RE motif was one of many unknown enriched motifs detected in HOXB1 binding peaks. Our experiments provide evidence suggesting that it is a valid binding motif with regulatory potential associated with in vivo occupancy of HOXB1. This illustrates the challenges in trying to identify or define binding sites for TFs in the genome based on in vitro DNA binding specificities. There are likely to be many more non-canonical binding sites that HOX proteins and their known co-factors act upon in vivo and others that involve interactions with co-factors other than PBX and MEIS. While HOXA1 displayed a significant overlap with PBX and MEIS binding and enrichments for these motifs, there were many other highly enriched motifs not listed in TF DNA binding databases [27,28,36]. Hence, it is worth the effort to investigate these unknown enriched binding motifs to begin to build a better understanding of how HOX proteins are recruited to DNA and exert their regulatory activity in different developmental contexts. The emergence of improved ChIP-seq technologies that generate base pair resolution of regions bound by TFs [73], and computational pipelines to uncover and define co-associated motifs like those for REST and HB1RE, will help unravel the hidden grammar and context of the cis-regulatory code.

Data Availability Statement:
The raw data files for ChIP-seq, ATAC-seq, and RNA-seq data were submitted to the NCBI BioProject database: https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA5 03882 (accessed on 28 December 2020), PRJNA341679, PRJNA335616, GSE61590, Sequence Read Archive under accession no: SRX4980243 to SRX4980246 and Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/ (accessed on 28 December 2020)) under series accession number GSE61590. The mass spectrometer datasets can be obtained from the MassIVE repository with accessions MSV000086596 and from the ProteomeXChange with accessions PXD023129. The original data used in this manuscript were also deposited to Stowers Institute Original Data Repository and available online at http://www.stowers.org/research/publications/libpb-1596 (accessed on 28 December 2020).