Tet-dependent 5-hydroxymethyl-Cytosine modification of mRNA regulates the axon guidance genes robo2 and slit in Drosophila

Modifications of mRNA, especially methylation of adenosine, have recently drawn much attention. The much rarer modification, 5-hydroxymethylation of cytosine (5hmC), is not well understood and is the subject of this study. Vertebrate Tet proteins are 5-methylcytosine (5mC) hydroxylases enzymes catalyzing the transition of 5mC to 5hmC in DNA and have recently been shown to have the same function in messenger RNAs in both vertebrates and in Drosophila. The Tet gene is essential in Drosophila because Tet knock-out animals do not reach adulthood. We describe the identification of Tet-target genes in the embryo and larval brain by determining Tet DNA-binding sites throughout the genome and by mapping the Tet-dependent 5hmrC modifications transcriptome-wide. 5hmrC-modified sites can be found along the entire transcript and are preferentially located at the promoter where they overlap with histone H3K4me3 peaks. The identified mRNAs are frequently involved in neuron and axon development and Tet knock-out led to a reduction of 5hmrC marks on specific mRNAs. Among the Tet-target genes were the robo2 receptor and its slit ligand that function in axon guidance in Drosophila and in vertebrates. Tet knock-out embryos show overlapping phenotypes with robo2 and are sensitized to reduced levels of slit. Both Robo2 and Slit protein levels were markedly reduced in Tet KO larval brains. Our results establish a role for Tet-dependent 5hmrC in facilitating the translation of modified mRNAs, primarily in developing nerve cells.


Introduction 44
The regulatory function of epigenetic mechanisms such as modifications of specific DNA bases 45 or amino acids in histone tails have been investigated for many years. These processes are overlayed 46 upon the genetic code and have profound effects on transcription and overall gene expression. The 47 importance of similar modifications of RNA bases has become apparent and its pervasiveness has 48 engendered the nascent field of epitranscriptomics 1

56
In Drosophila DNA, 5mC is present at low levels and so far, no function has been documented 57 for it. Also, the methyltransferases that catalyze the C-methylation in vertebrates are not present in 58 Drosophila, with the exception of DNMT2, which primarily modifies several tRNAs and viral transcripts 59 6 . However, both 5mrC and 5hmrC are present in Drosophila RNA. The 5hmrC modification appears to 60 be specific to mRNA and is controlled, at least in part, by the Drosophila Tet (Ten-Eleven-61 Translocation) protein 5 . Tet proteins were first identified as DNA-modifying enzymes that function as 5-62 methylcytosine (5mC) hydroxylases, catalyzing the transition of 5mC to 5hmC in vertebrate DNA 7 .

63
The three vertebrate TET genes (TET1, 2 and 3) function as epigenetic regulators of gene 64 expression. The transition of 5mC to 5hmC leads to the elimination of the methyl mark on DNA and and throughout embryogenesis and larval development the protein is found primarily in nerve cells 10,12 .

80
While the role of Tet in vertebrate DNA modification and its consequences have been reported 81 in much detail, little is known about the function of Tet and 5hmrC in mRNA. Tet2 regulates pathogen 82 induced myelopoiesis as well as endogenous retroviruses by controlling the 5hmrC mark on mRNAs 13 .

83
The 5hmrC mark is present in mRNA of mouse Embryonic Stem Cells (ESC), where Tet proteins 84 control the 5 hydroxymethylation of key-pluripotency transcripts 14 .

85
While Tet function in RNA modification has been analyzed in tissue culture cells in Drosophila 86 and mouse, we report our work on identifying genes that are regulated by Tet in Drosophila embryos 87 and nerve tissue. These Tet-target genes were identified through genome-and transcriptome-wide 88 experiments, namely ChIP-seq, hmeRIP-seq, and RNA-seq. Two of these target genes, robo2 and slit,

98
Previously we have shown by dot blot analysis in S2 Drosophila cells and larval brains that the 99 5hmrC modification was primarily found on polyA + RNA and was strongly reduced in Tet knock-down 100 (KD) cells as well as in larval brains from complete loss-of-function animals (Tet null ) 5 . We have 101 confirmed and quantified these results using ultra-performance liquid chromatography tandem mass 102 spectrometry (UHPLC-MS/MS). Measurements of 5mrC and 5hmrC abundance in S2 cells indicate that 103 5hmrC was strongly enriched in polyA + RNA whereas 5mrC was underrepresented in that fraction as 104 compared to total RNA ( Fig. 1A and B). Thus, our results are consistent with the observation that 5mrC vertebrates Tet proteins have been shown to bind DNA at promoter regions to regulate gene 116 expression through active DNA demethylation 14, 15 . We sought to identify the genes that are regulated 117 by Drosophila Tet. We began our experiments by determining if Drosophila Tet also binds DNA and 118 mapping the binding sites. We performed ChIP-seq experiments and mapped Tet-binding peaks 119 genome wide using a Tet-GFP fusion protein in two samples from different stages of development: 3 rd 120 instar larval brain and imaginal discs (larval brain fraction, LBF) and 0-12h embryos. Samples were 121 normalized to input chromatin. As negative control we used chromatin from LBF and 0-12 h embryos 122 lacking GFP however it did not produce enough material for library preparation and sequencing (see 123 methods).

124
Bioinformatic analysis of the LBF ChIP-seq results identified 3413 Tet binding peaks distributed 125 on 2240 genes. Example of Tet binding peak profile is shown in Fig. 2A. Tet preferentially occupies 126 promoter regions (Fig. 2B) and shows the strongest binding to promoter regions. (Fig. 2C). In murine 127 ESCs a GC rich DNA motif has been shown to be enriched in Tet1 bound loci 15 . In LBF we identified a 128 highly conserved CG-rich sequence as one of the highest-ranking motives within Tet bound regions 129 using MEME-ChIP Motif Analysis ( Fig. 2D and Fig. S2A).

130
A Tet-binding profile in a composite model across the protein coding regions illustrates that Tet 131 binding is highest near the promoter and gradually decreases until it undergoes a notable drop at the 132 transcription termination sites (TTS). This closely mirrors the profile observed for H3K4me3, an 133 epigenetic mark associated with actively transcribing regions frequently found at transcription start sites 134 16 (Fig. 2E). While 36% of all Tet peaks co-localize with this chromatin modification (H3K4me3, Fig. 2F),

135
40% of the Tet binding sites that are localized to the promoter region co-localized with the H3K4me3 136 mark (Fig. 2G).

137
In embryo samples, we detected 5180 Tet-binding peaks associated with 2578 genes. Example 138 of Tet binding peak profile is shown in Fig. 3A. Tet is enriched throughout the gene body and intronic 139 regions ( Fig. 3B) however the strength of binding is strongest at promoters and ( Fig 3C). A Tet-binding 140 profile across the protein coding regions is similar to that observed in LBF (Fig. 3E). Analysis of the 141 DNA sequences bound by Tet protein in embryos uncovered a highest ranking binding motif that shows 142 significant similarity to the larval Tet consensus sequence ( Fig. 3D and S2) and, as with the larval ChIP 143 samples, we observe Tet occupancy to be correlated with H3K4me3 binding sites, primarily associated 144 with promoters ( Fig 3E): 42% of all embryonic Tet peaks co-localized with H3K4me3 chromatin 145 modification marks (Fig. 3F) and 51% of the promoter binding sites overlapped with H3K4me3 mark 146 (Fig. 3G). In both embryos and LBF Tet binds to approximately the same number of target genes and 147 30% of Tet's targets are identical in both tissues (Fig 3H).

148
Our results indicate that Tet binding sites are distributed throughout the physical map of the (CxxC) under the control of the heat shock promoter (hsp70-GAL4::UAS-TetCxxCRFPmyc). We 152 expressed the Tet DNA-binding domain by exposing larvae to heat shock and stained salivary glands 153 with anti-Myc and anti-H3K4me3 antibody. Tet showed many bands distributed on all arms of the 154 chromosomes, but virtually no staining of the chromocenter which contains very few genes nor of 155 ribosomal RNA in the nucleolus. H3K4me3 is also present in a distinct binding pattern on all 156 chromosome, but in contrast to Tet is abundant in the chromocenter and the nucleolus. As indicated by     transcriptome-wide in the same tissues we used for our Chip-seq analysis. We first performed hMeRIP-171 seq on total RNA using basically the same approach we used previously in S2 cells 5 . RNAs isolated 172 from wt 0-12 h embryos and from wt and Tet null Larval Brain Fraction (LBF) was treated with anti-5hmC 173 anti-body or immunoglobulin as negative control, and followed by Next Generation Sequencing (NGS, 174 see methods).

175
In the embryo we identified 1815 peaks on 1402 mRNAs. A representative 5hmrC peak profile is 176 shown in Fig. 4A. The 5hmrC modification is preferentially associated with gene bodies and a 177 comparison to the expected distribution of peaks shows that the modification is not random (Fig. 4B).

178
Moreover, as the presence of the 5hmrC modification is not proportional to the abundance of the mRNA 179 the modification appears to function broadly within the transcriptome and is not a regulatory modality 180 restricted either to rare or hyperabundant transcripts (Fig. 4C). The 5hmrC-associated sequences 181 identified from these experiments revealed a specific UC-rich motif present within these mRNAs that

183
In mRNA from the wild type LBF, we detected 3711 peaks on 1775 transcripts. A representative 184 profile of 5hmrC enriched peaks in wt and Tet null is shown in Fig. 5A. In wt the peaks were distributed associated with a UC-rich motif highly related to that identified in embryonic samples (Fig. 5F). In

196
In addition, 37% of the modified mRNA in embryos were also identified in the LBF, while 30% of 197 the larval modified mRNAs were also present in the embryonic fraction (Fig. S4C). Taken together 198 these results suggest that Tet targets a distinct cohort of mRNAs in embryos and larval brains and

228
It is striking that in our two very different experimental approaches, ChIP-seq and hMeRIP-seq we

238
These analyses confirm that Tet-dependent 5hmC is often found on mRNAs derived from genes 239 that show Tet binding. Notably, close to 50% of transcripts that show a reduction in the 5hmrC mark in 240 Tet null tissues are derived from Tet-target genes. However, the levels of these mRNAs are generally 241 unaffected by the loss of Tet suggesting that the 5hmrC modification does not affect steady state level 242 of mRNAs but other aspects of mRNA function such as translation or localization.

245
We used the results above to identify Tet-target genes and sought to determine whether the 246 phenotypic effects of the loss of Tet's activity were derived from its inability to regulate target mRNAs 247 5,10 . We looked for genes that are 1. active in the nervous system where Tet is enriched and 2. showed 248 Tet protein binding to DNA, and 3. whose mRNA showed a reduction in 5hmrC in Tet null animals. Axon 249 guidance genes as a group frequently showed Tet-DNA-binding and 5hmrC mRNA modification by Tet 250 ( Figure 6D). Among the genes that fulfilled the three criteria were two well-studied genes that function  Table S1). Additionally, the most lateral of the Fas2 + longitudinal tracks 263 are often incomplete or absent (Fig. 7B, 46%-arrowheads). A second subpopulation of neurons 264 expressing Connectin also appears to be altered in Tet null VNCs and fails to populate one of the 265 longitudinal tracks compared to wild type (Fig S7B; arrows). These phenotypes are strikingly similar to 266 the axonal pathfinding defects seen in robo2 embryos with Tet's effects being slightly more severe (Fig.   267 7B and C and table S1) 18 . We sought to determine whether the reduction of Tet-mediated 5hmrC 268 deposition on the robo2 or slit mRNAs resulted in mRNA species with reduced activity or potential for 269 expression. Thus, we examined genetic interactions between Tet and the Slit/Robo signaling pathway 270 in Tet null embryos lacking one copy of robo2 or slit. We additionally examined Robo1, a gene that is also 271 involved in midline repulsion but is not 5hmrC modified. Decreasing the dose of Robo2 or Robo1 in a

272
Tet null background has little effect on Fas2+ axonal pathfinding in comparison to Tet null alone (table S1).

273
The failure to see an effect with Robo2 may stem from the observation that the levels of midline 274 crossing in Tet null embryos exceeds that seen for robo2 null embryos (Table S1 and

280
Given that these robo2 or slit encode mRNAs that carry the 5hmrC mark and a reduction of that 281 mark in the Tet null background, we expected Tet to potentially control their protein levels (Fig. S7).

282
Indeed, both proteins were clearly reduced in brain extracts from Tet null larvae relative to wt (

287
Based on all our results we suggest the model shown in Figure 7G, we propose that Tet binds,

298
It will be interesting to investigated if 5hmC RNA modification is deficient in the affected patients 20 .

301
In our previous study we investigated if Tet proteins, that are well known as 5-methylcytosine 302 (5mC) hydroxylases catalyzing the change from 5mC to 5hmC in DNA, can have a similar function in 303 RNA 5 . For these molecular studies we mainly used Drosophila S2 cells as source material. In the 304 present study we used animal sources, embryos, and larval brain tissues to investigate the function of 305 Tet in modifying mRNA in vivo. We also wanted to delineate the molecular and cellular processes for 306 which the modification is required, and to identify in vivo targets of the Tet protein.

307
Our results demonstrate that Tet protein binds to distinct genes, functions in modifying mRNAs, 308 and that this modification modulates translational output of the mRNAs. We used our molecular results 309 to identify Tet target genes. We selected genes that, 1. contain promoter proximal Tet-binding site(s) 310 that overlap with H3K4me3 modifications, 2. whose mRNA showed 5hmrC modifications that were 311 reduced in Tet null neuronal tissues, and 3. whose mRNA levels displayed negligible changes in Tet null 312 neuronal tissues.

313
We found that these target genes were most often associated with axonal growth and 314 pathfinding. Two such genes, robo2 and slit, were selected because they fulfill the conditions outlined In mass spectrometry experiments we determined that 5hmrC is highly enriched in polyA + RNA 326 confirming our previous dot blot results. This modification is significantly rarer than other well-studied 327 mRNA modifications, such as 5mrC or 6mA (Fig. 1) 5,22 . Coupled with our transcriptomic analyses, 328 these observations show that the modification resides on a subset of mRNAs. Because Tet is 329 expressed in Drosophila almost exclusively in nerve cells, we determined the levels of 5mrC and 5mrC levels are about two orders of magnitude higher than 5hmrC levels (~2x10 5 5mrC and ~2x10 7 332 5hmrC in larval brains), and therefore detecting 5hmrC is not trivial.

333
The presence of 5hmrC is notably reduced (~ 5 fold) in Tet null samples. Our results are 334 consistent with the Drosophila Tet enzyme being responsible for this 5hmrC modification ( Fig. 1 and 335 S1). However, since we detect ~20% of the wild type 5hmrC levels in mutant tissues that lack Tet 336 completely, we assume that an additional hydroxymethyltransferase(s) that can modify 5mrC do exist in 337 the Drosophila genome. The existence of additional enzyme(s) contributing to mRNA 338 hydroxymethylation has also been postulated in mouse ESCs 14 .

339
Our mass spectrometry findings and the results from our hMeRIP-seq experiments on larval 340 brain fractions (LBF) and embryos are consistent with what has been previously reported for Drosophila 341 tissue culture cells and for ESCs (Fig. 1,4,5 and S1, S3) 14 . We identified ~3000 5hmrC peaks in ~1500

376
Of the genes with reduced 5hmrC marks, 44% showed Tet-DNA binding. Notably, the mRNAs in which 377 the reduction of the 5hmrC mark was seen were mostly associated with genes that function in different 378 aspects of nerve cell development. First among them are axon outgrowth genes that were also 379 identified in the GO-term analysis as abundant gene categories associated with Tet binding sites and 380 mRNAs carrying the 5hmrC mark (Fig. 6D, S5).

381
Our initial examination of the developing embryonic ventral nerve cord (VNC) in Tet mutants 382 identified subtle defects in CNS patterning. We then examined subsets of VNC neurons using 383 antibodies to Fas2 and Connectin (Fig 7B, B' and S7B (Fig. 7, Table 1).

395
The overlapping phenotypes of Tet, robo2 and slit, together with the molecular data that 396 identified Robo2 and Slit as Tet targets, prompted us to investigate if Robo2 and Slit protein expression 397 was affected by the loss of Tet. Indeed, in Western blots from Tet null larval brain extracts both Robo2 398 and Slit protein levels were strongly reduced (Fig. 7F, F'), indicating that Tet's profound consequences 399 on VNC patterning occurs, at least in part through the control of expression of the Robo2 and Slit 400 proteins. As Robo2 and slit mRNA levels are not changed in Tet null LBF (Fig. S9), we suggest that the levels through the 5hmC modification of many target mRNAs.

404
Which step in RNA processing leading to mRNA translation is affected in Tet null animals will 405 have to be elaborated. Based on our previous results, that showed 5hmrC modified RNAs found on 406 polysomes, at least one possibility is that the 5hmrC modification facilitates the loading of the mRNAs 407 on ribosomes 5 .

408
Tet and the 5hmrC modification function in mRNA processing of specific neuronal mRNAs       The processed reads were mapped to the reference genome Drosophila melanogaster BDGP6 588 (dm6) from Ensembl by using Hisat2 (version 2.1.0) for RNA seq and hMeRIP seq 45 . To analyze gene 589 expression, HTSeq framework, version 0.5.3p9, was used to count the aligned reads in genes 46 . Mode 590 "union" and mapping quality cut-off 20 were used for our analysis. Count-table was normalized so that 591 all samples have the same level of total mapped reads. DEseq2 was used to identify differentially 592 expressed genes 47 . Cufflinks v2.2.1 was applied to calculate the rpkm values 48,49 . A gene was considered as significantly changed when fold change >=2 or <= -2 and adjusted p value < 0.05.

594
"SplitNCigarReads" funciton in GATK (version 3.3-0) (https://gatk.broadinstitute.org/) were used to split 595 reads that contain Ns in their cigar string (e.g., spanning splicing events in hMeRIP-seq data). "rmdup" 596 function of samtools (version 1.3.1) were used to remove a duplicate mapping of reads. Then the same 597 peak calling procedure as ChIP seq data analysis was performed to call peaks of hMeRIP-seq data.

598
The peaks of hMeRIP-seq were selected using P-value < 10e-5. Peaks of hMeRIP-seq were 599 considered as reduced when the normalized hMeRIP-seq signal in control samples was at least 1.4-600 fold change higher than the signal in Tet depleted samples. The fold change and P-value were 601 calculated using "limma" package in R 50 .

618
The data that support the finding of this study are available from the corresponding author upon 619 reasonable request during peer review and will be publicly available online at GEO at publication.