Profiling genome‐wide methylation in two maples: Fine‐scale approaches to detection with nanopore technology

Abstract DNA methylation is critical to the regulation of transposable elements and gene expression and can play an important role in the adaptation of stress response mechanisms in plants. Traditional methods of methylation quantification rely on bisulfite conversion that can compromise accuracy. Recent advances in long‐read sequencing technologies allow for methylation detection in real time. The associated algorithms that interpret these modifications have evolved from strictly statistical approaches to Hidden Markov Models and, recently, deep learning approaches. Much of the existing software focuses on methylation in the CG context, but methylation in other contexts is important to quantify, as it is extensively leveraged in plants. Here, we present methylation profiles for two maple species across the full range of 5mC sequence contexts using Oxford Nanopore Technologies (ONT) long‐reads. Hybrid and reference‐guided assemblies were generated for two new Acer accessions: Acer negundo (box elder; 65x ONT and 111X Illumina) and Acer saccharum (sugar maple; 93x ONT and 148X Illumina). The ONT reads generated for these assemblies were re‐basecalled, and methylation detection was conducted in a custom pipeline with the published Acer references (PacBio assemblies) and hybrid assemblies reported herein to generate four epigenomes. Examination of the transposable element landscape revealed the dominance of LTR Copia elements and patterns of methylation associated with different classes of TEs. Methylation distributions were examined at high resolution across gene and repeat density and described within the broader angiosperm context, and more narrowly in the context of gene family dynamics and candidate nutrient stress genes.


| INTRODUC TI ON
The processes shaping plant development and growth are regulated by epigenetic modifications that impact gene expression, genomic stability, and plasticity (Kumar & Mohapatra, 2021).Plant genomes are methylated in different ways at sequence contexts categorized as CG, CHG and CHH (where H = C, A, or T).These modifications are often initiated by transposable element (TE) activity, which impacts a significant portion of most plant genomes.Epigenetic modifications introduced through mobile elements contribute to genetic variation that is associated with biotic and abiotic stress adaptations (Ritter & Niederhuth, 2021).
Methylation in model plants has been studied from its initiation in embryos in Arabidopsis thaliana (Jullien et al., 2012), to the accumulation of methylation variation in the independent branches of a single Populus trichocarpa individual (Hofmeister et al., 2020).In Arabidopsis, the genome-wide regulatory effects of methylation were demonstrated by the knockout of all five methyltransferases, impacting cell fate throughout the plant (He et al., 2022).Also in Arabidopsis, a heritable epiallele associated with climate was found to control leaf senescence, providing an example of local climate adaptation (He et al., 2018).In P. trichocarpa, epigenetic modifications were associated with changes in the circadian cycle (Liang et al., 2019).Increasingly, non-model plant systems have been investigated, including for methylation responsible for leaf shape and photosynthetic traits in Populus simonii (Ci et al., 2016) and salinity-induced methylation in mangroves (Miryeganeh et al., 2022).
Until recently, methylation was primarily investigated through treatments such as whole-genome bisulfite conversion (WGBS).This technique is prone to degradation of DNA, incomplete conversion, and amplification bias (Gouil & Keniry, 2019).Since WGBS libraries are primarily short-read sequenced, interpretation also suffers from poor resolution in repetitive regions.Long-read technologies, including Pacific Biosciences' (PacBio's) single-molecule real-time (SMRT) sequencing, and nanopore sequencing by Oxford Nanopore Technologies (ONT), can detect methylation.SMRT sequencing can detect 5mC modifications based on polymerase dynamics at very high coverage, as well as methods that rely on bisulfite conversion for standard coverage.In comparison, nanopore sequencing can directly detect DNA or RNA modifications through a voltagemeasured pore, enabling real-time, single-molecule sensitivity (Liu et al., 2021).
Being sessile, plants rely on methylation as an evolutionary strategy in the form of short-and long-term (heritable) stress memory (Sun et al., 2022), and this is particularly important in long-lived tree species with slower changes in allele frequency (Bräutigam et al., 2013).
DNA methylation is known to have a critical role in the silencing of TEs, but environmental stress can reduce this activity and result in TE bursts (Cavrak et al., 2014).Epigenetic mechanisms related to transposition have been associated with responses to drought, temperature, and nutrient stress (Fan et al., 2022).In the context of maples, sugar maple (Acer saccharum) is susceptible to calcium deficiency, and this has led to a significant decline in natural populations (Bishop et al., 2015).The first comparative genomics study on North American maples identified candidate genes from the analysis of expression in the aluminum-and calcium-amended plots at the Hubbard Brook Experimental Forest (McEvoy et al., 2021).The interplay between TEs and gene expression is complex, but many forest tree species would benefit from a deeper examination to fully understand their adaptive potential for new and ongoing threats.
This study extends the previous work on Acer negundo and A. saccharum (McEvoy et al., 2021), as well as that of Sork et al. (2022) and Niederhuth et al. (2016) in comparative plant methylomics.
Employing ONT sequencing from two previously unstudied Acer individuals, we completed genome assembly and annotation and detected methylation sites genome-wide.We customized methylation calling methods, and in doing so, generated comparative methylation profiles focusing on transposable elements and nutrient stress candidate genes.

| Sequencing
Leaves from two maple individuals, A. negundo (Accession 253-2013*B) and A. saccharum (1353-84*A), were shipped on dry ice from the Arnold Arboretum of Harvard University.High molecular weight (HMW) DNA was extracted from both samples using the ONT protocol for Arabidopsis leaves (Vaillancourt & Buell, 2019)

| Methylation detection
The METEORE pipeline was selected as the general approach to methylation detection based on its ability to generate reliable consensus results from multiple tools (Yuen et al., 2021).Two tools were selected: Nanopolish v0.13.2 (Loman et al., 2015) and DeepSignal-Plant v0.1.4(Ni et al., 2021).This pairing was chosen as it had favorable results in the Yuen et al. (2021) benchmarking study and the ability to inform beyond the CG-context.Nanopolish is a well-supported tool that detects methylation in the CG context.DeepSignal-Plant is the top-performing, most accessible tool trained with plant-based models to detect methylation in CHG and CHH contexts from nanopore sequencing (Ni et al., 2021).
ONT reads from the new individuals were aligned to both the new and original genomes for methylation calling.Methods for Nanopolish and DeepSignal-Plant proceeded according to the documentation for each tool and the Snakemake workflows provided by METEORE (Figure 1).Each tool was run to create two output formats: (1) the input format necessary for integration in the consensus, and (2) the standardized form of independent tool output, calculating per-site and per-strand frequencies, allowing for greater ease of interpretation across tools.This provided both tool-specific frequencies and a consensus for comparison.
These alignments were run with minimap2 version 2.22-r1101 outside of the METEORE pipeline because the within-METEORE version provided by Snakemake was older.The next step was Nanopolish 'call_mods', followed by METEORE scripts to convert the output of log-likelihood ratio values into a standardized format.
DeepSignal requires single-fast5 formatted files, so the multi-fast5 files were converted using ont_fast5_api 'multi_to_single_ fast5'.DeepSignal-Plant began with annotation of the FAST5s with FASTQ using Tombo 'preprocess annotate_raw_with_fastqs' v1.5.1 (Oxford Nanopore Technologies, 2020b).This was followed by Tombo 'resquiggle' with the parameter -signal-align-parameters 4.2 4.22000 2500 20.0407502500250 used for the original genomes only.It is possible to detect methylation on either reads or extracted features; reads were recommended, so this method was implemented.At this stage, results were split into separate files for CG, CHG, and CHH subcontexts by modifying METEORE split_freq_file_ by_5mC_motif_sm.py.METEORE scripts were then used to standardize per-site and per-strand formats as described for Nanopolish.
The CG file alone was formatted as input for the consensus script.
Consensus predictions were generated using a Random Forest with provided models optimized at n-estimator = 3 and max_dep = 10.

| Statistical analysis and visualization
BEDTools v2.29.0 (Quinlan & Hall, 2010) was used to create 1 Mb windows and 100 Kb windows with 50 Kb overlaps across the genomes.It was then used to map methylation to these windows and calculate the frequency mean.BEDTools was also used with gene annotation files to intersect the gene regions and count methylated sites within the region.Plotting of chromosomal distributions was conducted with karyoplotR v1.21.3 (Gel & Serra, 2017).Pearson correlation was calculated with R Core Stats Package v4.2.0 (R Core Team, 2013).Statistics and summaries were used to compare across reference genomes within Acer (McEvoy et al., 2021), as well as Populus (Hofmeister et al., 2020), Quercus (Sork et al., 2022), and the 34 angiosperms surveyed by Niederhuth et al. (2016).Rapidly expanding and contracting families within A. negundo and A. saccharum were plotted along the distributions of methylation frequency and TE coverage.A total of 245 candidate genes associated with calcium response in A. saccharum were also investigated.
Scripts for all methods are available at doi: 10.5281/ zenodo.10659005.

| Sequencing
The ONT sequencing of A. negundo resulted in 48.5 Gb (115x coverage) in 4.29 M reads with a read N50 of 16.3 Kb.Re-basecalling resulted in an expected loss, resulting in 29.6 Gb (70x) in 2.55 M reads with an N50 of 16.2 Kb.After contaminant (56 K reads) and length (150 bp PE) produced 148x coverage of 52.9 Gb bases in 563 M read pairs and trimming reduced this to 139x (526 M total read pairs).

| Assembly and annotation
After the initial draft assembly, A. negundo had a total length of 421 Mbp in 421 contigs and an N50 of 2.16 Mb.BUSCO embryophyta genes were 96.0%complete, with 2.9% of these in duplicate.Polishing only minimally reduced the length and N50 (File S1).
Scaffolding with the original genome increased the length slightly, but it remained within 421 Mbp, and the N50 grew to 33 Mb.The assembly was in 32 scaffolds, with 13 chromosomes representing 99.7% of the assembled length.The genome size remained constant, though slightly smaller, than the published reference (442 Mbp) and closer to the kmer-based estimate of 319 Mbp (McEvoy et al., 2021).
At 96%, the final complete assembly BUSCO scores remained similar to the draft, with duplicates dropping slightly to 2.8%, 0.7% fragmented, and 3.3% missing (Table 1, File S1).The lower duplicate value was an improvement from the original reference duplication of 3%, although the completed single copy dropped slightly.Structural annotation of the new genome identified 27,541 genes, of which 23,408 were functionally annotated by either similarity search or gene family assignment.The BUSCO score for annotated proteins was 92.1% complete, with 3% in duplicates (Table 1).
The A. saccharum draft assembly had a total length of 571 Mbp in 1194 contigs and an N50 of 805 Kb.BUSCO embryophyta genes were 96.0%complete, with 6.6% of those in duplicate.Similar to A. negundo, polishing slightly reduced the total length and N50 in this genome.The BUSCO duplicate score increased by 0.1%.The BUSCO of the scaffolded genome was 96.2% complete, with 6.5% in duplicate, 0.7% fragmented, and 3.1% missing.The assembly had a total length of 571 Mbp in 160 contigs and an N50 of 41 Mb (Table 1,   File S1).Scaffolding resulted in 13 chromosomes that represented 96.8% of the assembled length.The genome size dropped from the published reference of 626 Mbp, a more substantial decrease than seen in A. negundo, and below the original kmer-based estimate of 636 Mbp.A total of 35,834 genes were identified, and 29,858 were associated with functional information.The resulting annotation BUSCO score was 91.8% complete, containing 8.5% duplicates The first versions of the Acer genomes were produced exclusively with deep-coverage PacBio reads (PacBio Sequel II, >100X), and resulted in fairly contiguous references that assembled to chromosome-scale with the addition of HiC libraries (McEvoy et al., 2021).The two new accessions, assembled in a hybrid manner and scaffolded with the published PacBio references as described above, were smaller than the original references (Table 1).The difference in size could be structural variation between the different genotypes, but more likely reflects some of the differences in read inputs and methodology (i.e.assembler). A. negundo was most similar to its original genome, with an improved duplication rate, and the missing genes identified (~3 K) were not specific to the new genome.
TA B L E 1 Assembly and annotation statistics for new and original genomes.A. saccharum had an increase in BUSCO-estimated duplication and 4 K original genes were not observed in the new assembly.Similar to the assembly, differences in final gene number may partially reflect variation in the BRAKER software used for prediction.Wholegenome alignment between the new and original versions revealed that, of the existing assembled sequence, no major discrepancies were present in either species (Figure 2a).Links between syntenic regions of the original genomes in both species showed similarities in spite of the larger genome size of A. saccharum (Figure 2b).
Studies on transgenerational inheritance of these forms of 5mC modifications in Arabidopsis determined that asymmetric CHH must be re-established de novo, while symmetric CG and CHG methylation is maintained (Hsieh, 2016).Using the consensus-based approach from METEORE, we combined DeepSignal-Plant with one of the best implementations of an HMM-based approach, Nanopolish, leveraging the strengths of both to detect methylation across all contexts in ONT sequencing (Yuen et al., 2021).were quite similar (~70%), and the new A. saccharum genome was higher, at 75%.Nanopolish results followed a similar trend, but  3).

| Comparative methylomes
Independent estimates from DeepSignal-Plant and Nanopolish were on the higher end of the spectrum of global methylation levels reported for 34 angiosperms in Niederhuth et al. (2016).Consensus results were closer, but still higher than many angiosperms, and both WGBS and ONT-based approaches (Nanopolish).The direct CG methylation profiling using the ONT reads was strongly correlated (>93%) with their WGBS data for the same accession.In addition, ONT long-reads identified centromeric repeats and accompanying hypermethylated signal (Perumal et al., 2020).
The distribution of methylation, genes, and TEs is of interest due to the implications for evolutionary strategies involving methylation's regulation of genes, TEs, and the boundary regions between.
Many plant species exhibit a pattern of gene density along the arms of the chromosome and TE density in centromeric regions (Comai et al., 2017).As an exception, Sork et al. (2022)  The Pearson correlation (R) of genes to methylation for each sequence context was plotted along with the previous analyses and a rudimentary phylogeny (Niederhuth et al., 2016;Sork et al., 2022) (Figure 4a).In Acer, which is evolutionarily diverged from Quercus, there is a moderate negative correlation between CHH methylation and genes-almost 50%-similar to about half of angiosperms surveyed, while the remaining angiosperm species have stronger negative correlation (~75%     (Li et al., 2022;Ma et al., 2020;Yang et al., 2019;Yu et al., 2021).Exceptions to this include two maples, A. palmatum and the polyploid A. rubrum, that host more Gypsy elements (Chen et al., 2023;Lu et al., 2022).Notable exceptions to the Gypsy dominance have also been observed in Theobroma cacao, Vitis vinifera, Musa, Rhizophora apiculata, and Cucumis sativus (Argout et al., 2011;Castanera et al., 2019;Moisy et al., 2008;Pratama et al., 2021;Wang et al., 2018).As more genomes become available, more variation in LTR ratios has been noted within and across genera (Zagorski et al., 2020).It is possible that some of this new variation also reflects improvements in long-read sequencing to resolve and quantify these elements, as seen with Cucumis melo (Castanera et al., 2019;Ruggieri et al., 2018) Copia elements were slightly less methylated compared to Gypsy, particularly in the CG and CHG contexts, with the largest difference observed in the lower CHG methylation of Copia elements in A. saccharum.Copias were also less methylated in the CHH context of the Q. lobata bud tissue (Figure 6b).In both Acer, the reduction of CHH methylation in Copia is primarily observed in the up and downstream regions.Interestingly, the greatest frequency of Copia CHH methylation for each species is highest in A. negundo (~14%), followed by A. saccharum (~13%), and only ~7% in Q. lobata.Greater methylation is commonly observed in Gypsy elements and could be attributed to their tendency to reside in heterochromatic regions (Wang et al., 2018).

Methylation across the characterized and dominant Copia
subfamilies was also examined (Figure 6a).Of the shared Copia lineages, copy number ranged from Tork which, similar to A.
truncatum (Ma et al., 2020), had the highest representation in A.  et al., 2019;Wang & Baulcombe, 2020;Wang et al., 2018).CHH methylation among Bianca, represented in far fewer copies, was the highest compared to other Copia subfamilies in both species and demonstrated the traditional pattern in A. saccharum and one less characteristic (more repeat body methylation) in A. negundo.
In both Acer, Sire had less methylation than other Copia subfamilies in all sequence contexts except A. saccharum CHH.In A. saccharum, the CHH methylation frequencies of Oryco/Ivana and Ale/ Retrofit were even lower than Sire, and while the copy number of Oryco/Ivana was also lower, Ale/Retrofit was interesting in its overall prevalence in the genome, with 4035 and 3488 divergent copies for A. negundo and A. saccharum, respectively.This pattern of high Ale sequence diversity and more even distribution through the genome was observed in a population level assessment of Setaria italica (Suguiyama et al., 2019).The low CHH methylation observed for Ale/Retrofit elements could be a reflection of   sufficient suppression by CG and CHG elements, or it could be ascribed to the inherent structure of these TEs which may influence methylation levels.In Brachypodium distachyon, GC content was low in Sire and Bianca, which was reflected in lower CG and CHG levels, but high in Ale/Retrofit with GC content that was decreasing with age due to deamination of cytosines (Stritt et al., 2020).
Studies on CHH methylation in Arabidopsis have characterized the unique pathways responsible, and their TE targets (Bouyer et al., 2017).The CMT2 pathway targets the LTRs (primarily Gypsy and Copia) (Sasaki et al., 2019) and the RdDM pathway targets Class I TEs, specifically RC/Helitron and MULE-MuDR.In Acer, CHH methylation in DNA transposons was strongly increased (Figure 6a), pointing to a functional role of asymmetric methylation in DNA transposon silencing (Zakrzewski et al., 2017).Among DNA transposons, DNA/MULE-MuDR elements were most highly methylated, except for the CHH context in A. saccharum, where it was similar in frequency to other TE families.MULE-MuDR elements contribute to genome crossovers as they are prone to meiotic double-strand breaks (Underwood & Choi, 2019).The insertion of MuDRs in maize increases meiotic crossover rates and these primarily occur in regions of open chromatin (Liu et al., 2009).The loss of CG and non-CG methylation over MuDr elements in Arabidopsis mutants led to an increase in crossovers that were enriched in proximity to immunity genes of large and diverse families with extensive variation (Choi et al., 2018).DNA/PiF-Harbinger elements were also highly methylated across the gene body with the exception of CHH methylation in A. negundo.CHH methylation of PiF-Harbinger has a bimodal distribution, with peaks flanking the element in both Acers, similar to Q. lobata (Figure 6b), but the overall frequency of CHH methylation is much higher in oak (~30%) which may be a tissue-specific pattern.PiF-Harbinger elements only cover a very small portion of the Acer genomes at ~0.18%, but they are widespread throughout plant lineages, at times in great abundance (~1%) relative to most DNA transposons as observed in Arabidopsis thaliana, Oryza sativa, and Brassica oleracea (Zhang et al., 2004).They have been associated with important adaptive roles, including intronic Harbinger elements that contain MYB elements that regulate stress response to light in Solanum lycopersicum (Deneweth et al., 2022), and a Harbingerderived gene that coordinates histone modifications that alter panicle number and grain size in O. sativa (Mao et al., 2022).
Among non-LTR retroelements, SINEs were heavily methylated (>40% for CHH) in Quercus bud tissue (Figure 6b) but not frequently identified in the Acer for comparison (File S1).SINEs are derived from tRNA and demonstrate low sequence similarity which may be responsible for their poor annotation (Wenke et al., 2011).Recent analysis across eight grass genomes examined patterns in mCHH islands and found significant enrichment of these islands in DNA transposons (DNA/MULE-MuDR, PiF-Harbinger) as well as non-LTR retroelements (SINEs) (Martin et al., 2021).
Methylation frequencies across gene regions in Acer were somewhat different than that observed in Q. lobata, but there is a spectrum of variation observed across angiosperms, as seen in Niederhuth et al. (2016) (Figures 4a and 7a,b).The two Acer species were very similar, but A. negundo, which has the smaller genome, had a slightly higher frequency of CG methylation.The higher frequencies across genes in the original genomes compared to the new genomes may be due to differences in average gene length, as seen in Q. lobata (Figures 7a and S4).Both Acer species also shared the same sharp increase in CHH just upstream of the transcription start site, which was greater in A. negundo.
CHH levels were highest in the 5 Kb region upstream of the transcription start site (TSS) (Figure 7a,b).The next largest fraction was in the first intron, where both new Acer genomes had a significant drop in the number of methylated sites, especially for those at 90% methylation (Figure 7b).This pattern was seen in other introns, but at lower levels, while the levels in the 5 Kb region downstream of the transcription end site (TES) were consistent across frequencies.In certain plant genes-often highly expressed genes-the first intron contains regulatory elements, though it is observed much less frequently than upstream promoters (Rose, 2018).

| Methylome and gene regulation
By combining distributions of methylation, select TEs, and gene density, trends amongst the elements can be observed.In Figure 8, each row contains chromosomes that are largely syntenic between A. negundo (ACNE) and A. saccharum (ACSA), as seen in Figure 2b, with all chromosomes from each genome available in Figure S5 for more detailed viewing.Regions of low gene density are co-located with peaks of LTRs and methylation in all contexts, including CHH.
It is likely these contain centromeres and pericentromeric regions.
Included in the gene density track (Figure 8) are genes from families previously identified as significantly expanding (26 in A. negundo, 99 in A. saccharum) or significantly contracting (52 in A. negundo, 18 in A. saccharum) when evaluated in terms of gene family evolution across 22 land plant species (McEvoy et al., 2021).Genes from rapidly evolving families are found in every chromosome except acsa2_011, though some contain many more than others (Figure S5).
The particular subset of chromosomes shown in Figure 8  The A. negundo chromosome (acne2_11) had the most genes from expanded families in that genome, but also contains some genes from contracted families.There is a notable drop in methylation over a narrow peak of gene density in both species that is greater in A. negundo.A. negundo also has spikes of high frequency rnd-4_family-434 that correspond to the greatest methylation frequencies and lowest gene densities.Rnd-4_family-434, the Copia subfamily with the burst of recent activity, does not appear to be proximal to rapidly evolving genes (Figure S5), but there are instances where the two overlap, as shown in acne2_11, where two genes from rapidly expanding families fall between the two spikes of rnd-4_family-434.
The second set of chromosomes, acsa2_017 and acne2_04, are also syntenic, but inverted, and gene density is concentrated in one arm of each species.In this distribution, rnd-4_family-434 is concentrated at the opposite end of the A. negundo chromosome where gene density is the lowest and methylation the greatest.This distribution pattern, weighted towards one chromosome arm, tends to be more common genome-wide in both species (Figure S5).The A.
negundo chromosome has several genes from contracted families and one from expanded, while A. saccharum is the inverse, which fits the predominant direction of rapidly evolving gene families for each species.expanding, and here, expanded gene families show an increase in CHH methylation in the gene body (coding regions) relative to other categories of genes in both species (Figure 7b), but it is greater in A.
negundo. A. saccharum primarily had expanding families, and here, genes from contracted families have distinct increases in CHH methylation upstream, downstream, and in the intronic space, regardless of position (Figure 7b).
Several studies have examined the relationship between methylation changes, metal toxicity, and the associated nutrient stress, in both model and non-model plant systems.One such study examined the aquatic plant Hydrilla verticillata and noted demethylation in response to copper, as well as blocked ROS interactions that could cause remethylation (Shi et al., 2017).In Hibiscus cannabinus, increased methylation was associated with increasing chromium levels (Ding et al., 2016).In Arabidopsis halleri, cadmium treatments increased CG methylation of genes responsible for symmetric methylation (Galati et al., 2021)  Gene expression results focused on candidates associated with nutrient stress (or heavy metal toxicity) were mapped to the new A. saccharum genome (240 in total; Figure 8).A total of 115 genes were downregulated in trees grown in aluminum-amended plots while 130 genes were up-regulated (in both cases, compared with trees grown in calcium-amended or control plots).These genes are distributed across all chromosomes at different densities and it is difficult to detect trends from this perspective.To further examine methylation of the 240 differentially expressed genes, they were compared with the patterns observed for the mean of all genes in Figure 7b and File S4.The subset of differentially expressed genes appeared enriched for CG and CHG methylation in the upstream regions when compared with the full gene space.The most noticeable difference, however, was the higher peak of CHH methylation frequency immediately preceding the transcription start site in the promoter region of differentially expressed genes (Figures S4 and 7b) which has been observed in grasses and other angiosperms (Martin et al., 2021).Values here peak at around 16% compared to 12.5% for the whole-genome mean.This value is comparable to the CHH methylation observed in Q. lobata bud tissue relative to catkins or young leaves, which have a greater portion of meristematic tissue, as mentioned in Sork et al. (2022), and are presumably undergoing rapid development.Increased methylation could be a sign of gene networks requiring more flexibility in expression to meet the challenges of development or biotic and abiotic stress (Lang et al., 2017).

| CON CLUS ION
Plants are dependent on methylation to regulate both short and long-term adaptive strategies (Sun et al., 2022).Best practices to de- . The resulting gDNA was checked for quality control via Thermo Scientific Nanodrop and then Agilent TapeStation (Agilent Technologies, Santa Clara, CA).Libraries were prepared with the ONT Genomic DNA by Ligation protocol with additional Covaris shearing to improve coverage (Oxford Nanopore Technologies, Oxford, UK).A single ONT Flongle sequencing run was conducted to evaluate the library quality prior to the PromethION run (A.negundo: 15 K reads, N50 16 Kb in 23 hours; A. saccharum: 39 K reads, N50 14 Kb after 21 hours).Two PromethION runs (one per individual) followed, using flow cell type FLO-PRO002, kit SQK-LSK110, and Guppy v4.0.11(Oxford Nanopore Technologies, 2020a) with the high-accuracy basecalling model and read filtering minimum qscore of 7. To reduce the error rate, particularly for methylation calling, the resulting FAST5s were re-basecalled with Guppy v5.0.16 (GPU) and ONT super accuracy model r9.4.1_450bps_sup.The same samples were short-read sequenced in a single Illumina NovaSeq 6000 SP v1.5 300 cycle run.TruSeq DNA Nano with Covaris shearing was used for Illumina library preparation in advance (Illumina Inc., San Diego, CA, USA).
filtering, 65x coverage with an N50 of 16.6 Kb remained.Illumina raw reads had 111x coverage as 46.6 Gb in 310 M reads.After trimming for adaptors, length, and quality, 104x coverage (146 M total read pairs) remained (File S1).Nanopore sequencing of A. saccharum resulted in 104.5 Gb in 9.8 M reads with an N50 of 15.1 Kb.Re-basecalling resulted in 56.7 Gb (99x coverage) in 5.2 M reads with an N50 of 15.2 Kb.Filtering (135 K contaminant reads) generated coverage of 93x with a read N50 of 15.5 Kb.Illumina sequencing F I G U R E 1 Stepwise method of methylation detection using the METEORE pipeline to create a consensus of DeepSignal-Plant and Nanopolish.
quantification has not yet been widely adopted in plants.Brassica nigra was among the first plants on which such analyses were conducted, with ONT reads enabling the demarcation of the centromeres(Perumal et al., 2020); an improved nanopore-sequenced radish genome provided the same centromere-level resolution(Cho et al., 2022).Both centromeres and stress responses were studied in Gossypium thurberi and Gossypium davidsonii through 5mC and 6 mA sites(Yang et al., 2021).The first release of DeepSignal was used with ONT reads to detect novel repeats and characterize the subtelomeres in algae(Chaux-Jukic et al., 2021).The DeepSignal-Plant tool utilized ONT data from A. thaliana, Oryza sativa, and B. nigra for development and benchmarking(Ni et al., 2021).This tool represents the first machine-learning approach for plants that can achieve accuracy for all three important 5mC states: CG, CHG, and CHH.The potential roles of methylation in each context are currently being studied in model and non-model systems, but it appears that non-CG methylation can have a stronger effect in silencing TEs, and CHH is responsible for regulating Independently assessed methylation levels for CG sites reported by Nanopolish and DeepSignal-Plant were both high relative to the consensus (Figure3).DeepSignal-Plant detected the most CG-methylation across all four genomes, as was seen in benchmarking studies comparing DeepSignal (v1) and Nanopolish on E. coli and H. sapiens data(Ni et al., 2019).The CG levels for both the new and original A. negundo and original A. saccharum F I G U R E 2 (a) Alignment of reference genomes comparing original (x-axis) and new (y-axis) assemblies for Acer negundo (ACNE) and Acer saccharum (ACSA).Blue indicates forward alignments, green indicates reverse, and orange is repeated alignments.(b) Gray lines show blocks of syntenic genes between A. negundo (orange) and Acer saccharum (blue) using the original reference genomes (reprinted from McEvoy et al. (2021)).The green area depicts gene density and purple is repeat frequency.| 7of 19 Given the reduced CG methylation levels of the consensus derived from the original A. saccharum genome, the methylation datasets based on the new genomes were selected as the best representation.As such, these genomes were used for downstream analyses conducted with the METEORE consensus results for CG methylation and DeepSignal-Plant results for CHG and CHH where no Nanopolish results exist.Figures displaying results for both new and original genomes are available in the supplementary material (Figures S1-S5).
in total to Fragaria vesca, Manihot esculenta, Vitis vinifera, Brachypodium distachyon, and Setaria viridis.WGBS in the form of MethylC-Seq was used in the Niederhuth et al. (2016) study, so increased signal detection could result from the long-reads, which are able to span low-complexity regions.A recent study compared short-read and long-read assemblies of the Brassica genome with demonstrated that Quercus lobata has a more uniform distribution of genes and CHH F I G U R E 3 Comparison of global methylation levels by context for DeepSignal-Plant, Nanopolish, and the METEORE-generated random forest consensus.methylation along the chromosome arm, similar to several Poaceae.
). A. saccharum does not have the same observable enrichment of CHH adjacent to centromeres as P. trichocarpa, nor is CHH as generally distributed as in Q. lobata.In both Acer species, there is a clear pattern for a preference of CTA, followed by CAA, along the lengths of the chromosome arms.Cytosine methyltransferases (CMT) are known to have sequence context preferences.In Arabidopsis, CMT2 prefers CHH sites, specifically CAA and CTA.However, little is known about these preferences outside of a handful of model species(Raju et al., 2019).3.5 | RepeatsLTR superfamilies Ty1-copia and Ty3-gypsy typically contribute the greatest proportion of TEs in land plants.Insertions of LTRs in and around genes can be responsible for alternative splicing, duplication, recombination, and epigenetic control(Galindo-González et al., 2017).Biotic and abiotic stress, as well as other external stimuli, can lead to TE movement and the rapid proliferation of several subfamilies and may result in polyploidization(Mhiri et al., 2022).The exact mechanisms of activation and repression are not fully understood, but generally involve methylation in any of three sequence contexts.Activation of TEs leads to an initial response of post-transcriptional gene silencing involving siRNA, which is followed by establishment and maintenance of silencing with DNA and histone methylation in RNA-directed DNA methylation (RdDM) pathways.RdDM methylation leads to chromatin modification making the TE inaccessible to transcriptional machinery(Erdmann & Picard, 2020).The intersection of methylation and TEs can be informative for understanding control of expression and architectural changes to the genome that may play a role in evolution.Repeat detection and classification with the LTR database, in addition to de novo libraries generated by RepeatModeler, resulted in a decrease of 9%-12% in the previously reported repeat coverages for the original A. negundo and A. saccharum genomes(McEvoy et al., 2021).The primary reduction was in unknown TEs -those not falling into LTR, DNA, LINE, SINE, or RC classes.LTR coverage decreased slightly (~0.6%), while the resolution on classification improved to the subfamily level.New and original genomes had small differences in counts and coverage of similar elements, while larger trends remained the same.
a) Modified from Sork et al. (2022) to add the Acer genomes.Acer has a negative correlation between methylation and gene content, similar to other species.In contrast, Quercus or Poaceae are less negatively correlated, or even positively correlated in some species.(b) Distribution of CHH methylation by subcontext across the longest chromosomes for Acer negundo and Acer saccharum.Populus and Quercus lobata, reprinted from Sork et al. (2022), are shown for comparison of distribution patterns, where Populus is localized and Q. lobata is generally distributed.Distributions for all chromosomes are in Figure S1.CHG distributions are in Figure S2 and gene densities plotted with all contexts in 100 Kb windows are in Figure S3.including A. pseudosieboldianum, A. catalpifolium, A. yangbiense, and A. truncatum, demonstrated the same trend, with greater representation of Copia elements . In the Acers, despite the fact that the original PacBio reads and the new ONT reads shared similar read N50, the inclusion of the longer ONT reads likely improved the resolution of longer elements.The average length of Copia and Gyspy elements did in fact increase by several hundred base pairs in A. negundo, and while the length of Copia elements in A. saccharum increased similarly, Gypsy elements decreased by 57 bp (File S4).Methylation across repeat superfamilies was plotted along with results fromSork et al. (2022) for comparison (Figure6a,b), but it should be noted that the Q. lobata tissue presented was bud, which contains undifferentiated meristem and had three times the CHH methylation compared to catkin and young leaf tissues.Among the LTRs in Acer, Copias were more frequent than Gypsy, with 4.8% more coverage across the A. negundo genome and 8.9% in A. saccharum.
saccharum (5330) and A. negundo (5086), to Sire which was most abundant in A. yangbiense (6624), but was present in far fewer copies in A. negundo (770) and A. saccharum (1008).In A. negundo, a significant contributor to the recent burst of Copia activity and the most abundant, rnd-4_family-434, had considerably higher CG and CHG methylation in up and downstream regions, and an unusual spike in CHH methylation across the repeat body.In most land plants assessed to date, CHH methylation is observed as a bimodal distribution with most activity in the LTRs themselves(Noshay

F I G U R E 5
Transposable element abundance for a) predominant repeat families and b) Copia subfamilies.The abundance of each repeat class or family, in terms of base pairs, is binned by divergence, with the least divergence representing putative recent activity.In Acer negundo, a novel Copia repeat identified here as rnd-4_family-434 contributes to the strong recent burst of activity.The other Copia subfamilies are provided as an inset for viewability.F I G U R E 6 (a) Genome-wide methylation distribution by sequence context and TE family.(b) Genome-wide CHH distribution (bud tissue) over TE regions, each line representing one of the largest superfamilies of each class.Reprinted from Sork et al. (2022).
provides a few different examples of distribution patterns of methylation and TEs and rapid gene family evolution in chromosomes.The top set of syntenic chromosomes, acsa2_004 (A.saccharum) and acne2_11 (A.negundo), shows a more even distribution of gene density rather than a concentration of genes in the chromosome arms.

F
Distribution of gene density, repeat families, methylation by context, CHH methylation subcontexts, gene family dynamics, and gene expression results for select chromosomes (all chromosomes available in Figure S5).Gene expression results are from aluminum and calcium treatments at Hubbard Brook Experimental Forest as detailed in McEvoy et al. (2021).X-axis indicates log2 fold change, while dot size represents the p-adjusted value.Letters mark examples as follows: (a) even distribution of gene density, (b) concentrated gene density, (c) Acer negundo chromosome with the most genes from expanded families (d) drops in methylation over spikes in gene density, (e) peaks of Copia rnd-r_family-434, (f) A. negundo chromosome with genes from contracted families.
tect this methylation leading to an understanding of integrated epigenomic data are important for a full understanding of the genomic mechanics in effect.To extend previous work on DNA methylation detection and integration, this study developed two improved reference genomes for two new accessions of A. saccharum and A. negundo.These hybrid assemblies benefitted from the inclusion of deep ONT coverage to both resolve long repeat elements and detect methylation.Methylation calling was conducted with ONT sequencing data and a custom pipeline.This pipeline, which leverages super accuracy base calling and detection of false positives, has broader application for other non-model plant genomes.Methylation frequencies and distributions were compared with other recent plant methylomes and revealed differences among the species where A. saccharum and A. negundo have a lower correlation of methylation and gene content than many angiosperms and fall in the middle of the spectrum of gene-dense arms with low methylation of P. trichocarpa and the generalized distribution of methylation in Q. lobata and Poaceae.In both Acer species, the repeat landscape was dominated by LTR/Copia elements.A recent burst of a novel Copia in A. negundo showed higher than average CG and CHG methylation up and downstream and an unusual pattern of strong CHH methylation in the repeat body.Gene family dynamics integrated with methylation data noted that individual genes in contracted families were associated with strong CHH methylation upstream, downstream, and in intronic regions.Expanded gene families had more CHH methylation in coding regions than contracted families.Among candidate genes previously associated with nutrient stress, patterns of methylation were variable, but had increased upstream methylation observed in all three contexts.Further investigations require parallel expression and tissue-specific studies, as well as pan-genome (population scale) analysis to understand how the methylome is contributing to genome evolution.
(Gallo-Franco et al., 2022) least negative correlation of genes to methylation for Acer.Less correlation of methylation with genes means more intersection in TE filled intergenic space, lending support to the hypothesis that in some species, the balance of CHH methylation, TE suppression, TE proximity to genes, and gene body methylation could indicate complex strategies playing a role regulating key genes at TE boundary regions, as observed in grasses(Martin et al., 2021)and Oryza(Gallo-Franco et al., 2022).
(Sork et al., 2022)f CHH subcontexts across chromosomes detected differences between Q. lobata, in which CHH is broadly distributed, and Populus trichocarpa, in which CHH is localized around the centromere and decreases across the gene-dense chromosome arms(Sork et al., 2022).Acer presents an intermediate pattern, particularly A. saccharum (Figures4b and S1 Percent of Acer negundo and Acer saccharum genomes masked by TE class and Copia families.
TA B L E 2 McEvoy et al. (2021)y distribution across protein coding regions, shown by genome, tissue type, and sequence context, (b) Methylation frequencies in different genic regions (1 Kb upstream of TSS, CDS, first intron, other introns, and, 1 Kb downstream of the TES).ACNE/ACSA: genome-wide means, Con: rapidly contracting genes (ACNE: 52, ACSA: 18), Exp: rapidly expanding genes (ACNE: 26, ACSA: 99), DE: differentially expressed genes.Exp and Con genes are fromMcEvoy et al. (2021), as well as DE, where genes were differentially expressed in response to calcium and aluminum treatment.Of the original 245 DE genes identified from stem tissue samples, 240 are presented here with methylation calculated from leaf tissue.
In the previous Acer study, A. negundo was shown to have more families associated with rapid contraction and very few F I G U R E 7 (a)