Cerebellar Granule Cells Develop Non-neuronal 3D Genome Architecture over the Lifespan

The cerebellum contains most of the neurons in the human brain, and exhibits unique modes of development, malformation, and aging. For example, granule cells—the most abundant neuron type—develop unusually late and exhibit unique nuclear morphology. Here, by developing our high-resolution single-cell 3D genome assay Dip-C into population-scale (Pop-C) and virus-enriched (vDip-C) modes, we were able to resolve the first 3D genome structures of single cerebellar cells, create life-spanning 3D genome atlases for both human and mouse, and jointly measure transcriptome and chromatin accessibility during development. We found that while the transcriptome and chromatin accessibility of human granule cells exhibit a characteristic maturation pattern within the first year of postnatal life, 3D genome architecture gradually remodels throughout life into a non-neuronal state with ultra-long-range intra-chromosomal contacts and specific inter-chromosomal contacts. This 3D genome remodeling is conserved in mice, and robust to heterozygous deletion of chromatin remodeling disease-associated genes (Chd8 or Arid1b). Together these results reveal unexpected and evolutionarily-conserved molecular processes underlying the unique development and aging of the mammalian cerebellum.

Global (whole-body) knock-out mouse lines of Arid1b or Chd8 were generated by first crossing conditional (floxed) mice from JAX-Arid1b (JAX 032061) (38) or Chd8 (JAX 031555)-with female E2a-Cre mice (JAX 003724), and then back-crossing their progeny with wild-type mice (C57BL/6J; JAX 000664) to remove Cre. Genotypes of the final animals were rigorously validated by examining sequencing reads in our bulk Dip-C data-a form of wholegenome sequencing-and confirming heterozygous deletions of the floxed exons (i.e., read depths reduced to ~50% at the deleted exons, and reads spanning the deletion junctions).
10x Genomics Multiome Cell nuclei were isolated from fresh frozen, postmortem human cerebellum or fresh mouse cerebellum following 10x Genomics Demonstrated Protocol "Nuclei Isolation from Complex Tissues for Single Cell Multiome ATAC + Gene Expression Sequencing" (CG000375 Rev B). Lysis time was optimized to 15 min (rather than 5 min) to prevent cell clumping at a later step (i.e., resuspension in PBS + 1% BSA + 1 U/uL RNase inhibitor for the second time).
For each donor/sample, ~10 k nuclei were loaded into a 10x Genomics Chromium Next GEM Single Cell Multiome ATAC + Gene Expression Kit following its user guide (CG000338 Rev E) with minor modifications: We used 10 and 8 PCR cycles (rather than 6 for ≥6 k nuclei) for cDNA amplification of human and mouse, respectively, and 20 uL (rather than 10 uL) of cDNA for GEX library preparation in Step 7.1 (Table S2).

Dip-C
Dip-C (2) was performed as in our previous study of the mouse forebrain (5), except that the 3C/Hi-C step was performed with the Arima 3C kit (which used a cocktail of 2 restriction enzymes for chromatin digestion). Detailed procedure can be found in our recent protocol paper (4). For some human experiments, protease inhibitor and RNase inhibitor were added during the nuclei isolation step to better preserve genome structure (Table S2); however, they had little effect on scA/B analysis (Fig. S9).

Pop-C
For each Pop-C reaction, a pool of cerebellar samples from 3-13 human donors or 2-8 mice were dropped into a single Dounce homogenizer (2-50 mL in size) and homogenized together (Table S2). We then proceeded with regular Dip-C (see section above). The only exception was the vDip-C sample, which was pooled after flow-sorting for mGreenLantern+ cells (i.e., 2 samples were sorted into the same FACS collection tube) and before the Arima 3C reaction.

vDip-C
To construct the vDip-C vector, mGreenLantern (13) and KASH (29) sequences were synthesized directly; the Pcp2 promoter was obtained by PCR with previously published primers (33) from genomic DNA of C57BL/6J mice, and therefore differed by a few strain-specific nucleotides from the original version. A full plasmid map can be found in Fig. S8.
Cell nuclei were isolated as in regular Dip-C (4). Nuclei were diluted (to avoid high FACS event rates) in ~10 mL PBS + 1% BSA (+ 300 nM DAPI for one of the 2 samples) and flow-sorted on a BD FACSAria to isolate the 0.2-0.5% mGreenLantern+ cells (i.e., Purkinje cells) (Fig. S8). FACS data was plotted with FlowCal (version 1.3.0) (44). A total of ~70 k cells were obtained from the 4 mice. Sorted cells were processed with the Arima 3C kit (and flow-sorted again) as in regular Dip-C.
Bulk Dip-C For bulk Dip-C of Arid1b and Chd8 mutant and control mice, half of the Arima 3C product was used for bulk DNA extraction, following the same step as extracting DNA from digestion and ligation controls in our recent protocol paper (4). Sequencing libraries were prepared with Illumina DNA Prep (formerly known as Nextera DNA Flex), following its user guide (1000000025416 v10). We used a DNA input of 200 ng and the suggested 5 PCR cycles (Table S2).

Sequencing
Libraries were sequenced (paired-end 150 bp) on an Illumina NovaSeq (or in some cases, HiSeq) by Novogene. Multi-ome libraries were trimmed to read lengths specified by 10x Genomics.

Cross-sample Integration and Analysis of the Transcriptome Portion with LIGER
The transcriptome portion of the 7 human donors were integrated with LIGER (version 1.0.0) (21) following the tutorial "Joint definition of cell types from multiple scRNA-seq datasets" (version: March 31, 2020; http://htmlpreview.github.io/?https://github.com/welchlab/liger/blob/master/vignettes/Integrating_multi_scRNA_data.html) with a minor modification: Direct input from Cell Ranger ARC did not work in our hands; instead, we first merged the 7 Seurat objects with "merge," which merged raw RNA count matrices, and then converted the merged Seurat object into a LIGER object with "seuratToLiger" (with parameter "combined.seurat = TRUE"), which re-normalized the raw data by default.
Correlated gene modules in immature granule cells ( Fig. 2A) were identified by visually examining the heatmap produced by "plotClusterFactors" and finding factors that were highly active at the 4 immature granule cell stages. For each factor, genes were sorted by LIGER weights ("liger_object@W").

Joint Transcriptome and Chromatin Accessibility Analysis with ArchR
For each sample/donor, RNA and ATAC data were jointly analyzed with ArchR (version 1.0.2) (22) following the tutorial "ArchR: 10x Multiome PBMCs" (https://greenleaflab.github.io/ArchR_2020/Ex-Analyze-Multiome.html). In particular, for maximum versatility, Arrow files were first generated from the ATAC output of Cell Ranger ARC ("atac_fragments.tsv.gz") using "createArrowFiles" without any initial filtering (Chapter 1.6 "Creating Arrow Files" of the manual). The RNA output of Cell Ranger ARC ("filtered_feature_bc_matrix.h5") was then added. Only cells with >500 UMIs (rather than 0; 500 is the value used by (18)), >2.5 k ATAC fragments (default in the tutorial), and TSS enrichment score >2 (rather than 6) were kept. We used reference genomes "hg38" and "mm10" for human and mouse, respectively. Doublet detection could not be performed because of a predominance of granule cells. We created an individual ArchR project for each sample/donor for subsequent analysis.
Cell type-specific marker genes were identified with "getMarkerFeatures" from the RNA data ("GeneExpressionMatrix") following Chapter 7.3 "Identifying Marker Genes" of the manual.
To color LIGER-annotated transcriptional cell types on ArchR plots (Fig. 2B), we manually stored the correspondence between cell IDs with LIGER cluster IDs, added this information to each ArchR object as "archr_project$liger_clusters," and plotted with "plotEmbedding" (with parameter "name = liger_clusters").
ATAC peaks were called with "addReproduciblePeakSet" following Chapter 10.2 "Calling Peaks w/ Macs2" of the manual. The ATAC count matrix ("PeakMatrix") were generated with "addPeakMatrix" following Chapter 10.4 "Add Peak Matrix" of the manul.
Enrichment of TFBS motifs (Fig. 2C) was analyzed following Chapter 13.1 "Motif Deviations" of the manual.

Analysis of Dip-C Data
Dip-C data were analyzed as in our previous study (5), with a minor modification (cells with <50 k contacts were filtered out as empty wells, and >2 m contacts as cell aggregates; previously: cells with <20 k contacts as empty wells, no filters for cell aggregates) and the following additions:

Demultiplexing of Pop-C Data
Sexes were demultiplexed by calculating the ratio between reads mapping to the X chromosome and reads mapping to autosomes, normalized by X chromosome and total autosome lengths ("X:A") with "samtools idxstats"-which exhibited a strong bimodal distribution and were separated by setting a threshold between the 2 modes (high X:A = female, low X:A = male). This could be alternatively achieved by calculating X:A for contacts rather than reads.
Human donors were demultiplexed with souporcell (24) (version 2.4)-which we adapted from transcriptome data to 3D genome (or whole-genome sequencing in general) data-following the manual with minor modifications. In particular, we did not re-map the BAM files, but rather directly used our BWA MEM-mapped BAM files from Dip-C, where cell IDs were already recorded as the read group ("RG") BAM tag. PCR duplicates were removed from each BAM file with "sambamba markdup" (version 0.7.0). For each Pop-C experiment (note that we computationally pooled human experiments 2 (cerebellum) and 3 (cerebral cortex) because they contained nearly the same set of donors, and human experiments 8 (cerebellum) and 9 (cerebral cortex)) ( Table S2), BAM files of all cells merged with "sambamba merge." Read count matrix at common SNPs in human populations ("common_variants_hg19.vcf.gz") was calculated with "vartrix" (version 1.1.20) (with parameter "--bam-tag RG" in addition to the default parameters in the manual "--mapq 30 --scoring-method coverage"). Finally, donors were demultiplexed with "souporcell" (with parameter "k" set to the number of pooled donors) and "troublet." We removed 2 cells ("dipc-human-experiment08-cerebellum_0843" and "dipc-human-experiment09-cortex_0426") that were classified at doublets after this step.
When demultiplexing the 13 donors from human experiments 8 and 9, we first demultiplexed by sex (see above) and then ran souporcell separately on the 2 sexes (4 female donors and 9 male donors), because running souporcell directly did not demultiplex 2 of the 13 donors.
In general, inclusion of empty wells (<50 k contacts) and cell aggregates (>2 m contacts) had no impact on souporcell except in one case, where a cell ("dipc-human-experiment06-cerebellum_0059") was classified as a singlet only when empty wells and cell aggregates were filtered out before running souporcell.

scA/B Dynamics across Maturation Stages
To identify dynamic genomic regions during granule cell maturation, we calculated the mean scA/B of each 1-Mb autosomal regions at each of the 5 maturation stages, and selected the top 10% regions with the highest between-stage variance in MATLAB (version R2022b). We then performed hierarchical clustering of these dynamic regions based on their mean scA/B at the 5 maturation stages, with "linkage_data = linkage(scab_matrix(dynamic_regions_ids, :), 'average', 'correlation')." To plot their centered scA/B heatmap (Fig. 5A), we ordered dynamic regions based on their hierarchical clustering, with "outperm" from "[H, ~, outperm] = dendrogram(linkage_data, 0)." To analyze scA/B of individual genes (Fig. 5B), mid-points of genes were calculated from the GENCODE annotations "gencode.v42lift37.annotation.gtf.gz" and "gencode.vM25.annotation.gtf.gz" for human and mouse, respectively.  20,40,50,70,80,90, 100 (same as the main text), and 150. Cells were colored by transcriptional cell types from the main text (i.e., identified with k = 100). Regardless of the value of k, transcriptionally immature cerebellar granule cells (stage T1-T4: dashed outlines; lighter shades of purple) were consistently visualized as relatively distinct clusters on the t-SNE plot, although boundaries were variable because of the continuous nature of their maturation.

Fig. S2. Correlated gene modules in transcriptionally immature cerebellar granule cells.
We identified 6 correlated gene modules A-F from LIGER factors that were specifically expressed at each transcriptional (T) stage of granule cell maturation. The top row shows the expression pattern of each module, quantified by LIGER metagene expression (weighted average of genes in each module) on the t-SNE plot. The middle rows show the top 10 genes (ranked by LIGER weights) and enriched pathways (summarized GO terms for the top 50 genes) for each module. The bottom row connects each module to its corresponding T stages (solid line: most expressed; dashed line: also expressed). Module A was enriched for the pathway of neuron projection guidance (FDR = 3 × 10 -4 ), included genes such as BOC, FOXP2, ADAMTS6, and was expressed at stage T1. Modules B (various ribosome subunits) and C were expressed at both stages T1-T2. Module D was enriched for cytoskeleton (FDR = 2 × 10 -9 ) and nervous system development (FDR = 9 × 10 -8 ), included genes such as TUBB3/2B/1A, SOX4/11 (transcription factors implicated in Coffin-Siris syndrome), TSMB10/4X, and was expressed at stage T2. Module E was enriched for telencephalon development (FDR = 7 × 10 -5 ) and cell projections (FDR = 3 × 10 -4 ), included genes such as ERBB4, PRDM8 (a histone H3K9 methyltransferase implicated in epilepsy), GRIA2, and was expressed at stage T3. Finally, module F was enriched for cell adhesion (FDR = 8 × 10 -4 ) and synapses (FDR = 4 × 10 -7 ), included genes such as CHRM3 (a muscarinic acetylcholine receptor), HS3ST4, CNTN5 (a contactin implicated in Coffin-Siris), CNTNAP2 (a neurexin implicated in multiple neurodevelopmental disorders including autism), and was expressed at stage T4-the most abundant immature type. Note that LIGER did not identify gene modules for the mature stage T5.            Fig. 5D but with Chd4 knock-out rather than Arid1b or Chd8. Bulk Hi-C data after conditional (driven by Gabra6-Cre), homozygous deletion of Chd4 was from (15). Although Chd4 deletion moderately changed 3D genome (bottom right), these changes had little overlap with and had a much smaller magnitude than our observed architectural maturation (top right). Bin sizes was 250 kb (top) or 500 kb (because published data did not contain 250 kb resolution; bottom).  (5), but plotting cerebellar granule cells (first row: neonatal; fourth row: adult) and forebrain neurons (second row: neonatal; fifth row: adult) side by side. Despite a lack of neuron-specific non-CpG methylation (16)-which we found to occur simultaneously with and anti-correlate with inward genome movement in forebrain neurons-cerebellar granule cells still exhibited inward genome movement (e.g., Chr 7). Table S1. Information about each human donor. Table S2. Information about each multi-ome and 3D genome experiment. Table S3. Number of cells from each human donor in each 3D genome experiment. Table S4. Information about each Dip-C cell. Table S5. Lists of cell type-specific marker genes.