Measuring the long arm of childhood in real-time: Epigenetic predictors of BMI and social determinants of health across childhood and adolescence

Children who are socioeconomically disadvantaged are at increased risk for high body mass index (BMI) and multiple diseases in adulthood. The developmental origins of health and disease hypothesis proposes that early life conditions affect later-life health in a manner that is only partially modifiable by later-life experiences. Epigenetic mechanisms may regulate the influence of early life conditions on later life health. Recent epigenetic studies of adult blood samples have identified DNA-methylation sites associated with higher BMI and worse health (epigenetic-BMI). Here, we used longitudinal and twin study designs to examine whether epigenetic predictors of BMI developed in adults are valid biomarkers of child BMI and are sensitive to early life social determinants of health. Salivary epigenetic-BMI was calculated from two samples: (1) N=1,183 8-to-19-year-olds (609 female, mean age=13.4) from the Texas Twin Project (TTP), and (2) N=2,020 children (1,011 female) measured at 9 and 15 years from the Future of Families and Child Well-Being Study (FFCWS). We found that salivary epigenetic-BMI is robustly associated with children’s BMI (r=0.36 to r=0.50). Longitudinal analysis suggested that epigenetic-BMI is highly stable across adolescence, but remains both a leading and lagging indicator of BMI change. Twin analyses showed that epigenetic-BMI captures differences in BMI between monozygotic twins. Moreover, children from more disadvantaged socioeconomic status (SES) and marginalized race/ethnic groups had higher epigenetic-BMI, even when controlling for concurrent BMI, pubertal development, and tobacco exposure. SES at birth relative to concurrent SES best predicted epigenetic-BMI in childhood and adolescence. We show for the first time that epigenetic predictors of BMI calculated from pediatric saliva samples are valid biomarkers of childhood BMI that are sensitive to social inequalities. Our findings are in line with the hypothesis that early life conditions are especially important factors in epigenetic regulation of later life health. Research showing that health later in life is linked to early life conditions have important implications for the development of early-life interventions that could significantly extend healthy life span.


DNA-methylation
DNA-methylation preprocessing Texas Twins. Saliva samples were collected during a laboratory visit using Oragene kits (DNA Genotek, Ottawa, ON, Canada). DNA extraction and methylation profiling was conducted by Edinburgh Clinical Research Facility (UK). The Infinium MethylationEPIC BeadChip kit (Illumina, Inc., San Diego, CA) was used to assess methylation levels at 850,000 methylation sites.
DNA-methylation preprocessing was primarily conducted with the 'minfi' package in R (Aryee et al., 2014). Within-array normalization was performed to address array background correction, red/green dye bias, and probe type I/II correction, and it has been noted that at least part of the probe type bias is a combination of the first two factors (Dedeurwaerder et al., 2014). Noob preprocessing as implemented by minfi's "preprocessNoob" (Triche et al., 2013) is a background correction and dye-bias equalization method that has similar within-array normalization effects on the data as probe type correction methods such as BMIQ (Teschendorff et al., 2013).
CpG probes with detection p > 0.01 and fewer than 3 beads in more than 1% of the samples and probes in cross-reactive regions were excluded (Pidsley et al., 2016). None of these failed probes overlapped with the probes used for epigenetic predictors. 44 samples were excluded because (1) they showed low intensity probes as indicated by the log of average methylation <9 and their detection p was > 0.01 in >10% of their probes, (2) their self-reported and methylation-estimated sex mismatch, and/or (3) their self-reported and DNA-estimated sex mismatch. Cell composition of immune and epithelial cell types (i.e., CD4+ T-cell, natural killer cells, neutrophilseosinophils, B cells, monocytes, CD8+ T-cell, and granulocytes) were estimated using a newly developed child saliva reference panel implemented in the R package "BeadSorted.Saliva.EPIC" within "ewastools" (Middleton et al., 2021). Surrogate variable analysis was used to correct methylation values for batch effects using the "combat" function in the SVA package (Johnson et al., 2007).
DNA-methylation preprocessing FFCW. DNA extraction and methylation profiling for FFCW was conducted by the Notterman Lab of Princeton University and the Pennsylvania State University College of Medicine Genome Sciences Center. Due to the timing of assay completion 40% of the FFCW saliva samples were completed using the Illumina 450K chip and the remaining 60% used the Illumina EPIC chip. Methods for the two chips were standardized as much as possible, but all analyses were run separately for 450 and EPIC and then meta-analyzed. 450K DNA-methylation image data were processed in R statistical software (4.1) using the ENmix package (Xu et al., 2016). The red and green image pairs (nsamples =1811) were read into R and the ENmix preprocessENmix and rcp functions were used to normalize dye bias, apply background correction, and adjust for probe-type bias. The majority of sample filtering was applied using the ewastools packages (Just & Heiss, 2018). Samples were excluded using the following criteria: if >10% of DNA-methylation sites had detection p-value >0.01 (nsamples =34), if there was sex discordance between DNA-methylation predicted sex and recorded sex (nsamples =11), or if two sequential samples from the same individual exhibited genetic discordance between visits (nsamples =27).
ENmix QCinfo function identified samples with outlier methylation values which were cut (nsamples =6). Technical replicates were removed nsamples =49). This gave us our final analytic sample (n=1684). DNA-methylation sites were removed if they had detection p-value >0.01 in 5% of samples (n=33,376). Relative proportions of immune and epithelial cell types were estimated from DNA-methylation measures using a childhood saliva reference panel (Middleton et al., 2021).
EPIC DNA-methylation image data were processed in R statistical software (4.1) using the ENmix package (Xu et al., 2016).
The red and green image pairs (nsamples =2558) were read into R and the ENmix preprocessENmix and rcp functions were used to normalize dye bias, apply background correction, and adjust for probe-type bias. The majority of sample filtering was applied using the ewastools packages (Just & Heiss, 2018). We dropped samples using the following criteria: if >10% of DNA-methylation sites had detection p-value >0.05 (nsamples =63), if there was sex discordance between DNAmethylation predicted sex and recorded sex (n=12), or if two sequential samples from the same individual exhibited genetic discordance between visits (n=30). ENmix QCinfo function identified samples with outlier methylation values which were cut (n=1) or samples that failed bisulfite conversion (nsamples =7). Technical replicates were removed (n=168). This gave us our final analytic sample (nsamples =2277). DNA-methylation sites were removed if they had detection pvalue >0.05 in 5% of samples (n=127,275). Relative proportions of immune and epithelial cell types were estimated from DNA-methylation measures using a childhood saliva reference panel (Middleton et al., 2021).

Genetics
Texas Twins genotyping, imputation, and preprocessing. DNA samples were genotyped at the University of Edinburgh using the Illumina Infinium PsychArray, which assays ~590,000 single nucleotide polymorphisms (SNPs), insertions-deletions (indels), copy number variants (CNVs), structural variants, and germline variants across the genome. Genetic data was subjected to quality control procedures recommended for chip-based genomic data (Anderson et al., 2010;Turner et al., 2011). Briefly, samples were excluded on the basis of poor call rate (< 98%) or inconsistent self-reported and biological sex, while variants were excluded if missingness exceeded 2%. As further variant-level filtering has been shown to have a detrimental effect on imputation quality (Roshyara et al., 2014), quality control thresholds for minor allele frequency (MAF) and Hardy-Weinberg equilibrium (HWE) were applied after phasing and imputation. Untyped markers were imputed on the Michigan Imputation Server (https://imputationserver.sph.umich.edu).
Specifically, genotypes were phased and imputed with Eagle v2.4 and Minimac4 (v1.5.7), respectively, while using the 1000 Genomes Phase 3 v5 reference panel (Auton et al., 2015). To ensure that only high-quality typed and imputed markers were used for analysis, variants were excluded if they had a MAF < 1e-3, a HWE p-value < 1e-6, or an imputation quality score < .90.
These procedures produced a final set of 4,703,309 genetic markers to be used in analyses. to identify analytic genomic group outliers and to provide sample eigenvectors as covariates in the statistical model used for association testing to adjust for possible population stratification. SNPs used for PC analysis were selected by linkage disequilibrium (LD) pruning from an initial pool consisting of all autosomal SNPs with a missing call rate < 2% and minor allele frequency (MAF) > 5%, and excluding any SNPs with a discordance between HapMap controls genotyped along with the study samples and those in the external HapMap data set. In addition, we excluded the HLA, 8p23, and 17q21.31 regions from the initial pool. We categorized participants through PC analysis using the aforementioned filtering criteria into analytic genomic groups based on genome-wide SNP similarity to genomic reference groups (commonly referred to using geographical labels as European, n=475; Admixed African, n=1640; Admixed Hispanic, n=959 groups). Then, PC analysis was run again within each group to create sample eigenvectors for covariates in the statistical model used for association testing to adjust for possible population stratification within each analytic group (i.e., within-analytic-group PCs).

Polygenic scores of BMI.
Genetic data was used to calculate a polygenic score of BMI (PGS-BMI). The PGS-BMI is an approximate indicator of an individual's genetic liability for developing high levels of BMI. We computed polygenic scores of BMI on the basis of a meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of high genetic similarity to European reference groups (Pulit et al., 2019). Polygenic scores were residualized for the top 10 principal components of genetic similarity to reference groups as in FFCW. PGS-BMI analyses were restricted to individuals of high genomic similarity to European and "Ad Mixed American" reference groups in order to reduce the risk of spurious findings due to population stratification (see https://useast.ensembl.org/Help/Faq?id=532).

Cross-sectional analyses show that salivary epigenetic-BMI provides complementary information to measured genetic variants
Previous genome-wide association studies of adult BMI have estimated correlations with measured DNA sequence variants. This information can be applied in new samples to compute polygenic scores, which represent an individual's aggregate genetic liability toward developing high BMI (PGS-BMI). PGSs have limitations: They also capture environmental processes such as uncorrected population stratification, "genetic nurture," and gene-environment correlations, and they do not capture the effects of all genetic variation, including rare variants (Raffington et al., 2020;Young et al., 2019). Nonetheless, in previous work in adults, epigenetic-BMI and PGS-BMI provided complementary prediction with respect to BMI (Shah et al., 2015;Trejo Banos et al., 2020). One previous pediatric study of epigenetic-BMI in blood also found that epigenetic-BMI and PGS-BMI measures captured largely independent variation in child and adolescent BMI (Reed et al., 2020).
Due to different patterns of linkage disequilibrium across ancestries, and possible gene × environment interactions, it is expected that PGS will have imperfect portability to groups with disparate ancestries relative to the discovery sample (Martin et al., 2019). Consequently, we report analyses separately by DNA-based ancestry estimates (Supplemental Table S6). We regressed BMI z-scores on PGS-BMI residualized for principal components of genetic similarity to reference groups (see Methods). In both cohorts and all ancestral groups, children with higher PGS-BMI had higher BMI (see Supplementary Table S6, Supplementary Figure S1).
Next, we tested whether epigenetic-BMI remained associated with childhood BMI after accounting for PGS-BMI and it did (see Supplementary Table S6). Consistent with results from previous studies (Reed et al., 2020;Shah et al., 2015;Trejo Banos et al., 2020), the variation in BMI explained by epigenetic-BMI and PGS-BMI was largely additive (see Supplementary Figure S1). Figure S1. Measures of epigenetic-BMI and polygenic scores (PGS-BMI) uniquely contribute to prediction of BMI. Participants were categorized into genetic analytic groups based on similarity to genomic reference groups (EA, HA, AA) due to risk of social stratification confounding in PGS analyses (see Supplemental Methods). Results presented for 8-to 18-yearold children from the Texas Twin Project (TTP), and in 9-year-old children and 15-year-old children from Future of Families and Child Wellbeing Study (FFCWS).

Figure S2. Cross Lagged Model of epigenetic-BMI and BMI in FFCWS.
BMI is the CDC z-score for BMI for age. Epigenetic-BMI was residualized for cell distributions and technical artifacts and normalized with a mean=0, SD=1. Coefficients not shown are controls that influence age 9 BMI and age 9 epigenetic-BMI, including: self-reported race, sex assigned at birth, SES at birth, if mom smoked while pregnant (see Supplement Table S6). Standard deviations of the Residual error variances of BMI and epigenetic-BMI are in parenthesis after their variance estimate. *p<0.05, **p<0.01,***p<0.001 Figure S3. Cross Lagged Model of SES and epigenetic-BMI Profiles in FFCWS. Epigenetic-BMI was residualized for cell distributions and technical artifacts. SES and epigenetic-BMI are normalized with a mean=0, SD=1. Coefficients not shown are controls that influence age 9 SES and Age 9 epigenetic-BMI, including: self-reported race, sex assigned at birth, and if mom smoked while pregnant (see Supplemental  Table 10). Standard deviations of the residual error variances of SES and epigenetic-BMI are in parenthesis after their variance estimate. *p<0.05, **p<0.01,***p<0.001