Genome-Wide Analysis of Disordered Eating Behavior in the Mexican Population

Alterations in eating behavior characterized eating disorders (ED). The genetic factors shared between ED diagnoses have been underexplored. The present study performed a genome-wide association study in individuals with disordered eating behaviors in the Mexican population, blood methylation quantitative trait loci (blood-meQTL), summary data-based Mendelian randomization (SMR) analysis, and in silico function prediction by different algorithms. The analysis included a total of 1803 individuals. We performed a genome-wide association study and blood-meQTL analysis by logistic and linear regression. In addition, we analyzed in silico functional variant prediction, phenome-wide, and multi-tissue expression quantitative trait loci. The genome-wide association study identified 44 single-nucleotide polymorphisms (SNP) associated at a nominal value and seven blood-meQTL at a genome-wide threshold. The SNPs show enrichment in genome-wide associations of the metabolic and immunologic domains. In the in silico analysis, the SNP rs10419198 (p-value = 4.85 × 10−5) located on an enhancer mark could change the expression of PRR12 in blood, adipocytes, and brain areas that regulate food intake. Additionally, we found an association of DNA methylation levels of SETBP1 (p-value = 6.76 × 10−4) and SEMG1 (p-value = 5.73 × 10−4) by SMR analysis. The present study supports the previous associations of genetic variation in the metabolic domain with ED.


Introduction
Eating disorders (ED) significantly impact individuals' physical and mental health [1][2][3]. The manifestation of disordered eating behaviors characterizes ED. Disordered eating is the presence of aversive food-related behaviors, including fasting, vomiting, binge eating, and food intake restriction [4,5]. The three primary ED diagnoses are anorexia nervosa, bulimia nervosa, and binge-eating disorder [6]. Individuals diagnosed with ED have different disordered eating patterns, which could change over time [7][8][9]. Even when the diagnosis criteria for ED are well established [6], affected individuals may have a diagnosis crossover [10][11][12]. The diagnosis crossover could result from the close relationship between ED diagnosis criteria and the overlapping of symptoms [8]. Others suggest that this crossover could be an escalation mechanism, mainly in individuals diagnosed with the binge-eating disorder [13].
The etiology of ED is complex, with the interaction between environmental and biological risk factors [14]. Some identified risk factors are genetics, gender, dieting, or weight concerns, occurring early in life or adolescence [15]. In the last decade, different studies have explored the genetic risk factors that underlie many psychiatric disorders, but the analysis of the genetics of ED had been underexplored [16,17]. Furthermore, genomewide association studies until now had been performed only in individuals diagnosed with AN [18][19][20][21][22]. These studies suggest that AN has a genetic correlation with other psychiatric phenotypes or metabolic traits [18,22]. Nevertheless, the genetic factors shared between ED diagnoses are less explored.
The environment had a role in the etiology of ED. The translation of the environmental risk factors into molecular changes is the focus of epigenetics. DNA methylation is the most studied epigenetic molecular modification, and several studies have reported alterations in DNA methylation in individuals diagnosed with ED [23]. DNA methylation levels depend on the environment and genetic variation, and integrative analysis of genetic variants and DNA methylation levels could identify new biomolecules involved in the etiology of ED. Some authors had developed statistical frameworks to integrate these two sources of genomic information; an example is summary-based Mendelian randomization (SMR) [24]. However, the integration of these two sources of genomic information in individuals diagnosed with ED had not been performed in the Mexican population. The present work aimed to perform a genome-wide association analysis, blood methylation quantitative trait loci (blood-meQTL), summary data-based Mendelian randomization (SMR) analysis, and in silico function prediction in individuals diagnosed with disordered eating behaviors in the Mexican population.

Sample Population
The study included a total of 1803 individuals of Mexican descent, from three different samples: an adolescent clinical subsample from the Mexican Genomic Database for Cross-Disorder Research (n = 168, MeDaCrosR) [25,26], a clinical sample recruited from the ED Unit of the Instituto Nacional de Psiquiatría Ramón de la Fuente Muñiz (n = 166, INPRFM), and an epidemiological subsample from the Mexican Genomic Database for Addiction Research (n = 1469; MxGDAR) [27,28] (Table 1).  [29,30]. The cases of disordered eating were the groups from MeDaCrosR and INPRFM (n = 333). The population-based controls (n = 1802) were the individuals from the MxGDAR. The Diagnostic Interview for Psychosis and Affective Disorders (DIPAD) screening section was used to evaluate the individuals from MxGDAR [28,[33][34][35][36]. The study was performed based in the Helsinki declaration, and was revised by the ethic and investigation committees of the Instituto Nacional de Medicina Genómica (Approval Number CEI/2018/60), Instituto Nacional de Psiquiatría Ramón de la Fuente Muñiz (Approval Number: DGC-279-2008), and the Hospital Psiquiátrico Infantil Juan N Navarro (Approval number II3/01/0913). Every individual fulfilled and signed an informed assent (adolescents) and/or consent (parents/adults).

Genome-Wide Typing
We extracted DNA from blood and buccal epithelial samples with a modified saltingout method, implemented in the commercial kit Gentra Puregene (Qiagen, Readwood City, CA, USA). According to the manufacturer's protocol, genotyping was performed in the Instituto Nacional de Medicina Genomica with the commercial Infinium PsychArray Beadchip (Illumina, San Diego, CA, USA). Fluorescent intensities were measured with the iScan (Illumina, San Diego, CA, USA), transformed to genotypes with the GenomeStudio (Illumina, San Diego, CA, USA), and converted to Plink format files. Quality control of genotypes was performed in the Plink software [37,38], based on previously published protocols, briefly, we considered the following criteria: variant calling greater than 95%, a minor allele frequency (MAF) greater than 5%, a Hardy-Weinberg equilibrium Chi-square test p-value greater than 1 × 10 −5 , and remotion of variants with A/T or G/C alleles (to avoid the flip strand effect). Additionally, we excluded individuals with a genotype call rate of less than 95%. In addition, all individual pairs with an identity-by-state value greater than 1.6 were marked to correct for cryptic relationships, and the individual with the lowest genotype call rate was excluded.

Population Stratification
We evaluated population stratification using previously reported algorithms and the PC-AiR package [39]. The reference panel was the Human Genome Diversity Project (HGDP) [40]. We included only independent SNPs by filtering with linkage disequilibrium pruning (LD pruning), using the following parameters: a window size of 50 Kb, a step of 2, and a variance inflation factor of 5 implemented in Plink.

Genome-Wide Associations Analysis
We performed genetic associations through multiple logistic regressions, adjusted for age, sex, and ten components of global ancestry as covariables in Plink. We considered a p-value of 5.00 × 10 −5 as nominally associated, and a p-value of 5.00 × 10 −8 was considered statistically significant on the genome-wide level. After statistical contrasts, we removed all the variants with a MAF lower than 5.0% in cases or controls. The final results included a total of 223,189 SNPs.

Functional Prediction
An in silico functional annotation was also carried out of the associated SNPs using a variant effect predictor (VEP) [41]. A phenome-wide association study was performed in the GWAS atlas of the associated SNPs, and we considered a statistical significance threshold for a p-value < 5 × 10 −8 [42]. Pathway analysis was carried out with the online ComPath tool [43]. Finally, we searched the GTEx portal for in silico multi-tissue expression quantitative trait loci analysis (eQTL) results of the associated variants [44,45].

Methylation Quantitative Trait Loci Analysis
We performed blood methylation quantitative trait loci (blood-meQTL) analysis to estimate the possible impact of the associated SNPs in DNA methylation levels. We analyzed a subset of individuals diagnosed with eating disorders of Mexican ascendence from a previously published database [46]. The DNA methylation levels were determined using Illumina MethylationEPIC BeadChips (Illumina, USA). We calculated transand cis-meQTL with the -mtscore function of the KING software [47], and a p-value < 5.00 × 10 −8 was considered genome-wide statistically significant. We searched the associated SNPs with eQTL and blood-meQTL effect in the regulomeDB for enhancer or promoter marks [48].

Summary Data-Based Mendelian Randomization (SMR)
We applied summary data-based Mendelian randomization analysis (SMR) to identify DNA methylation levels associated with disordered eating by pleiotropy [49]. SMR was developed for quantitative expression loci and uses the principles of Mendelian Randomization (MR) to estimate the pleiotropic association between the level of gene expression and a phenotype. Zhu et al. established that if we denote z as a genetic variant (for example, SNP), x as the expression level of a gene, and y as the trait, then the two-step least-squares estimate of the effect of x on y from an MR analysis is: where b zy and b zx are the least-squares estimates of y and x on z, respectively, and b xy is interpreted as the effect size of x on y free of confounding from non-genetic factors. After calculating the sampling variance by the Delta method and replacing b zy and b zx by the estimates, we should have: where z zy and z zx are the z statistics from the GWAS and eQTL study, respectively. The previous method was developed by Zhu et al., and we used the SMR to integrate the GWAS and blood-meQTL summary statistics.
In the PheWAS, we identified the immunological (rs4626924 and rs10419198) and metabolic (rs3205060, rs8041059 and rs10419198) domain with SNPs associated. The SNPs associated with the immunological domains were both intronic, one located in the coding region of LOC107985364 and the other in RCN3. In the metabolic domain, we identified the rs8041059 located in the intron of LIPC and with previous associations with 33 different lipidic traits.
The blood-meQTLs analysis showed seven SNPs associated with DNA methylation levels of five CpG sites ( Table 3). The CpG sites map to four genes, including the rs10419198 related to DNA methylation levels in a CpG site annotated to the body of the PRR12. The blood-mQTLs correlated with five CpG sites, annotated to the gene body of SETBP1 (cg12522870) and PRR12 (cg06378142), 200 base pairs upstream of the transcription starting site (TSS200) of the non-coding LOC440704 (cg12412036 and cg09420738) and 1500 upstream of the SEMG1 (cg15921833). In the case of LOC440704, two different CpGs correlated with two SNPs; meanwhile, for the CpGs on SETBP1 and SEMG1, two SNPs connect with the same site. Note. SNP = single-nucleotide polymorphism, CpG = cytosine to guanine nucleotides, SE = standard error.
The multi-tissue eQTL analysis in the GTEx database revealed that the rs10419198 had an eQTL effect on different tissues (frontal cortex, anterior cingulate cortex, cerebellum, putamen, thyroid, visceral adipose tissue, etc.) (Figure 1). In the regulomeDB database, the rs10419198 showed a mark of enhancers in 51 different cell lines. The blood-meQTLs analysis showed seven SNPs associated with DNA methylation levels of five CpG sites ( Table 3). The CpG sites map to four genes, including the rs10419198 related to DNA methylation levels in a CpG site annotated to the body of the PRR12. The blood-mQTLs correlated with five CpG sites, annotated to the gene body of SETBP1 (cg12522870) and PRR12 (cg06378142), 200 base pairs upstream of the transcription starting site (TSS200) of the non-coding LOC440704 (cg12412036 and cg09420738) and 1500 upstream of the SEMG1 (cg15921833). In the case of LOC440704, two different CpGs correlated with two SNPs; meanwhile, for the CpGs on SETBP1 and SEMG1, two SNPs connect with the same site. The multi-tissue eQTL analysis in the GTEx database revealed that the rs10419198 had an eQTL effect on different tissues (frontal cortex, anterior cingulate cortex, cerebellum, putamen, thyroid, visceral adipose tissue, etc.) (Figure 1). In the regulomeDB database, the rs10419198 showed a mark of enhancers in 51 different cell lines. Of the previous seven SNPs that showed blood-meQTL significant results, we found that two had a pleiotropic effect with DNA methylation levels and disordered eating by SMR analysis ( Table 4). The CpG sites were the cg12522870 (SETBP1) and cg15921833 (SEMG1). Note. SNP = single-nucleotide polymorphism, CpG = cytosine to guanine nucleotides SE = standard error.

Discussion
Previous GWAS on ED has shown, mainly in AN, that genetic variants associated with this disorder could significantly impact metabolic pathways [31][32][33][34]. The genetic association found in our study also supports this finding on individuals with disordered eating behavior. In our study, three variants (rs3205060, rs8041059 and rs10419198) associated with disordered eating are previously associated at the genome-wide level with at least one metabolic trait. One associated variant was the rs8041059 which is found in the coding region of the hepatic triglyceride lipase (LIPC) and is associated with differences in high-density lipoproteins (HDL), cholesterol, and triglyceride plasmatic levels [35,36]. LIPC catalyzes the hydrolysis of triglycerides and phospholipids found in circulating lipoproteins [37][38][39]. Some authors hypothesized that this variant regulates triglyceride concentration, affecting the transcription factors' binding site in the promoter of LIPC. Still, the exact mechanism remains to be explored [40][41][42]. In addition, the association of this SNP could be important for further explorations in the mechanism behind the reported increased HDL levels in individuals diagnosed with AN [43,44].
The other associated variant, rs10419198, is annotated to the intron of RCN3, and could function as an enhancer for PPR12 [36]. Our blood-meQTL and in silico analysis of multi-tissue eQTL showed that this variant modulates the expression of the PRR12. PRR12 is the proline-rich 12 protein, also known as KIAA1205. The exact cellular function of these proteins remains unknown, but it is proposed that it has an essential role in brain development. The role of this gene on the brain is supported by some studies, where individuals with loss-of-function or structural variation on this gene had clinical syndromes characterized by neurodevelopmental and eye abnormalities [45][46][47]. Individuals carriers of high impact variants on PRR12 had neurodevelopmental disorders, for example, autism, intellectual disability, and attention-deficit/hyperactivity disorder (ADHD). ADHD is one of the most frequent comorbidities found in individuals with ED, and the presence of this clinical entity could increase the symptoms of disordered eating [48][49][50][51]. Further analysis of the function of PRR12 could help us understand this high comorbidity.
Environmental factors interact with genetics to the development of ED [48]. A known molecular mark that captures these environmental differences is DNA methylation [50]. DNA methylation is influenced by the environment and genetic, having differences based on the genotype of the individuals [51,52]. Then if the DNA methylation level affected the trait, we could see an effect of differences in the phenotype, known as pleiotropy [49,53]. We performed SMR to capture this in our analysis and found an association of disordered eating with CpG sites on SETBP1 and SEMG1. Mutations on SETBP1 cause the Schinzel Giedion syndrome, individuals affected had a severe intellectual disability. A recent study identified SETBP1 as a key epigenetic regulator of the gene network responsible for visceral organ and brain morphogenesis [54]. On the other hand, the SEMG1 could be a key modulator of energy metabolism on ED. SEMG1 is the semenogelin 1 gene, expressed mainly on the prostate and is suggested to be responsible for asthenozoospermia [55]. In addition, protein pull-down studies suggest that SEMG1 increases energy expenditure by interacting with enzymes responsible for energy regulation, such as lactate dehydrogenase A [56]. Nevertheless, further studies are required to clarify the molecular mechanism of these proteins in ED.
Our study is one of the first to explore genetic associations with ED in the Mexican population at the genome-wide level; nevertheless, we could state some limitations. The main is the small sample size, and also, we do not have a replication cohort. This reduced sample size reduces our statistical power. Another limitation could be the lack of metabolic parameters measured in the sample and that the population-based sample was not screened for disordered eating behavior. On the other hand, integrating different sources of information, the in silico eQTL and blood-meQTL strengthens it.

Conclusions
The present study suggests an association of the rs10419198 enhancer variant of the PRR12 supporting previous reports that have found that disordered eating could imply genetic associations with metabolism.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author, omitted due to privacy and ethical issues. The complete summary statistics would be made available as Supplementary Material.