Genetic effects on educational attainment in Hungary

Abstract Introduction Educational attainment is a substantially heritable trait, and it has recently been linked to specific genetic variants by genome‐wide association studies (GWASs). However, the effects of such genetic variants are expected to vary across environments, including countries and historical eras. Methods We used polygenic scores (PGSs) to assess molecular genetic effects on educational attainment in Hungary, a country in the Central Eastern European region where behavioral genetic studies are in general scarce and molecular genetic studies of educational attainment have not been previously published. Results We found that the PGS is significantly associated with the attainment of a college degree as well as the number of years in education in a sample of Hungarian study participants (N = 829). PGS effect sizes were not significantly different when compared to an English (N = 976) comparison sample with identical measurement protocols. In line with previous Estonian findings, we found higher PGS effect sizes in Hungarian, but not in English participants who attended higher education after the fall of Communism, although we lacked statistical power for this effect to reach significance. Discussion Our results provide evidence that polygenic scores for educational attainment have predictive value in culturally diverse European populations.


INTRODUCTION
Educational attainment is a key psychological and sociological variable, which comprises an important part of socioeconomic status and which is positively correlated with income and health, but negatively with crime and welfare dependency (Behrman et al., 1997). Educational attainment is moderately heritable, with a substantial shared environmental component (Branigan et al., 2013) and it shares substantial, but not all genetic variance with cognitive abilities (Krapohl et al., 2014).
Early reports on the heritability of educational attainment were derived from family pedigree studies, most notably twin studies (Cesarini & Visscher, 2017). Recently, however, the heritability of educational attainment was confirmed with molecular genetic methods. Single nucleotide polymorphism (SNP) heritability studies Hill et al., 2016) confirmed that genetic similarity between non-related individuals is positively associated with the phenotypic similarity of their educational attainment, with common genotyped SNPs accounting for up to 20% of the total variance. Over the past 10 years, a series of genome-wide association (GWA) studies using a constantly expanding international study sample have been performed within the framework of the Social Science Genetic Association Consortium (SSGAC), linking specific genetic variants to educational attainment (Lee et al., 2018;Okbay et al., 2016;Rietveld et al., 2013). Polygenic scores (PGSs) based on GWAS results (referred to as EA1-3 PGSs depending on which of the SSGAC GWAS results were used to construct them) confirmed the predictive value of these genetic variants (also termed PGS heritability), which typically account for up to 10% of the phenotypic variance in educational attainment itself (Allegrini et al., 2019;Domingue et al., 2015), cognitive abilities (Allegrini et al., 2019;de Zeeuw et al., 2014;Selzam et al., 2016), social mobility (Ayorech et al., 2017), and overall socioeconomic success (Belsky et al., 2016 in independent samples. The predictive performance of education attainment PGSs has been demonstrated among others in samples of Icelanders , Estonians (Rimfeld et al., 2018), and African Americans Lee et al., 2018;Rabinowitz et al., 2019).
However, neither the pedigree-based or SNP heritability of educational attainment nor the correlation of polygenic scores with socioeconomic phenotypes is a biological constant. There is evidence that between-country differences (Lee et al., 2018;Silventoinen et al., 2020) and within-country changes in education policy (Heath et al., 1985), as well as the attendance of different types of schools (Trejo et al., 2018) may affect the heritability of educational attainment (geneenvironment interaction). In other words, the relative importance of genetic and environmental effects on individual differences in educational attainment is affected by the characteristics of the environment.
It has been argued Hauser, 2002;Nielsen, 2008) that a high heritability of educational attainment is a sign of a meritocratic educational system, because attainment is determined by innate abilities and preferences instead of shared environmental effects such as social class or parental income. The social changes due to the Fall of Communism (FoC) in the former Eastern Bloc may have had a particular effect on educational meritocracy. In line with this hypothesis, a recent Estonian study (Rimfeld et al., 2018) found that the SNP and PGS heritability of educational attainment was higher in Estonians who attended school after FoC, suggesting that the educational system in Estonia has become more meritocratic. In line with this observation, pedigree-based studies conducted in countries with higher social mobility generally also show higher heritability (Engzell & Tropf, 2019).
Because of the moderating and mediating effects of environmental variables, the strength of genetic effects on educational attainment may be different across countries. In the present study, we investigated molecular genetic effects on educational attainment in Hungary, a country where no similar study has previously been published. The main purpose of the current study was to establish the presence and magnitude of the predictive performance of the latest educational attainment PGS in a Hungarian subsample. While the Hungarian population is not substantially genetically different from that of other European countries (Heath et al., 2008), the country is characterized by lower GDP, income, and according to some indicators, lower social mobility (Eurofound, 2017) compared to Western European countries where PGSs have been extensively used in research. Notably, the country transitioned from a planned economy to a market economy only about 15 years before our data was collected. These characteristics of Hungarian society and economy render it an interesting question to what extent the molecular genetic indicators discovered in other countries predict educational attainment, a key element of social and economic success, in Hungary.
As auxiliary analyses, we also estimated cohort differences in PGS heritability as well as overall SNP heritability. These are of interest because no data on these metrics is available from Hungary and we are unaware of any ongoing research to calculate these estimates from larger samples. We caution, however, that our study, while well powered for its main purpose, has limited statistical power to provide precise estimates of these latter effects.
Throughout the paper, we use the term "genetic effects" because the route of causation in this case can only go in one direction, from the genotype to the phenotype. However, we note that nominally genetic effects can be indirect and environmentally moderated (Young, 2019) in practice indexing environmental effects (see also Section 4).

MATERIAL AND METHODS
We used genetic data and self-reported level of education collected in the NewMood study (New Molecules in Mood Disorders, Sixth Framework Program of the European Union, LSHM-CT-2004-503474) to validate the EA3 polygenic score (Lee et al., 2018) in Hungarian participants (Budapest sample, N = 829). We used data from English participants from NewMood (Manchester sample, N = 976) to provide a comparison group with an identical phenotypic and genotypic data collection regimen. Participants of 18-60 years of age were recruited through advertisements, general practices, and a website. Full details of the recruitment strategy and criteria have been published previously (Juhasz et al., 2009(Juhasz et al., , 2011Lazary et al., 2008

Educational attainment
Participants filled out close-ended questions about whether they attained certain educational levels. These levels were "No qualification," "O-levels," "A-levels," "Degree," "Professional qualification," and "Other (please specify)." In the Hungarian version of the questionnaire, British educational levels were translated as their Hungarian counterparts (O-levels as "szakmunkásképző," vocational education; A-levels as "érettségi," high school diploma; professional qualification as "szakvizsga," a vocational or specialist qualification). If a participant gave a response about an "Other" qualification, the participant was prompted to provide further detail about his/her qualification and an educational level was assigned based on this information.
We used these self-reported educational attainment levels to create two educational attainment phenotypes. First, we coded whether each participant attained a tertiary degree (college completion). The choice of a simple binary phenotype was justified by the fact that most of the educational attainment variance was between tertiary degrees or the absence of them (Table 1). Second, we converted the self-reported educational attainment levels to years of completed education as an interval variable (years in education). Years in education was imputed as the number of years in educational typically necessary to obtain the individual's qualification in the Hungarian system: 8 years for "no qualification," 11 years for a professional vocational education ("szakvizsga"), 12 years for both a vocational and a standard high

Age groups
We aimed to investigate whether the strength of genetic effects on educational attainment varied as a function of graduation cohort. First, cohorts were separated based on whether participants graduated from high school before or after FoC, a possible moderator of the relative strength of genetic effects at least in the Budapest sample (Rimfeld et al., 2018). A third category for very young participants (age <24 years) was split off from the PostC cohort because these participants were likely not to have completed their tertiary education regardless of their genetic endowment (see low educational attainment variability in Table 1) which would exert a downward bias on PGS heritability.
For ethical reasons, we did not store data about the birth year of our participants or exactly when they were interviewed. However, given that data collection was performed in 2004 and 2005, we can estimate the birth year of each participant within 1-year margin based on self-reported age at data collection. As described above, we divided our participants in three age groups based on their age at FoC and FoC in Hungary). Alternative cohort cut-off points were also explored (see Section 3).
We provide detailed statistics about the sample sizes, ages, and educational attainments of these groups in Table 1. We hypothesized that the predictive performance of the EA3 polygenic score will be different in these age groups in Hungary, but not in England, due to a geneenvironment interaction induced by the historical political changes in Hungary and their effects on the educational system (Rimfeld et al., 2018).
Note that the "All participants" columns under "Education" also contain participants with no age data, who were consequently not assigned to either age group (N Budapest = 54, N Manchester = 2). For the same reason, counts in these columns are not equal to the sum of the age group columns and the total count of "All participants" is different for the "Education" and "Age" panels.

Genotyping
Genomic DNA was extracted from buccal swabs collected by a cytol- "info" less than 0.5 or "certainty" less than 0.7 were excluded. After that, variants and participants were filtered separately for each of the

Statistical analysis
We  Figure S1. All other calculated alternative PGSs were also significantly associated with educational level (p max = 2.2 × 10 -6 ). In total, we calculated 1340 PGSs with various p-value inclusion thresholds, the mean correlation between all possible pairs of these was r = 0.902 (SD = 0.08).
We estimated PGS effect sizes as the point biserial (college completion) or Pearson point-moment (years in education) correlation between the educational attainment phenotype and the PGS. We ran additional multivariate models controlling for the effects of age, sex, the first 10 genomic principal components and self-reported psychiatric or pain-related diagnoses. These were operationalized as generalized linear models with the fitglm() function in MATLAB 2018a specifying a binomial distribution and logit link (logistic regression for college completion) or a normal distribution and an identity link (linear regression for years in education). For logistic regressions, the PGS effect size was expressed as the Nagelkerke R 2 statistic, while for linear regressions it was expressed as multiple regression coefficients.
Because the variance of years in education was not equal in all subsamples, we corrected correlation coefficients for restriction of range using the formula by Schmidt and Hunter (2014) using the total Manchester sample as the reference.
We ran both GCTA and PGS analyses both with and without controlling for the first 10 genomic principal components.

PGS effects
The EA3  In multivariate models, we F I G U R E 1 PGSs (shown as z-scores) by country and educational levels. PostC and PreC indicate age groups, see Table 1 for details and definitions of educational levels. "All participants" includes participants with no age data. Whiskers indicate 95% confidence intervals (CIs) of the mean, overplotted with raw data. Note that some Budapest groups were represented by a single participant which did not permit the estimation of CIs and instead only the value is shown explored the effect age, sex, the first 10 genomic PCs and self-reported illness (depression, suicidal attempt, manic disorder, anxiety disorder, obsessive-compulsive disorder, schizophrenia, eating disorder, drug, or alcohol-related disorder and/or pain-related problems) on the PGSphenotype association (Tables S1 and S2). The inclusion of these covariates (especially age, sex, and genomic PCs) caused little change in effect sizes. Using ISCED-derived years in education instead of the original records also did not substantially affect effect sizes (across all models  Individuals with professional educations had somewhat higher means. We note, however, that the low number of participants with low educational attainments led to less precision in estimating mean PGSs. We next estimated whether the PGS-phenotype associations were different by age group. We excluded the youngest participants (age <24) from these analyses because of the low variability of educational attainment in this subgroup (see Table 1). With the continuous phenotype (years in education) as the dependent variable, the PGS*age group interaction was not statistically significant in either the

F I G U R E 2
Associations between the best-fit EA3 polygenic score and educational attainment by sample and age group. For college completion, the effect size is a point-biserial correlation and for years in education, the effect size is a Pearson correlation coefficient. Error bars show 95% CIs. "Restriction corrected" refers to a PGS-phenotype correlation corrected for restriction of range. PostC: participants at most 16 years old at FoC and at least 24 years old during data collection. PreC: participants at least 16 years old at FoC. "All over 24 years" refers to pooled PostC and PreC subsamples. "All" also includes participants younger than 24 years old at data collection and those with no age data.

SNP heritability
Genomic-relationship-matrix restricted maximum likelihood (GREML-GCTA) SNP heritabilities indicated that in the Budapest sample, all common SNPs accounted for 34.4% (SE = 24%, p = .06) of the variance of years in education. In the Manchester sample, the same SNP heritability was 20.5% (SE = 20%, p = .13). Controlling for the first 10 genomic PCs, the values were h 2 SNP = 42.6% (Budapest, SE = 24.6%, p = .03) and h 2 SNP = 20.2% (Manchester, p = .15). In case of the college completion binary outcome, all common SNPs accounted for 52% (SE = 24%, p = .01) of the variance in the Budapest sample (53%, SE = 24.3%, p = .01 controlling for genomic PCs) and 40% (SE = 20.3%, p = .02) in the Manchester sample (36%, SE = 20.8%, p = .03 controlling for genomic PCs). Note that these estimates had wide confidence intervals due to the limited sample size. However, as our sample sizes were below the several thousand individuals usually recommended for this type of analysis (Knopik et al., 2016), and because GCTA models failed to properly converge when we further restricted samples to single age groups due to very low sample sizes, we did not perform SNP heritability analyses within these separately.

DISCUSSION
Ours is the first study to estimate molecular genetic effects on educational attainment in Hungary, and the second to do so in a former Warsaw Pact country. We are also unaware of any other behavior genetic study about educational attainment or cognitive functions in Hungarians, except from some data from the Hungarian Twin Registry published in a recent meta-analysis (Silventoinen et al., 2020). The main goal of our study was to demonstrate the predictive performance of the EA3 PGS in a novel country. Despite its limited size, our sample was well-powered for this purpose.
In our main analysis, we found that-in line with international results-the genetic variants discovered by a recent GWAS to predict educational attainment in Western European and American validation samples also do so in Hungary.
We compared findings in Hungarians to analogous results from an English comparison sample with identical recruitment protocols and phenotypic measurements. Once restriction of range was corrected for either statistically or by excluding very young participants presumably still in education, PGS effect sizes were not significantly lower in the Budapest sample.
An exact comparison of our PGS effect sizes with other studies is not feasible due to between-study differences in genotyping, polygenic score construction (such as differences in the source GWAS and the selection of p-value and MAF thresholds) and phenotype quality (including the specific phenotype used and its variance). However, we note that that the effect sizes in the Manchester sample were in line with those reported by independent studies using PGSs based on the same GWAS with more representative British and American datasets with higher quality phenotypes (Allegrini et al., 2019;Lee et al., 2018) including educational attainment and cognitive performance. The effect sizes in the Budapest sample were generally not substantially weaker than this. In sum, the relative strength of genetic effects in our Budapest sample were in line with those reported from other countries.
Our first auxiliary analysis aimed to replicate previous Estonian findings about the larger relative role of genetic effects after FoC.
Because this previous finding established a clear prior hypothesis about the presence and the direction of this effect, we attempted these auxiliary analyses despite limitations in statistical power. Our replication was only partially successful. While we found substantially higher effect sizes for educational attainment phenotypes in the FoC subsamples in the Budapest, but not the Manchester sample, these differences did not reach statistical significance. Changes in educational policy surrounding FoC were similar in Hungary to Estonia (Hrubos et al., 2016;Ladányi, 1995). The previous Estonian study on the same effects (Rimfeld et al., 2018) invokes increases in educational meritocracy-first of all, the abandonment of political considerations in university admissions-as the chief driver of increased PGS effect sizes after FoC. Our results do not exclude the possibility of a similar change taking place in Hungary, but better powered genetic databases will be required for a conclusive replication.
Age, cohort, or country differences in the heritability of social traits can reflect mechanisms other than genuine historical societal differences. Differences in sampling bias is an especially strong candidate mechanism of creating spurious heritability differences. We were able to account for two sources of sampling bias: educational attainment variance and psychiatric illness. If study participants are recruited from a narrower educational attainment range in certain subgroups, heritability in that age group is biased downward. We eliminated this bias by controlling for restriction of range. If psychiatric disorders affect heritability and individuals in psychiatric disorders are oversampled in certain subgroups, estimated heritability in that subgroup will also be affected. We demonstrated that self-reported psychiatric illness does not affect SNP heritability in the NewMood sample, therefore, subgroup differences in psychiatric illness are unlikely explanations of SNP heritability differences. We emphasize that we were unable to account for all possible sources of sampling bias (and other biases), warranting further caution about the results. We note that although due to limitations in statistical power, we limited our auxiliary analyses to replication instead of discovery and thus we were mainly interested in age group differences in the Budapest sample, the PreC-PostC difference in the Manchester sample was even larger (with an opposite sign), even though no major historical change took place in England at the time of FoC in Central Europe.
Our second auxiliary analysis estimating SNP heritability in the Budapest subsample suggests that in line with recently published pedigree-based results (Silventoinen et al., 2020) a substantial propor-tion of educational attainment variance is accounted for by common genetic variants in Hungarians, but once again our estimates are imprecise due to limited power and require replication in a larger sample.
Our work suffers from a number of limitations. The largest of these is the modest size of our sample, which allowed us to conclusively demonstrate the association of the PGS polygenic score with actual educational attainment in Hungarians, but limited statistical power to detect age and country effects on SNP heritability (Table   S3) Third, a general limitation to between-family molecular genetic studies is that they may reveal shared environmental instead of true, biological genetic effects through gene-environment correlation (Young, 2019). On the one hand, SNPs used to construct PGSs or the relatedness matrix for GCTA may be associated with educational attainment because they index membership in families which influence educational attainment through cultural rather than genetic effects (residual stratification or "dynastic" effects (Morris et al., 2019)). On the other hand, SNPs may have a causal effect on parental phenotypes, which in turn influence offspring educational attainment (genetic nurture ). While both effects are known to operate and inflate PGS effect sizes in studies of unrelated individuals Young, 2019;Young et al., 2018), predictive performance in within-family studies Domingue et al., 2015;Selzam et al., 2019) demonstrates that a substantial portion of PGS

CONCLUSION
In sum, our work demonstrates that genetic variants discovered in international GWAS samples also predict educational attainment in Hungary with equal or only slightly reduced strength relative to an English sample. In line with Estonian data, individual genetic differences played a somewhat larger role shaping educational attainment in those graduating after the fall of Communism, but due to limitations in statistical power a more conclusive replication of this effect is needed.
Similar findings from Hungary had not been previously available, and the results are likely of interest to those studying the society of Hungary and may serve as a model for other countries of the region without their own genetic studies.

CONFLICT OF INTEREST
Bill Deakin has share options in P1vital. He has also performed speaking engagements, research, and consultancy for AstraZeneca, Autifony, Bristol-Myers Squibb, Eli Lilly, Janssen-Cilag, P1vital, Schering Plough, and Servier (all fees paid to the University of Manchester to reimburse them for the time taken). All the other authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
An anonymized dataset containing PGSs, city, age, sex, and educational attainment data is available at: https://osf.io/dfg38/.