High-Throughput Sequencing Haplotype Analysis Indicates in LRRK2 Gene a Potential Risk Factor for Endemic Parkinsonism in Southeastern Moravia, Czech Republic

Parkinson’s disease and parkinsonism are relatively common neurodegenerative disorders. This study aimed to assess potential genetic risk factors of haplotypes in genes associated with parkinsonism in a population in which endemic parkinsonism and atypical parkinsonism have recently been found. The genes ADH1C, EIF4G1, FBXO7, GBA, GIGYF2, HTRA2, LRRK2, MAPT, PARK2, PARK7, PINK1 PLA2G6, SNCA, UCHL1, and VPS35 were analyzed in 62 patients (P) and 69 age-matched controls from the researched area (C1). Variants were acquired by high-throughput sequencing using Ion Torrent workflow. As another set of controls, the whole genome sequencing data from 100 healthy non-related individuals from the Czech population were used (C2); the results were also compared with the Genome Project data (C3). We observed shared findings of four intron (rs11564187, rs36220738, rs200829235, and rs3789329) and one exon variant (rs33995883) in the LRRK2 gene in six patients. A comparison of the C1–C3 groups revealed significant differences in haplotype frequencies between ratio of 2.09 for C1, 1.65 for C2, and 6.3 for C3, and odds ratios of 13.15 for C1, 2.58 for C2, and 7.6 for C3 were estimated. The co-occurrence of five variants in the LRRK2 gene (very probably in haplotype) could be an important potential risk factor for the development of parkinsonism, even outside the recently described pedigrees in the researched area where endemic parkinsonism is present.


Introduction
Parkinson disease (PD) is a very frequent neurodegenerative disease which affects 1-2% of individuals in the population over 65 years of age and about 4% in the population over the age of 85 years [1,2].
The disorder is characteristic by the depletion of dopaminergic neurons in substantia nigra pars compacta in the midbrain and it is associated with the following most common Life 2022, 12, 121 2 of 10 symptoms: bradykinesia, muscle rigidity, and postural instability [3]. The major risk factors which contribute to PD development are age, environmental factors [4,5], and genetic background. The familial form of PD (about 5%) is caused by causal variant with high penetrance. The variants with minor effect and environmental factors could contribute to sporadic form [6].
The high prevalence of neurodegenerative parkinsonism has been described in some particular small, isolated, and endemic populations [30][31][32][33][34]. In such populations, higher contribution of genetic factors can be hypothesized.
In the Czech Republic, Mensikova et al. found a higher prevalence of parkinsonism in the southeastern Moravian region known as Hornacko [Upper Lands]; the disease showed a clear inheritance pattern, expressed in three large pedigrees [35,36]. Hornacko is predominantly hilly in the White Carpathian Mountains and borders Slovakia (formerly part of Hungary) and the Moravian Dolnacko (Lower Lands). The original inhabitants were attacked by Hungarian and Turkish raids in the 1600s and 1700s; in the mid-1800s, the region was colonized by newcomers from Silesia, western Slovakia, and Burgenland. These people formed a small isolated population with distinctive local traditions and religion.
In our epidemiological study, we managed to obtain DNA samples from patients and healthy controls from the region [37]. Thus, the opportunity of molecular-genetic analyses has arisen.
The aim of this study was to assess the co-occurrence of rare variants, possibly in haplotype, in the set of genes associated with parkinsonism (ADH1C, EIF4G1, FBXO7, GBA, GIGYF2, HTRA2, LRRK2, MAPT, PARK2, PARK7, PINK1, PLA2G6, SNCA, UCHL1, and VPS35) in the parkinsonian patients and controls from the Hornacko region [36][37][38]. To compare the population frequency of potential haplotype in patients, three different types of controls were used: (i) controls from the "Hornacko", (ii) healthy non-related individuals, and (iii) sequencing data from The 1000 Genome Project data.

Materials and Methods
The study was written approved by the ethics committee of the Palacky University and University Hospital Olomouc, Czech Republic (IRB number 22-16985S;78_21). All experiments were performed in accordance with guidelines and regulations released by Palacky University and University Hospital Olomouc, Czech Republic. The patients were informed in detail about the study and they all signed informed consent.
The study involved 62 parkinsonian patients and 69 age-matched healthy controls from the same region (C1). The controls cohort includes 42 women and 27 men and their average age was over 70 years old. Demographic data are shown in Table 1. The diagnosis of neurodegenerative parkinsonism was made at the Department of Neurology in University Hospital Olomouc, Czech Republic; the diagnostic process has been described in detail elsewhere [37,38]. The 69 age-matched healthy controls originating from and living in the Hornacko region were recruited through the general practitioner's offices. The DNA was isolated in all patients and controls from peripheral blood with an isolation kit (QIAamp DNA Mini Kit from Qiagen). High-throughput sequencing was performed using the Personal Genome Machine (PGM) from Ion Torrent technology in the genes ADH1C, ATP13A2, EIF4G1, FBXO7, GBA + GBAP1, GIGYF2, HTRA2, LRRK2, MAPT, PARK2, PARK7, PINK1, PLA2G6, SNCA, UCHL1, and VPS35; performed target sequencing was described in detail in our previous study [37].
All variants found were filtered out by a minor allele frequency (MAF) of 0.05. A matrix was created in which the reference allele is marked as 0, heterozygote as 1, and homozygote as 2.
Using Microsoft Excel software, the data were filtered with respect to two criteria: (1) Co-occurrence of two or more variants in a particular gene; (2) Variants occurring mostly in the parkinsonian patients.
The detected variants were then verified and confirmed by Sanger sequencing. SIFT, MutationTaster, DANN, and PolyPhen-2 prediction tools were used to evaluate missense variants. The PhyloP algorithm was used to assess the phylogenetic conservation. Alternative splicing analysis was performed using NetGene2 and Human Splicing Finder 3.1 for the intron variants.
To evaluate (and exclude) possible consanguinity, the HVR1 mitochondrial region in six random patients and 15 Y-STR markers in five random male patients were analyzed using the AmpFLSTR™ Yfiler™ PCR Amplification Kit from Thermo Fisher Scientific.
To get a larger set of controls (C2), we used whole genome sequencing (WGS) data in 100 healthy nonrelated individuals from the Czech population (100CZG), originating from the Institute of Biology and Medical Genetics, Charles University, Prague, Czech Republic. The healthy individuals ranged in age from 23 to 60 years old. Individual DNA was used for WGS on NovaSeq 6000 (Illumina). The FASTQ data were processed by the generic data pre-processing pipeline published by the Broad Institute (https://github.com/ gatk-workflows/gatk4-data-processing, acceesed on 21 December 2021). This pipeline aligns FASTQ data to hg38 genome build, performs base recalibration, and produces analysis-ready BAM files.
As another representative set of controls (C3), allelic frequency data from 2516 samples from the 1000 Genome Project (1000GP) were used for population-genetic comparison.

Statistical Assessment
The demographic and clinical characteristics of the patients were presented as mean values ± standard deviation Fisher's exact test or chi-square test was used to assess statistical differences in haplotype frequency, where appropriate. Relative risk (RR) and odds ratio (OR) were calculated. Differences were considered significant at a p < 0.05. All p-values were obtained from two-tailed tests, and all the analyses were performed in the MATLAB environment, MATLAB Version 7.5.0.342 (R2007b), Statistics Toolbox, Version 6.1.
Patient clinical data are described in Table 3 in more detail. Possible patient consanguinity was excluded using HVR1 and Y-STR markers (Supplementary Tables S1 and S2)   The exon variant rs33995883 was benign according to SIFT; Polyphen-2 evaluated it as pathogenic, as did MutationTaster and DANN. Three intron variants (rs11564187, rs36220738, and rs3789329) do not affect alternative splicing according to NetGene2; deleting rs200829235 resulted in a possible break of the splice site. Substitution rs33995883 creates the acceptor of the splice site (according to NetGene2).
The intron variant rs11564187 led to the creation of a new donor site (based on HSF Matrices), according to Human Splicing Finder 3.1. A difference between mutant and reference sequences was not predicted for variant rs36220738. The variant rs200829235 resulted in the activation of an exonic cryptic donor site and the potential alteration of mRNA splicing (based on HSF Matrices). The variant rs33995883 led to the activation of an exonic cryptic donor site and the potential alteration of splicing (based on HSF Matrices). No significant splicing motif alteration was detected in rs3789329.
A difference between haplotype frequency between the patients and controls were found (C1, p = 0.0031, RR 2.09, OR 13.15 Table S3; C2, p = 0.11, RR 1.65, OR 2.58, Table S4; C3, p < 0.0001, RR 6.3, OR 7.6, Table S5); however, the difference in haplotype frequency between the patients and C2 reached just a borderline statistical significance. Relative risks and odd ratios are summarized in Table S6.

Discussion
Many key factors have been identified in the past, which are putatively important in the development of neurodegenerative parkinsonism, including genetic predisposition, environmental risk factors, and lifestyle. In the term of genetics, rare variants in the LRRK2 gene are the most common cause of hereditary parkinsonism. The LRRK2 gene is associated with autosomal-dominant, late-onset familial "Parkinson's disease" (PD). However, this hereditary disease may lack a seminal pathological hallmark of PD-the Lewy-body pathology-so the term "Parkinson's disease" in the disease designation should probably be avoided [12,[39][40][41][42][43].
The LRRK2 gene is located in the 12q12 chromosome. The LRRK2 gene contains 56 exons (https://www.ncbi.nlm.nih.gov/gene/120892 (accessed on 21 December 2021)). LRRK2 is a member of the leucine-rich repeat kinase family with kinase and GTPase activities, and it encodes multiple domain proteins. The gene influences mitochondrial functions, autophagy [44], and signaling transmission [45]. LRRK2 is expressed in neuronal cells as well as in immune system cells [46,47]. Among all the variants in the LRRK2 gene, substitution p.G2019S is the most common variant in PD patients (1% of sporadic PD and 4% of familial PD) [11,48] and its frequency is more common in North African and Ashkenazi Jewish patients [49,50]. Another variant in the LRRK2 gene, p.N2081D, was described as a risk factor for Crohn's disease [51]. The presence of multiple minor variants (p.S16478, p.R1628P, and p.G2385R) in the LRRK2 gene is usually correlated with a lower age at the Parkinson s disease onset-52.6 (±12.3) years-as compared to patients without these risk variants-62.5 years (±10.5) [52]. A protective effect of the LRRK2 variants p.N551K, p.R1398H, and p.K1423K was proposed [53].
Paisán-Ruiz, et al. (2009) performed a genome-wide association analysis focused on low frequency alleles that were found together in the LRRK2 gene; the study identified 84 single nucleotide polymorphisms (SNP) for 49 linkage disequilibrium (LD) blocks. These loci were in strong LD [54]. In our study, we have found some of the same LRRK2 variants as Paisán-Ruiz et al. (2009b) (rs11564187, rs33995883, and rs3789329) [55], but they all were assigned to another block.
Haplotypes associated with PD were studied in the MAPT gene. The study by Setó-Salvia (2011) reported a higher occurrence of the H1 haplotype (G3010A and A2706G or C7028T) in patients than in controls (p = 0.001) [56]. The rare sub-haplotype of the H1 haplotype, H1p, was more common in patients with another type of Lewy body disease, Parkinson's disease dementia-PDD [11]. Our previous studies did not find any pathogenic "founder" mutation in the researched population of Hornacko; however, we have found variants with possible smaller effects (MAPT, HTRA2, and LRRK2) [36,37]. When assessing the degree of effect of a given genotype, it is also very important to keep in mind the common occurrence of variants in a particular haplotype.
In this study, in the LRRK2 gene, the occurrence of intron variants almost exclusively together with the rarest variant (p.Asn2018Asp)-cis binding was detected (based on 1000GP data and 100CZG data). We analyzed the presence of common 20 SNPs in the patients and three randomly selected control groups. In the patients, it was possible to assign identical haplotypes in all 20 SNPs; in the controls, the haplotype differentiated the controls from the patients and even from each other (Table S7). Therefore, we strongly suspect that the variants found are in the haplotype. The exon variant p.Asn2018Asp is located in the kinase domain, which is essential for LRRK2-induced neuronal toxicity [13,57,58].
Furthermore, we also looked for the co-occurrence of the haplotype in another population of 1000GP (The 1000 Genomes Project Consortium, 2015) [59]. The frequencies of the haplotype in different populations are compared in Table 4. Most of the populations with the haplotype come from Europe or have European ancestors: Tuscans in Italy, Iberians in Spain, Finnish British in England and Scotland, Utah residents with Northern and Western European ancestry, Colombians in Medellin, and Gujarati from India. It could be explained by historical positive selection pressure (physical fitness at a younger age, resistance to climate change and infectious diseases, etc.). Because of the later onset of the disease, the risk factor for PD did not have to be manifested in everybody and the haplotype could, thus, remain conserved.
According to Greggio and Cookson (2009), missense variant p. Gly2019Ser, located in the kinase domain, leads to the increase of kinase activity and contributes to PD pathogenesis [60]. Inhibition of kinase activity may be considered as a potential therapeutic target [57,61]. Abnormal kinase activity of LRRK2 may result in abnormal phosphorylation of proteins (SNCA and MAPT) and it leads to aggregation and cell death [12]. We can assume that the variant has "some", undoubtedly rather weak, impact on the development of neurodegenerative parkinsonism. Whether this is also true in the Hornacko population remains to be assessed in further studies involving all inhabitants of the researched area, i.e., almost 9000 people.

Conclusions
The co-occurrence of five variants in LRRK2 gene was more frequent in patients with parkinsonism in Hornacko region. Intron variants occur almost exclusively with the rarest exon variant. LRRK2 gene rate variants in haplotype might be a potential risk factor for endemic parkinsonism.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/life12010121/s1. Table S1. Verification of non-consanguinity: variants in HVR1 region in six random patients; Table S2. Verification of non-consanguinity: Y-STR marker region of six random patients; Table S3. Fisher's exact factorial test (C1); Table S4. Fisher's exact factorial test (C2); Table S5. Contingency table of chi-square test (C3); Table S6. Risk ratio (RR) and odds ratio (OR) calculation; Table S7. Presence of common LRRK2 SNPs found in the patients and 3 randomly selected controls. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets presented in this study can be found in the online repository https://www.ncbi.nlm.nih.gov/bioproject/PRJNA682520/ (accessed on 21 December 2021).

Conflicts of Interest:
The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.