Genome-wide Association Study of Biochemical Traits in Korčula Island, Croatia
Abstract
Aim
To identify genetic variants underlying biochemical traits – total cholesterol, low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides, uric acid, albumin, and fibrinogen, in a genome-wide association study in an isolated population where rare variants of larger effect may be more easily identified.
Methods
The study included 944 adult inhabitants of the island of Korčula, as a part of a larger DNA-based genetic epidemiological study in 2007. Biochemical measurements were performed in a single laboratory with stringent internal and external quality control procedures. Examinees were genotyped using Human Hap370CNV chip by Illumina, with a genome-wide scan containing 346 027 single nucleotide polymorphisms (SNP).
Results
A total of 31 SNPs were associated with 7 investigated traits at the level of P < 1.00 × 10−5. Nine of SNPs implicated the role of SLC2A9 in uric acid regulation (P = 4.10 × 10−6-2.58 × 10−12), as previously found in other populations. All 22 remaining associations fell into the P = 1.00 × 10−5-1.00 × 10−6 significance range. One of them replicated the association between cholesteryl ester transfer protein (CETP) and HDL, and 7 associations were more than 100 kilobases away from the closest known gene. Nearby SNPs, rs4767631 and rs10444502, in gene kinase suppressor of ras 2 (KSR2) on chromosome 12 were associated with LDL cholesterol levels, and rs10444502 in the same gene with total cholesterol levels. Similarly, rs2839619 in gene PBX/knotted 1 homeobox 1 (PKNOX1) on chromosome 21 was associated with total and LDL cholesterol levels. The remaining 9 findings implied possible associations between phosphatidylethanolamine N-methyltransferase (PEMT) gene and total cholesterol; USP46, RAP1GDS1, and ZCCHC16 genes and triglycerides; BCAT1 and SLC14A2 genes and albumin; and NR3C2, GRIK2, and PCSK2 genes and fibrinogen.
Conclusion
Although this study was underpowered for most of the reported associations to reach formal threshold of genome-wide significance under the assumption of independent multiple testing, replications of previous findings and consistency of association between the identified variants and more than one studied trait make such findings interesting for further functional follow-up studies. Changed allele frequencies in isolate population may contribute to identifying variants that would not be easily identified in much larger samples in outbred populations.
It is known that human lipid levels in circulation show considerable heritability (1-3). Recently, genome-wide association studies proved their value as a reliable tool for identifying genetic variants underlying complex human diseases and traits (4,5). The first applications of genome-wide association studies to identify genetic variants underlying human lipid levels yielded a total of 19 loci controlling serum high-density lipoprotein cholesterol (HDL), low-density lipoprotein cholesterol (LDL), and triglycerides (6-10) and implicated a number of potential genes and 3 genetic regions with multiple associated genes. Many of these genes have already been known as very likely candidates for lipid level regulation (11,12). The most recent study also found that ATP-binding cassette gene (ABCG5) predicted triglycerides and LDL; transmembrane protein 57 gene (TMEM57) predicted triglycerides; regulator protein with 11 highly conserved zinc finger-protein arginine methyltransferase 8 regions (CTCF-PRMT8 region) predicted HDL; dynein, axonemal, heavy chain 11 (DNAH11) predicted LDL; fatty acid desaturase 3 – fatty acid desaturase 2 gene (FADS3-FADS2) predicted triglycerides and LDL; and MAP-kinase activating death domain-folate hydrolase 1 gene (MADD-FOLH1 region) predicted HDL (13). However, all these loci combined can still not explain a substantial share of noted heritabilities, leaving room for further findings of genetic and epigenetic factors that control serum lipid levels in humans (14). Those are likely to include structural variants, such as copy number variants, and also very rare genetic variants that may only be identified in special populations (4,5).
Apart from the lipid levels, other clinically relevant biochemical measurements in human serum and/or plasma that are usually studied include uric acid, albumin, and fibrinogen. Recently, several studies have identified and replicated the finding of a novel major uric acid transporter – solute carrier family 2 (facilitated glucose transporter) member 9 – SLC2A9 (GLUT9), which is several times more functionally significant than urate transporter 1 – URAT1, a gene previously thought to play a major role in the regulation of uric acid levels (15-18). A recent study with increased power has identified further 2 genes that also contribute to uric acid regulation: ATP-binding cassette, sub-family G (WHITE) member 2 – ABCG2 and solute carrier family 17 member 3 – SLC17A3 (19). Unlike for uric acid, we could not find any genome-wide association studies that searched specifically for loci influencing the levels of human albumin in plasma. Some genome-wide association studies investigating biochemical traits also included searching for fibrinogen loci, but none of them reached the required threshold for genome-wide significance (20).
We performed a comprehensive genome-wide association studies of total cholesterol, LDL and HDL cholesterol, triglycerides, uric acid, albumin, and fibrinogen in the isolated population of the island of Korčula in Croatia. The study is a part of genetic epidemiology research program in Croatian island isolates, “10,001 Dalmatians.” The genetic epidemiology research program in Croatian island isolates began in 1999 (21), expanded to study human genetic variation and effects of isolation and inbreeding (22-34), and then broadened its focus to include diseases and gene mapping studies (18,35-41). By now, the research project has included more than 3000 examinees from isolated populations, while it aims to eventually reach 10 001 examinees.
Materials and methods
This study was performed in the eastern part of the island of Korčula, Croatia between March and December 2007. Healthy volunteers aged 18 and over from the town of Korčula and villages Lumbarda, Žrnovo, and Račišće (Figure 1) were invited to the study through mail, posters, radio, and personal contacts.
The study was approved by the Ethical Committee of the Medical School, University of Zagreb, and all examinees signed an informed consent. There was a total of 969 examinees included who had a number of quantitative phenotypic traits measured.
Biochemical analyses of 7 traits were performed on blood samples taken from the examinees after overnight fasting (collected between 8:30 and 9:30 am). We collected 9 mL clotted serum-gel sample for serum biochemistry, 2 × 10mL EDTA samples for DNA extraction or liquid nitrogen storage of aliquots for future transformation, and a 10 mL citrate sample for clotting factor analysis (S-Monovette, Sarstedt, Leicester, UK). Plasma and serum were rapidly frozen and stored at -80°C in 500 μL aliquots using standardized sample handling procedures. They were then transported frozen, within a maximum of 3 days, to a single biochemical laboratory (Salzer, Zagreb, Croatia). The laboratory was chosen as it was internationally accredited for performing this type of analysis and included in internal quality assessment by Roche and Olympus, as well as in external monitoring programs by Croatian reference center for biochemical measurements and RIQAS international agency for quality control (42). Methods of biochemical analyses performed to measure the levels of total cholesterol, LDL and HDL cholesterol, triglycerides, uric acid, albumin, and fibrinogen followed standard internationally agreed procedures and they were described in detail in previous articles (43). DNA extraction was performed using Nucleon kits (Tepnel, Manchester, UK), and 944 were genotyped at the GSF Research Center for Environment and Health in Munich, Germany. Genotyping was performed using Illumina HumanHap 370CNV (Illumina, San Diego, CA, USA), with a total of 346 027 SNP markers. Quality control of the genotype data, excluding markers with a call rate <98%, or minor allele frequency <2%, or out of Hardy-Weinberg equilibrium (P < 1E-10) left 316 730 SNPs in the analysis. Removal of individuals with a call rate of <97% left 898 people for inclusion in the analysis. Genome-wide associations between anthropometric phenotypes (adjusted for age and sex) and SNP markers were analyzed by the “mmscore” function of the GenABEL R statistical package (44), using an additive model. This score test for family based association accounts for pedigree structure and allows unbiased estimations of SNP allelic effect (45). No correction for multiple testing was applied. The relationship matrix used in this analysis was generated by the “identity-by-state” function of GenABEL which used identity-by-state (ibs) genotype sharing to determine the realized pair-wise kinship coefficient similarly to the PLINK genome function. All identified SNPs that reached significance or seemed to be suggestive were visualized using Haploview software, which was also used for calculation of linkage disequilibrium measures. The methods of genotyping and statistical genetic analyses necessary to perform a GWAS of a quantitative biological trait in an isolate population where familial relationships need to be taken into account were described in great detail in our previous publications (18). Pearson correlation coefficient was used in the correlation analysis performed in SPSS, version 13 (SPSS Inc, Chicago, IL, USA).
Results
A total of 31 SNPs were associated with the investigated traits at the level of P < 1.0 × 10−5 (Table 1). SNP rs10444502 in kinase suppressor of ras 2 gene (KSR2) located on chromosome 12 was associated with total cholesterol levels (P = 6.1 × 10−6), while two nearby SNPs on the chromosome 12 (rs1493762 and rs10777332), each of them slightly more than 100 kilobases from KSR2, showed similar level of significance of association (Table 1, Figure 2). These two SNPs were in complete linkage disequilibrium (r2 = 1, D' = 1). This finding is particularly interesting because two other SNPs (rs4767631 and rs10444502) in the same gene (KSR2), located close to each other on chromosome 12, were associated with LDL cholesterol levels (Table 1, Figure 3). rs4767631 and rs10444502 are in weak linkage disequilibrium (r2 = 0.426, D' = 0.766).
Table 1
A summary of single-nucleotide polymorphisms (SNP) that showed association with biochemical traits that reached levels of genome-wide significance of P < 10−5*†
| SNP for trait | Chromosome | Position | Effect allele | Effect allele frequency Korčula | Effect allele frequency HapMap | P value | β | Standard error (β) | Gene |
|---|---|---|---|---|---|---|---|---|---|
| Total cholesterol: | |||||||||
| rs7678182 | 4 | 32117038 | A | 0.428 | 0.367 | 9.9 × 10−06 | 0.119 | 0.027 | no genes +/− 200kb |
| rs1493762 | 12 | 90604046 | C | 0.158 | 0,180 | 7.6 × 10−06 | -0.224 | 0.050 | no genes +/− 200kb |
| rs10777332 | 12 | 90688433 | A | 0,113 | 0.161 | 5.1 × 10−06 | -0.235 | 0.051 | no genes +/− 200kb |
| rs10444502 | 12 | 116838254 | C | 0.279 | 0,267 | 6.1 × 10−06 | -0.165 | 0.036 | kinase supressor of ras 2 (KSR2) |
| rs8081810 | 17 | 17463669 | G | 0.1692 | 0.083 | 9.8 × 10−06 | 0.222 | 0.050 | phosphatidylethanolamine N-methyltransferase (PEMT) |
| rs2839619 | 21 | 43309246 | G | 0.396 | 0.525 | 7.6 × 10−06 | 0.127 | 0.028 | PBX/knotted 1 homeobox 1 (PKNOX1) |
| LDL cholesterol: | |||||||||
| rs4767631 | 12 | 116796126 | A | 0.313 | 0.333 | 5.5 × 10−07 | -0.167 | 0.033 | kinase supressor of ras 2 (KSR2) |
| rs10444502 | 12 | 116838254 | C | 0.280 | 0.267 | 6.4 × 10−07 | -0.182 | 0.036 | kinase supressor of ras 2 (KSR2) |
| rs2839619 | 21 | 43309246 | G | 0.395 | 0.525 | 8.0 × 10−06 | 0.127 | 0.028 | PBX/knotted 1 homeobox 1 (PKNOX1) |
| HDL cholesterol: | |||||||||
| rs4599440 | 4 | 61648868 | A | 0.233 | 0.275 | 1.6 × 10−06 | 0.199 | 0.041 | Latrophilin-3 precursor (LPHN3) on 100kb |
| rs871392 | 12 | 40458249 | A | 0.152 | 0,133 | 3.0 × 10−06 | -0.255 | 0.054 | no genes +/− 200kb |
| rs7499892 | 16 | 55564091 | A | 0.164 | 0.175 | 9.0 × 10−06 | -0.235 | 0.053 | cholesteryl ester transfer protein, plasma (CETP) |
| Triglycerides: | |||||||||
| rs346923 | 4 | 53106886 | A | 0.132 | 0.075 | 1.6 × 10−06 | 0.276 | 0.057 | ubiquitin specific protease 46 (USP46) |
| rs10516430 | 4 | 99556904 | A | 0.283 | 0.283 | 5.9 × 10−06 | -0.163 | 0.036 | RAP1, GTP-GDP dissociation stimulator 1 (RAP1GDS1) |
| rs5982533 | X | 111533120 | G | 0.221 | 0.167 | 6.6 × 10−06 | -0.227 | 0.050 | zinc finger CCHC domain containing 16 (ZCCHC16) |
| Uric acid: | |||||||||
| rs10805346 | 4 | 9529445 | G | 0.409 | 0.492 | 8.4 × 10−09 | -0.162 | 0.028 | solute carrier family 2, member 9 (SLC2A9) |
| rs13129697 | 4 | 9536065 | C | 0.249 | 0.350 | 2.5 × 10−12 | -0.287 | 0.041 | solute carrier family 2, member 9 (SLC2A9) |
| rs737267 | 4 | 9543842 | A | 0.203 | 0.308 | 6.1 × 10−09 | -0.273 | 0.046 | solute carrier family 2, member 9 (SLC2A9) |
| rs4505821 | 4 | 9587192 | A | 0.207 | 0.175 | 4.1 × 10−06 | 0.215 | 0.046 | solute carrier family 2, member 9 (SLC2A9) |
| rs13131257 | 4 | 9590987 | A | 0.173 | 0.225 | 8.0 × 10−09 | -0.300 | 0.052 | solute carrier family 2, member 9 (SLC2A9) |
| rs6449213 | 4 | 9603313 | G | 0.163 | 0.242 | 1.2 × 10−09 | -0.330 | 0.054 | solute carrier family 2, member 9 (SLC2A9) |
| rs1014290 | 4 | 9610959 | G | 0.205 | 0.308 | 7.5 × 10−12 | -0.320 | 0.046 | solute carrier family 2, member 9 (SLC2A9) |
| rs733175 | 4 | 9659239 | G | 0.162 | 0.242 | 1.8 × 10−06 | -0.252 | 0.052 | solute carrier family 2, member 9 (SLC2A9) |
| rs6820756 | 4 | 9671947 | A | 0.191 | 0.725 | 1.4 × 10−06 | -0.229 | 0.047 | solute carrier family 2, member 9 (SLC2A9) |
| Albumin: | |||||||||
| rs10505955 | 12 | 24985488 | G | 0.477 | 0.467 | 9.5 × 10−06 | 0.104 | 0.023 | branched chain aminotransferase 1 cytosolic (BCAT1) |
| rs10502868 | 18 | 41407947 | G | 0.075 | 0.092 | 6.5 × 10−06 | -0.359 | 0.079 | solute carrier family 14, member 2 (SLC14A2) |
| Fibrinogen: | |||||||||
| rs1490453 | 4 | 149540796 | A | 0.170 | 0.208 | 3.1 × 10−06 | 0.249 | 0.053 | nuclear receptor subfamily 3, group C, member 2 (NR3C2) |
| rs12207601 | 6 | 102462560 | G | 0.161 | 0.136 | 2.4 × 10−06 | -0.257 | 0.054 | glutamate receptor 6 (GRIK2) |
| rs6044777 | 20 | 17316542 | A | 0.174 | 0.183 | 7.9 × 10−06 | 0.227 | 0.050 | proprotein convertase subtilisin/kexin type 2 (PCSK2) |
| rs1840485 | X | 6229033 | A | 0.239 | 0.208 | 3.2 × 10−06 | -0.231 | 0.049 | X-linked neuroligin 4 on 73kb |
| rs7885458 | X | 6230474 | G | 0.238 | 0.208 | 2.9 × 10−06 | -0.233 | 0.049 | X-linked neuroligin 4 on 74kb |
*Abbreviations: LDL – low-density lipoprotein; HDL – high-density lipoprotein.
†The table summarizes SNPs, their positions on the chromosomes, P values, effect size and direction (expressed as β and standard error of β), effect allele, and implicated gene; the information on each gene was obtained from ref. 46.
Results of genome-wide association study of total serum cholesterol levels (mmol/L) using Haploview software, showing peaks at chromosomes 4, 12, 17, and 21 reaching genome-wide significance level of P < 10−5.
Results of genome-wide association study of serum low-density lipoprotein cholesterol levels (mmol/L) using Haploview software, showing peaks at chromosomes 12 and 21 reaching genome-wide significance level of P < 10−5.
Similarly, the SNP rs2839619 in PBX/knotted 1 homeobox 1 gene (PKNOX1) on chromosome 21 was associated with both total and LDL cholesterol levels. This makes KSR2 and PKNOX1 interesting candidates for further replication and follow-up functional studies. The remaining possible associations between genetic variants and the levels of total cholesterol were phosphatidylethanolamine N-methyltransferase gene (PEMT; P = 9.9 × 10−6) and rs7678182 on chromosome 4 with no known genes in proximity less than 100 kilobases (P = 9.9 × 10−6).
Genetic variants influencing HDL levels are shown in Figure 4. Two loci at chromosomes 4 and 12 showed high level of significance of association, but no known genes were present within 100 kilobases of the two loci. However, the third finding (rs7499892 on chromosome 16; P = 9.9 × 10−6) replicated an already reported association between cholesteryl ester transfer protein gene (CETP) and HDL, confirming the role of CETP gene in the regulation of plasma HDL levels. The correlation coefficient between LDL and HDL was 0.46 (P < 0.001), LDL and total cholesterol 0.53 (P < 0.001), and HDL and total cholesterol 0.83 (P < 0.001).
Results of genome-wide association study of serum high-density lipoprotein cholesterol levels (mmol/L) using Haploview software, showing peaks at chromosomes 4, 12, and 16 reaching genome-wide significance level of P < 10−5.
We found 3 variants influencing triglyceride levels in Korčula island population at the level P < 1.0 × 10−5 (Table 1, Figure 5): SNP rs346923 on chromosome 4 found within ubiquitin specific protease 46 gene (USP46, P = 1.6 × 10−6); SNP rs10516430 on chromosome 4 within RAP1, GTP-GDP dissociation stimulator 1 gene (RAP1GDS1, P = 5.9 × 10−6); and SNP rs5982533 on chromosome X, within zinc finger CCHC domain 16 (ZCCHC16, P = 6.6 × 10−6).
Results of genome-wide association study of serum triglyceride levels (mmol/L) using Haploview software, showing peaks at chromosomes 4 and X reaching genome-wide significance level of P < 10−5.
Table 1 shows that nearly one-third of 31 SNP association findings from this study implicated the role of SLC2A9 in uric acid regulation. Significance of this association ranged between P = 4.1 × 10−6 and 2.58 × 10−12. The highly significant peak at chromosome 4, shown in Figure 6, represents a convincing replication of previous findings in other populations. Linkage disequilibrium pattern of SNPs associated with uric acid regulation for the CEU population (population of western European ancestry) is shown in Figure 7.
Results of genome-wide association study of uric acid levels (µmol/L) using Haploview software, showing a single, highly significant peak at chromosome 4 reaching genome-wide significance level of P < 10−5.
P linkage disequilibrium pattern of short nucleotide polymorphisms (SNP) associated with uric acid regulation for the CEU population (of western European ancestry) calculated using Haploview software, showing r2 values.
Genome-wide association study of albumin and fibrinogen conducted at Korčula island represents one of the first attempts to obtain insights into genetic regulation of those two traits. The most significant findings were possible roles of branched chain aminotransferase 1 cytosolic gene (BCAT1) and solute carrier family 14 member 2 gene (SLC14A2) in the regulation of albumin levels (Table 1, Figure 8). Fibrinogen yielded 5 results at the level P < 1.0 × 10−5 (Table 1, Figure 9). Two of them, located close to each other on chromosome X, were not found within or near to any known genes. Other results highlighted the possible role of the genes nuclear receptor subfamily 3 group C member 2 (NR3C2) on chromosome 4, glutamate receptor 6 (GRIK2) on chromosome 6, and proprotein convertase subtilisin/kexin type 2 (PCSK2) on chromosome 20.
Results of genome-wide association study of serum albumin levels (g/L) using Haploview software, showing peaks at chromosomes 12 and 18 reaching genome-wide significance level of P < 10−5.
Discussion
Our study of nearly 1000 examinees focused on 7 standard biochemical measurements and revealed a total of 31 SNPs associated with the investigated traits at the level of P < 1.00 × 10−5. Nine of those SNPs (nearly one third) implicated the role of SLC2A9 in uric acid regulation, which is a replication of previous findings in other populations. The remaining 20 associations fell into statistical significance range P = 1.00 × 10−5-1.00 × 10−6. One of them replicated the association between CETP and HDL (47,48). Recent 3 GWASs identified nearly 20 new loci affecting total cholesterol, LDL cholesterol, HDL cholesterol, and triglycerides (18,49,50). All 3 studies confirmed the association between CETP gene and HDL cholesterol. However, loci identified in these 3 studies are different than 15 loci for serum lipid levels reported in our study. The replications of SLC2A9 and CETP findings add to reliability of other findings reported in this study and of genome-wide association approach in general.
Two nearby SNPs in KSR2 gene on chromosome 12 were associated with LDL cholesterol levels and another SNP in KSR2 gene with total cholesterol levels. This is an intriguing finding, because very little is known about the function of this gene in humans, but mice in which this gene has been knocked out show a striking phenotype of extreme obesity (51). Similarly, the SNP rs2839619 in PKNOX1 gene on chromosome 21 was associated with both total and LDL cholesterol levels. This is another intriguing finding, because the role of this gene in cancer development has long been hypothesized without ever gathering much strong evidence, but the most recent study found that the deficiency of PKNOX1 gene induced protection from diabetes and increased insulin sensitivity through a p160-mediated mechanism (52). In addition, the third gene implicated in cholesterol regulation, PEMT, is a liver-specific enzyme that converts phosphatidylethanolamine to phosphatidylcholine (53). It has been shown recently that mice that lack PEMT gene have reduced plasma levels of phosphatidylcholine and cholesterol in HDL (53). This makes all 3 genes – KSR2, PKNOX1, and PEMT – interesting candidate genes for the regulation of cholesterol levels. Further replication and functional studies will be needed to reaffirm their role in cholesterol regulation in humans.
Among the other findings, 7 associations were more than 100 kilobases away from the closest known gene and their true importance is yet to be replicated and explained. The remaining 8 findings implied possible associations between the three genes (deubiquitinating enzyme gene - USP46; exchange factor SMG P21 stimulatory GDP/GTP exchange protein gene - RAP1GDS1; and zinc finger CCHC domain 16 - ZCCHC16) and triglycerides; the two genes (branched-chain-amino-acid aminotransferase, cytosolic gene - BCAT1; and solute carrier family 14 (urea transporter) member 2 gene - SLC14A2) and albumin; and the three genes (nuclear receptor subfamily 3 group C member 2 gene - NR3C2; glutamate receptor, ionotropic, kainate 2 gene - GRIK2; and proprotein convertase subtilisin/kexin type 2 gene - PCSK2) and fibrinogen. No information is currently available on the roles and functions of these genes that would help understand their possible role in regulating levels of human biochemical traits. BCAT1 gene has been identified through genome-wide association studies as a gene that confers the risk of diabetes type 2 in several populations (54), while SLC14A2 is only known as a predicted urea transporter (55).
The results presented here suggest that some identified genes were implicated in more than one trait, suggesting that they might be responsible for the genetic action on the common underlying property for these traits. Others, such as PEMT gene, represent promising and biologically plausible functional candidates. The shortcomings of this study primarily include the potential low statistical power, which is a consequence of the limited population size encountered in any genetic isolate. The target for this study was to sample approximately 1000 examinees, which could be supplemented with more in the future, in order to increase the statistical power of the study. Additionally, the studies performed in isolated populations should always seek replication to ensure that the findings are indeed representative of wider general human populations and not limited to specific circumstances of a particular isolate. Finally, variants identified in this study all require functional follow-up and replication in other populations in order to establish their true significance in determination of human biochemical traits.
Acknowledgment
I. R. is the Editor for International Health Issues in the Croatian Medical Journal, and S. J. is the Dean of the University of Split School of Medicine, one of the owners of the journal. To ensure that any possible conflict of interest has been addressed, this article was reviewed according to best practice guidelines of international editorial organizations.
This work was supported by the grant 108-1080315-0302 to IR and several other grants from the Croatian Ministry for Science, Education, and Sport to Croatian co-authors. It was also supported by the grants from the Medical Research Council UK to HC, AFW, and IR; and European Commission FP6 STRP grant number 018947 (LSHG-CT-2006-01947) to HC. The authors collectively thank to very large number of individuals for their individual help in organizing, planning and carrying out the field work related to the project and data management.”









