Novel Associations of CPS1, MUT, NOX4 and DPEP1 with Plasma Homocysteine in a Healthy Population: A Genome Wide Evaluation of 13,974 Participants in the Women’s Genome Health Study
Associated Data
Abstract
Background
Homocysteine is a sulfur amino acid whose plasma concentration has been associated with the risk of cardiovascular diseases, neural tube defects and loss of cognitive function in epidemiological studies. While genetic variants of MTHFR and CBS are known to influence homocysteine concentration, common genetic determinants of homocysteine remain largely unknown.
Methods and Results
To address this issue comprehensively, we performed a genome wide association analysis, testing 336,469 SNPs in 13,974 healthy Caucasian women. While we confirm association with MTHFR (1p36.22; rs1801133; p=8.1 × 10−35) and CBS (21q22.3; rs6586282; p=3.2 × 10−10), we found novel associations with CPS1 (2q34; rs7422339; p=1.9 × 10−11), MUT (6p12.3; rs4267943; p=2.0 × 10−9), NOX4 (11q14.3; rs11018628; p=9.6 × 10−12) and DPEP1 (16q24.3; rs1126464; p=1.2 × 10−12). The associations at MTHFR, DPEP1 and CBS were replicated in an independent sample from the PROCARDIS study, whereas the association at CPS1 was only replicated among the women.
Conclusions
These associations offer new insights into the biochemical pathways involved in homocysteine metabolism and provide opportunities to better delineate the role of homocysteine in health and disease.
INTRODUCTION
Homocysteine is a non-protein-forming sulfur amino acid produced during the catabolism of methionine. Its concentration is tightly regulated and kept at low levels through catabolism by either remethylation or transsulfuration. The small amount of homocysteine found in plasma is the result of a cellular export mechanism that complements the remethylation and transsulfuration pathways in maintaining low intra-cellular concentration of this potentially cytotoxic and pro-oxidant amino acid1–3. Plasma homocysteine levels are influenced by genetic as well as environmental factors, such as age, sex, smoking status, intake of folate and intake of B vitamins. Homocysteine concentration has been epidemiologically correlated with the risk of cardiovascular diseases4, 5, neural tube defects 6 and loss of cognitive functions 7.
Despite heritability estimates ranging from 25% to 44% 8, 9, relatively little is known of the genetic determinants of homocysteine levels. Linkage scans have revealed linkage signals at 11q23, 12q14, 13q31 and 16q12, although these remain un-validated to date 8, 9. Rare homozygous defects in genes encoding for enzymes of homocysteine metabolism (i.e. CBS and MTHFR) lead to dramatically increased homocysteine concentration and premature occlusive vascular disease 1. However, few common polymorphisms have been unequivocally associated with homocysteine concentration. Among these, the strongest is the MTHFR SNP rs1801133 (C677T) correlated with reduced enzymatic activity and higher homocysteine levels 10. MTHFR encodes the enzyme methylenetetrahydrofolate reductase, which catalyses the conversion of 5,10-methylenetetrahydrofolate to 5-methyltetrahydrofolate, a cosubstrate for homocysteine remethylation to methionine by methionine synthase. Along with MTHFR, genetic variations at loci such as CBS11, 12 (cystathionine beta-synthase), MTR 13 (5-methyltetrahydrofolate-homocysteine methyltransferase) and MTRR 14 (5-methyltetrahydrofolate-homocysteine methyltransferase reductase) have also been reported to influence homocysteine concentration. In an effort to identify new genetic variants influencing homocysteine concentration, we performed a genome wide association analysis, testing 336,469 SNPs in 13,974 healthy women. Because homocysteine is a direct intermediary of metabolic pathways, this offers the possibility of improving our knowledge not only of homocysteine metabolism but also of the genetic architecture of metabolic traits in general. Furthermore, finding novel genetic associations with homocysteine concentration will provide opportunities to better delineate the role of homocysteine in health and disease.
METHODS
WGHS Study Participants
The study population derived from the Women’s Genome Health Study (WGHS)15. Briefly, participants in the WGHS include American women from the Women’s Health Study (WHS) with no prior history of cardiovascular disease, cancer, or other major chronic illness who also provided a baseline blood sample at the time of study enrollment from which genomic DNA was extracted16. Individuals with prevalent diabetes were excluded from analysis. The study was approved by the institutional review board of the Brigham and Women’s Hospital.
EDTA blood samples were obtained at the time of enrollment and stored in vapor phase liquid nitrogen (−170 °C). The concentration of homocysteine was determined using an enzymatic assay on the Hitachi 917 analyzer (Roche Diagnostics) using reagents and calibrators from Catch, Inc17. The sensitivity of the homocysteine assay is 0.24 μmol/L and its coefficient of variation is 1.8% at 15.6 μmol/L. Participants also completed a 131-item semiquantitative food frequency questionnaire at baseline 18 from which intake of folate, vitamin B6 and vitamin B12 was evaluated. Nutrient intake assessments made by use of the food frequency questionnaire used have been previously shown to be valid and reliable19, 20, with correlation coefficients of 0.49–0.55 for estimated folate and B-vitamin intake with this questionnaire method compared with measured plasma concentrations in apparently healthy female professionals similar to our study participants21, 22.
Genotyping
DNA samples were genotyped with the Infinium II technology from Illumina. Either the HumanHap300 Duo-Plus chip or the combination of the HumanHap300 Duo and I-Select chips was used. In either case, the custom content was identical and consisted of candidate SNPs chosen without regard to allele frequency to increase coverage of genetic variation with impact on biological function including metabolism, inflammation or cardiovascular diseases. Specifically, all UCSC coding non-synonymous SNPs, UCSC splice-site SNPs, UCSC SNPs with phenotype annotation, and SNPs from targeted cardiovascular diseases, diabetes, venous thromboembolism, blood pressure and inflammation candidate genes were selected. Furthermore, all SNPs with functional effects according to Online Mendelian Inheritance in Man (OMIM) were included, irrespective of genic region. Both OMIM and UCSC databases were accessed in December 2006. Genotyping at 318,237 HumanHap300 Duo SNPs and 45,571 custom content SNPs was attempted, for a total of 363,808 SNPs. Genetic context for all annotations are derived from human genome build 36.1 and dbSNP build 126.
SNPs with call rates <90% were excluded from further analysis. Likewise, all samples with percentage of missing genotypes higher than 2% were removed. Among retained samples, SNPs were further evaluated for deviation from Hardy-Weinberg equilibrium using an exact method23 and were excluded when the P-value was lower than 10−5. Samples were further validated by comparison of genotypes at 44 SNPs that had been previously ascertained using alternative technologies. SNPs with minor allele frequency >1% in Caucasians were used for analysis. After quality control, 307,805 HumanHap300 Duo SNPs and 28,664 custom content SNPs were left, for a total of 336,469 SNPs with a mean call rate of 99.6%.
Population Stratification
Because population stratification can result in inflated type I error in genome wide association analysis, a principal component analysis using 1443 ancestry informative SNPs was performed using PLINK24 in order to confirm self-reported ancestry. Briefly, these SNPs were chosen based on Fst > 0.4 in HapMap populations (YRB, CEU, CHB+JPT) and inter-SNP distance at least 500 kb in order to minimize linkage disequilibrium. Different ethnic groups were clearly distinguished with the two first components. Based on this analysis, 55 participants were excluded from further evaluation as they did not cluster with other Caucasians, leaving 13,974 for the current study population. Two additional steps were taken to rule out the possibility that residual stratification within Caucasians was responsible for the associations observed. First, association analysis was done with correction by genomic control. Second, a principal component analysis25 was performed in previously identified Caucasians (only) using 64,221 SNPs chosen to have pair-wise linkage disequilibrium lower than r2=0.2. The first ten components were then used as covariates in the association analysis. As adjustment by these covariates did not change the conclusions, we present analysis among Caucasian participants without further correction for sub-Caucasian ancestry.
Association Analysis
To identify common genetic variants influencing homocysteine levels, we first attempted to discover which loci significantly contributed to homocysteine concentration. Log-transformed plasma concentration of homocysteine was adjusted for age, body mass index, smoking status, hormone replacement therapy use, folate intake, vitamin B6 intake and vitamin B12 intake using a linear regression model in R. These covariates were chosen as they are potential environmental confounders of homocysteine levels, based on available literature26–29. This was done to reduce the impact of clinical covariates on homocysteine variance. The adjusted homocysteine values were then tested for association with SNP genotypes by linear regression in PLINK24, assuming an additive contribution of each minor allele. Beta coefficients of natural log-transformed adjusted homocysteine values are given using the major allele as the reference allele (major and minor alleles are presented in Supplementary Table 1). A conservative P-value cut-off of 5×10−8 was used to correct for the maximum of 1,000,000 independent statistical tests thought to correspond to all the common genetic variation of the human genome 30.
Once any locus with genome wide significance was identified, a forward selection linear multiple regression model was used to further define the extent of the genetic association. Briefly, all genotyped SNPs within 100 kb of a SNP with genome wide association (i.e. P < 5×10−8) and passing quality control requirements were tested for possible incorporation into a multiple regression model. In stepwise fashion, a SNP was added to the model if it had the smallest P-value among all the SNPs not yet included in the model and if it had a P-value was lower than 5 × 10−8. SNPs selected by this algorithm were also used in haplotype analysis using WHAP 24, as implemented in PLINK 24.
PROCARDIS Validation Sample
A subset of 840 postinfarction patients participating in the PROCARDIS study for whom homocysteine data were available was used for replication of findings in the WGHS. Detailed descriptions of the PROCARDIS population can be found elsewhere31, 32. In brief, cases included in the present report had a diagnosis of myocardial infarction at or before the age of 65 years and reported having European ancestry. Recruitmentwas carried out in Germany, Italy, Sweden and the UK. The PROCARDIS protocol was approved by the Ethics Committees of the participating institutions and all subjects gave written informed consent.
DNA was extracted and quantified as described31. Genotyping of a genome-wide set of 1,054,559 SNPs was performed using the Illumina Infinium II Human 1M bead chip at two national centres (Centre National de Génotypage, Paris, France and the SNP Technology Platform, Uppsala, Sweden). Subjects with % genotype calls <95% were excluded from further analysis as were male samples with significant X-chromosomal heterozygosity and samples with a possible population bias, as indicated from the identity by state cluster analysis. Also, when relatedness between subjects was suspected (up to 2nd degree relationships), one individual from the pair was removed.
EDTA Blood samples were obtained in the fasting state using EDTA tubes supplemented with 4 g/l sodium fluoride as an inhibitor of erythrocyte metabolism. Total homocysteine (free and protein bound) concentration was determined immediately after thawing frozen plasma samples which had been stored at − 80 °C by fluorescence high-performance liquid chromatography (HPLC) using reagents and calibrators from Chromsystems (Munich, Germany). Isocratic reversed-phase chromatography was performed on a Kontron (Neufahrn, Germany) liquid chromatograph interfaced with a model RF-535 fluorescence detector (Shimadzu, Kyoto, Japan) by using excitation and emission wavelengths of 385 and 515 nm, respectively. The coefficients of variation within and between days for the assay were ≤ 4.8%.
Association analysis in the validation sample was performed on log-transformed plasma concentration of homocysteine, using a linear regression model that adjusted for age, sex, body mass index, smoking status and country of origin. Other linear models used included either of the MTHFR rs1801133 and MTHFR rs17350396 SNPs or both the rs1801133 and rs17350396 SNPs. For the analysis of sex-specific SNP-homocysteine associations, a general linear model was used comparing the main effects of SNPs on homocysteine adjusted for age, body mass index, smoking status and country of origin within gender subgroups. Statistical analyses were performed in SPSS 16.0 for Windows®.
Statement of Responsibility
The authors had full access to the data and take responsibility for its integrity. All authors have read and agree to the manuscript as written.
RESULTS
Clinical characteristics of the 13,974 healthy women in the WGHS sample are provided in Table 1. Results of the genome wide association analysis of log-transformed adjusted homocysteine concentration are presented in Table 2. Twenty-nine SNPs at six different loci – MTHFR, CPS1, MUT, NOX4, DPEP1 and CBS - had association P-values lower than our previously defined genome wide significance threshold of 5 × 10−8: 15 SNPs at the MTHFR locus; 1 SNP at the CPS1 locus; 1 SNP at the MUT locus; 6 SNPs at the NOX4 locus; 3 SNPs at the DPEP1 locus; 1 SNP at the CBS locus. Genetic context of all these loci is presented in Figure 1 along with the − log10 transformed P-values. In addition, 8 SNPs had association P-values just above this threshold level (between 10−6 and 5 × 10−8), including the AKAP13 SNP rs2061821 on chromosome 15q25.3 (the remaining 7 SNPs were in the vicinity of one of the six previously identified loci). Among these SNPs, only rs4267943 (MUT; 6p12.3) deviated from Hardy-Weinberg equilibrium (p=0.00006), but visual inspection of the raw genotyping signal for this SNP did not reveal any obvious artifact. Furthermore, other SNPs at this locus (rs2501976, rs6458690 and rs9473558; see Table 2) did not deviate from Hardy-Weinberg equilibrium while being significantly associated with homocysteine level (P-values ranging from 2.1 × 10−7 to 5.2 × 10−8) and in linkage disequilibrium with rs4267943 (r2 > 0.95 for all three pairwise comparisons).
Genomic context for each of six loci with significant association with homocysteine concentration. (A) MTHFR locus (1p36.22); (B) CPS1 locus (2q34); (C) MUT locus (6p12.3); (D) NOX4 locus (11q14.3); (E) DPEP1 locus (16q24.3); (F) CBS locus (21q22.3). Upper panel: Genes from RefSeq release 25. Only one isoform is shown when multiple splicing variants are known. Lower Panel: SNPs are shown according to their physical location and − log10 P-values for association with homocysteine (red dots). The red line represents the genome-wide significance threshold of 5 × 10−8. Also shown is the genetic distance in cM from the lowest P-value SNP (light grey line) along with the position of recombination hotspots (light grey vertical bars). Recombination rates and hotspots are based on HapMap data, as described by McVean et al.54 and Winckler et al.55
Table 1
Clinical characteristics of the samples used.
| WGHS (N=13,974) | PROCARDIS (N=840) | |
|---|---|---|
| %Female Patients | 100% | 19.2% |
| Age (yrs.) | 54.7 (7.0) | 61.0 (7.0) |
| BMI (kg/m^2) | 25.8 (4.7) | 28.0 (4.2) |
| Menopause* | 54.6% | NA |
| HRT† | 44.9% | NA |
| Smoking | 11.6% | 25.0 % |
| Homocysteine (μmol/L) | 11.4(4.7) | 13.3(6.0) |
| Folate Intake (μg/day) | 434 (227) | NA |
| Vitamin B6 Intake (mg/day) | 5.2 (17.4) | NA |
| Vitamin B12 Intake (μg/day) | 8.6 (9.6) | NA |
Results are given as mean (standard deviation), as appropriate.
Table 2
SNPs with a P-Value Lower than 10−6 for Association with Homocysteine Concentration.
| SNP | Region | Position (kb) | Nearest Gene (Candidate Gene) | Function* | MAF† | HW‡ | Beta§ | P-Value§ |
|---|---|---|---|---|---|---|---|---|
| rs12564559 | 1p36.22 | 11748.1 | AGTRAP (MTHFR) | intron | 0.11 | 0.58 | −0.034 | 9.4E-09 |
| rs3737967 | 1p36.22 | 11770.0 | MTHFR | CNS (R492H) | 0.05 | 0.52 | −0.063 | 4.1E-13 |
| rs2274976 | 1p36.22 | 11773.5 | MTHFR | CNS (R594Q) | 0.05 | 0.35 | −0.062 | 1.9E-12 |
| rs1801133 | 1p36.22 | 11779.0 | MTHFR | CNS (A222V) | 0.33 | 0.02 | 0.048 | 8.1E-35 |
| rs17367504 | 1p36.22 | 11785.4 | MTHFR | intron | 0.16 | 0.26 | −0.035 | 5.4E-12 |
| rs2076003 | 1p36.22 | 11806.7 | CLCN6 (MTHFR) | intron | 0.05 | 0.72 | −0.062 | 9.4E-13 |
| rs7537765 | 1p36.22 | 11809.9 | CLCN6 (MTHFR) | intron | 0.16 | 0.29 | −0.035 | 2.6E-12 |
| rs198414 | 1p36.22 | 11823.4 | CLCN6 (MTHFR) | untranslated | 0.14 | 0.97 | −0.031 | 1.1E-08 |
| rs198358 | 1p36.22 | 11826.7 | CLCN6 (MTHFR) | - | 0.24 | 0.41 | −0.034 | 2.7E-15 |
| rs5065 | 1p36.22 | 11828.7 | NPPA (MTHFR) | CNS (*152R) | 0.15 | 0.74 | −0.028 | 9.0E-08 |
| rs5063 | 1p36.22 | 11830.2 | NPPA (MTHFR) | CNS (V32M) | 0.05 | 0.46 | −0.055 | 5.6E-10 |
| rs1999594 | 1p36.22 | 11881.8 | NPPB (MTHFR) | - | 0.46 | 0.77 | −0.025 | 8.5E-12 |
| rs2639453 | 1p36.22 | 11905.1 | KIAA2013 (MTHFR) | intron | 0.22 | 0.34 | 0.032 | 6.7E-13 |
| rs6682554 | 1p36.22 | 11909.2 | KIAA2013 (MTHFR) | - | 0.45 | 0.36 | 0.024 | 2.7E-10 |
| rs7550536 | 1p36.22 | 11989.6 | MFN2 (MTHFR) | intron | 0.40 | 0.70 | 0.021 | 1.4E-08 |
| rs7422339 | 2q34 | 211248.8 | CPS1 | CNS (T1406N) | 0.31 | 0.92 | 0.027 | 1.9E-11 |
| rs2501976 | 6p12.3 | 49478.9 | MUT | - | 0.37 | 0.15 | 0.021 | 5.3E-08 |
| rs6458690 | 6p12.3 | 49519.6 | MUT | intron | 0.37 | 0.27 | 0.021 | 5.9E-08 |
| rs9473558 | 6p12.3 | 49520.4 | MUT | CNS (R532H) | 0.37 | 0.17 | 0.020 | 2.1E-07 |
| rs4267943 | 6p12.3 | 49547.8 | CENPQ (MUT) | CNS (G63R) | 0.36 | 5.6E-05 | 0.024 | 2.0E-09 |
| rs317191 | 11q14.3 | 88730.5 | NOX4 | intron | 0.07 | 0.64 | −0.048 | 1.1E-10 |
| rs317150 | 11q14.3 | 88763.4 | NOX4 | intron | 0.08 | 0.63 | −0.041 | 5.3E-10 |
| rs7929532 | 11q14.3 | 88778.6 | NOX4 | intron | 0.07 | 0.53 | −0.048 | 4.4E-11 |
| rs9299894 | 11q14.3 | 88796.3 | NOX4 | intron | 0.07 | 0.14 | −0.050 | 9.8E-12 |
| rs317148 | 11q14.3 | 88810.2 | NOX4 | intron | 0.44 | 0.73 | −0.019 | 5.1E-07 |
| rs319016 | 11q14.3 | 88820.8 | NOX4 | intron | 0.47 | 0.32 | 0.019 | 1.8E-07 |
| rs10501705 | 11q14.3 | 88827.9 | NOX4 | intron | 0.07 | 0.27 | −0.050 | 1.0E-11 |
| rs11018628 | 11q14.3 | 88846.2 | NOX4 | intron | 0.07 | 0.18 | −0.050 | 9.6E-12 |
| rs2061821 | 15q25.3 | 83923.7 | AKAP13 | CNS (M452T) | 0.38 | 0.71 | 0.019 | 5.5E-07 |
| rs1126464 | 16q24.3 | 88231.9 | DPEP1 | CNS (E351Q) | 0.24 | 0.93 | −0.031 | 1.2E-12 |
| rs460879 | 16q24.3 | 88240.4 | CHMP1A (DPEP1) | intron | 0.46 | 0.19 | 0.026 | 3.8E-12 |
| rs459920 | 16q24.3 | 88258.3 | C16orf55 (DPEP1) | intron | 0.44 | 0.36 | 0.024 | 6.5E-11 |
| rs2377058 | 16q24.3 | 88262.3 | C16orf55 (DPEP1) | intron | 0.36 | 0.86 | −0.019 | 5.6E-07 |
| rs6586282 | 21q22.3 | 43351.6 | CBS | intron | 0.18 | 0.37 | −0.030 | 3.2E-10 |
To further define the extent of genetic associations at the 6 loci with association P-value lower than 5×10−8, we applied a forward model selection algorithm to each of them in order to identify SNPs non-redundantly associated with homocysteine. Briefly, 97 SNPs at MTHFR, 26 at CPS1, 20 at MUT, 32 at NOX4, 35 at DPEP1 and 34 at CBS were initially assessed for possible inclusion in a multiple linear regression model. Using a P-value cut-off of 5 × 10−8, 7 SNPs were selected, representing the lead SNP at each of the 6 loci considered plus one SNP (rs17350396) at the MTHFR locus (see Table 3). Interestingly, this later SNP (rs17350396; MTHFR) was marginally significant in univariable analysis (p=0.008), illustrating that its inclusion in the model and significant association are conditional on the genotypes at rs1801133 (MTHFR). Multiple regression beta coefficients and P-values of the 7 SNPs selected are shown in Table 3. Illustrated in Figure 2 are the quantile-quantile plots of association P-values before and after adjusting homocysteine concentration for the combined effect of these 7 SNPs. Among these SNPs, only rs1801133 (MTHFR) showed evidence (p=0.002) for non-additive effects of the minor allele as judged by a likelihood ratio test comparing the additive regression model to an alternative genotype model with an additional degree of freedom. Specifically, the association tended toward a recessive genetic model, with mean log-transformed adjusted homocysteine values of − 0.023 (N=6246) for major allele homozygotes, 0.011 (N=6109) for heterozygotes and 0.087 (N=1619) for minor allele homozygotes. However, according to the Bayes Information Criteria, the additive model still provided the best fit to the data as compared to purely recessive or dominant models. Importantly, use of a 2 degrees of freedom model did not change the result of the model selection algorithm and the genetic effect is therefore assumed to be additive for the remainder of this report. No gene-gene interaction was observed between any of the model selected SNPs.
The quantile-quantile plot of homocysteine association P-values is shown on the left. On the right, the same quantile-quantile plot is shown, but after adjusting homocysteine values for the 7 SNPs retained by the model selection algorithm (see text for details).
Table 3
Multiple Linear Regression Statistics of SNPs Retained by the Forward Model Selection Algorithm.
| SNP | Chromosome | Position (kb) | Nearest Gene (Candidate Gene) | Function* | MAF† | HW‡ | Beta§ | P-Value§ |
|---|---|---|---|---|---|---|---|---|
| rs1801133 | 1p36.22 | 11779.0 | MTHFR | CNS | 0.33 | 0.02 | 0.057 | 1.7E-44 |
| rs17350396 | 1p36.22 | 11823.4 | CLCN6 (MTHFR) | untranslated | 0.17 | 0.82 | 0.036 | 4.2E-12 |
| rs7422339 | 2q34 | 211248.8 | CPS1 | CNS | 0.31 | 0.92 | 0.027 | 1.9E-11 |
| rs4267943 | 6p12.3 | 49547.8 | CENPQ (MUT) | CNS | 0.36 | 0.00 | 0.024 | 2.0E-09 |
| rs11018628 | 11q14.3 | 88846.2 | NOX4 | intron | 0.07 | 0.18 | −0.050 | 9.6E-12 |
| rs1126464 | 16q24.3 | 88231.9 | DPEP1 | CNS | 0.24 | 0.93 | −0.031 | 1.2E-12 |
| rs6586282 | 21q22.3 | 43351.6 | CBS | intron | 0.18 | 0.37 | −0.030 | 3.2E-10 |
The 2 SNPs at the MTHFR locus selected by our algorithm were also used in haplotype analysis (Table 4). The estimate of the proportion of variance attributable to haplotypes, as well as their regression coefficients, is consistent with the linear model of these same SNPs, reinforcing the adequacy of a multiple regression model to explain the association (as compared to the haplotype analysis). Linkage disequilibrium between these two MTHFR SNPs was 0.097 for r2 and 0.978 for D’. The 2 SNPs at MTHFR collectively explained 1.3% of the total variance in homocysteine concentration, whereas the CPS1 SNP (rs7422339) explained 0.3%, the MUT SNP (rs4267943) 0.3%, the NOX4 SNP (rs11018622) 0.3%, the DPEP1 SNP (rs1126464) 0.3% and the CBS SNP (rs6586282) 0.2%. In comparison, clinical covariates accounted for 8.7% of the variance (Table 5), and together the candidate loci and the clinical variables accounted for 11.3% of total variance.
Table 4
Haplotype Analysis of rs1801133 and rs17350396 (MTHFR Locus).
| Haplotype | Frequency | Beta* | P-Value* | |
|---|---|---|---|---|
| rs1801133 | rs17350396 | |||
| G | G | 0.17 | 0.013 | 1.1E-02 |
| A | A | 0.33 | 0.048 | 3.3E-35 |
| G | A | 0.50 | −0.051 | 8.6E-43 |
Omnibus (2 df) P-Value = 5.5 E-45
Table 5
Partition of Homocysteine Variance According to Genetic and Clinical Variables.
| Category | Variable | Variable R2 | Category R2 |
|---|---|---|---|
| Clinical Covariates | Age | 0.0063 | 0.0871 |
| Body Mass Index | 0.0020 | ||
| Smoking Status | 0.0226 | ||
| HRT Status | 0.0074 | ||
| Folate Intake | 0.0483 | ||
| Vitamin B6 Intake | 0.0002 | ||
| Vitamin B12 Intake | 0.0003 | ||
| 1p36.22 (MTHFR) Locus | rs1801133 | 0.0095 | 0.0126 |
| rs17350396 | 0.0030 | ||
| 2q34 (CPS1) Locus | rs7422339 | 0.0027 | 0.0027 |
| 6p12.3 (MUT) Locus | rs4267943 | 0.0027 | 0.0027 |
| 11q14.3 (NOX4) Locus | rs11018622 | 0.0029 | 0.0029 |
| 16q24.3 (DPEP1) Locus | rs1126464 | 0.0029 | 0.0029 |
| 21q22.3 (CBS) Locus | rs6586282 | 0.0023 | 0.0023 |
| TOTAL | 0.1131 |
Replication of the model selected SNPs was attempted in the PROCARDIS sample (see Table 1 for clinical characteristics of PROCARDIS participants). First, the two MTHFR SNPs, rs1801133 and rs17350396, were included in a multiple regression linear model that used homocysteine concentration as the dependant variable. Both SNPs were significantly associated with 2-sided P-values of 0.0001 (Beta=0.15) and 0.02 (Beta=0.09), respectively (and consistent direction of effect). Of note, when each SNP was included separately in the linear model, only rs1801133 was associated (P=0.0001) whereas rs17350396 was non-significant (P=0.433). Because the DPEP1 SNP rs1126464 did not pass genotyping quality control in the PROCARDIS sample, we tested for association with rs460879, the second most significant SNP at the DPEP1 locus in WGHS and the SNP with the strongest linkage disequilibrium based on D’ (P=3.8 × 10−12; D’=0.99 and r2=0.25 between rs1126464 and rs460879). The association P-value was 0.001 (Beta=0.12), with consistent direction of effect. The CBS SNP rs6586282 was also replicated with a P-value of 0.02 (Beta=− 0.07), again with consistent direction of effect. Finally, while the CPS1 SNP rs7422339 was non-significant when men (N=658) and women (N=161) were considered together, the association was significant in the women considered on their own (P=0.009; Beta=0.2; direction of effect consistent) with a sex-SNP interaction P-value of 0.0003. No other sex interaction was noted. Associations with the MUT SNP rs4267943 and NOX4 SNP rs11018628 were non-significant (P>0.05).
DISCUSSION
Four loci – MTHFR, CBS, DPEP1 and CPS1 – have been identified and confirmed in this report for association with homocysteine levels, with the CPS1 association being sex-specific. While genetic variants of MTHFR and CBS are known to influence homocysteine metabolism, the other associations are novel. DPEP1 (dipeptidase 1) is a kidney membrane enzyme that is highly expressed in the proximal convoluted tubules 33. It hydrolyzes a variety of dipeptides and is implicated in renal metabolism of glutathione and its conjugates, such as leukotrienes 34. Because DPEP1 deficiency leads to increased urinary excretion of cysteine 35, 36, a precursor of homocysteine, we hypothesize that the association between the DPEP1 coding non-synonymous (E351Q) SNP rs1126464 and homocysteine concentration could be the result of changes in the renal handling of amino acids. While DPEP1 offers a clear hypothesis for the observed genetic association, the high level of linkage disequilibrium in this region precludes the exclusion of other genes as mechanistically linked to homocysteine metabolism (see Figure 1-E).
Carbamoylphosphate synthetase I (CPS1) is a nuclear encoded mitochondrial matrix enzyme that catalyses the first and rate-limiting step of the hepatic urea cycle. The hepatic urea cycle is responsible for the elimination of ammonia in the form of urea as well as the synthesis of arginine, a precursor of the potent vasodilatator nitric oxide. CPS1 synthesizes carbamoylphosphate from bicarbonate, ATP and ammonia using a cofactor N-acetylglutamate. Its genetic deficiency results in a rare autosomal recessive disease characterized by episodes of hyperammonemia in the neonatal period, with elevated plasma glutamine and low or absent citrulline 37. The CPS1 SNP rs7422339 associated with homocysteine in a sex-specific manner in our study encodes the substitution of asparagine for threonine (T1405N) in the region critical for N-acetyl-glutamate binding and results in 20–30% higher enzymatic activity 38. This variation has been shown to influence nitric oxide metabolite concentrations and vasodilation following agonist stimulation 39. Furthermore, the same variant has been associated with the risk of pulmonary hypertension in the newborn 40 as well as the risk of veno-occlusive disease after bone marrow transplantation 38–41. Interestingly, nitrous oxide irreversibly inactivates the cytosolic enzyme methionine synthase by oxidizing enzyme-bound vitamin B12 42, 43. Methionine synthase is a vitamin B12 dependant enzyme that catalyzes the synthesis of methionine and methyltetrahydrofolate from homocysteine and tetrahydrofolate, the entry point of homocysteine into the remethylation pathway. Consistent with this pathway, humans and laboratory animals subjected to nitrous oxide have elevated levels of plasma homocysteine 44–48. The MUT and NOX4 loci were associated with homocysteine levels in the WGHS sample, but these associations were not replicated in the PROCARDIS sample. While rs4267943 is a non-synonymous coding SNP (G63R) in the gene CENPQ (centromere protein Q), the most likely candidate gene for its association with homocysteine is MUT. CENPQ itself encodes for a protein responsible for proper kinetochore function and mitotic progression 49. Although we cannot exclude an effect on CENPQ, we note that rs4267943 is approximately 10 kb upstream of the MUT transcription start site and is in high linkage disequilibrium (r2 > 0.95) with other MUT genetic variants, including a non-synonymous coding SNP rs9473558 (R532H; P=2.1 × 10−7). MUT encodes for the mitochondrial enzyme methylmalonyl-Coa mutase and as such, catalyses the isomeration of methylmalonyl-Coa into succinyl-Coa. While its catalytic activity is hardly related to homocysteine, MUT has frequently been associated with homocysteine metabolism because it is one of three vitamin B12 dependant enzymes, along with methionine synthase and leucine aminomutase 50. Moreover, based on the observation that patients with vitamin B12 deficiency but high blood folate have higher plasma methylmalonic acid concentration (than low vitamin B12, low folate patients), it has been proposed that the enzymatic activity of methionine synthase interferes with the activity of methylmalonyl-Coa mutase through diversion of vitamin B12 from the mitochondrion to the cytosol 51. The genetic association observed in WGHS (but not in PROCARDIS) reinforces this hypothesis by suggesting that methylmalonyl-Coa mutase activity itself could impact on methionine synthase activity (and therefore homocysteine concentration) through a similar mechanism.
NOX4 encodes for a recently described NADPH oxidase that is highly expressed in the kidney 52. As such, it catalyses the formation of the free-radical superoxide using O2 as an electron acceptor and NADPH as the donor. The exact role of NOX4 in normal physiology is yet to be determined, but NOX4 expression has recently been correlated with increased albuminuria in adiponectin deficient rats 53. For these reasons, the most compelling explanation for the observed genetic association between the intronic NOX4 SNP rs11018628 and plasma homocysteine concentration is regulation of renal handling of homocysteine by NOX4. This association, however, requires further validation given its failure to replicate in the PROCARDIS sample.
Despite the confirmatory nature of the MTHFR association, we also identified a novel MTHFR variant (rs17350396) that non-redundantly influenced homocysteine concentration after taking into account the effect of the known MTHFR SNP rs1801133 (C677T; A222V), paving the way for more complete assessment of the impact of this locus on human disease. While genetic variants of CBS have previously been reported to influence homocysteine concentration 11, 12, the intronic CBS SNP rs6586282 described in this report is either in unremarkable or unknown linkage disequilibrium with these other variations. CBS catalyses the first step in the transsulfuration pathway of homocysteine catabolism1.
Our study has potential limitations. First, even though all associations in WGHS had P-value lower than the conservative 5 × 10−8 threshold and care was taken to correct for the potential effect of population stratification, we were not able to confirm the MUT and NOX4 findings in the separate PROCARDIS sample. This might be the result of the much smaller size of the PROCARDIS sample. For instance, the power to replicate the MUT association at a nominal level of significance of 0.05 in PROCARDIS was 27% while it was 29% for NOX4. Second, because of the nature of the WGHS and PROCARDIS samples, extension of these associations in non-Caucasian populations will require further testing, even if we have no a priori reason to believe there should be heterogeneity. Third, the overall variance explained by these associations (2.6%) is small as compared to heritability estimates of homocysteine levels8, 9 (ranging from 25% to 44%), suggesting that many other genetic variants remain to be discovered. Fourth, the observation of a significant sex-interaction for the CPS1 association hints at different genetic architectures of homocysteine levels in men and women. Our study might therefore have missed male-specific associations. Finally, although many of these novel associations offer tantalizing hypotheses regarding homocysteine metabolism, further biological experiments are needed to define the processes underlying them.
In this report, we found associations of homocysteine with genetic variation at the MTHFR, CPS1, MUT, NOX4, DPEP1 and CBS loci, with independent replication of the MTHFR, CPS1, DPEP1 and CBS findings. While associations at the MTHFR and CBS loci extend previous work done on the genetic basis of homocysteine plasma concentration, our other findings are entirely novel. It will be of considerable interest to study these associations in non-Caucasian populations, as well as test these polymorphisms for association with cardiovascular diseases, neural tube defects and cognitive decline. These hypotheses generating genetic observations pave the way for further biological studies of homocysteine in the pathophysiology of these diseases. Furthermore, these genetic observations provide new insights into the biochemical pathways involved in homocysteine metabolism.
Acknowledgments
We thank Lynda Rose for her review of the manuscript and helpful comments.
FUNDING SOURCES: This study was supported by grants from the National Heart, Lung, and Blood Institute and the National Cancer Institute (Bethesda, MD, USA; grants HL080467, HL043851 and CA047988), the Donald W Reynolds Foundation (Las Vegas, NV, USA), the Doris Duke Charitable Foundation (New York, NY, USA), and the Fondation Leducq (Paris, France), with collaborative scientific and genotyping support provided by Amgen Inc (Thousand Oaks, CA, USA). The PROCARDIS genome wide association study was supported by the European Commission (grant LSHM-2007-027273), the British Heart Foundation, the Knut and Alice Wallenberg Foundation, the Swedish Medical Research Council (project 8691), the Swedish Heart-Lung Foundation, the Karolinska Institutet and the Stockholm County Council (project 560183).
Footnotes
DISCLOSURES
None.


