• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of plosbiolPLoS BiologySubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)View this Article
PLoS Biol. Aug 2011; 9(8): e1001125.
Published online Aug 16, 2011. doi:  10.1371/journal.pbio.1001125
PMCID: PMC3156686

Combining Genome-Wide Association Mapping and Transcriptional Networks to Identify Novel Genes Controlling Glucosinolates in Arabidopsis thaliana

Greg Gibson, Academic Editor

Abstract

Background

Genome-wide association (GWA) is gaining popularity as a means to study the architecture of complex quantitative traits, partially due to the improvement of high-throughput low-cost genotyping and phenotyping technologies. Glucosinolate (GSL) secondary metabolites within Arabidopsis spp. can serve as a model system to understand the genomic architecture of adaptive quantitative traits. GSL are key anti-herbivory defenses that impart adaptive advantages within field trials. While little is known about how variation in the external or internal environment of an organism may influence the efficiency of GWA, GSL variation is known to be highly dependent upon the external stresses and developmental processes of the plant lending it to be an excellent model for studying conditional GWA.

Methodology/Principal Findings

To understand how development and environment can influence GWA, we conducted a study using 96 Arabidopsis thaliana accessions, >40 GSL phenotypes across three conditions (one developmental comparison and one environmental comparison) and ~230,000 SNPs. Developmental stage had dramatic effects on the outcome of GWA, with each stage identifying different loci associated with GSL traits. Further, while the molecular bases of numerous quantitative trait loci (QTL) controlling GSL traits have been identified, there is currently no estimate of how many additional genes may control natural variation in these traits. We developed a novel co-expression network approach to prioritize the thousands of GWA candidates and successfully validated a large number of these genes as influencing GSL accumulation within A. thaliana using single gene isogenic lines.

Conclusions/Significance

Together, these results suggest that complex traits imparting environmentally contingent adaptive advantages are likely influenced by up to thousands of loci that are sensitive to fluctuations in the environment or developmental state of the organism. Additionally, while GWA is highly conditional upon genetics, the use of additional genomic information can rapidly identify causal loci en masse.

Author Summary

Understanding how genetic variation can control phenotypic variation is a fundamental goal of modern biology. A major push has been made using genome-wide association mapping in all organisms to attempt and rapidly identify the genes contributing to phenotypes such as disease and nutritional disorders. But a number of fundamental questions have not been answered about the use of genome-wide association: for example, how does the internal or external environment influence the genes found? Furthermore, the simple question of how many genes may influence a trait is unknown. Finally, a number of studies have identified significant false-positive and -negative issues within genome-wide association studies that are not solvable by direct statistical approaches. We have used genome-wide association mapping in the plant Arabidopsis thaliana to begin exploring these questions. We show that both external and internal environments significantly alter the identified genes, such that using different tissues can lead to the identification of nearly completely different gene sets. Given the large number of potential false-positives, we developed an orthogonal approach to filtering the possible genes, by identifying co-functioning networks using the nominal candidate gene list derived from genome-wide association studies. This allowed us to rapidly identify and validate a large number of novel and unexpected genes that affect Arabidopsis thaliana defense metabolism within phenotypic ranges that have been shown to be selectable within the field. These genes and the associated networks suggest that Arabidopsis thaliana defense metabolism is more readily similar to the infinite gene hypothesis, according to which there is a vast number of causative genes controlling natural variation in this phenotype. It remains to be seen how frequently this is true for other organisms and other phenotypes.

Introduction

Biologists across fields possess a common need to identify the genetic variation causing natural phenotypic variation. Genome-wide association (GWA) studies are a promising route to associate phenotypes with genotypes, at a genome-wide level, using “unrelated” individuals [1]. In contrast to the traditional use of structured mapping populations derived from two parent genomes, GWA studies allow a wide sampling of the genotypes present within a species, potentially identifying a greater proportion of the variable loci contributing to polygenic traits. However, the uneven distribution of this increased genotypic diversity across populations (population structure), as well as the sheer number of statistical tests performed in a genome-wide scan, can cause detection of a high rate of “false-positive” genotype-phenotype associations that may make it difficult to distinguish loci that truly affect the tested phenotype [1][5]. Epistasis and natural selection can also lead to a high false-negative rate, wherein loci with experimentally validated effects on the focal trait are not detected by GWA tests [4][5].

Repeated detection of a genotype-phenotype association across populations or experiments has been proposed to increase support for the biological reality of that association, and has even been proposed as a requirement for validation of trait-phenotype associations [2]. However, replication across populations or experiments is not solely dependent upon genotypes, but also differences in environment and development that significantly influence quantitative traits [5][8]. Thus, validation of a significant association through replication, while at face value providing a stringent criterion for significance, may bias studies against detection of causal associations that show significant Genotype×Environment interactions [9]. In this study we employed replicated genotypes to test the conditionality of GWA results upon the environment or development stage within which the phenotype was measured.

Integrating GWA mapping results with additional forms of genome-scale data, such as transcript profiling or proteomics datasets, has also been proposed to strengthen support for detected gene-trait associations and reduce the incidence of false-positive associations [10]. To date, network approaches have largely focused upon comparing GWA results with natural variation in gene expression across genotypes in transcriptomic datasets (i.e., expression quantitative trait loci (eQTLs)) [11][13]. This requires that candidate genes show natural variation in transcript accumulation, which is not always the functional level at which biologically relevant variation occurs [14]. Another network approach maps GWA results onto previously generated interaction networks within a single genotype, such as a protein-protein interaction network, enhancing support for associations that cluster within the network [15]. This network filtering approach has yet to be tested with GWA data where the environment or tissue is varied.

To evaluate the influence of environmental or developmentally conditional genetics on GWA mapping and the utility of network filtering in identifying candidate causal genes, we focused on defense metabolism within the plant Arabidopsis thaliana. A. thaliana has become a key model for advancing genetic technologies and analytical approaches for studying complex quantitative genetics in wild species [16]. These advances include experiments testing the ability of genome resequencing and transcript profiling to elucidate the genetics of complex expression traits [17][19] and querying the complexity of genetic epistasis in laboratory and natural populations [20][26]. Additionally, A. thaliana has long provided a model system for applying concepts surrounding GWA mapping [3][5],[27][30].

As a model set of phenotypes, we used the products of two related A. thaliana secondary metabolite pathways, responsible for aliphatic and indolic glucosinolate (GSL) biosynthesis. These pathways have become useful models for quantitative genetics and ecology (Figure 1) [31]. Aliphatic, or methionine-derived, GSL are critical determinants of fitness for A. thaliana and related cruciferous species via their ability to defend against insect herbivory and non-host pathogens [32][35]. Indolic GSL, derived from tryptophan, play important roles in resistance to pathogens and aphids [36][40]. A. thaliana accessions display significant natural genetic variation controlling the production of type and amount of both classes of GSL, with direct impacts on plant fitness in the field [33],[41][47]. Additionally, GSL display conditional genetic variation dependent upon both the environment and developmental stage of measurement [48][51]. GSL thus provide an excellent model to explore the impact of conditional genetics upon GWA analysis.

Figure 1
GSL Biosynthesis and Cloned QTL.

While the evolutionary and ecological importance of GSL is firmly established, the nearly complete description of GSL biosynthetic pathways provides an additional practical advantage to studying these compounds [52][54]. A large number of QTL and genes controlling GSL natural variation have been cloned from A. thaliana using a variety of network biology approaches similar to network filtering in GWA studies (Figure 1) [55][59]. These provide a set of positive control genes of known natural variability and importance to GSL phenotypes, enabling empirical assessment of the level of false-positive and false-negative associations.

Within this study, we measure GSL phenotypes in two developmental stages and stress conditions/treatments using a collection of wild A. thaliana accessions to test the relative influence of these components upon GWA. In agreement with previous analyses from structured mapping populations, we found that differences in development have more impact on conditioning genetic variation in A. thaliana GSL accumulation. This is further supported by our observation that GWA-identified candidate genes show a non-random distribution across the three datasets with the GWA candidates from the two developmental stages analyzed overlapping less than expected. The large list of candidate genes identified via GWA was refined with a network co-expression approach, identifying a number of potential networks. A subset of loci from these networks was validated for effects on GSL phenotypes. Even for adaptive traits like GSL accumulation, these analyses suggest the influence of numerous small effect loci affecting the phenotype at levels that are potentially exposed to natural selection.

Results

GSL Analysis

We measured GSL from leaves of 96 A. thaliana accessions at 35 d post-germination [27][28] using either untreated leaves or leaves treated with AgNO3 (silver) to mimic pathogen attack. In addition, we measured seedling glucosinolates from the same accessions to provide a tissue comparison as well as a treatment comparison. Seedlings were measured at 2 d post-germination at a stage where the GSL are largely representative of the GSL present within the mature seed [48],[60]. GSL from both foliar and seedling tissue grown under these conditions have been measured in multiple independent QTL experiments that used recombinant inbred line (RIL) populations generated from subsets of these 96 accessions, thus providing independent corroboration of observed GSL phenotypes [41],[51],[61]. For the untreated leaves, this analysis detected 18 aliphatic GSL compounds and four indolic GSL compounds. These combined with an additional 21 synthetic variables that describe discrete components of the biochemical pathway to total 43 GSLtraits for analysis [4],[61][62]. For the AgNO3-treated samples, we detected only 16 aliphatic GSL and four indolic GSL, but also were able to measure camalexin, which is related to indolic GSL (Table S3), which in combination with derived measures provided us with 42 AgNO3 treated GSL traits [61]. For the seedling GSL samples, we detected 19 aliphatic GSLs, two indolic, and three seedling specific phenylalanine GSLs (Table S4), which in combination with derived descriptive variables gave us a total of 46 total GSL traits [61].

Genetic, Environmental, and Developmental Effects on GSL

Population stratification has previously been noted in this set of A. thaliana accessions, where eight subpopulations were proposed to describe the accessions' genetic differences [27][28]. Less explored is the joint effect of population structure and environmental factors, both external (exogenous treatment) and internal (tissue comparison) on GSL. We used our three glucosinolate datasets to test for potential confounding effects of environmental variation, population structure, and their various interaction terms upon the GSL phenotypes (Figure 2). On average, 36% (silver versus control) and 23% (seedling versus control) of phenotypic variance in GSL traits was solely attributable to accession. An additional 7% (silver versus control) and 14% (seedling versus control) of phenotypic variance was attributable to an interaction between accession and treatment or tissue. This suggests that, on average and given the statistical power of the experiments, 30%–50% of the detectable genetically controlled variance is stable across conditions, while at least 20% of the variance is conditional on treatment and/or tissue.

Figure 2
Analysis of variance of glucosinolates.

In contrast, population structure by itself accounted for 10%–15% of total variance in GSL (Figure 2). Interestingly, significantly less variance (<5%) could be attributed to interaction of treatment or tissue with population structure. This suggests that for GSL, large-effect polymorphisms that may be linked with population structure are stable across treatment and tissue while the polymorphisms with conditional effects are less related to the species demographic structure (Figure 2). This is consistent with QTL studies using RIL that find greater repeatability of large-effect QTL across populations and conditions than of treatment-dependent loci [41],[51],[61],[63]. This is further supported by the fact that we utilized replication of defined genotypes across all conditions and tissues and as such have better power to detect these effects than in systems where it is not possible to replicate genotypes. As such, controlling for population structure will reduce the number of false-positives detected but lead to an elevated false-negative rate, given this significant association between the measured phenotypes and population structure.

Interestingly, developmental effects (average of 15%) accounted for 3 times more of the variation in GSL than environmental effects (average 5%). In particular, only three GSL (two indolic GSL, I3M and 4MOI3M, and total indolic GSL) were affected more strongly by AgNO3-treatment than by accession (Table S1 and Figure S1), whereas 11 GSL traits were found to be influenced more by tissue type than accession (Table S2). This agrees with these indolic GSL being regulated by defense response [36],[64]. Similarly, twice as much GSL variation could be attributed to the interaction between accession and tissue type compared to the interaction between accession and AgNO3 treatment. Thus, it appears that intraspecific genetic variation has greater impact on GSL in relation to development than in response to simulated pathogen attack.

Genome-Wide Association Study

Using 229,940 SNP available for this collection of 96 accessions, we conducted GWA-mapping for GLS traits in both the Seedling and Silver datasets using a maximum likelihood approach that accounts for genetic similarity (EMMA) [65]. This identified a large number of significant SNPs and genes for both datasets (Table 1). We tested the previously published criteria used to assess significance of candidate genes to ensure that different treatments or tissues did not bias the results produced under these criteria [4]. These criteria required ≥1 SNP, ≥2 SNPs, or ≥20% of SNPs within a gene to show significant association with a specific GSL trait. This test was independently repeated for all GSL traits in both datasets (Tables S5 and S6). As previously found using the control leaf GSL data, the more stringent ≥2 SNPs/gene criterion greatly decreased the overall number of significant genes identified while not overtly influencing the false-negative rate when using a set of GSL genes known to be naturally variable and causal within the 96 accessions (Tables 2 and and3).3). Interestingly, including multiple treatments and tissues did not allow us to decrease the high empirical false-negative rate (~75%) in identifying validated causal candidate genes (Table 3) [4],[31]. Using the ≥2 SNPs/gene criterion identified 898 genes for GSL accumulation in silver-treated leaves and 909 genes for the seedling GSL data. As previously found, the majority of these candidate genes were specific to a subset of GSL phenotypes and no gene was linked to all GSL traits within any dataset (Figure S2) [4].

Table 1
GWA mapping summary.
Table 2
Using known GSL genes to estimate thresholds in GWA mapping.
Table 3
Recovery of known causal GSL genes in GWA mapping.

We estimated the variance explained by the candidate GWA genes identified in this study using a mixed polygenic model of inheritance for each phenotype within each dataset using the GenABEL package in R [66][67]. This showed that, on average, the candidate genes explained 37% of the phenotypic variation with a range of 1% to 99% (Table S10). Interestingly, if the phenotypes are separated into their rough biosynthetic classes of indolic, long-chain, or short-chain aliphatic [68], there is evidence for different levels of explained phenotypic variation where indolic has the highest percent variance at 45% while short-chain has the lowest at 25% (p = 0.001). This is not explainable by differential heritability as the short-chain aliphatic GSLs have the highest heritability in numerous studies including this one (Tables S1 and S2) [4],[41],[61]. This is instead likely due to the fact that short-chain aliphatic GLS show higher levels of multi-locus epistasis that complicates the ability to estimate the explained variance within GWA studies [31],[41],[61].

Treatment and Tissue Contrasts

Previous work with untreated GSL leaf samples showed that candidate genes clustered in hotspots, with the two predominant hotspots surrounding the previously cloned AOP and MAM loci [4], where multiple polymorphisms surrounding the region of these two causal genes significantly associate with multiple GLS phenotypes. We plotted GWA-identified candidate genes for GSL accumulation from the silver and seedling datasets to see if treatment or tissue altered this pattern (Figure 3). Both datasets showed statistically significant (p<0.05; Figure 3) hotspots of candidate genes that clustered predominantly around the AOP and MAM loci with some minor treatment- or tissue-specific hotspots containing fewer genes. This phenomenon is observed across multiple GLS traits (Figure 3). The AOP and MAM hotspots are known to be generated by local blocks of linkage disequilibrium (LD) wherein a large set of non-causal genes are physically linked with the causal AOP2/3 and MAM1/3 genes [4]. Interestingly, while the silver and control leaf GWA datasets showed similar levels of clustering around the AOP and MAM loci, the hotspot at the MAM locus was much more pronounced than the AOP locus in the seedling GWA dataset (Figure 3), suggesting more seedling GLS traits are associated with the MAM locus. This agrees with QTL-mapping results in structured RIL populations of A. thaliana that have shown that the MAM/Elong locus has stronger effects upon seedling GSL phenotypes in comparison to leaves, whereas the effect of the AOP locus is stronger in leaves than seedlings [41],[62][63]. In addition, the relationship of GSL phenotypes across accessions is highly similar in the two leaf datasets, while the phenotypic relationships across accessions are shifted when comparing the seedling to the leaf (Figure 4). Together, this suggests greater similarity in the genetic variation affecting GSL phenotypic variation between the two leaf datasets than between leaf and seedling datasets, suggesting that GSL variation is impacted more by development than simulated pathogen attack. This is further supported by the analysis of variance (Figure 2).

Figure 3
Genomic hotspots of GWA positive candidate genes.
Figure 4
Clustering of traits from control, AgNO3-treated leaf, and seedling samples.

To further test if measuring the same phenotypes in different tissues or treatments will identify similar GWA mapping candidates, we investigated the overlap of GWA candidate genes identified across the three datasets. For this analysis we excluded genes within the known AOP and MAM LD blocks as previous research has shown that all of these genes except the AOP and MAM genes are likely false-positives and would bias our overlap analysis [4],[69][71]. The remaining GWA mapping candidate genes showed more overlap between the two leaf datasets than between leaf and seedling datasets (Figure 5). Interestingly, the overlap between GWA-identified candidate gene sets from seedling and leaf data was smaller than would be expected by chance (χ2 p<0.001 for all three sectors) (Figure 5). This suggests that outside of the AOP and MAM loci, distinct sets of genetic variants may contribute to the observed phenotypic diversity in GSL across these tissues, which agrees with QTL-mapping studies identifying distinct GSL QTL for seedling and leaf [41],[62][63]. As such, focusing simply on GWA mapping candidates independently identified in multiple treatments or tissues to call true significant associations will overlook genes whose genotype-to-phenotype association is conditional upon differences in the experiments. Similarly, the amount of phenotypic variance explained by the candidates differed between the datasets, with control and treated having the highest average explained variance, 39% and 41%, respectively. In contrast, the seedling dataset had the lowest explained variance at 32%, similarly suggesting that altering the conditions of the experiments will change commonly reported summary variables such as explained variance.

Figure 5
Overlap of significant GWA genes between datasets.

Candidate Gene Network Filtering

GWA studies generally produce large lists of candidate genes, presumed to contain a significant fraction of false-positive associations. One proposed strategy refines these results by searching for enrichment of candidate genes within pre-defined proteomic or transcriptomic networks [15]. To test the applicability of this approach to our GWA study, we overlaid our list of 2,436 candidate genes (excluding genes showing proximal LD to the causal AOP2/3 and MAM1/2/3 genes [4]) that associated with at least one GSL phenotype in at least one of the three datasets (Figure 5) onto a previously published co-expression network [72].

If the network filtering approach is valid and there are true causal genes within the candidate gene lists, then the candidate genes should show tighter network linkages to previously validated causal genes than the average gene. Measuring the distances between all candidate genes to all known GSL causal genes within the co-expression network showed that, for all datasets, the GWA candidate genes were on average closer to known causal genes than non-candidates (Figure S4). Interestingly, the GWA mapping candidate genes actually showed closer linkages to the cysteine, homocysteine, and glutathione biosynthetic pathways than to the core GSL biosynthetic pathways, suggesting that natural variation in these pathways may impact A. thaliana secondary metabolism (Figure S4 and Dataset S1). The network proximity of GWA mapping candidates to known causal genes supports the utility of the network filtering approach in identifying true causal genes among the long list of GWA mapping candidate genes.

Candidate Gene Network Filtering (Core Pathway Linkages)

To determine if this network filtering approach finds whole co-expression networks or isolated genes, we extended the co-expression network to include known and predicted GSL causal genes (Table S7). The largest network obtained from this analysis centered on the core-biosynthetic genes for the aliphatic and tryptophan derived GSL as well as sulfur metabolism genes (Figures 6 and S3). Interestingly, this large network linked to a defense signaling network represented by CAD1, PEN2, and EDS1 (Figure 6) [73].

Figure 6
Largest co-expression network of GWA candidates and known GSL core genes.

The defense signaling pathway associated with PEN2 and, more recently, CAD2 and EDS1 had previously been linked to altered GSL accumulation via both signaling and biosynthetic roles [36],[39],[74][75]. However, the current network analysis has identified new candidate participants in this network altering GSL accumulation. To test these predicted linkages, we obtained a mutant line possessing a T-DNA insertional disruption of the previously undescribed locus At4g38550, which is linked to both CAD1 and PEN2 (Figure 6, Table S9). This mutant had elevated levels of all aliphatic GSL within the rosette leaves as well as 4-methoxyindol-3-ylmethyl GSL, shown to mediate non-host resistance (Table S9) [36],[39]. These results suggest a role for At4g38550 in either defense responses or GSL accumulation.

Network analysis also identified several previously described (RML1) and novel candidate (ATSFGH, At1g06640, and At1g04770) genes that were associated with the core-biosynthetic part of the network. RML1 (synonymous with PAD2, CAD2), a biosynthetic enzyme for glutathione, has previously been shown to control GSL accumulation either via a signaling role or actual biosynthesis of glutathione [74][75]. To test if ATSFGH (S-formylglutathione hydrolase, At2g41530), At1g06640 (unknown 2-oxoacid dependent dioxygenase – 2-ODD), or At1g04770 (tetratricopeptide containing protein) may play a role in GSL accumulation, we obtained insertional mutants. This showed that the disruption of At1g06640 led to significantly increased accumulation of the short-chain methylsulfinyl GSL but not the corresponding methylthio or long-chain GSL (Table S9). In contrast, the AtSFGH mutant had elevated levels of all short-chain GSL along with a decreased accumulation of the long-chain 8-MTO GSL (Table S9). The At1g04770 mutant showed no altered GSL levels other than a significantly decreased accumulation of 8-MTO GSL (Table S9). This suggests that these genes alter GSL accumulation, although the specific molecular mechanism remains to be identified.

Interestingly, network membership is not sufficient to predict a GSL impact, as T-DNA disruption of homoserine kinase (At2g17265), a gene co-expressed with the GSL core but not a candidate from the GWA analysis, had no detectable impact upon GSL accumulation (Table S9).

Thus, the network filtering approach identified genes closely linked to the GSL biosynthetic network that can control GSL accumulation and are GWA-identified candidate genes.

Candidate Gene Network Filtering (Novel Networks)

The above analysis shows that GWA candidate genes which co-express with known GSL genes are likely to influence GSL accumulation. However, networks might influence GSL accumulation independent of co-expression with known GSL genes. To test this, we investigated several co-expression networks that involved solely GWA-identified candidate genes and genes not previously implicated in influencing GSL accumulation (Figure 7). Three of these networks included genes that affect natural variation in non-GSL phenotypes within A. thaliana, namely PHOTOTROPIN 2 (PHOT2), Erecta (ER) [76], and ELF3/GI (Figure 7) [77],[78]. The fourth network did not involve any genes previously linked to natural variation (Figure 7). We obtained A. thaliana seed stocks with mutations in a subset of genes for each of these three networks to test whether loss of function at these loci affects GSL accumulation.

Figure 7
Self-affiliated expression networks of GWA mapping significant candidates.

The largest network containing no previously known GSL-related genes that we examined is a blue light/giberellin signaling pathway represented by PHOT2 (Figure 7A). This pathway had not been previously ascribed any role in GSL accumulation in A. thaliana. We tested this GWA-identified association by measuring GSL in the single and double PHOT1/PHOT2 mutants [79]. PHOT1 was included as it has been shown to function either redundantly or epistatically with PHOT2 [79]. The single phot1 or phot2 mutation had no significant effect upon GSL accumulation (Table S9). The double phot1/phot2 knockout plants showed a significant increase in the production of detected methylthio GSL as well as a decrease in the accumulation of 3-carbon GSL compared to control plants. Thus, it appears that GSL are influenced by the PHOT1/PHOT2 signaling pathway, possibly in response to blue light signaling (Table S9). This agrees with previous reports from Raphanus sativa that blue light controls GSL [80],[81].

The second non-GSL network we examined contains the ER gene (Figure 7B). The ER (Erecta) network and specifically the ER locus had previously been queried for the ability to alter GSL accumulation using two Arabidopsis RIL populations (Ler×Col-0 and Ler×Cvi) that segregate for a loss-of-function allele at the ER locus [41],[51],[63],[82][86]. In these analyses, the ER locus was linked to seed/seedling GSL accumulation in only one of the two populations and not linked to mature leaf GSL accumulation [41],[86]. Analysis of the ER mutant within the Col-0 genotype showed that the Erecta gene does influence GSL content within leaves as suggested by the GWA results (Table S9, Figure 7A). Plants with loss of function at Erecta showed increased levels of methylthio GSL, long-chain GSL, and 4-substituted indole GSL (Table S9). Interestingly, the ER network contains a number of chromatin remodeling genes. We obtained A. thaliana lines with loss-of-function mutations in three of these genes (Table S9) to test if the extended network also alters GSL accumulation. Mutation of two of the three genes (At5g18620CHR17 and At4g02060PRL) was associated with increased levels of short-chain aliphatic GSL and a corresponding decrease in long-chain aliphatic GSL (Table S9). This shows that the Erecta network has the capacity to influence GSL accumulation.

Two smaller networks containing the ELF3 and GI genes were of interest as these two genes are associated with natural variation in the A. thaliana circadian clock (Figure 7C) [77],[87],[88]. GSL analysis showed that both the elf3 and gi mutants had lower levels of aliphatic GSL than controls (Table S9). Comparing multiple gi mutants from both the Col-0 and Ler genetic backgrounds showed that only gi mutants in the Col-0 background altered GSL accumulation (Table S9). This suggests that gi's link to glucosinolates is epistatic to other naturally variable loci within the genome, as previously noted for natural GI alleles in relation to other phenotypes (Table S9) [78]. An analysis of the elf4 mutant which has morphological similarities to elf3-1 but was not a GWA-identified candidate showed that this mutation did not alter GSL accumulation. Thus, elf3/gi affects GSL via a more direct mechanism than altering plant morphology. Given two genes in the circadian clock network directly affects GSL accumulation and given the expression of these two genes are correlated with other genes in the network, it is fair to hypothesize that circadian clock plays a role in GSL accumulation.

While the GSL phenotypes of the above laboratory-generated mutants suggest that variation in circadian clock plays a role in GSL accumulation, they do not prove that the natural alleles at these genes affect GSL accumulation. To validate this, we leveraged germplasm developed in the course of previous research showing that natural variation at the ELF3 locus controls numerous phenotypes, including circadian clock periodicity and flowering time [77]. We utilized quantitative complementation lines to test if natural variation at ELF3 also generates differences in GSL content [77]. This showed that the ELF3 allele from the Bay-0 accession was associated with a higher level of short chain aliphatic GSL accumulation in comparison to plants containing the Sha allele (Table S9). In contrast, both Bay-0 and Sha allele-bearing plants had elevated levels of 8-MTO GSL in comparison to Col-0 (Tables S8 and S9). Thus, ELF3 is a polymorphic locus that contains multiple distinct alleles that influence GSL content within the plant and the ELF3/GI network causes natural variation in GSL content.

The final network examined here, represented by CLPX (CLP protease), is likely involved in chlorophyll catabolism and possibly also chloroplast senescence [89]. This network is uncharacterized and has not previously been associated with GSL accumulation or natural variation in any phenotype, but participation in chloroplast degradation is suggested by transcriptional correlation of CLPX with several catabolism genes. Analysis of mutants deficient in function for two of these genes showed that they all possessed increased aliphatic GSL in comparison to wild-type controls. These results suggest that natural variation in this putative network could influence GSL content in A. thaliana. The majority (12 of 13) of genes in this network show significant variation in transcript abundance across A. thaliana accessions, a significantly greater proportion than expected by chance (X2 p<0.001) [90][92], further suggesting that this network may contribute to GSL variation across the accessions.

Finally, we tested a single two gene network found in the co-expression data wherein both genes had been annotated but not previously linked to GSL content. This network involved AtPTR3 (a putative peptide transporter, At5g46050) and DPL1 (a dihydrosphingosine lyase, At1g27980). T-DNA mutants in both genes appeared to be lethal as we could not identify homozygous progeny. However, comparison of the heterozygous progeny to wildtype homozygotes showed that mutants in both genes led to elevated levels of aliphatic GSL (Table S9). Thus, there are likely more networks that are causal for GSL variation within this dataset that remain to be tested.

Negative Network T-DNA Test

While GSL are considered “secondary” metabolites, these compounds are affected by many aspects of plant metabolism, thus GSL phenotyping is sensitive to any genetic perturbation that affects plant physiology. As such, we identified six genes that were expressed in mature leaves but did not show any significant association of DNA sequence polymorphism with GSL phenotypes and were additionally not identified within any of the above co-expression networks. Insertional mutants disrupted at these loci were designated as random mutant controls (Table S9). Analyzing GSL within these six lines showed that on average 13%±4% of the GSL were affected in the random control mutant set even after correction for multiple testing. While this suggests that GSL may be generally sensitive to mutations affecting genes expressed within the leaf, this incidence of significant GSL effects is much lower than observed for the T-DNA mutants selected to test GWA mapping-identified pathways (CLPX - 78%±11%, PTR3 – 61%±6%, Erecta – 45%±10%, GSL – 46%±11%, ELF3/GI – 53%±17%). In all cases the mutants deficient in GWA pathway-identified gene function showed significantly greater numbers of altered GSL phenotypes than the negative control T-DNA mutant set (X2, p<0.001), suggesting that combining GWA-identified candidate genes with co-expression networks successfully identifies genes with the capacity to cause natural variation in GSL content. Identifying the specific mechanisms involved will require significant future research.

Discussion

The influence of conditional genetics, i.e. interaction of genotypes with environment or development, has been intensively studied within structured mapping populations and shown to exert considerable influence on the accumulation of small metabolites [20],[49][51],[93][94]. However, conditional effects have not been routinely included in GWA studies. In this report, we show extensive variation in the identification of GWA candidate genes that depends upon both Genotype×Environment and Genotype×Tissue interactions. The analysis of GSL accumulation in two different tissues showed a significant bias toward indentifying different causal genes for the GSL phenotypes in the two different tissues (Figure 5). As such, conditional genetics are likely to be as critical in GWA analyses as for QTL analyses using structured populations. This suggests that requiring replication of genotype-phenotype associations across environments or conditions as a condition for validation, as has been suggested for human GWA studies, may lead to a significant bias against loci that interact with the environment or development. Instead, methods should be developed to specifically target these loci.

Interestingly, developmental differences played a larger role than the AgNO3 treatment in influencing genetic variation across this collection of accessions, as displayed by the distribution of phenotypes and their variance across the datasets (Figures 2 and and3).3). The different developmental stages, seedling and mature leaf, showed a non-random distribution of GWA candidate genes with repulsion, such that a seedling candidate was less likely to be a leaf candidate gene than would be expected by random chance. This result has two implications. The first is that GSL are influenced by different genetic variation in the different developmental stages. This is not unexpected given the changing herbivore pressures that the plant will encounter over the course of its development. Production of different optimal GLS profiles for defense at each developmental stage likely is mediated by different genetic networks. The second implication is that a large number of genes may have the potential to influence GSL accumulation.

Network Proximity as a Method to Filter GWA Candidates

A limiting factor for the utility of GWA studies has been the preponderance of false-positive and false-negative associations which makes the accurate prediction of biologically valid genotype-phenotype associations very difficult. In this report, we describe the implementation and validation of a candidate gene co-expression filter that has given us a high success rate in candidate gene validation (>75%). The co-expression dataset is derived from transcript accumulation within a single A. thaliana accession (Col-0) across a wide range of developmental and environmental states [72]. This dataset has previously been used to show that genes showing co-expression often modulate the same phenotype, and may thus also function within the same pathway [57][59],[95][99]. This co-expression dataset provides a functional grouping of A. thaliana genes based upon non-genetic variation. This provides an orthogonal grouping to that provided by the GWA mapping which associates genes to phenotypes via natural genetic variation. This approach is similar to other filtering approaches that utilize complementary datasets to rank candidate genes [11],[100][102]. However, most of these other approaches utilize two databases, e.g. GWA and eQTL (expression quantitative trait loci), that are both based upon natural genetic variation and thus do not provide independent filters [11],[100][101]. In contrast to these other network approaches, our methodology does not rely upon a statistical rank or enrichment procedure which can be dominated by individual genes with high significance possibly due to GWA mapping artifacts [102]. Instead, our approach focuses upon relative network size to direct the researcher to the most interesting candidate networks. This approach is less susceptible to statistical artifacts and allows the user to input bait genes suggested by a priori knowledge [95],[103][104]. This approach should be useful in any system possessing genomic networks that are orthogonal to the GWA-identified candidate gene lists.

Number of Genes Determining a Phenotype's Level and Proximity of Effect

The use of multiple tissues and treatment conditions, as well as a large set of different but related GSL phenotypes, led to the identification of several thousand candidate genes. Even after decreasing this number by using the network expression filter approach, several hundred candidate genes of interest remained. Analysis of a set of these genes via plants bearing single gene mutations showed that disruption of many of these genes can alter the amount or pattern of GSL accumulation (Table S9 and Figures 6 and and7).7). Given the observation that the background genotype can influence the capacity to identify a mutational effect (see gi mutants in Ler v Col-0, Table S9), our estimate of tested genes influencing GSL accumulation is conservative. Given this, it is likely that a very large number of small to moderate effect loci influence GSL accumulation within A. thaliana, echoing recent findings regarding the genetics of human height, and maize flowering time [105][106]. This suggests that the whole genome may have a pattern similar to that found in an analysis of a single Arabidopsis locus that identified several QTL for growth within a small section of the genome [70]. As such, it might be common for quantitative traits to be influenced by thousands of causal loci [107].

The potential existence of thousands of polymorphic genes influencing a phenotype raises a common concern that these effects actually represent indirect pleiotropy, where moderate to small effects of a locus upon a phenotype are not biologically significant and do not reflect direct molecular control of the trait. However, numerous studies on GSL variation within wild populations have shown that changes in GSL accumulation similar to those identified here have selective consequences in field studies [33][35],[43][45],[108]. As such, even if polymorphisms in these identified genes have indirect pleiotropic effects upon GSL accumulation, these changes have a strong potential to influence A. thaliana in natural settings. Thus, it may be more useful to consider, instead of indirect versus direct effects of a locus, a continuous distribution that describes the number of molecular steps required to link a particular gene to the most proximal controller of the phenotype—in this case, an enzyme in the biosynthetic pathway. This raises the distinct problem of adaptive constraint wherein natural variation at a locus is limited by its indirect consequences upon other phenotypes. For instance, a phototropin allele with a beneficial effect on seedling phototropic behavior may be limited in its selective advantage due to a deleterious effect on GSL accumulation [109][110]. While this possibility remains to be tested in natural populations, it invites the question of why these phenotypic linkages occur. Is there a benefit to the influence of these loci on GSL accumulation, or has insufficient time passed since the de novo evolution of GSL biosynthesis to generate the genetic modularity to bypass historical linkages between development and metabolism [111]?

Number of Genes Influencing a Phenotype and Validation Barriers

A more mundane but significant experimental challenge of generating a list of thousands of candidate genes potentially causing natural variation in a phenotype is validation. Even after our expression network filtering, we were left with hundreds of likely candidates that would take decades to rigorously validate. Given that it is likely that at least several hundred genes lead to natural variation in GSL accumulation [105][106], how do we validate the effects of natural alleles at these loci, and is it worth the effort? If it is not worth the effort for GSL accumulation, what deciding factors should determine when a single phenotype should be completely dissected (to the level of knowing all genes containing a causal link to natural variation within a phenotype)? Given the importance of quantitative variation in numerous agronomic and medically important phenotypes, this discussion needs to begin, because untested presumptions about the number of causal genes for a phenotype greatly influences current GWA research and associated strategies for avoiding false-positive and false-negative results [2],[65],[112].

GWA and Development

We identified significant differences in GSL accumulation between two different developmental stages and this led to the identification of GWA candidate genes. While previous work on structured mapping populations, such as RILs, has shown that each tissue may be viewed as a distinct genetic module for both development and biochemistry [41],[49][50],[113][114], this is one of the first reports about tissue differences in an unstructured population. This tissue specificity indicates that it is not possible to simply require a candidate gene to replicate across tissues to validate its GWA signature. Instead, each tissue has to be looked at as a potentially independent modular system [115]. Such modularity could be mediated by members of a gene family each acting in a limited set of tissues, either as a result of sub- or neo-functionalization [116][119]. Both sub- and neo-functionalization have played an important role in the evolution of GSL and other plant secondary metabolites [55],[69],[92],[96]. The impact of development on GWA remains to be tested across a broader range of tissues and developmental stages.

Conclusion

In this report, we show that GWA-mapping, like QTL-mapping using structured populations, is sensitive to interaction of genetic variation with the environment and the developmental stage of phenotype measurement. This has not often been considered as a critical factor influencing GWA studies, given the difficulty of obtaining replicated analyses within organisms such as humans. Future work incorporating systematic analysis of how GWA studies are influenced by developmental or environmental gradients will be critical to understanding how the genomic architecture of a species controls its phenotypes. We have developed and validated a new approach to identifying GWA candidate genes and shown that the use of orthogonal genomic network datasets can lead to a very high success rate in the biological validation of candidate genes. This new approach, in combination with the observation of conditional GWA results, suggests that large numbers of genes can have a causal connection to variation within GSL and other phenotypes.

Materials and Methods

Population, Treatment, and Growth Conditions

A previously described collection of 96 natural A. thaliana accessions was used to measure GSL accumulation for GWA mapping with existing SNP data from these same lines [3],[27][28],[120]. Seeds were imbibed and cold stratified at 4°C for 3 d to break dormancy. Seeds were planted in a randomized block design, with multiple seeds of each accession occupying an individual cell within 36-cell flats (approximately 100 cm3 soil volume per cell). Four plantings of the 96 accessions provided four independent replicates for each accession. At 1 wk of age, seedlings were thinned to leave one plant per cell and glucosinolates were extracted from 10 of the removed seedlings. For all experiments, plants were maintained under short day conditions in controlled environment growth chambers. At 35 d post-germination, two fully expanded mature leaves were harvested, digitally photographed, and one was directly analyzed for GSL content as described below [18],[121]. The other leaf was treated with 5 mM AgNO3 for 48 h prior to harvest for GSL analysis. AgNO3 induces plant responses to pathogens by interfering with ethylene hormone-signaling and inducing reactive oxygen species. We utilized AgNO3 as a treatment to estimate the effect of variation in plant defense response upon GWA mapping [122][124]. In total, these datasets contain four measurements per accession per tissue and treatment for a total of 301 assays of seedling GSL (Seedling Dataset), 374 assays of control leaf GSL (Ctl Dataset), and 375 assays of GSL following AgNO3 treatment of leaves (Silver Dataset). The data for the control dataset is reported elsewhere as the “2008 dataset” [4].

Analysis of GSL Content

GSL content of excised leaves and seedlings was measured using a previously described high-throughput analytical system [62],[69]. Briefly, for excised leaves, one leaf was removed from each plant, photographed, and placed in a 96-well microtiter plate with 500 µL of 90% methanol and one 3.8 mm stainless steel ball-bearing. Seedlings were removed from pots with forceps, gently cleaned with distilled water to remove soil, and similarly placed into 90% methanol in microtiter plates. Tissues were homogenized for 2 min in a paint shaker, centrifuged, and the supernatants transferred to a 96-well filter plate with 50 µL of DEAE sephadex. The sephadex-bound GSL were eluted by overnight, room temperature incubation with sulfatase. Individual desulfo-GSL within each sample was separated and detected by HPLC-DAD, identified, and quantified by comparison to purified standards [125]. Tissue area for each leaf was digitally measured using Image J with scale objects included in each digital image [126]. The GSL traits are reported per cm2 of leaf area for the mature leave data or per seedling for the seedling data. There was no significant variation detected for leaf density within these accessions (unpublished data). In addition to the content of individual GSL, we developed a series of summation and ratio traits based on prior knowledge of the GSL pathways [127]. These ratios and summation traits allow us to isolate the effects of variation at individual steps of GSL biosynthesis from variation affecting the rest of the biosynthetic pathway [127].

Parititioning H2 Between Structure and Accession

To estimate broad-sense heritability due to accession and population structure for the different metabolites, we evaluated the data using a model where the metabolite traits are ysar = μ+Ss+A(S)sa+Tt+R(T)tr+Tt:Ss+Tt:A(S)sasart where s = 1,…,8; r = 1,…4; t = 1,2; and a = 1,…,95. The main effects are denoted as S, A, T, and R and represent structure, accession, treatment (or tissue), and replicate block, respectively. Here, the variable T may refer to (1) treatment corresponding to the two factors with or without AgNO3 treatment or (2) tissue corresponding to the two factors' mature leaves or seedlings. Population structure is represented as s = 1,…,8, corresponding to eight distinct groups into which these 96 accessions have previously been assigned [27][28]. The error, εsart, is assumed to be normally distributed with mean 0 and variance σε2. Broad-sense heritability was estimated as the percent of total variance attributable to accession nested within structure and that for structure was estimated as the percent of total variance attributable to structure. The data were analyzed independently for the two treatments or conditions: control versus AgNO3 and control versus seedling (Figure 2; Tables S1 and S2).

Association Mapping

To conduct single-locus GWA mapping accounting for population structure, we adopted a previously published method, the efficient mixed-model association (EMMA) algorithm [65]. EMMA is a statistical mixed model [65] where each SNP is modeled as a fixed effect and population structure, represented as a genetic similarity matrix, is modeled as a random effect. Variance components for this mixed model were estimated directly using maximum likelihood as implemented in the R/EMMA package [65]. Within this model, the independent measures of each metabolite within each accession, obtained from the analysis of variance model ysar = μ+Aa+Rr+εsar, were directly incorporated as genetic averages for the accessions (Tables S3 and S4). Because GWA was performed independently for each of the three datasets and because EMMA accounts for population structure, the variables Ss, Tt, and Rr were excluded in this model. The average GSL accumulation per accession for the control dataset is reported elsewhere as the “2008 experiment” [4]. The full results are available at http://www.plantsciences.ucdavis.edu/kliebenstein/supplementaldataset1.zip.

Calling Positive Genes for GWA Mapping

We utilized a previously reported criterion for calling significant gene-trait associations in these three datasets [4]. p value distributions of the GWA analysis were not uniform. Accepting an inherently elevated false-positive rate, we identified SNP within the bottom 0.1 percentile of each p value distribution, corresponding to each trait, as significant for EMMA. Given previous observations that multiple SNPs per gene are typically associated with a trait for true-positives [30], we developed a criterion for calling a significant association between a trait and a gene [4],[30]: requiring at least two significant SNPs within ±1 kb of a gene's coding region to call a gene significant. This approach optimized the ratio of empirical false-positive to false-negative associations. This criterion was independently applied to the GWA results from all tissues and conditions (Tables S5 and S6).

Estimating Phenotypic Variance Controlled by GWA Candidates

We estimated the variance explained by the candidate GWA mapping genes identified in this study using the GenABEL package in R [66][67]. This was done using a mixed polygenic model of inheritance for each phenotype within each dataset. Only SNPs within 1 kb of significant genes were utilized.

Co-Expression Network Analyses

Co-expression data were obtained from ATTED II [72],[128]. We extracted correlation values for transcript levels of genes showing significant association in at least one of the three datasets (Tables S5 and S6) [4] as well as a list of genes with predicted or known roles in GSL metabolism or regulation (Table S7). This latter set of genes was included to act as “bait genes” that might catalyze network formation around a known causal gene [59],[95],[98]. GWA candidates located within previously identified regions surrounding the AOP and MAM loci were then excluded to reduce detection of false associations due to linkage with the causal AOP2/3 and MAM1/2/3 genes [4]. Co-expression networks were constructed between these genes using a Mutual Rank threshold of up to 15 [129]. Co-expression networks were visualized using Pajek [130].

To test if GWA-identified candidate genes showed tighter linkage to known GSL networks than expected by chance, the shortest paths between each candidate or randomly selected control gene and all verified GSL genes within the full co-expression network were compared using the R/igraph package [67],[131][133]. This analysis was performed independently for candidate genes found in the control, silver, or seedling datasets as well as for all GSL genes and a subset of randomly selected genes that were not significantly associated with GSL phenotypes within the GWA mapping (Figure S4). This analysis generated a distribution of path distances linking the set of GWA mapping candidate genes to the known GSL genes. We also repeated the analysis by dividing the GSL genes into each of the specific biosynthetic pathways to test if any specific pathways showed reduced path distances to GWA mapping candidates (Tables S7 and S8) [50],[92],[99],[134][135].

We conducted two statistical tests to compare the null distribution (distances from non-significant genes to known GSL genes) with the GWA mapping candidate distribution (distances from GWA candidate genes to known GSL genes). The Wilcoxon Rank Sum Test tests the probability of a location shift between the distribution of the shortest paths of all GWA mapping candidate genes (from one of the three datasets) to all known GSL genes and the distribution of the shortest paths of all non-significantly associated genes to the all known GSL genes. The Ansari-Bradley Test examines the probability that the two aforementioned distributions are differently dispersed. Both statistic tests were conducted using the full GSL network list as well as each individual biosynthetic pathway (Tables S7 and S8).

GWAS Candidate Gene Selection and Validation

We focused our validation efforts on a set of GWA-identified candidate gene co-expression networks that exhibited different numbers of genes that are a member of the network (levels of membership). Criteria for selection of candidate genes from these networks for testing were connectedness (the gene had to show correlated expression levels (MR rank of <16) with multiple candidate genes within the network) and availability of viable mutants. These mutants were either a pre-existing characterized mutant line or a homozygous T-DNA mutation within an early exon of the candidate gene available from the Arabidopsis Biological Resource Center (ABRC) [136]. For each network tested, we attempted to test at least four separate genes within the network for altered GSL accumulation. We obtained putative homozygous T-DNA mutants for 18 candidate genes and validated their homozygosity using a PCR assay. Primers for the assay were designed using the SALK SIGnAL iSect primer design tool (http://signal.salk.edu/tdnaprimers.2.html). Of the 18 T-DNA mutants surveyed, homozygous mutants could not be obtained for 11 mutants, likely from lethality. In these cases, heterozygote lines were allowed to self-pollinate, and homozygous seed stocks were obtained by single seed decent following PCR-based genotyping of the progeny. In the absence of a homozygous line, we tested GSL content within the adult rosette leaves within PCR-confirmed heterozygous individuals. We also obtained mutants deficient in function at the following loci: phototropin1/phototropin2 (phot1/phot2) (4 lines), Gigantea (gi) (8 alleles), Erecta (er) in Col-0, and early flowering 3-1 (elf3-1) [79],[137][140]. Plants were grown under 10 h of light for 5 wk using a randomized complete block design over two experiments with at least four biological replicates per experiment. Leaf area and GSL content of the first true leaf was obtained as described above. A Dunnett's t-test was conducted to test the statistical significance of differences in GSL content between the mutant and wild-type while correcting for multiple comparisons using the R/multcomp package (Table S9) [141]. GSL were measured in at least two biological replicates per genotype, averaging 17 total individual measurements per genotype across the two replicates (min = 8, max = 48) (Table S9). Only wild-type controls grown concurrently with the mutants were used for the statistical comparison.

Measuring Glucosinolate Accumulation between the Bay-0 and Sha ELF3 Alleles

We utilized previously generated quantitative complementation lines to validate that natural variation in the ELF3 locus did alter GSL accumulation [77]. elf3:Bay-0 and elf3:Sha transgenic T1 seeds were planted on soil including elf3.1 mutants and wild-type Col-0 as a control [77]. The extreme hypocotyl length and cotyledon color phenotypes of the elf3.1 mutants were assessed to distinguish transformed from untransformed plants [137]. Transformed plants were grown for 25 d in a 10 h photoperiod. At 25 d, leaf tissue was harvested from each plant and individually extracted and assayed via HPLC for glucosinolate composition and concentration as previously described [41],[69]. The experiment was replicated 5 times for a total of 41 elf3:Bay-0 and 44 elf3:Sha independent T1 plants. GSL differences between the two ELF3 alleles were tested as described above.

Supporting Information

Dataset S1

GWA network candidate results. This dataset contains the GWA network candidate output results in a .net file ready for import into Pajek.

(TXT)

Figure S1

Trait distributions from leaf-control, leaf-AgNO3, and seedling datasets. Distributions of total aliphatic (left) and total indolic (right) glucosinolates are shown as examples to illustrate the differences between the three datasets. Seedling glucosinolates are presented in amount per seedling to control for differences in cellular expansion.

(TIF)

Figure S2

VENN Diagram of positive calls and trait groups. VENN diagrams showing the numbers of GWAS significantly associated genes for each dataset, Silver, Control, and Seedling, are shown. The GSL were separated into four trait groups based on previous biochemical analysis; INDOLE, indolic GSL; OHBUT, 2-hydroxy-but-3-enyl GSL traits; LC, 7 and 8 C long methionine derived GSL; SC, 3 and 4 C long methionine derived GSL. Two Seedling specific trait groups were also included for seedling specific GSL; BZO, benzoyloxy GSL and Benzyl are phenylalanine derived GSL. The bottom right VENN diagram displays overlap between the four common trait groups and the seedling specific groups.

(TIF)

Figure S3

Core GSL co-expression network. The known or predicted GSL genes generate a core GSL co-expression network that is expanded in this presentation for legibility. The general biochemical functions of the four major clusters within this super network are labeled. Three of the major clusters are further magnified to provide gene identification.

(PDF)

Figure S4

Distributions of shortest distances between known GSL genes and GWA candidates. Shown are plots comparing the distributions of the shortest distances between known GSL genes and GWA candidates for the control (red), silver (green), and seedling (blue) datasets. For comparison similar distributions derived from non-GWA-candidates (all genes) are also shown (black lines). Pw is the Wilcoxon Rank Sum test p value comparing the probability of a location shift between the distribution of the shortest paths of all GWA candidate genes (from one of the three datasets) to the corresponding glucosinolate gene and the distribution of the shortest paths of all non-significantly associated genes to the corresponding glucosinolate gene. Pa is the Ansari-Bradley Test probability assessing the difference in dispersion between the two aforementioned distributions. This was done for all GSL genes as well as for each of the specific biosynthetic networks as defined.

(PDF)

Table S1

Estimates of variance components for GSL in AgNO3 study. For each glucosinolate trait the following model was examined: ysar~μ+Ss+A(S)sa+Tt+R(T)tr+Tt:Ss+Tt:A(S)sa, where μ is the intercept, s is the K [set membership] {1,…,8) value for the corresponding accession [27][28], A(S)sa is the effect of accession nested in structure, Tt is the effect of AgNO3-treatment, and R(T)tr is the biological/technical replicate of the measure. The model was evaluated by combining the control (untreated mature leaves) and AgNO3-treated datasets. F, F-statistic of the model; P(F), nominal p value of the F-statistic; DF(num), numerator degrees of freedom; DF(denom), denominator d.f.; R2, fraction of total variance explained by the model; η2(x), partial R2 of the corresponding predictor variable; and P(x), p value of the corresponding predictor variable.

(XLS)

Table S2

Estimates of variance components for GSL in seedling study. For each glucosinolate trait the following model was examined: ysar~μ+Ss+A(S)sa+Tt+R(T)tr+Tt:Ss+Tt:A(S)sa, where μ is the intercept, s is the K [set membership] {1,…,8) value for the corresponding accession [27][28], A(S)sa is the effect of accession nested in structure, Tt is the effect of tissue type (mature leaves versus seedlings), and R(T)tr is the biological/technical replicate of the measure. The model was evaluated by combining the control (mature leaves) and seedling datasets. F, F-statistic of the model; P(F), nominal p value of the F-statistic; DF(num), numerator degrees of freedom; DF(denom), denominator d.f.; R2, fraction of total variance explained by the model; η2(x), partial R2 of the corresponding predictor variable; and P(x), p value of the corresponding predictor variable.

(XLS)

Table S3

Genetic means of glucosinolate abundance per accession for silver treated accessions. All metabolite values are in nmol per mg fresh weight tissue. Shown are the predicted means from four independent plants treated with silver nitrate per accession as per the statistical model: ysar = μ+Aa+Rr+εsar. Treated and untreated camalexin values are presented and are considered related to the indole GSL metabolites.

(XLS)

Table S4

Genetic means of glucosinolate abundance per accession for seedlings. All metabolite values are in nmol per seedling. Shown are the predicted means from four independent samples per accession as per the statistical model: ysar = μ+Aa+Rr+εsar.

(XLS)

Table S5

Gene-to-trait associations as identified using silver treated samples. Logical table indicating whether each of 31,505 genes is significantly associated to each of the 46 traits within the seedling samples. AGI is the gene code, Chr is the chromosome, and Start and End are the position of the gene in basepairs. For each trait, a gene is significantly associated if at least two SNP within ±1 kb flanking the coding region has a p value in the bottom 0.1 percentile of the p value distribution. T, is significant; F, not significant and genes with no significances are not listed.

(XLS)

Table S6

Gene-to-trait associations as identified using seedling material. Logical table indicating whether each of 31,505 genes is significantly associated to each of the 46 traits within the seedling samples. AGI is the gene code, Chr is the chromosome, and Start and End are the position of the gene in basepairs. For each trait, a gene is significantly associated if at least two SNP within ±1 kb flanking the coding region has a p value in the bottom 0.1 percentile of the p value distribution. T, is significant; F, not significant and genes with no significances are not listed.

(XLS)

Table S7

Known and putative genes involved in the GSL pathway. List of genes either known or predicted to play a role in GSL metabolism and regulation. AGI, the AGI (Arabidopsis Genome Initiative) code for each gene; Pathway, specific part of the GSL metabolic system the gene is thought function; Pseudogene, whether or not the gene is predicted to be a pseudogene; Evidence, experimental evidence (Genetic or Biochemical) or sequence evidence base on homology to validated GSL gene (Homology).

(XLS)

Table S8

GSL abbreviations.

(XLS)

Table S9

Mutant analysis for altered GSL accumulation. Chemical and statistical analysis for the various single gene mutants and genotypes queried within the manuscript. The wildtype to mutant comparison being conducted is shown in bold at the start of each subtable. The average value for mutant and control are shown in the top table for each mutant while the standard error is shown in the second table. The p value comparing the two genotypes is on the line labeled p value and n shows the number of independent plants measured per line.

(XLS)

Table S10

Estimated phenotypic variance determined by significant GWAS candidates. Abbreviations per glucosinolate are as described in Table S8. Percent phenotypic variations are as described in Materials and Methods. Analysis was conducted independently for each dataset.

(XLS)

Abbreviations

EMMA
efficient mixed-model association
eQTLs
expression quantitative trait loci
GSL
glucosinolate
GWA
genome-wide association
LD
linkage disequilibrium
QTL
quantitative trait loci
RIL
recombinant inbred line

Footnotes

The authors have declared that no competing interests exist.

This work was funded by National Science Foundation grants DBI 0642481 and MCB 0323759 to DJK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Hirschhorn J. N, Daly M. J. Genome-wide association studies for common diseases and complex traits. Nature Reviews Genetics. 2005;6:95–108. [PubMed]
2. Spencer C. C, Su Z, Donnelly P, Marchini J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 2009;5:e1000477. doi: 10.1371/journal.pgen.1000477. [PMC free article] [PubMed]
3. Atwell S, Huang Y, Vilhjalmsson B. J, Willems G, Horton M, et al. Genome-wide association study of 107 phenotypes in a common set of Arabidopsis thaliana in-bred lines. Nature. 2010 In press. [PMC free article] [PubMed]
4. Chan E. K. F, Rowe H. C, Kliebenstein D. J. Understanding the evolution of defense metabolites in Arabidopsis thaliana using genome-wide association mapping. Genetics. 2010;185:991–1007. [PMC free article] [PubMed]
5. Chan E. K, Rowe H. C, Hansen B. G, Kliebenstein D. J. The complex genetic architecture of the metabolome. PLoS Genet. 2010;6:e1001198. doi: 10.1371/journal.pgen.1001198. [PMC free article] [PubMed]
6. Mackay T. F. C. The genetic architecture of quantitative traits. Annual Review Of Genetics. 2001;35:303–339. [PubMed]
7. Mackay T. F. C. Q&A: genetic analysis of quantitative traits. Journal of Biology. 2009;8:23. [PMC free article] [PubMed]
8. Manolio T. A, Collins F. S, Cox N. J, Goldstein D. B, Hindorff L. A, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. [PMC free article] [PubMed]
9. Liu Y. J, Papasian C. J, Liu J. F, Hamilton J, Deng H. W. Is replication the gold standard for validating genome-wide association findings? PLos ONE. 2008;3 doi: 10.1371/journal.pone.0004037. [PMC free article] [PubMed]
10. Hawkins R. D, Hon G. C, Ren B. Next-generation genomics: an integrative approach. Nature Reviews Genetics. 2010;11:476–486. [PMC free article] [PubMed]
11. Nicolae D. L, Gamazon E, Zhang W, Duan S. W, Dolan M. E, et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLOS Genet. 2010;6 doi: 10.1371/journal.pgen.1000888. [PMC free article] [PubMed]
12. Su W. L, Sieberts S. K, Kleinhanz R. R, Lux K, Millstein J, et al. Assessing the prospects of genome-wide association studies performed in inbred mice. Mammalian Genome. 2010;21:143–152. [PubMed]
13. Wooten E. C, Iyer L. K, Montefusco M, Hedgepeth A. K, Payne D. D, et al. Application of gene network analysis techniques identifies AXIN1/PDIA2 and endoglin haplotypes associated with bicuspid aortic valve. PLos ONE. 2010;5 doi: 10.1371/journal.pone.0008830. [PMC free article] [PubMed]
14. Filiault D. L, Wessinger C. A, Dinneny J. R, Lutes J, Borevitz J. O, et al. Amino acid polymorphisms in Arabidopsis phytochrome B cause differential responses to light. Proc Natl Acad Sci U S A. 2008;105:3157–3162. [PMC free article] [PubMed]
15. Baranzini S. E, Galwey N. W, Wang J, Khankhanian P, Lindberg R, et al. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Human Molecular Genetics. 2009;18:2078–2090. [PMC free article] [PubMed]
16. Koornneef M, Alonso-Blanco C, Vreugdenhil D. Naturally occurring genetic variation in Arabidopsis thaliana. Annual Review of Plant Biology. 2004;55:141–172. [PubMed]
17. Clark R. M, Schweikert G, Toomajian C, Ossowski S, Zeller G, et al. Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science. 2007;317:338–342. [PubMed]
18. West M. A. L, Kim K, Kliebenstein D. J, van Leeuwen H, Michelmore R. W, et al. Global eQTL mapping reveals the complex genetic architecture of transcript level variation in Arabidopsis. Genetics. 2007;175:1441–1450. [PMC free article] [PubMed]
19. Keurentjes J. J. B, Fu J. Y, Terpstra I. R, Garcia J. M, van den Ackerveken G, et al. Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci U S A. 2007;104:1708–1713. [PMC free article] [PubMed]
20. Rowe H. C, Kliebenstein D. J. Complex genetics control natural variation in arabidopsis thaliana resistance to botrytis cinerea. Genetics. 2008;180:2237–2250. [PMC free article] [PubMed]
21. Rowe H. C, Hansen B. G, Halkier B. A, Kliebenstein D. J. Biochemical networks and epistasis shape the Arabidopsis thaliana metabolome. Plant Cell. 2008;20:1199–1216. [PMC free article] [PubMed]
22. Caicedo A. L, Stinchcombe J. R, Olsen K. M, Schmitt J, Purugganan M. D. Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proc Natl Acad Sci U S A. 2004;101:15670–15675. [PMC free article] [PubMed]
23. Malmberg R. L, Held S, Waits A, Mauricio R. Epistasis for fitness-related quantitative traits in Arabidopsis thaliana grown in the field and in the greenhouse. Genetics. 2005;171:2013–2027. [PMC free article] [PubMed]
24. Alcazar R, Garcia A. V, Parker J. E, Reymond M. Incremental steps toward incompatibility revealed by Arabidopsis epistatic interactions modulating salicylic acid pathway activation. Proc Natl Acad Sci U S A. 2009;106:334–339. [PMC free article] [PubMed]
25. Bomblies K, Lempe J, Epple P, Warthmann N, Lanz C, et al. Autoimmune response as a mechanism for a Dobzhansky-Muller-type incompatibility syndrome in plants. Plos Biol. 2007;5:1962–1972. doi: 10.1371/journal.pbio.0050236. [PMC free article] [PubMed]
26. Bikard D, Patel D, Le Mette C, Giorgi V, Camilleri C, et al. Divergent evolution of duplicate genes leads to genetic incompatibilities within a-thaliana. Science. 2009;323:623–626. [PubMed]
27. Nordborg M, Borevitz J. O, Bergelson J, Berry C. C, Chory J, et al. The extent of linkage disequilibrium in Arabidopsis thaliana. Nature Genetics. 2002;30:190–193. [PubMed]
28. Nordborg M, Hu T. T, Ishino Y, Jhaveri J, Toomajian C, et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 2005;3:e196. doi: 10.1371/journal.pbio.0030196. [PMC free article] [PubMed]
29. Kim S, Plagnol V, Hu T. T, Toomajian C, Clark R. M, et al. Recombination and linkage disequilibrium in Arabidopsis thaliana. Nature Genetics. 2007;39:1151–1155. [PubMed]
30. Zhao K. Y, Aranzana M. J, Kim S, Lister C, Shindo C, et al. An Arabidopsis example of association mapping in structured samples. Plos Genet. 2007;3 doi: 10.1371/journal.pgen.0030004. [PMC free article] [PubMed]
31. Kliebenstein D. J. A quantitative genetics and ecological model system: understanding the aliphatic glucosinolate biosynthetic network via QTLs. Phytochem Rev. 2009;8:243–254.
32. Fan J, Crooks C, Creissen G, Hill L, Fairhurst S, et al. Pseudomonas sax genes overcome aliphatic isothiocyanate–mediated non-host resistance in arabidopsis. Science. 2011;331:1185–1188. [PubMed]
33. Bidart-Bouzat M. G, Kliebenstein D. J. Differential levels of insect herbivory in the field associated with genotypic variation in glucosinolates in Arabidopsis thaliana. Journal of Chemical Ecology. 2008;34:1026–1037. [PubMed]
34. Lankau R. A, Kliebenstein D. J. Competition, herbivory and genetics interact to determine the accumulation and fitness consequences of a defence metabolite. Journal of Ecology. 2009;97:78–88.
35. Mauricio R. Costs of resistance to natural enemies in field populations of the annual plant Arabidopsis thaliana. American Naturalist. 1998;151:20–28. [PubMed]
36. Clay N. K, Adio A. M, Denoux C, Jander G, Ausubel F. M. Glucosinolate metabolites required for an arabidopsis innate immune response. Science. 2009;323:95–101. [PMC free article] [PubMed]
37. de Vos M, Kriksunov K. L, Jander G. Indole-3-acetonitrile production from indole glucosinolates deters oviposition by Pieris rapae. Plant Physiol. 2008;146:916–926. [PMC free article] [PubMed]
38. Kim J. H, Jander G. Myzus persicae (green peach aphid) feeding on Arabidopsis induces the formation of a deterrent indole glucosinolate. The Plant Journal. 2007;49:1008–1019. [PubMed]
39. Bednarek P, Pislewska-Bednarek M, Svatos A, Schneider B, Doubsky J, et al. A glucosinolate metabolism pathway in living plant cells mediates broad-spectrum antifungal defense. Science. 2009;323:101–106. [PubMed]
40. Pfalz M, Vogel H, Mitchell-Olds T, Kroymann J. Mapping of QTL for resistance against the crucifer specialist herbivore Pieris brassicae in a new arabidopsis inbred line population, Da(1)-12×Ei-2. PLos ONE. 2007;2:e578. [PMC free article] [PubMed]
41. Kliebenstein D. J, Gershenzon J, Mitchell-Olds T. Comparative quantitative trait loci mapping of aliphatic, indolic and benzylic glucosinolate production in Arabidopsis thaliana leaves and seeds. Genetics. 2001;159:359–370. [PMC free article] [PubMed]
42. Raybould A. F, Moyes C. L. The ecological genetics of aliphatic glucosinolates. Heredity. 2001;87:383–391. [PubMed]
43. Lankau R. A, Strauss S. Y. Mutual feedbacks maintain both genetic and species diversity in a plant community. Science. 2007;317:1561–1563. [PubMed]
44. Lankau R. A. Specialist and generalist herbivores exert opposing selection on a chemical defense. New Phytologist. 2007;175:176–184. [PubMed]
45. Lankau R. A, Strauss S. Y. Community complexity drives patterns of natural selection on a chemical Defense of Brassica nigra. American Naturalist. 2008;171:150–161. [PubMed]
46. Benderoth M, Textor S, Windsor A. J, Mitchell-Olds T, Gershenzon J, et al. Positive selection driving diversification in plant secondary metabolism. Proc Natl Acad Sci U S A. 2006;103:9118–9123. [PMC free article] [PubMed]
47. Bakker E. G, Traw M. B, Toomajian C, Kreitman M, Bergelson J. Low levels of polymorphism in genes that control the activation of defense response in Arabidopsis thaliana. Genetics. 2008;178:2031–2043. [PMC free article] [PubMed]
48. Brown P. D, Tokuhisa J. G, Reichelt M, Gershenzon J. Variation of glucosinolate accumulation among different organs and developmental stages of Arabidopsis thaliana. Phytochem. 2003;62:471–781. [PubMed]
49. Wentzell A. M, Boeye I, Zhang Z. Y, Kliebenstein D. J. Genetic networks controlling structural outcome of glucosinolate activation across development. Plos Genet. 2008;4 doi: 10.1371/journal.pgen.1000234. [PMC free article] [PubMed]
50. Wentzell A. M, Kliebenstein D. J. Genotype, age, tissue, and environment regulate the structural outcome of glucosinolate activation. Plant Physiology. 2008;147:415–428. [PMC free article] [PubMed]
51. Kliebenstein D. J, Figuth A, Mitchell-Olds T. Genetic architecture of plastic methyl jasmonate responses in Arabidopsis thaliana. Genetics. 2002;161:1685–1696. [PMC free article] [PubMed]
52. Grubb C. D, Abel S. Glucosinolate metabolism and its control. Trends in Plant Science. 2006;11:89–100. [PubMed]
53. Wittstock U, Halkier B. A. Glucosinolate research in the Arabidopsis era. Trends Plant Sci. 2002;7:263–270. [PubMed]
54. Halkier B. A, Gershenzon J. Biology and biochemistry of glucosinolates. Annual Review of Plant Biology. 2006;57:303–333. [PubMed]
55. Li J, Hansen B. G, Ober J. A, Kliebenstein D. J, Halkier B. A. Subclade of flavin-monooxygenases involved in aliphatic glucosinolate biosynthesis. Plant Physiology. 2008;148:1721–1733. [PMC free article] [PubMed]
56. Hansen B. G, Kerwin R. E, Ober J. A, Lambrix V. M, Mitchell-Olds T, et al. A novel 2-oxoacid-dependent dioxygenase involved in the formation of the goiterogenic 2-hydroxybut-3-enyl glucosinolate and generalist insect resistance in arabidopsis. Plant Physiology. 2008;148:2096–2108. [PMC free article] [PubMed]
57. Sønderby I. E, Hansen B. G, Bjarnholt N, Ticconi C, Halkier B. A, et al. A systems biology approach identifies a R2R3 MYB gene subfamily with distinct and overlapping functions in regulation of aliphatic glucosinolates. PLos ONE. 2007;2:e1322. doi: 10.1371/journal.pone.0001322. [PMC free article] [PubMed]
58. Hansen B. G, Kliebenstein D. J, Halkier B. A. Identification of a flavin-monooxygenase as the S-oxygenating enzyme in aliphatic glucosinolate biosynthesis in Arabidopsis. The Plant Journal. 2007;50:902–910. [PubMed]
59. Hirai M, Sugiyama K, Sawada Y, Tohge T, Obayashi T, et al. Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis. Proc Natl Acad Sci U S A. 2007;104:6478–6483. [PMC free article] [PubMed]
60. Kliebenstein D. J, D'Auria J. C, Behere A. S, Kim J. H, Gunderson K. L, et al. Characterization of seed-specific benzoyloxyglucosinolate mutations in Arabidopsis thaliana. The Plant Journal. 2007;51:1062–1076. [PubMed]
61. Wentzell A. M, Rowe H. C, Hansen B. G, Ticconi C, Halkier B. A, et al. Linking metabolic QTL with network and cis-eQTL controlling biosynthetic pathways. PLOS Genet. 2007;3:e162. doi: 10.1371/journal.pgen.0030162. [PMC free article] [PubMed]
62. Kliebenstein D. J, Kroymann J, Brown P, Figuth A, Pedersen D, et al. Genetic control of natural variation in Arabidopsis thaliana glucosinolate accumulation. Plant Physiol. 2001;126:811–825. [PMC free article] [PubMed]
63. Kliebenstein D. J, Pedersen D, Mitchell-Olds T. Comparative analysis of quantitative trait loci controlling glucosinolates, myrosinase and insect resistance in Arabidopsis thaliana. Genetics. 2002;161:325–332. [PMC free article] [PubMed]
64. Kliebenstein D. J, Rowe H. C, Denby K. J. Secondary metabolites influence Arabidopsis/Botrytis interactions: variation in host production and pathogen sensitivity. Plant Journal. 2005;44:25–36. [PubMed]
65. Kang H. M, Zaitlen N. A, Wade C. M, Kirby A, Heckerman D, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178:1709–1723. [PMC free article] [PubMed]
66. Aulchenko Y. S, Ripke S, Isaacs A, Van Duijn C. M. GenABEL: an R library for genorne-wide association analysis. Bioinformatics. 2007;23:1294–1296. [PubMed]
67. Computing R. F. f. S., editor. R Development Core Team. R: a language and environment for statistical computing. Vienna. 2008.
68. Sønderby I. E, Geu-Flores F, Halkier B. A. Biosynthesis of glucosinolates - gene discovery and beyond. Trends in Plant Science. 2010;15:283–290. [PubMed]
69. Kliebenstein D, Lambrix V, Reichelt M, Gershenzon J, Mitchell-Olds T. Gene duplication and the diversification of secondary metabolism: side chain modification of glucosinolates in Arabidopsis thaliana. Plant Cell. 2001;13:681–693. [PMC free article] [PubMed]
70. Kroymann J, Mitchell-Olds T. Epistasis and balanced polymorphism influencing complex trait variation. Nature. 2005;435:95–98. [PubMed]
71. Textor S, Bartram S, Kroymann J, Falk K. L, Hick A, et al. Biosynthesis of methionine-derived glucosinolates in Arabidopsis thaliana: recombinant expression and characterization of methylthioalkylmalate synthase, the condensing enzyme of the chain-elongation cycle. Planta. 2004;218:1026–1035. [PubMed]
72. Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, et al. ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Research. 2007;35:D863–D869. [PMC free article] [PubMed]
73. Glazebrook J. Contrasting mechanisms of defense against biotrophic and necrotrophic pathogens. Annual Review of Phytopathology. 2005;43:205–227. [PubMed]
74. Schlaeppi K, Bodenhausen N, Buchala A, Mauch F, Reymond P. The glutathione-deficient mutant pad2-1 accumulates lower amounts of glucosinolates and is more susceptible to the insect herbivore Spodoptera littoralis. Plant Journal. 2008;55:774–786. [PubMed]
75. Geu-Flores F, Nielsen M. T, Nafisi M, Moldrup M. E, Olsen C. E, et al. Glucosinolate engineering identifies gamma-glutamyl peptidase. Nature Chemical Biology. 2009;5:575–577. [PubMed]
76. Clarke J, Mithen R, Brown J, Dean C. QTL analysis of flowering time in Arabidopsis thaliana. Mol Gen Genet. 1995;248:278–286. [PubMed]
77. Jiménez-Gómez J. M, Wallace A, Maloof J. N. QTL and network analysis of the shade avoidance response in Arabidopsis. PLoS Genet. 2010;6(9):e1001100. doi: 10.1371/journal.pgen.1001100. [PMC free article] [PubMed]
78. Brock M. T, Tiffin P, Weinig C. Sequence diversity and haplotype associations with phenotypic responses to crowding: GIGANTEA affects fruit set in Arabidopsis thaliana. Molecular Ecology. 2007;16:3050–3062. [PubMed]
79. Briggs W. R, Christie J. M. Phototropins 1 and 2: versatile plant blue-light receptors. Trends in Plant Science. 2002;7:204–210. [PubMed]
80. Hasegawa T, Yamada K, Kosemura S, Yamamura S, Hasegawa K. Phototropic stimulation induces the conversion of glucosinolate to phototropism-regulating substances of radish hypocotyls. Phytochemistry. 2000;54:275–279. [PubMed]
81. Yamada K, Hasegawa T, Minami E, Shibuya N, Kosemura S, et al. Induction of myrosinase gene expression and myrosinase activity in radish hypocotyls by phototropic stimulation. Journal of Plant Physiology. 2003;160:255–259. [PubMed]
82. Lister C, Dean D. Recombinant inbred lines for mapping RFLP and phenotypic markers in Arabidopsis thaliana. Plant Journal. 1993;4:745–750.
83. Mithen R, Clarke J, Lister C, Dean C. Genetics of aliphatic glucosinolates. III. Side-chain structure of aliphatic glucosinolates in Arabidopsis thaliana. Heredity. 1995;74:210–215.
84. Magrath R, Bano F, Morgner M, Parkin I, Sharpe A, et al. Genetics of aliphatic glucosinolates. I. Side chain elongation in Brassica napus and Arabidopsis thaliana. Heredity. 1994;72:290–299.
85. Alonso-Blanco C, Peeters A. J. M, Koornneef M, Lister C, Dean C, et al. Development of an AFLP based linkage map of Ler, Col and Cvi Arabidopsis thaliana ecotypes and construction of a Ler/Cvi recombinant inbred line population. Plant Journal. 1998;14:259–271. [PubMed]
86. Keurentjes J. J. B, Fu J. Y, de Vos C. H. R, Lommen A, Hall R. D, et al. The genetics of plant metabolism. Nature Genetics. 2006;38:842–849. [PubMed]
87. Harmer S. L. The circadian system in higher plants. Annual Review of Plant Biology. 2009;60:357–377. [PubMed]
88. Edwards K. D, Lynn J. R, Gyula P, Nagy F, Millar A. J. Natural allelic variation in the temperature-compensation mechanisms of the Arabidopsis thaliana circadian clock. Genetics. 2005;170:387–400. [PMC free article] [PubMed]
89. Stanne T. M, Pojidaeva E, Andersson F. I, Clarke A. K. Distinctive types of ATP-dependent Clp proteases in cyanobacteria. Journal of Biological Chemistry. 2007;282:14394–14402. [PubMed]
90. Kliebenstein D. J, West M. A. L, Van Leeuwen H, Kyunga K, Doerge R. W, et al. Genomic survey of gene expression diversity in Arabidopsis thaliana. Genetics. 2006;172:1179–1189. [PMC free article] [PubMed]
91. Van Leeuwen H, Kliebenstein D. J, West M. A. L, kim K. D, van Poecke R, et al. Natural variation among Arabidopsis thaliana accessions for transcriptome response to exogenous salicylic acid. Plant Cell. 2007;19:2099–2110. [PMC free article] [PubMed]
92. Kliebenstein D. J. A role for gene duplication and natural variation of gene expression in the evolution of metabolism. PLos ONE. 2008;3:e1838. doi: 10.1371/journal.pone.0001838. [PMC free article] [PubMed]
93. Byrne P. F, McMullen M. D, Wiseman B. R, Snook M. E, Musket T. A, et al. Maize silk maysin concentration and corn earworm antibiosis: QTLs and genetic mechanisms. Crop Science. 1998;38:461–471.
94. Loudet O, Chaillou S, Krapp A, Daniel-Vedele F. Quantitative trait loci analysis of water and anion contents in interaction with nitrogen availability in Arabidopsis thaliana. Genetics. 2003;163:711–722. [PMC free article] [PubMed]
95. Saito K, Hirai M, Yonekura-Sakakibara K. Decoding genes with coexpression networks and metabolomics – ‘majority report by precogs.’ Trends in Plant Science 2008 [PubMed]
96. Yonekura-Sakakibara K, Tohge T, Niida R, Saito K. Identification of a flavonol 7-O-rhamnosyltransferase gene determining flavonoid pattern in Arabidopsis by transcriptome coexpression analysis and reverse genetics. Journal of Biological Chemistry. 2007;282:14932–14941. [PubMed]
97. Maruyama-Nakashita A, Nakamura Y, Tohge T, Saito K, Takahashi H. Arabidopsis SLIM1 is a central transcriptional regulator of plant sulfur response and metabolism. Plant Cell. 2006;18:3235–3251. [PMC free article] [PubMed]
98. Hirai M. Y, Klein M, Fujikawa Y, Yano M, Goodenowe D. B, et al. Elucidation of gene-to-gene and metabolite-to-gene networks in Arabidopsis by integration of metabolomics and transcriptomics. Journal Of Biological Chemistry. 2005;280:25590–25595. [PubMed]
99. Sønderby I. E, Burow M, Rowe H. C, Kliebenstein D. J, Halkier B. A. A complex interplay of three R2R3 MYB transcription factors determines the profile of aliphatic glucosinolates in Arabidopsis. Plant Physiol. 2010;153:348–363. [PMC free article] [PubMed]
100. Keller B, Martini S, Sedor J, Kretzler M. Linking variants from genome-wide association analysis to function via transcriptional network analysis. Seminars in Nephrology. 2010;30:177–184. [PMC free article] [PubMed]
101. Wheeler H. E, Metter E. J, Tanaka T, Absher D, Higgins J, et al. Sequential use of transcriptional profiling, expression quantitative trait mapping, and gene association implicates MMP20 in human kidney aging. PLOS Genet. 2009;5 doi: 10.1371/journal.pgen.1000685. [PMC free article] [PubMed]
102. Ballard D, Abraham C, Cho J, Zhao H. Y. Pathway analysis comparison using Crohn's disease genome wide association studies. Bmc Medical Genomics. 2010;3 [PMC free article] [PubMed]
103. Kliebenstein D. J. Belostotsky D, editor. Quantification of variation in expression networks. Plant Systems Biology: Humana Press. 2009. [PubMed]
104. Kliebenstein D. Quantitative genomics: analyzing intraspecific variation using global gene expression polymorphisms or eQTLs. Annual Review of Plant Biology. 2009;60:93–114. [PubMed]
105. Buckler E. S, Holland J. B, Bradbury P. J, Acharya C. B, Brown P. J, et al. The genetic architecture of maize flowering time. Science. 2009;325:714–718. [PubMed]
106. Yang J. A, Benyamin B, McEvoy B. P, Gordon S, Henders A. K, et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genetics. 2010;42:565-U131. [PMC free article] [PubMed]
107. Fisher R. A. The correlation between relatives on the supposition of mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh. 1918;52:399–433.
108. Moyes C. L, Raybould A. F. The role of spatial scale and intraspecific variation in secondary chemistry in host-plant location by Ceutorhynchus assimilis (Coleoptera: Curculionidae). Proc Biol Sci. 2001;268:1567–1573. [PMC free article] [PubMed]
109. Tiffin P, Rausher M. D. Genetic constraints and selection acting on tolerance to herbivory in the common morning glory Ipomoea purpurea. American Naturalist. 1999;154:700–716. [PubMed]
110. Kalisz S, Kramer E. M. Variation and constraint in plant evolution and development. Heredity. 2008;100:171–177. [PubMed]
111. Leroi A. M. The scale independence of evolution. Evol Dev. 2000;2:67–77. [PubMed]
112. Stich B, Yu J. M, Melchinger A. E, Piepho H. P, Utz H. F, et al. Power to detect higher-order epistatic interactions in a metabolic pathway using a new mapping strategy. Genetics. 2007;176:563–570. [PMC free article] [PubMed]
113. Sergeeva L. I, Keurentjes J. J. B, Bentsink L, Vonk J, van der Plas L. H. W, et al. Vacuolar invertase regulates elongation of Arabidopsis thaliana roots as revealed by QTL and mutant analysis. Proc Natl Acad Sci U S A. 2006;103:2994–2999. [PMC free article] [PubMed]
114. Edwards C, Weinig C. The quantitative-genetic and QTL architecture of trait integration and modularity in Brassica rapa across simulated seasonal settings. Heredity. 2010 In Press. [PMC free article] [PubMed]
115. Klingenberg C. P. Morphological integration and developmental modularity. Annual Review of Ecology Evolution and Systematics. 2008;39:115–132.
116. Ohno S. Evolution by gene duplication. New York: Springer-Verlag; 1970.
117. Hughes A. L. The evolution of functionally novel proteins after gene duplication. Proc R Soc Lond Ser B Biol Sci. 1994;256:119–124. [PubMed]
118. Fraser H. B, Hirsh A. E, Steinmetz L. M, Scharfe C, Feldman M. W. Evolutionary rate in the protein interaction network. Science. 2002;296:750–752. [PubMed]
119. Lynch M, Force A. The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000;154:459–473. [PMC free article] [PubMed]
120. Borevitz J. O, Hazen S. P, Michael T. P, Morris G. P, Baxter I. R, et al. Genome-wide patterns of single-feature polymorphism in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2007;104:12057–12062. [PMC free article] [PubMed]
121. Kliebenstein D, West M, van Leeuwen H, Loudet O, Doerge R, et al. Identification of QTLs controlling gene expression networks defined a priori. BMC Bioinformatics. 2006;7:308. [PMC free article] [PubMed]
122. Glawischnig E, Hansen B. G, Olsen C. E, Halkier B. A. Camalexin is synthesized from indole-3-acetaidoxime, a key branching point between primary and secondary metabolism in Arabidopsis. Proc Natl Acad Sci U S A. 2004;101:8245–8250. [PMC free article] [PubMed]
123. Bohlmann H, Vignutelli A, Hilpert B, Miersch O, Wasternack C, et al. Wounding and chemicals induce expression of the Arabidopsis thaliana gene Thi2.1, encoding a fungal defense thionin, via the octadecanoid pathway. Febs Letters. 1998;437:281–286. [PubMed]
124. Epple P, Apel K, Bohlmann H. An arabidopsis-thaliana thionin gene is inducible via a signal-transduction pathway different from that for pathogenesis-related proteins. Plant Physiology. 1995;109:813–820. [PMC free article] [PubMed]
125. Reichelt M, Brown P. D, Schneider B, Oldham N. J, Stauber E, et al. Benzoic acid glucosinolate esters and other glucosinolates from Arabidopsis thaliana. Phytochem. 2002;59:663–671. [PubMed]
126. Abramoff M. D, Magelhaes P. J, Ram S. J. Image processing with ImageJ. Biophotonics International. 2004;11:36–42.
127. Kliebenstein D. J. Metabolomics and plant quantitative trait locus analysis - the optimum genetical genomics platform? In: Nikolau B. J, Wurtele E. S, editors. Concepts in plant metabolomics. Dordrect, The Netherlands: Springer; 2007. pp. 29–45.
128. Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K. ATTED-II provides coexpressed gene networks for Arabidopsis. Nucleic Acids Research. 2009;37:D987–D991. [PMC free article] [PubMed]
129. Obayashi T, Kinoshita K. Rank of correlation coefficient as a comparable measure for biological significance of gene coexpression. DNA Research. 2009;16:249–260. [PMC free article] [PubMed]
130. Batagelj V, Mrvar A. Pajek - analysis and visualization of large networks. Graph Drawing Lecture Notes in Computer Science. 2002;2265:477–478.
131. Ferres L, Parush A, Li Z. H, Oppacher Y, Lindgaard G. Butz A, Fisher B, Kruger A, Olivier P, editors. Representing and querying line graphs in natural language: The iGraph system. Smart graphics, proceedings. 2006. pp. 248–253.
132. Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal Complex Systems. 2006:1695.
133. Csardi G. igraph: routines for network analysis R package 2005
134. Zhang P. F, Foerster H, Tissier C. P, Mueller L, Paley S, et al. MetaCyc and AraCyc. Metabolic pathway databases for plant research. Plant Physiology. 2005;138:27–37. [PMC free article] [PubMed]
135. Mueller L. A, Zhang P. F, Rhee S. Y. AraCyc: a biochemical pathway database for Arabidopsis. Plant Physiology. 2003;132:453–460. [PMC free article] [PubMed]
136. Alonso J. M, Stepanova A. N, Leisse T. J, Kim C. J, Chen H. M, et al. Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science. 2003;301:653–657. [PubMed]
137. Zagotta M. T, Shannon S, Jacobs C, Meekswagner D. R. Early-flowering mutants of arabidopsis-thaliana. Australian Journal of Plant Physiology. 1992;19:411–418.
138. Fowler S, Lee K, Onouchi H, Samach A, Richardson K, et al. GIGANTEA: a circadian clock-controlled gene that regulates photoperiodic flowering in Arabidopsis and encodes a protein with several possible membrane-spanning domains. Embo Journal. 1999;18:4679–4688. [PMC free article] [PubMed]
139. Park D. H, Somers D. E, Kim Y. S, Choy Y. H, Lim H. K, et al. Control of circadian rhythms and photoperiodic flowering by the Arabidopsis GIGANTEA gene. Science. 1999;285:1579–1582. [PubMed]
140. Torii K. U, Mitsukawa N, Oosumi T, Matsuura Y, Yokoyama R, et al. The arabidopsis ERECTA gene encodes a putative receptor protein kinase with extracellular leucine-rich repeats. Plant Cell. 1996;8:735–746. [PMC free article] [PubMed]
141. Hothorn T, Bretz F, Westfall P. Simultaneous inference in general parametric models. Biometrical Journal. 2008;50:346–363. [PubMed]
142. Kroymann J, Textor S, Tokuhisa J. G, Falk K. L, Bartram S, et al. A gene controlling variation in Arabidopsis glucosinolate composition is part of the methionine chain elongation pathway. Plant Physiology. 2001;127:1077–1088. [PMC free article] [PubMed]
143. Kliebenstein D, Lambrix V, Reichelt M, Gershenzon J, Mitchell-Olds T. Gene duplication in the diversification of secondary metabolism: Tandem 2-oxoglutarate–dependent dioxygenases control glucosinolate biosynthesis in Arabidopsis. Plant Cell. 2001;13:681–693. [PMC free article] [PubMed]

Articles from PLoS Biology are provided here courtesy of Public Library of Science
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...