![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2007, Cold Spring Harbor Laboratory Press An Arabidopsis gene network based on the graphical Gaussian model 1 Physiological and Molecular Plant Biology Program, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA; 2 Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA; 3 Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA 4Corresponding author.E-mail bohnerth/at/life.uiuc.edu; fax: (217) 333-5574. Received March 31, 2007; Accepted September 5, 2007. This article has been cited by other articles in PMC.Abstract We describe a gene network for the Arabidopsis thaliana transcriptome based on a modified graphical Gaussian model (GGM). Through partial correlation (pcor), GGM infers coregulation patterns between gene pairs conditional on the behavior of other genes. Regularized GGM calculated pcor between gene pairs among ~2000 input genes at a time. Regularized GGM coupled with iterative random samplings of genes was expanded into a network that covered the Arabidopsis genome (22,266 genes). This resulted in a network of 18,625 interactions (edges) among 6760 genes (nodes) with high confidence and connections representing ~0.01% of all possible edges. When queried for selected genes, locally coherent subnetworks mainly related to metabolic functions, and stress responses emerged. Examples of networks for biochemical pathways, cell wall metabolism, and cold responses are presented. GGM displayed known coregulation pathways as subnetworks and added novel components to known edges. Finally, the network reconciled individual subnetworks in a topology joined at the whole-genome level and provided a general framework that can instruct future studies on plant metabolism and stress responses. The network model is included. Remarkable conceptual and technical advances in genomics have generated exceptionally large data sets. Global analyses of these collections of data may now be used to construct biological networks that systematically categorize all molecules and describe their functions and interactions (Barabasi and Oltvai 2004). Networks are emerging that, oriented to highlight different levels of complexity and placing emphasis on distinct regulatory, developmental, or metabolic “pathways,” can now integrate biological functions of cells, organs, and organisms (Brazhnik et al. 2002). Most advanced are gene networks analyzing large-scale microarray hybridizations that monitor transcriptome dynamics (de la Fuente et al. 2002; Yugi et al. 2005). Emerging also are networks extracted from protein–protein interactions or protein complexes (Ito et al. 2001; Gavin et al. 2002), regulatory networks based on ChIP-chip data, which describe the interactions between transcription factors and their targets (Lee et al. 2002; Buck and Lieb 2004), or metabolic networks elucidating effects of the dynamics of metabolites (Baxter et al. 2007; Martins et al. 2007). Synthetic lethal networks extract genetic interactions critical for an organism’s fitness (Tong et al. 2004; Pan et al. 2006). In contrast to single-cell organisms, network reconstruction of higher organisms has been restricted mainly due to limitations in data availability. Nevertheless, in a complex system such as the plant model Arabidopsis thaliana, expression profiles extracted from microarray data sets offer information on physiological status, in particular, because data from time series and from developmental, genetic intervention, or manipulative treatments are available (Schmid et al. 2005; Kilian et al. 2007). The assembly of a gene network depends on the mathematical models applied, which, ideally, should describe inferred causal relationships that govern the expression patterns and dynamics of a set of genes. In reality, networks are assembled according to coincidence or coregulation of genes and the magnitude of regulation or statistical significance of the coincidence (Brazhnik et al. 2002). Currently, the most widely used computational method involves calculating standard Pearson correlation coefficients (r) between pairs of genes. A pair of genes with r larger than a preselected threshold is considered to reveal functional interaction, influence, or dependence. Networks based on these interactions are termed relevance network. However, such networks may lead to ambiguous results, especially when the network is heavily connected (Brazhnik et al. 2002). An alternative method, the graphical Gaussian model (GGM), uses partial correlations as the source for a robust assessment of a direct interaction between any gene pair (Whittaker 1990; Toh and Horimoto 2002). Different from Pearson correlation that records correlation between gene pairs without regard to other genes, partial correlation between two genes measures the degree of correlation remaining after removing the effects of other genes. Recent studies have demonstrated that GGM is a useful tool to infer conditional dependency structure and to reconstruct network-like associations among genes (Kishino and Waddell 2000; Toh and Horimoto 2002; Magwene and Kim 2004; Schäfer and Strimmer 2005b; Wille and Buhlmann 2006). Irrespective of the potential intrinsic to GGM, its application for building network inferences had before been restricted to a small number of genes (Kishino and Waddell 2000; Toh and Horimoto 2002) due to the generally small number of samples (n) available from microarray experiments. This number is typically much smaller than the number of genes (P). Classical GGM theory cannot accommodate settings for P >> n (Schäfer and Strimmer 2005a; Wille and Buhlmann 2006). Recently, GGM with a limited-order partial correlation function, which estimates correlations conditional on one or two, but not all other genes, has been developed to infer gene networks from Arabidopsis and yeast transcript profiles (Magwene and Kim 2004; Wille et al. 2004). Another way to tackle the small sampling problem is to infer GGM with regularization and moderation (Schäfer and Strimmer 2005b). This shrinkage approach to graphical Gaussian modeling, implemented in “GeneNet” in R, is such an approach that is applicable to data sets with P slightly larger than n (Schäfer and Strimmer 2005c; Schäfer et al. 2006). We have used this regularized GGM to build a gene network for A. thaliana, based on data from more than 2000 Affymetrix ATH1 microarray experiments deposited in the NASC database (Craigon et al. 2004). A pilot study evaluated the method for 2000 genes for which biologically meaningful interactions had been established in single-gene studies. Then, as an exploratory experiment, by using an iterative random sampling strategy, the model was expanded to cover >22,000 Arabidopsis genes, resulting in a network that included 6760 nodes (genes) connected by 18,625 significant edges (interactions). Results and Discussion Pilot experiment with 2000 genes The data for construction of the model (Schäfer and Strimmer 2005c) represented 2466 Affymetrix ATH1 microarray slides deposited at NASC by August 2006. After excluding 421 potentially outlying experiments according to Persson et al. (2005), 2045 chips remained for network construction. The selected conditions reported transcript changes in plants challenged by a spectrum of abiotic and biotic stresses and chemical treatments. In addition, transcript profiles from different tissues or developmental stages were included (Supplemental Table S1). A proof-of-concept experiment started with a collection of ~5000 named, and to some degree, analyzed genes in Arabidopsis. This collection was filtered by a selection of genes with high regulation by biotic and abiotic stresses and tissue expression characteristics, which reduced the number to ~2000 genes. Partial correlation (pcor) was estimated for every gene pair among these genes using the “GeneNet” package (Schäfer et al. 2006). Figure 1A
A network for 22,200 Arabidopsis genes This result provided motivation to expand the network by including ~22,200 genes of the Arabidopsis transcriptome, represented by 22,266 Affymetrix ATH1 probes with the discrepancy in numbers, due to the fact that some genes were represented by more than one probe set. GGM does not allow for computing the pcor of all input genes simultaneously, because the maximum number of genes that may be analyzed at one time depends on sampling numbers. An iterative process with 2000 iterations was adopted. In each iteration, 2000 genes were randomly selected and used as input for pcor estimation. On average, every gene pair was sampled 16.2 times, and the pcor with the lowest absolute value, representing the one with the largest amount of effects from other genes removed, was chosen as an estimation of the final pcor in the expanded network. Compared with the pilot experiment, these pcors were more narrowly concentrated around zero (Fig. 1B Additionally, two random permutation experiments were conducted to evaluate potential false discovery rates. First, all 22,266 genes were permutated, followed by the analysis described before. After 1000 iterations, all final pcors were in the range of from –0.0002 to +0.0004 and deemed insignificant. Second, 1000 genes were randomly chosen, permutated, combined with the remaining 21,266 genes, and subjected to the analysis with 2000 iterations, resulting in an overall pcor distribution similar to that in Figure 1B. Overall network properties The resulting network was not completely scale free, but exhibited scale-free behavior over a wide range. The average network connectivity for a node was 5.5. Figure 3A
The network seems to follow a truncated power-law distribution (Amaral et al. 2000), with a power-law regime at 1 ≤ k ≤ 11, where the network exhibits certain scale-free behavior, followed by a sharp drop off. Biological networks with similar connectivity distribution have been reported before (Jeong et al. 2001; Giot et al. 2003). A recent analysis indicated that most biological networks were not totally scale free, but rather might better be described as following a truncated power law, while certain scale-free features such as small world and centrality properties hold true (Khanin and Wit 2006). An evident qualitative feature of our network, characteristic of scale-free network models, was the presence of few nodes with many connections, which appeared to constitute major hubs and many nodes with very few connections. The final overall network (Fig. 3B
Modules in metabolism reveal coherent network subgraphs By use of the kCores method in Carey and Long’s RBGL package (version 1.10.0) in Bioconductor, we identified coherent subgraphs (Gentleman et al. 2004). Easily identifiable among these subgraphs were networks assignable to defined metabolic processes. Figure 4 Genes centered on APR1, one of three 5′-adenylylsulfate reductase genes in Arabidopsis, identified a coherence group associated with sulfur metabolism (Fig. 4A In Figure 4B Figure 4C Figure 4D Shown in Figure 4E The selected seed genes for metabolic functions revealed a structure of the model (Fig. 4 Subnetworks describing cell wall biosynthesis and related processes As another example, we analyzed placement of cellulose synthase genes, CESAn, in the network. Two major subnetworks were identified that covered eight CESA genes. Figure 5 Of particular interest here were genes related to secondary cell wall synthesis. Covered in Figure 5B In addition to group I and II CESAs, CESA10, one of the cellulose synthases involved in the biosynthesis of primary cell walls (Beeckman et al. 2002), appeared in a subnetwork with relationships to epidermal cell development, including trichomes, root hairs, and seed coats (Supplemental Fig. S2J). Also clustered in separate, but closely related subnetworks were genes related to lignin and wax biosynthesis (Supplemental Fig. S2K,L). Gene modules related to cell wall synthesis showed substantial overlap with networks based on Pearson correlation coefficients, with the exception that GGM provided more complex structure in as far as additional nodes were inserted. Also, highest correlation with genes reported by Pearson correlation were found only when the subgraphs were extended by several edges. For example, the cellulose synthases CESA4, CESA7 (IRX3), and CESA8, and additional genes in the synthesis of secondary cell walls (Fig. 5B Arabidopsis responses to cold stress To visualize a network for genes induced by cold stress, we extracted a subnetwork centered on CBF1, CBF2, DREB1A, and RAV1 (Fig. 6A The center of the subnetwork was dominated by DREB-type transcription factors (Fig. 6B Genes strongly induced by cold stress, and as well by a variety of other stress treatments (Fig. 6C Figure 6D Other cold stress-induced functions included genes related to the regulation of circadian rhythm (Fig. 6F Comparison of GGM with a relevance network Relevance networks based on standard Pearson correlation establish relationships different from GGM, without reference to other genes (Schäfer and Strimmer 2005c). Two genes may demonstrate the difference. ST3 and ST4 list the top 30 genes with the highest Pearson correlation in relationship to genes SQD2, a sulfolipid synthase (Fig. 4B The complete Arabidopsis data set that had generated the GGM network was then used to construct a relevance network (Supplemental Data File S2). This analysis recovered 134,594 gene-pair interactions among 5745 genes with Pearson correlation coefficients larger or equal to 0.80. We excluded 12 negative interactions, lower than −0.80, in this analysis. Figure 7
The relevance network showed node distribution more similar to power-law (Supplemental Fig. S3), but many highly connected nodes in this relevance network were connected internally. Supplemental Figure S4A shows a subnetwork for the 100 most connected nodes, with 1939 interactions. Among these interactions, 1936 were assigned pcors lower than 0.10 and deemed insignificant in GGM, because the corresponding gene pairs shared expression patterns with many other genes, which then explained the low number of highly connected nodes in the GGM network. Additionally, GGM required high similarity in expression pattern for a gene to become connected with a highly populated node. As observed, this constraint in highly connected nodes generated the truncated power-law distribution for the whole network (Amaral et al. 2000). However, GGM continued to identify potential hubs, as shown by the 100 most highly connected nodes. The relevance network sorted these nodes into three potential hubs, while GGM arrived at a much higher number (Supplemental Fig. S4). The model used is based on a shrinkage approach (Schäfer and Strimmer 2005c) that expanded classical GGM and performed well for the data set with P slightly larger than n, but was still limited, in that large transcriptomes could not be analyzed. By using iterations coupled with random sampling, our procedure allowed for expanding coverage to the genome level for Arabidopsis. The permutation experiments further indicated a low false discovery rate in this expanded network, whose biological significance was supported by case studies. We note, however, that the final pcor closely approached the 1998th-order partial correlation rather than a full partial correlation, because, in each iteration, only effects of 1998 other genes were removed for every gene pair. We present this GGM as an exploratory tool and heuristic model, whose significance is supported by the case studies outlined. GGM-based gene network structures at the genome level for Arabidopsis have not been presented before, but networks for selected pathways have been constructed (Wille et al. 2004; Nikiforova et al. 2005; Li and Gui 2006; Gutierrez et al. 2007). The models presented here, when queried for nodes in these pathways, revealed significant overlap (data not shown). Recently, coexpression patterns based on Pearson correlation coefficients to infer gene function have been a highly active field in Arabidopsis research, with approaches expanding into two directions. For one, focus on coexpression of genes in selected functions, such as glucosinolate biosynthesis, primary carbon and nitrogen, or secondary metabolism, showed considerable overlap with this GGM network (Williams and Bowles 2004; Gachon et al. 2005; Wei et al. 2006; Hirai et al. 2007). Typically, these studies relied heavily on prior knowledge, such as biochemically established pathway structures, which is not a requirement for the GGM presented here. A second approach established databases that may be queried with individual genes to extract information about coexpressed genes (Zimmermann et al. 2004; Aoki et al. 2007; Obayashi et al. 2007). For one example, the database ATTED-II (Obayashi et al. 2007) lists highly coexpressed genes for every gene. Querying our GGM to the extent of one edge from the seed gene will only reveal a few of the connections identified by ATTED-II, while additional connections appear when the query is extended to include additional edges. Interestingly, these models, based on Pearson correlations alone, have not presented a network for the entire genome, possibly because such a structure would be dominated by genes related to a few dominant functional categories, such as ribosome structure, photosynthesis and carbon fixation, or flowering, while networks of metabolism would be hidden within the immensity of interactions. The examples (Figs. 4 Methods Microarray data All microarray data derived from Affymetrix ATH1 slides. The “Super Bulk Gene Download,” a file with all genes and experiments, was downloaded from NASCarrays (http://affymetrix.arabidopsis.info/narrays/help/usefulfiles.html). By August 2006, the file contained data from 2466 slides recorded as raw intensities. The corresponding experiments are summarized in Table 1. Six slides with missing data were removed and the remaining 2460 slides were subjected to quantile normalization. A method based on “deleted residuals” was used to screen for potential outlier chips (Persson et al. 2005). Briefly, studentized deleted residuals d* are calculated for each probe set in every chip. The d* from the same chip were expected to observe a t distribution. Problematic chips were featured with significant deviation from t distribution of d*, which should be excluded. The Kolmogorov-Smirnov (K-S) goodness-of-fit test was used to calculate the K-S D-value to decide whether the d* from a chip fit the t distribution. With the K-S D value set at 0.10, we identified 415 chips, around 17% of all chips, as potential outliers. The raw intensity data (after quantile normalization) from the remaining 2045 chips were rounded to integers (for values ≥10) or to the first digit after the decimal (for values <10), and used for analysis. Of 22,810 Affymetrix ATH1 probe sets, 22,266 were annotated as actual Arabidopsis genes. Data from these 22,266 probe sets were used for the GGM network construction, including both the pilot experiment and the expanded network. We treated each probe set as an individual gene. For probe sets matching more than one gene, we used the name of one of the matched genes. Supplemental Table 2 lists probe sets, corresponding gene names, and annotations. The pilot experiment The shrinkage approach (Schäfer and Strimmer 2005c) was used to estimate partial correlation coefficients (pcor) of gene pairs among 2000 chosen genes. The highest 0.01% and lowest 0.01% of pcors were excluded when building the null model. All calculations were conducted via the software package “GeneNet”, version 1.0.1 (Schäfer et al. 2006). Genes used in pilot experiments are listed in Supplemental Table 3. The GGM network for the entire Arabidopsis genome In total, 2000 iterations with random sampling were used to expand the network to cover the whole genome. Iteratively, 2000 genes were randomly selected and the “ggm.estmate.pcor” in GeneNet package 1.0.1 was used to estimate the pcor between gene pairs. Pcors from all iterations were recorded. With an average of 3 min, 10 sec per iteration on a PC (Intel Core2 E6420 processor), one round of 2000 iterations consumes ~4 d. For each gene pair the pcor with the lowest absolute value was chosen as the final value. Supplemental Table 4 lists the significant interactions with absolute values of estimated pcors larger or equal to 0.10 used to construct the network. Permutation experiment The raw intensity data set (after quantile normalization) with 22,266 genes from 2045 chips was used. For permutations of a gene, the intensity values for that gene in all 2045 chips were collected, and then randomly and nonrepeatedly assigned as the intensity values for that gene among the 2045 chips. In one experiment, the entire 22,266 genes were permutated, while in a second, 1000 randomly selected genes were permutated. The permutated data set were then subjected to the analysis procedure described before. Network layout and visualization Three methods were used. For the complete network (Fig. 3B Acknowledgments We thank the members of NASCArrays and the laboratories providing data for contributing to the database. Advice by Dr. K. Strimmer is gratefully acknowledged. The work was supported by grants from the National Science Foundation Plant Genome Project (DBI-0223905) and University of Illinois at Urbana-Champaign institutional grants. S.M. conceived the experimental approach and performed calculations. S.M., Q.G., and H.J.B. analyzed intermediary approaches to the problem and wrote the article. Footnotes [Supplemental material is available online at www.genome.org.] Article published online before print. Article and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.6911207 References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Nat Rev Genet. 2004 Feb; 5(2):101-13.
[Nat Rev Genet. 2004]Trends Biotechnol. 2002 Nov; 20(11):467-72.
[Trends Biotechnol. 2002]Trends Genet. 2002 Aug; 18(8):395-8.
[Trends Genet. 2002]BMC Bioinformatics. 2005 Dec 13; 6():299.
[BMC Bioinformatics. 2005]Proc Natl Acad Sci U S A. 2001 Apr 10; 98(8):4569-74.
[Proc Natl Acad Sci U S A. 2001]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Nat Genet. 2005 May; 37(5):501-6.
[Nat Genet. 2005]Plant J. 2007 Apr; 50(2):347-63.
[Plant J. 2007]Trends Biotechnol. 2002 Nov; 20(11):467-72.
[Trends Biotechnol. 2002]Bioinformatics. 2002 Feb; 18(2):287-97.
[Bioinformatics. 2002]Genome Biol. 2004; 5(12):R100.
[Genome Biol. 2004]Bioinformatics. 2002 Feb; 18(2):287-97.
[Bioinformatics. 2002]Genome Biol. 2004; 5(12):R100.
[Genome Biol. 2004]Genome Biol. 2004; 5(11):R92.
[Genome Biol. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D575-7.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2005 Jun 14; 102(24):8633-8.
[Proc Natl Acad Sci U S A. 2005]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Proc Natl Acad Sci U S A. 2002 Apr 30; 99(9):6163-8.
[Proc Natl Acad Sci U S A. 2002]Plant Physiol. 2005 Mar; 137(3):961-8.
[Plant Physiol. 2005]Plant Cell Rep. 2006 Dec; 25(12):1263-74.
[Plant Cell Rep. 2006]Nat Rev Genet. 2005 Sep; 6(9):688-98.
[Nat Rev Genet. 2005]Genome Res. 2004 Jun; 14(6):1085-94.
[Genome Res. 2004]Genome Biol. 2004; 5(12):R100.
[Genome Biol. 2004]Genome Biol. 2006; 7(7):R55.
[Genome Biol. 2006]Science. 1999 Oct 15; 286(5439):509-12.
[Science. 1999]Proc Natl Acad Sci U S A. 2000 Oct 10; 97(21):11149-52.
[Proc Natl Acad Sci U S A. 2000]Nature. 2001 May 3; 411(6833):41-2.
[Nature. 2001]Science. 2003 Dec 5; 302(5651):1727-36.
[Science. 2003]J Comput Biol. 2006 Apr; 13(3):810-8.
[J Comput Biol. 2006]Genome Biol. 2004; 5(10):R80.
[Genome Biol. 2004]Plant J. 2003 Feb; 33(4):633-50.
[Plant J. 2003]J Exp Bot. 2005 Jul; 56(417):1887-96.
[J Exp Bot. 2005]Plant Mol Biol. 2007 Jan; 63(2):221-35.
[Plant Mol Biol. 2007]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6765-70.
[Proc Natl Acad Sci U S A. 2006]J Biol Chem. 2005 Mar 4; 280(9):7469-76.
[J Biol Chem. 2005]Proc Natl Acad Sci U S A. 2002 Apr 16; 99(8):5732-7.
[Proc Natl Acad Sci U S A. 2002]J Biol Chem. 2005 Jan 28; 280(4):2397-400.
[J Biol Chem. 2005]Proc Natl Acad Sci U S A. 2005 Aug 16; 102(33):11934-9.
[Proc Natl Acad Sci U S A. 2005]Nature. 2002 Jan 10; 415(6868):141-7.
[Nature. 2002]Physiol Plant. 2001 Mar; 111(3):345-352.
[Physiol Plant. 2001]Plant Cell. 2005 Oct; 17(10):2832-47.
[Plant Cell. 2005]Proc Natl Acad Sci U S A. 2006 Jun 20; 103(25):9732-7.
[Proc Natl Acad Sci U S A. 2006]Plant Physiol. 2003 May; 132(1):64-74.
[Plant Physiol. 2003]Proc Natl Acad Sci U S A. 2006 Apr 25; 103(17):6765-70.
[Proc Natl Acad Sci U S A. 2006]Plant J. 2003 Feb; 33(4):633-50.
[Plant J. 2003]J Exp Bot. 2005 Jul; 56(417):1887-96.
[J Exp Bot. 2005]Proc Natl Acad Sci U S A. 2004 May 25; 101(21):8245-50.
[Proc Natl Acad Sci U S A. 2004]Plant Mol Biol. 2007 Jan; 63(2):221-35.
[Plant Mol Biol. 2007]Annu Rev Cell Dev Biol. 2006; 22():53-78.
[Annu Rev Cell Dev Biol. 2006]Plant Cell. 2005 Aug; 17(8):2281-95.
[Plant Cell. 2005]Proc Natl Acad Sci U S A. 2005 Jun 14; 102(24):8633-8.
[Proc Natl Acad Sci U S A. 2005]Plant Cell. 2006 Nov; 18(11):3158-70.
[Plant Cell. 2006]Plant Physiol. 2004 Sep; 136(1):2621-32.
[Plant Physiol. 2004]Plant Physiol. 2002 Dec; 130(4):1883-93.
[Plant Physiol. 2002]Plant Cell. 2005 Aug; 17(8):2281-95.
[Plant Cell. 2005]Proc Natl Acad Sci U S A. 2005 Jun 14; 102(24):8633-8.
[Proc Natl Acad Sci U S A. 2005]Plant Cell. 2006 Nov; 18(11):3158-70.
[Plant Cell. 2006]Plant Physiol. 2004 Sep; 136(1):2621-32.
[Plant Physiol. 2004]Plant Mol Biol. 2004 Mar; 54(5):767-81.
[Plant Mol Biol. 2004]Plant J. 2004 Jun; 38(6):982-93.
[Plant J. 2004]Plant Cell Rep. 2006 Dec; 25(12):1263-74.
[Plant Cell Rep. 2006]Plant J. 2003 Apr; 34(2):217-28.
[Plant J. 2003]Plant Cell. 2006 Oct; 18(10):2733-48.
[Plant Cell. 2006]FEBS Lett. 2006 Dec 11; 580(28-29):6537-42.
[FEBS Lett. 2006]Genes Dev. 2003 Feb 1; 17(3):410-8.
[Genes Dev. 2003]Plant Physiol. 2004 Sep; 136(1):2621-32.
[Plant Physiol. 2004]Plant Cell. 2006 Dec; 18(12):3415-28.
[Plant Cell. 2006]Plant Cell. 1997 Mar; 9(3):297-304.
[Plant Cell. 1997]Plant Physiol. 2005 Mar; 137(3):961-8.
[Plant Physiol. 2005]Proc Natl Acad Sci U S A. 2000 Oct 10; 97(21):11149-52.
[Proc Natl Acad Sci U S A. 2000]Genome Biol. 2004; 5(11):R92.
[Genome Biol. 2004]J Exp Bot. 2005 Jul; 56(417):1887-96.
[J Exp Bot. 2005]Biostatistics. 2006 Apr; 7(2):302-17.
[Biostatistics. 2006]Genome Biol. 2007; 8(1):R7.
[Genome Biol. 2007]Genome Res. 2004 Jun; 14(6):1060-7.
[Genome Res. 2004]Proc Natl Acad Sci U S A. 2005 Jun 14; 102(24):8633-8.
[Proc Natl Acad Sci U S A. 2005]Bioinformatics. 2006 Dec 1; 22(23):2968-70.
[Bioinformatics. 2006]