![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2008, American Society of Plant Biologists Identification and Characterization of Nucleotide-Binding Site-Leucine-Rich Repeat Genes in the Model Plant Medicago truncatula1[W][OA] Laboratoire des Interactions Plantes Microorganismes, UMR CNRS-INRA 442–2594, 31326 Castanet Tolosan, France (C.A.-T.); Departments of Plant Pathology and Plant Biology, University of Minnesota, St. Paul, Minnesota 55108 (C.A.-T., B.-B.W., N.D.Y.); Advanced Center for Genome Technology and Department of Chemistry and Biochemistry, University of Oklahoma, Norman, Oklahoma 73109 (M.S.O., B.A.R.); Department of Plant and Soil Sciences, University of Kentucky, Lexington, Kentucky 40546–0236 (H.Z.); and United States Department of Agriculture-Agricultural Research Service and Department of Agronomy, Iowa State University, Ames, Iowa 50011 (S.B.C.) *Corresponding author; e-mail steven.cannon/at/ars.usda.gov. Received June 25, 2007; Accepted October 19, 2007. This article has been cited by other articles in PMC.Abstract The nucleotide-binding site (NBS)-Leucine-rich repeat (LRR) gene family accounts for the largest number of known disease resistance genes, and is one of the largest gene families in plant genomes. We have identified 333 nonredundant NBS-LRRs in the current Medicago truncatula draft genome (Mt1.0), likely representing 400 to 500 NBS-LRRs in the full genome, or roughly 3 times the number present in Arabidopsis (Arabidopsis thaliana). Although many characteristics of the gene family are similar to those described on other plant genomes, several evolutionary features are particularly pronounced in M. truncatula, including a high degree of clustering, evidence of significant numbers of ectopic translocations from clusters to other parts of the genome, a small number of more evolutionarily stable NBS-LRRs, and numerous truncations and fusions leading to novel domain compositions. The gene family clearly has had a large impact on the structure of the genome, both through ectopic translocations (potentially, a means of seeding new NBS-LRR clusters), and through two extraordinarily large superclusters. Chromosome 6 encodes approximately 34% of all TIR-NBS-LRRs, while chromosome 3 encodes approximately 40% of all coiled-coil-NBS-LRRs. Almost all atypical domain combinations are in the TIR-NBS-LRR subfamily, with many occurring within one genomic cluster. This analysis shows the gene family not only is important functionally and agronomically, but also plays a structural role in the genome. Plants have evolved sophisticated mechanisms to recognize and guard against pathogens. Interaction between hosts and pathogens triggers both localized and systemic resistance responses. Disease resistance frequently is governed by specific recognition between pathogen AVIRULENCE genes and corresponding plant disease RESISTANCE (R) genes. This type of gene for gene interaction usually is accompanied by a hypersensitive response leading to the restriction of pathogen growth. In the past decade, R genes have been cloned from numerous plant species, conferring resistance to a wide range of plant pathogens including bacteria, fungi, oomycetes, viruses, and nematodes (Dangl and Jones, 2001; Meyers et al., 2003; DeYoung and Innes, 2006). However, despite the wide range of pathogen taxa involved, R genes seem to encode a limited set of proteins consisting of conserved domains (for review, see Dangl and Jones, 2001). The largest class of R genes encodes proteins with a nucleotide-binding site (NBS) and a Leu-rich repeat (LRR) region. This domain architecture is consistent with a role in pathogen recognition and defense response signaling. The NBS domain contains several conserved motifs typically found in ATP- or GTP-binding proteins and also present in several structurally related regulators of animal apoptosis (Traut, 1994). In plant R proteins, the NBS region is a conserved domain that is responsible for the binding and the hydrolysis of ATP and GTP (Tameling et al., 2002). LRRs are typically involved in protein-protein interactions, and various studies indicate that the LRR motif is at least partly responsible for recognition specificity (Kobe and Deisenhofer, 1995; Leister and Katagiri, 2000). Between the NBS and LRR domains, the ARC domain has recently been identified to play a role in the recruitment of the LRR domain to the N-terminal region and in the molecule inactive/active statement (Rairdan and Moffett, 2006). Some studies support a guard hypothesis model for R gene function, where NBS-LRR proteins guard plant targets (guardee) against pathogen effector proteins, and some R proteins have been shown to be activated upon interaction between pathogen virulence factors and guardee proteins (Axtell and Staskawicz, 2003; Mackey et al., 2003; Belkhadir et al., 2004). The NBS-LRR family of R genes can be further divided into two subfamilies based on deduced N-terminal structural domains. One subfamily, termed TIR-NBS-LRR or TNL, encodes a domain with similarity to the intracellular signaling domains of the Drosophila Toll and mammalian INTERLEUKIN1 receptor, while the second, termed coiled-coil (CC)-NBS-LRR or CNL, codes for a putative CC domain in the N-terminal region. These two subfamilies can also be distinguished by the unique amino acid motifs found within the NBS domain itself (Meyers et al., 1999; Pan et al., 2000). Although the TIR domain interacts with effector molecules (Axtell and Staskawicz, 2003), the TNL and CNL subfamilies seem to require different downstream factors, mediated primarily via the different N-terminal domains in these families. In genes characterized to date, the genes from the TNL and CNL subfamilies depend on EDS1- or NPR1-type signaling pathways, respectively (Aarts et al., 1998; Glazebrook, 1999; Peart et al., 2002; Hu et al., 2005; Wiermer et al., 2005). Nevertheless, the variety of domain arrangements in this large, diverse family suggests that members of the family will likely participate in a range of signaling pathways. Conservation of the NBS domain has been used to study the genomic architecture of this gene family. R genes are unevenly distributed in plant genomes and many reside in local multigene clusters. The clustered distribution of R genes provides a reservoir of genetic variation from which new specificities can evolve. Mechanisms like duplication, unequal crossing over, ectopic recombination, gene conversion, and diversifying selection have been proposed to contribute to the structure of R gene clusters and the evolution of resistance specificities (Michelmore and Meyers, 1998; Young, 2000; Sun et al., 2001). Moreover, the presence of conserved motifs primarily within the NBS domain has been used extensively to identify resistance genes homologs in model and crop species (Kanazin et al., 1996; Aarts et al., 1998; Penuela et al., 2002; Zhu et al., 2002; Ferrier-Cana et al., 2003; Yaish et al., 2004; Palomino et al., 2006). In species where R genes have been studied, this pattern of widespread multigene clusters is common (Young, 2000; Hulbert et al., 2001; Meyers et al., 2003). In Arabidopsis (Arabidopsis thaliana), where R genes have been studied in detail, 149 NBS-LRR-encoding genes plus 58 related genes lacking LRRs have been identified (Meyers et al., 1999, 2003). Both CNL and TNL classes could be further subdivided based on specific motifs, intron position, and genomic distribution. Interestingly, CNL genes are widely distributed in both monocot and dicot species, while TNLs appear to be restricted to dicot species (Meyers et al., 1999). A few TN genes (TNL but lacking an LRR domain) have been identified in rice (Oryza sativa) but differ greatly from typical TNL genes (Bai et al., 2002). Thus, it appears that the R gene arsenal of monocot and dicot species have significantly diverged during the evolution of these plant lineages. M. truncatula is a self-fertile, annual, and diploid plant that has been selected as a model legume (Barker et al., 1990; Cook, 1999). Previously, we reported on the identification of NBS encoding sequences in M. truncatula based on specific PCR amplification with primers designed from conserved regions of the NBS domain (Zhu et al., 2002). This earlier study identified 147 NBS-LRR sequences from M. truncatula (107 TNL and 40 CNL; Zhu et al., 2002). More recently, M. truncatula has become the target for hierarchical genome sequencing based on bacterial artificial chromosome clones (BACs). This ongoing effort to sequence the M. truncatula genome now enables a much more extensive and detailed examination of the NBS-LRRs in this model legume. In the public draft assembly, which is estimated to span approximately 60% of the euchromatic space of M. truncatula, we identified at least 333 NBS-LRR encoding genes in the A17 genotype that is now being sequenced. Here we report an analysis of the evolution and genomic organization of these genes in M. truncatula based on genomic sequence data from the first large-scale genome assembly of the ongoing sequencing project (www.medicago.org/genome). RESULTS Genome Assembly, Gene Prediction, and Nomenclature Genomic sequence consisted of the 1.0 draft genome assembly generated by the Medicago Genome Sequencing Consortium (MGSC; http://www.medicago.org/genome/downloads/Mt1/), with gene predictions from the International Medicago Genome Annotation Group (IMGAG; Town, 2006). The assembly consists of 1,826 BAC clones, in 348 sequence contigs and 275 scaffolds, altogether comprising 186.2 Mbp nonredundant genome sequence. The MGSC estimates that the assembly covers 38% to 47% of the entire M. truncatula genome, and captures a larger proportion (55%–58%) of genes (Mt1.0 fact sheet, http://www.medicago.org/genome/downloads/Mt1/Mt1.0.pdf). The M. truncatula NBS-LRR genes in this study were identified from IMGAG annotated genes. Gene names (Supplemental Table S1) follow the IMGAG naming convention, as illustrated by gene AC148761_18.3: The characters before the underscore are the GenBank accession for the source BAC; the number after the underscore is the gene number within the BAC; and the number after the period is the version of this gene call. For convenience in tree figures, shorter, more informative names are used. Aliases to IMGAG names are provided in Supplemental Table S1. The format for short names is illustrated by Mt2g1873: genus and species in the first two characters, followed by chromosome number, then type (g = gene), then gene order in the pseudomolecule build (Mt1.0; http://www.medicago.org/genome/downloads/Mt1). These names are for use only in this article; the persistent names (in GenBank and EMBL) use the IMGAG format. Identification of NBS-LRR Genes in M. truncatula We used similarity searches based on extended NBS-LRR domains (see “Materials and Methods”) to identify NBS-LRR genes in the A17 ecotype genomic sequence. A total of 333 nonredundant sequences, consisting of 177 putative CNL and 156 TNL sequences, were used in subsequent analyses, as described in “Materials and Methods” (Supplemental Table S1). Thirty additional sequences that appear to be related to or derived from NBS-LRRs, but were too divergent for inclusion in phylogenetic analyses, are also shown in Supplemental Table S1. The 333 sequences included in phylogenetic analyses are distributed across all chromosomes of M. truncatula (Fig. 1
As M. truncatula has not been fully sequenced, we also attempted to estimate how many NBS-LRR genes may be missing in this study. We asked what proportion of a random set of expressed M. truncatula NBS genes are found in the Mt1.0 assembly. We find that proportion to be approximately 196/294 = 2/3, as follows. For the random set of expressed M. truncatula NBS genes (the denominator), we took the M. truncatula transcript assemblies and singletons (TA unigenes from http://plantta.tigr.org, release 2) that (1) have a tblastn E value of 1e-15 with some M. truncatula NBS gene, and (2) have higher similarity to an Arabidopsis NBS-LRR than to an Arabidopsis gene from any other gene family. Applying only the first criterion, we find 661 NBS-like M. truncatula sequences among the 55,182 TA unigenes. Applying the second criterion lowers the number to 294, because a large proportion of the initial candidates are actually more similar to some other Arabidopsis gene. Given the denominator (294), we then asked how many M. truncatula NBS genes have a nearly identical match to those Mt NBS sequences (196). Therefore, we estimate that the current genome sequence contains roughly two-thirds (196/294) of M. truncatula NBS-LRR genes. This is consistent with estimates that the Mt1.0 assembly covers approximately two-thirds of the euchromatic space of M. truncatula. Genomic Distribution and Phylogenetic Analysis Most NBS are physically clustered in the genome (Fig. 1 Phylogenetic trees were constructed from 333 NBS sequences (177 CNLs and 156 TNLs). As previous studies have shown that phylogenies calculated from the NBS domain robustly distinguish the TNL and CNL subfamilies (Meyers et al., 1999), we constructed phylogenies separately for these subfamilies (CNL, Fig. 2, A and B
Figures 1 Most NBS derive from relatively recent gene duplications and for the most part they are highly similar to other NBS in the same genomic clusters (although some of the observed sequence similarity may also be the result of gene conversion). In Figure 2 Most legume NBS sequences are found at greater than 0.5 PAM units (accepted point mutations per site) from these coalescence points with nonlegume species. By measuring the phylogenetic distance between M. truncatula NBS sequences, we can assess how many have originated recently. In this context, a distance cutoff of 0.5 PAM units between M. truncatula sequences (or average distance of 0.25 PAM to the M. truncatula-M. truncatula coalescence point) is a reasonable indicator of nearness in that it is much shorter than the evolutionary distance to Arabidopsis or poplar sequences, and so represents gene duplications that most probably occurred within legumes. On average, each M. truncatula NBS is within 0.5 PAM of 9.2 other sequences, again indicating that many groups of sequences have high sequence similarity. This is evident in the trees in Figure 2 The tree in Figure 2 Composition of Clusters and Evidence of Transposing Duplications Most legume-specific clades are dominated by sequences from one chromosome (and usually from one or a small number of genomic clusters), but many also contain small numbers of sequences from other chromosomes. Specifically, 14 CNL and six TNL legume-specific clades are mixed (with sequences from multiple chromosomes). This is 80% of all clades. A mixed clade could arise in several ways: by chromosomal rearrangement (for example, breakage and fusion), by transposition, or by large-scale genomic duplication. In at least five clades, the origin of sequences from different chromosomes (or widely separated parts of one chromosome) can be traced to internal synteny in M. truncatula, the likely remnants of an early episode of polyploidy in the legumes (Cannon et al., 2006). Examples are shown in Figure 2 While some mixed clades, including TNL-1 just described, can be traced to internal segmental duplications and others are probably cryptic remnants of duplications, no longer apparent after rearrangements, there are also other mixed clades and clusters that are best explained as ectopic translocations. Supplemental Table S1 indicates 29 such cases: instances in which a clade of closely related sequences from one genomic cluster, with one sequence occurring in a distant part of the genome. These instances can be thought of as having donor regions (a cluster of related sequences in one part of the genome) and acceptor regions (the location of the related gene outside of the home cluster). Examples of such clades are Figure 2A Probable transpositions (donations) do not seem to target particular locations. They seem to occur throughout the genome and not just in NBS-rich regions. For example, there are seven donations to chromosome 1, but there are only 11 NBS-LRRs, in total, on chromosome 1. These are mostly unclustered (five donations occur as singletons and the remainder occur in two clusters). There are, however, some instances of apparent donations into existing clusters. Several examples are the TNL genes in the large CNL cluster on chromosome 3 (Mt6g1868 → Mt3g5125, Mt2g3436 → Mt3g4772) and the CNL genes in TNL clusters on chromosome 5 (Mt8g385 → Mt5g2441, Mt8g683 → Mt5g5043). While some clusters contain genes that appear to have come from other regions of the genome, a broader phenomenon is physical clusters that include divergent sequences (regardless of the origin). There are 26 instances of TNL and CNL genes falling within 100 kb of one another (Supplemental Table S1). In fact, five pairs of CNL and TNL genes fall within single BACs: Mt7g2240/AC169666_26.4 (CNL-4) and Mt7g2242/AC169666_23.4 (TNL-2); Mt5g3987/CT963106_1.4 (CNL-9) and Mt5g3991/CT963106_17.4 (TNL-8); Mt3g812/CT963074_17.3 (TNL-4) and Mt3g833/CT963074_13.3 (CNL-3); Mt3g796/CT963132_6.3 (TNL-4) and Mt3g809/CT963132_9.3 (CNL-4); Mt3g654/CT967304_15.3 (CNL-2) and Mt3g655/CT967304_17.3 (TNL-8). Domain Analyses Protein domains of the 333 NBS-encoding genes in this study were predicted using Hidden Markov model (HMM) searches against Pfam v. 20 (Bateman et al., 2002; Eddy, 2003), followed by correction of likely prediction errors (e.g. fusions with adjacent transposon proteins). Domain arrangements were divided into putative structural categories according to the nature, number, and organization of their constituent domains, as listed in Supplemental Table S1 and counted in Table I. Unusual domain arrangements are also shown in Figure 2, A to C
Comparisons between protein sequences within a single structural category revealed some likely inaccuracies in automated gene predictions and annotations. Probable misannotations were detected in 10 proteins (indicated with an asterisk in the domains column of Supplemental Table S1), and generally consisted of an additional exon in the C-terminal region. Such exons include HSP70, reverse transcriptase, MMR-HSR (GTPase), RNase H, and chaperone-associated domain. These are not included in the tally of domain classes in Table I. Pfam analyses could not identify the CC motif present in the N-terminal region, even though previous studies have demonstrated that the presence or absence of this motif is correlated with specific signatures in the NBS domain (Meyers et al., 1999, 2002). We used these NBS signatures as the basis for classifying sequences in the CNL subfamily. A majority of the proteins examined belong to the canonical classes described in the literature (Meyers et al., 1999, 2003): CNLs or TNLs (Table I, classes 2 and 8). A minority of genes, however, have less typical domain arrangements. Interestingly, almost all of these are in the TNL subfamily (discussed later). In the CNL subfamily, the predominant unusual domain arrangement is a missing LRR; specifically, 25/177 (16%) lack the LRR (i.e. CN). Only one other unusual class observed in the CNL is the result of a putative fusion in CU013515_1.4/Mt5g1164, with the Rpw8 domain. The closest homolog of CU013515_1.4/Mt5g1164 in Arabidopsis (At5g66910) displays the same domain structure. The Rpw8 gene in Arabidopsis provides broad-spectrum resistance mildew resistance (Xiao et al., 2001). There are six proteins with Rpw8 similarity in Arabidopsis; one of these (AT5G66910 = RPW8.1) is a C-terminal fusion with an NBS-LRR domain. Recent studies show that RPW8.1 from Arabidopsis is absent in the Arabidopsis lyrata genome (Orgil et al., 2007), presumably due to a deletion event (Xiao et al., 2001). Both the CU013515_1.4/Mt5g1164 and At5g66910 proteins contain four LRR domains and are reciprocal top matches to each other. Thus, it appears that both have been retained as single copy genes in their respective genomes over the approximately 100 million years since the last common ancestor of these plant lineages (Supplemental Data S3 and S17). In contrast to the CNL, the TNL subfamily is highly diverse in terms of domain arrangements. Only 86/156 (55%) are typical TNL. The second and third most common classes are TN (27/156 = 17%) and NL (25/156 = 16%). One of the most intriguing sets of atypical domain arrangements is within clade TNL-8 (bottom of Fig. 2C An additional intriguing instance of a putative fusion in the TNL subfamily is a predicted protein with domains TNLTNL (AC126790_31.4/Mt6g1826; GenBank ID ABE83302.2). Such a fusion would not be unprecedented, as at least one gene with similar structure is present in the current manually annotated Arabidopsis peptides (At3g25510; The Institute for Genomic Research v.7). The Arabidopsis gene is not an ortholog, as it apparently results from an independent event. Rexamination of M. truncatula BAC AC126790.38 confirmed the initial prediction, and about one-third of the sequence has 100% cDNA support (with ESTs CX538931 and CX524109). Therefore this gene structure and five exons are otherwise not unusual, and the 3,123 nt of coding sequence occurs within the 4,550 nt total gene region. Motif Analyses Analyses of motifs within NBS domains reveal additional features. Since typical NBS domains often contain variable motifs (NBS-A and -C, described in Meyers et al., 1999), while others are more conserved (NBS-B and -D), they can be used to distinguish TNL from CNL sequences. We identified short and highly conserved stretches within each domain configuration class (TNL, TNTNL, etc.; Table I; Supplemental Table S1) using Pfam and HMM domain analysis (Eddy, 2003) and MEME motif analysis (Bailey and Elkan, 1995) on aligned domain subgroups. In most cases, the conserved sequences we observed have been described previously (Meyers et al., 1999, 2002, 2003; Tuskan et al., 2006). As expected, MEME analyses revealed that P-loop, Kin-2, and GLPL motifs all are conserved among NBS gene family members. In contrast, NBS-A and NBS-C from TNL proteins are less conserved and these motifs could not be identified at all in domain classes 6 and 8 (Table I; Supplemental Table S1) due to high levels of diversity in amino acid sequences. We also examined proteins with doubled NBS domains (classes 5, 9, and 11; NTNL, TNTNL, TNLTNL) and found that the two domains within a single protein usually are dissimilar in motif structure (Table I). In each case they have one truncated NBS domain, usually involving NBS-A and NBS-C motifs. Beyond differences in specific motifs, the two NBS domains found within a single gene could also be distinguished by their overall amino acid sequences. For example, Mt6g1826/AC126790_31.4, the only member of class 11, displays a notably high degree of difference between the two NBS-A motifs (Table I). Examination of the two TIR domains within a single protein (classes 9, 10, and 11) does not reveal a high dissimilarity as such as observed in two adjacent NBS domains (data not shown). In Silico Analysis of CNL and TNL Gene Expression Using EST Libraries To assess which genes in this study have expression support, we compared the predicted genes against available ESTs (231,765 ESTs from 55 libraries, from GenBank in April, 2006). Because many NBS-LRR genes are similar to one another, only top matches were considered, after applying a high match stringency of at least 95% of nucleotide identity between EST and genomic sequences. At this threshold, 168 NBS genes in this study have EST support, representing 50.5% of predicted genes in the study (indicated by EST matches in the right-hand column of Fig. 2, A–C Approximate expression patterns, judged by counts of EST matches, vary substantially between clades and even between highly similar genes within the same clade. For example, genes on most branches in the CNL tree in Figure 2A Pseudogenes There are 49 unique sequences in Mt1.0 with stop codons, identified using a CC or TIR consensus NBS sequences in a tblastn query against the Mt1.0 nucleotide chromosome assemblies, and filtered at E value 1e-10. This is probably an underestimate, as either many pseudogene fragments may fall below this level of significance, or are sequences without stop codons in the region of this query, but not predicted among the IMGAG gene calls in this assembly. Nevertheless, the stated criteria give values for comparison. Pseudogene counts per chromosome are (for chromosomes 0–8) 1, 2, 2, 4, 15, 6, 7, 5, and 7. Counts of CNL- and TNL-like pseudogenes are 22 and 27 (Supplemental Table S2). We also have observed that 91.8% of all predicted NBS pseudogenes are within 100 kb of another predicted NBS gene. These pseudogenes are not, however, distributed on the chromosomes in the same way as predicted NBS genes without stop codons. More pseudogenes are found on Mt4 than would be expected (15 observed versus 6.1 expected), and fewer are found on Mt3 than expected (four observed versus 13.5 expected). These differences are supported by a test for independence by chromosome, with a χ2 value of 0.0033 (degrees of freedom = 8). The differences by chromosome are primarily due to the excess of pseudogenes on Mt4 (13 of 15 of which are TNL class) and the dearth of pseudogenes on Mt3 (most of which would have been expected to be CNL class, following the pattern of distribution of predicted NBS genes). Further, all of the Mt4 TNL pseudogenes are near predicted TNL genes in the two large clusters of TNL genes with diverse domain arrangements (Fig. 1 There also is evidence that some of the predicted pseudogenes may be expressed. Four of the 49 predicted pseudogenes match at least one EST at 99% to 100% identitity over 58% to 89% of the genomic pseudogene length, and from 76% to 100% of the EST length (Supplemental Table S2). For example, TA38236_3880 (AL375406 AL375407) matches over 1,403 nucleotides, with one mismatch, and neither genomic or EST sequence has extended open reading frames; each contains at least three stop codons in the 716 nt aligning region. In Silico Analysis of the Promoter Regions of the NBS-LRR Genes We identified promoter sequences in 2 kb windows upstream of predicted NBS-LRR genes (Supplemental Table S1). Four regulatory elements implicated in either response to pathogens or plant stress were identified as being overrepresented in the 2 kb region upstream of NBS-LRRs. The regulatory elements were: WBOX cassettes, associated with the WRKY transcription factors (Dong et al., 2003); CBF and DRE boxes (Sakuma et al., 2006); and the GCC motif associated with the ERF-type transcription factors (Ohme-Takagi et al., 2000). WBOX elements are the most numerous, averaging 8.6 for the CNL and 8.4 for the TNL subfamilies (Supplemental Table S1; Fig. 2, A–C DISCUSSION Many aspects of the NBS-LRR disease resistance gene family have been extensively studied and described in other species. This study of NBS-LRRs in M. truncatula confirms many patterns observed in other plant species, but also clarifies some patterns and finds some features that differ at least quantitatively from those seen in other plants. Analysis of overall localization, predicted domain structure, in silico gene expression, promoter regions, and molecular evolution reveal a number of striking features: (1) predominantly recently derived sequences, with most having originated through local duplications; (2) evidence that NBS-LRR clusters, which in many cases dominate multimegabase regions, have played an important role in genomic remodeling; (3) evidence of ectopic translocations of NBS-LRRs from many clusters to other parts of the genome; (4) surprisingly variable domain arrangements, primarily in the TNL subfamily; (5) several novel domain combinations that appear to have originated and proliferated within the legumes; (6) dramatically varying expression patterns, with expression varying both between and within clades; (7) surprising uniformity of promoter regions across the gene family; and (8) patterns of pseudogene distributions related to NBS gene distributions, but differing significantly between clusters. Genomic Organization: Clustered, Donated Singletons, or Maintained Singletons As is the case in other plant genomes, NBS genes predominantly are clustered physically in M. truncatula. This is clearly an outcome of the birth and death process that results from tandem duplication or contraction in a cluster. More intriguing, perhaps, are the exceptions. While most clusters are predominantly comprised of closely related genes, most clusters also include distantly related strangers. While most NBS are found in clusters, some exist as singletons, that in some cases, have close homologs elsewhere in the genome, but in other cases, appear to have been evolving independently. These exceptions have important implications for evolution of this family (and the genome), because although they are rare, they provide sources of novelty and change in the genome. The pattern of clustered, related NBS sequences clearly is an outcome of the birth and death process that results from tandem duplication or contraction in a cluster (Michelmore and Meyers, 1998). In most such cases, the NBS genes all appear to be derived from local duplication events more recent than the split with poplar or Arabidopsis. Examples include clades CNL-2 and TNL-4. Although this pattern has been observed in other species described to date (Noel et al., 1999; Meyers et al., 2003; Monosi et al., 2004), the clusters on Mt3 and Mt6 are both exceptionally large and relatively recent, highlighting the rapid turnover of most of this gene family. Expanded clusters account for a large proportion of NBS-LRRs in M. truncatula. Given a sliding window size of 100 kb, nearly 80% of all M. truncatula NBS genes reside in clusters. This compares to 61% of all NBS genes in clusters Arabidopsis (Meyers et al., 2003). Nearly 50% of M. truncatula NBS-LRRs lie in clusters of five or more, and the largest single cluster, on chromosome Mt6, contains 14 members on just two BAC clones (AC148154 and AC127020). Not only do M. truncatula NBS-LRRs tend to cluster, but many also lie in superclusters, such as the 82 NBS genes on the upper arm of chromosome 3 and the 57 NBS genes on the lower arm of chromosome 6. Interestingly, Mt6 also is more transposon dense than any other chromosome (Cannon et al., 2006; S.B. Cannon and N.D. Young, unpublished data). Such an association between NBS and transposons has been observed before (Graham et al., 2002), but it is not yet clear to what extent an association between NBS-LRR clusters and transposons is causal (in either direction) rather than merely associative. The NBS genes encoded on the Mt3 and Mt6 superclusters represent more than 5% of all the genes, NBS and non-NBS, found on the upper arm of Mt3 and the lower arm of Mt6. At this scale, NBS superclusters probably played a central role in genomic remodeling during the evolution of these chromosome regions. Superclusters in M. truncatula resemble the situation in Arabidopsis, where 32 and 43 NBS-LRRs are found on chromosomes At-1 and At-5, respectively (Meyers et al., 1999, 2003), and in rice, where more than 25% of all NBS genes are located on chromosome 11 (Monosi et al., 2004). Although most clusters are predominantly composed of similar sequences, many clusters also contain some phylogenetically distant NBS genes. Indeed, 26 of 120 M. truncatula clusters include both TNL and CNL members. The presence of heterogeneous NBS clusters in M. truncatula resembles the situation in rice (Monosi et al., 2004) and Arabidopsis, where 10 of 40 clusters are phylogenetically mixed (Baumgarten et al., 2003; Meyers et al., 2003). At least some NBS clusters in M. truncatula are the result of large-scale segmental duplication events, as indicated by shared combinations of phylogenetic clades within multiple physical clusters and by their localization within regions of demonstrated intragenomic synteny (Cannon et al., 2006). Among the minority of NBS-LRRs that are singletons, some of these are closely related to sequences elsewhere in the genome. Although this is a small proportion of all NBS genes in the genome, these genes may play the role of pioneers, seeding new regions of the genome with NBS-LRRs, and potentially establishing new locations for future clusters. Examples of singletons with related genes elsewhere are the three Mt1 NBS-LRRs (pink) in clade TNL-4, Figure 2C The last class of NBS-LRRs are singletons that have no close relatives in the genome. Examples include the single-gene clade CNL-17 or the nine low-copy, unclustered genes in CNL-13 to CNL-15 (Fig. 2B Effects of Whole-Genome Duplication on the Gene Family Several studies have described evidence of large-scale genomic duplication (possibly a whole-genome duplication [WGD]) early in the evolution of the legumes (Schlueter et al., 2004; Cannon et al., 2006). For at least some of the mixed clades, the genes from multiple chromosomes can be mapped to larger internal genomic duplications (duplication blocks). Few such mappings were evident in comparisons to all M. truncatula duplication blocks because NBS-LRR clusters are intrinsically rapidly evolving, and may erase evidence of synteny; synteny within M. truncatula duplication blocks is generally weak and degraded, suggesting significant gene loss and rearrangement in the genome following this early event, and the genome is not completely sequenced (approximately 60% of the euchromatin in the Mt1.0 draft release). Weak detection of M. truncatula duplication blocks may not be surprising considering that estimates of the timing of the WGD place it quite early, at 85 to 55 mya (Schlueter et al., 2004; Lavin et al., 2005). This pattern of high rates of loss of WGD evidence for the NBS-LRR family (relative to other large gene families) also has been described in Arabidopsis (Cannon et al., 2004). Promoter Regions In an evaluation of the 2,000 bp upstream of the NBS-LRR genes in M. truncatula, we found surprising uniformity in the numbers of four overrepresented cis-elements. This uniformity was found across all clades examined, in both TNL and CNL subfamilies. At least within clusters, similar regulatory elements might be expected if regulatory regions duplicate and undergo changes at rates similar to their associated genes. This would be consistent with the finding that tandemly duplicated genes in Arabidopsis have higher levels of conservation of cis-elements when compared to segmentally duplicated genes (Haberer et al., 2004). The finding that all upstream regions of NBS genes examined had at least one WBOX motif indicates that this motif is important for regulation of most, if not all NBS-LRR family genes. Counts of WBOX elements are not significantly different between CC and TIR subfamilies, or between most clades (with the largest difference, interestingly, occurring in the most diverse clade in terms of domain composition, TNL-8). This similarity (at least for WBOX elements) does not necessarily imply that all M. truncatula NBS genes are under similar regulation in all respects, but does suggest that most have at least some regulatory features in common. In particular, pathogen elicitors and salicylic acid are rapid inducers of a large number of WRKY genes in various plants (Eulgem et al., 1999; Chen and Chen, 2000). In turn, WBOX motifs have been described upstream in the NPR1 gene (a positive regulator of inducible plant disease resistance; Yu et al., 2001) and upstream of most Arabidopsis pathogen-response genes (Chen and Chen, 2002; Li et al., 2004). WRKY genes also activate NBS-LRRs in Arabidopsis and grape (Vitis vinifera; Zheng et al., 2006, 2007; Marchive et al., 2007). The widespread presence of WBOX motifs upsteam of so many of NBS genes, however, indicates that fine regulatory control must be due to less conserved and presumably more variable factors. Domain Structures and Expression Patterns M. truncatula NBS genes show diverse domain combinations, although almost all of the diversity exists in the TIR subfamily. The only variants in the CNL (apart from variation in LRR repeat number and a possible fusion with an RPW8 homolog) are CN (no LRR) and CNL (the canonical structure). In contrast, there are nine domain arrangements in the TIR subfamily: N, NL, NT, NTNL, TN, TNL, TNLT, TNLTNL, TNTNL, and TTNL. The much greater domain diversity in the TNL subfamily compared with CNL might be explained in part by their exon-intron structure. The CNL proteins mostly are encoded by a single exon, unlike TNLs that usually are encoded by multiple exons (Meyers et al., 1999; Bai et al., 2002). NBS proteins lacking an LRR also occur in Arabidopsis, and have been suggested to play a role as adapter proteins (Meyers et al., 2003). Apparent exon additions or fusions occur in other genomes, including WRKY-related domains and some metallopeptidases in Arabidopsis (Meyers et al., 2003), as well as the BED/DUF1544 domain in poplar (Tuskan et al., 2006). Meyers et al. (2002) described unusual domain arrangements in Arabidopsis chromosomes 4 and 2, and hypothesized that the TNTNL gene must have been a fusion of a TN and a TNL gene. Poplar also contains instances of the apparent TNLT, TN, TNL, TNLT, and NL domain arrangements (Tuskan et al., 2006). Intriguingly, much of the structural diversity in TNL genes exists in a small number of clusters, suggesting a linkage between physical organization in the genome and the origin of novelty in gene structure. All but one of the unusual TNL domain arrangements (TNLTNL) are found in a single clade (Fig. 2B In general, expression patterns (at least measured by counts of EST matches) are highly variable and are not strongly associated with domain structure or sequence similarity. This is especially striking on chromosome Mt6. Here, there are frequent instances of neighboring NBS genes differing significantly in both expression and structure. For example, a cluster of six TNL genes on Mt6 (on a single BAC clone, AC126790) differ in EST counts ranging from 0 to 31. These same TNL genes also display four distinct domain combinations and differ in upstream WBOX counts, which range from 0 to 11. Pseudogenes An examination of pseudogenes supports rapid turnover of genes in this gene family and identifies some particularly active clusters that have generated both large numbers of diverse new genes and pseudogenes. A relatively restrictive criterion for identifying pseudogenes finds 49, in comparison with the 333 predicted NBS genes reported here. This proportion is similar to that observed in the Arabidopsis TN and TIR-X subfamilies, which contain 47 genes and four pseudogenes, respectively (Meyers et al., 2002). It is possible that the ratio of pseudogenes to genes is relatively characteristic of a gene family, with families experiencing high rates of turnover also having large numbers of pseudogenes. The number of pseudogenes remaining at any given time would depend on the half-life of a pseudogene. The half-life of pseudogenes is thought to be relatively short, estimated at 8 to 9 million years in mouse and human (Sakai et al., 2007), and 14 million years in Drosophila (Petrov et al., 2000). Assuming similar rates in Mt, the observed M. truncatula pseudogenes would all have died recently in comparison to the timeframe of the legumes, which originated approximately 65 mya (Sanderson et al., 2004). It also is interesting to note that at least some of the pseudogenes may be expressed, and therefore not under neutral selection. Four of the predicted pseudogenes have near-perfect (99%–100% identity) support from ESTs. In one such case, the full 716 nt EST contig length matches the genomic pseudogene, and both contain at least three stop codons. Expressed pseudogenes have been observed to regulate the messenger-RNA stability of the corresponding homologous coding gene (Hirotsune et al., 2003). Expressed NBS-LRR pseudogenes have been observed in pine (Pinus monticola; Liu and Ekramoddoullah, 2003) and rice (Monosi et al., 2004). Some mechanisms of gene turnover are suggested by the distribution of NBS pseudogenes in comparison to predicted NBS genes. Most (91.8%) of psedudogenes are found within 100 kb of predicted NBS genes, suggesting that most turnover occurs within clusters. However, there is clearly a greater rate of turnover in some clusters than others. A large excess of pseudogenes is present on Mt4, with 15 observed versus 6.0 expected if the 49 pseudogenes were distributed as are the 333 predicted NBS genes. Further, most (10) of the pseudogenes on Mt4 occur in the cluster that accounts for a large portion of domain diversity in the TNL subfamily (Table I; Fig. 2C CONCLUSION The NBS-LRR gene family remains, despite a great deal of work on many fronts, fascinating and surprising. There was little reason to suspect, prior to the sequencing of M. truncatula, that there would be 3 times as many NBS-LRRs in this genome as in Arabidopsis, or that they would dominate large parts of two chromosomes (Mt3 and Mt6). Similarly it was surprising to find such domain novelty, and to find that almost all the domain novelty exists in the TNL subfamily, and most of that within one genomic cluster. Besides raising more intriguing questions (e.g. precisely how do NBS-LRR translocations occur, and are frequencies different between genomes?), these findings have direct practical agronomic implications. The dramatic pace of birth and death in the family is emphasized by the fact that the large majority of M. truncatula NBS-LRRs exist in cluster nurseries, and will not have one-to-one correspondences to NBS-LRRs in other species. A striking counterexample exists, however, for a minority of genes, which seem to follow a different, more stable evolutionary trajectory. MATERIALS AND METHODS Identification of TNL and CNL Sequences and Pseudogenes in Medicago truncatula We used the 1.0 draft genome assembly generated by the MGSC (http://medicago.org/genome/release1.0), with gene predictions from the IMGAG (Town, 2006). Candidate genes containing NBS domains were identified using blastp similarity (Altschul et al., 1997) at 1e-20 to the following consensus CNL and TNL consensus sequences from plant extended NBS domains (Cannon et al., 2002, 2004). Consensus of CNLs used for blast search was as follows: erpsestiVGletmleklwnrLledndvgivgiyGMGGVGKTTLatqifNdfdvkgehFdrviWVvVSkefnvekiqqdIlekLglgdeewlekteeekaaeienLfqlLegKkfLLvLDDvWekevdLdkigvpfPdrenGsKvlfTTRsesvavcgdmgvdxmevecLtpeeAWeLFqkkvfentlksdpeieelaKevvkkCgGLPLAlkVlGgllacKrtvqEWkraievlssslaaefsgmessilpvLklSYdnLppelKsCFLYcalFPEDykIekekLieyWiaEGfideseggetaedvGyeylgeLVrrsLleegdktdnetsrketVkMHDvvREmALwiaseegfkeviiVraGvglreipnvkswntvrRmSlmnneieelldspenpklrsLltlllqsnsh. Consensus of TNLs used for blast search was as follows: RDFddlVGiEaHlekmksLLcLdsdeVrMVGIwGPaGIGKTTIARALfsqLSssFqlsaFmenlrgsyStrpaglDeYsmKLhLQeqfLSkILnqkDikIhHLGvieERLkdqKVLIiLDDVDdleQLdALAketqWFGpGSRIIVTTeDkqLLkaHgInhIYeVgfPSkeeALqIFCrsAFgQnsPpdGFeeLAreVtkLaGnLPLGLrVlGSsLRGkskeeWedmLpRLrtsLDgkIekvLrvsYDgLhekDqaLFLhIACfFNgekvdyVkalLadsnLDVrqGLkvLadKSLIhisplgdgtieMHnLLqqLGReIVrkQsidePgKRqFLvDaeeIcdVLtdnTGtgsVlGIslDtseieeelnIsekAFegMrNLqFLriykksfrddgk. Pseudogenes were identified using the same consensus CNL and TNL sequences, using a tblastn search (Altschul et al., 1997) against the Mt1.0 nucleotide chromosome assemblies, at E-value 1e-10. Matches were considered to be pseudogenes if tblastn translations (relative to the consensus query sequences above) contained at least one stop codon. Candidate NBS-LRR proteins were provisionally assigned to either the CNL or TNL groups on the basis of similarity, then were aligned to a HMM calculated from a large collection of TNL and CNL extended NBS domains (Cannon et al., 2002, 2004). Consensus from NBS HMM used for whole-family NBS alignment was as follows: GKTTLAraVYNkiadhFeakcFlcvvrefsvkhxlkhlqkqlxxxxxkeikldnvleglsiilkrLsgKKvLLVLDDVwneeQLeaLaggldwxxpGSRIIITTRdkhvLsshgvvrxxtYevegLneeealeLFckkAFkgxxspvdpeYeeigkkiVkycgGLPL. Phylogenetic Analyses Prior to phylogeny construction, sequences containing fewer than 75% of the HMM match-state residues were retained for subsequent analysis, and indels and poorly aligning regions were removed by trimming regions outside the HMM match states. Also, although the IMGAG pseudomolecule assembly process removed most overlapping regions, some redundant sequence remains in the 1.0 draft in unfinished BAC clones. Phylogenies were calculated using parsimony and bootstrapped neighbor joining. Parsimony trees were calculated using protpars in the Phylip suite (PHYLIP [Phylogeny Inference Package] version 3.6; distributed by the author). The input sequence order was jumbled five times, and a topology was calculated based on each data order. One most-parsimonious tree was chosen at random to serve as the basis for branch length calculations. Maximum likelihood branch lengths were calculated on the parsimony topologies using TreePuzzle 5.2 (Schmidt et al., 2002). The model of substitution was of Müller and Vingron (2000). Amino acid frequencies were calculated from the input trees, and rate heterogeneity was allowed with four γ rate categories. The neighbor joining calculation used the ClustalW implementation (Oliver et al., 2005), without Kimura distance correction, on the cleaned alignment from hmmalign, with 1,000 bootstrap replicates. Domain and Motif Predictions Domains were predicted using hmmpfam (Eddy, 2003) comparisons to Pfam v20 (Bateman et al., 2002) with an initial E-value cutoff of 0.1. Predicted NBS-LRR protein sequences were compared to the Pfam v20 HMMs using HMMER 2.3.2. Predictions of motifs were made using MEME and MAST (Bailey and Elkan, 1995). In Silico Expression Analysis and Estimation of NBS-LRR Gene Number Medicago truncatula EST and cDNA sequences were downloaded from GenBank nucleotide database using query (txid3880[ORGN] AND “biomol mrna”[PROP]) for medicago (txid3880[ORGN] AND “biomol mrna”[PROP]). All EST/cDNA sequences were mapped to Mt1.0 BAC sequences by computer program GMAP (Wu and Watanabe, 2005). The alignments were processed and uploaded into MySQL database using ASIP pipeline (Wang and Brendel, 2006). We required >95% identity and 80% coverage for an EST/cDNA to be mapped. If one EST can be mapped to multiple genome location, only the location with best alignment score will be considered. By this method, we minimized the cross mapping of ESTs among duplication genes. To estimate the NBS-LRR gene number in EST collection, 55,182 Medicago Transcript Assemblies and singletons sequences (TA unigenes, release 2) were downloaded from http://plantta.tigr.org. Identified M. truncatula NBS-LRR protein sequences were used as query sequences to search against the TA unigenes by BLAST (Altschul et al., 1997), with an E-value threshold of 1e-15. All TA unigene hits were then searched against Arabidopsis (Arabidopsis thaliana) proteins by blastx (1e-15). Only those TA unigenes with top match to Arabidopsis NBS-LRR genes were considered as candidates for expressed M. truncatula NBS-LRR genes. These expressed candidates were then searched against Mt1.0 BACs by blastn (1e-15) to find out the portion captured by the current M. truncatula genome. Identification and Analysis of the Promoter Regions For each NBS predicted gene, the 2 kb upstream regions were selected according to the position of the genes provided by the IMGAG annotation (Medicago Sequencing Resources) on the BAC sequences of M. truncatula. The extracted sequences were screened against the PLACE database (Higo et al., 1999). Regulatory elements overrepresented in the dataset and known to be involved in regulation during the resistance response and under stressed conditions were selected for further analysis (Jang et al., 2006). Among them, WBOX [sequence TGAC(C/T)], CBF (GTCGAC), DRE [(G/A)CCGAC], and GCC boxes were retained for further analysis. Comparisons to Internal Genomic Duplications in Medicago Comparisons of NBS-LRR gene duplications and large-scale genomic duplications were carried out using the Medicago genome pseudomolecule build (Mt1.0; http://www.medicago.org/genome/downloads/Mt1). Syntenic regions were predicted using National Center for Biotechnology Information blastp self comparisons at E-value 1e-10, then filtering to consider only the top reciprocal best hit between each chromosome pair, then synteny prediction using DiagHunter (Cannon et al., 2003) using the following parameters: compress_factor 2500, use_orientation, near_main_diag 30, min_diag_len 3, min_diag_qual 30, and sensitivity 83. Duplications of NBS-LRR genes were compared against predicted synteny regions using OrthoParaMap (Cannon and Young, 2003) and manual evaluation. Supplemental Data The following materials are available in the online version of this article.
[Supplemental Data]
Acknowledgments Thanks to Roxanne Denny for lab assistance and to Xiaohong Wang, Jayprakash Vasdewani, and Ethalinda Cannon for bioinformatics assistance. Notes 1This work was supported by the National Science Foundation (grant nos. 0321664 and 0321460 to N.D.Y.). The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Steven B. Cannon (steven.cannon@ars.usda.gov). [W]The online version of this article contains Web-only data. [OA]Open Access articles can be viewed online without a subscription. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
Nature. 2001 Jun 14; 411(6839):826-33.
[Nature. 2001]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Nat Immunol. 2006 Dec; 7(12):1243-9.
[Nat Immunol. 2006]Eur J Biochem. 1994 May 15; 222(1):9-19.
[Eur J Biochem. 1994]Plant Cell. 2002 Nov; 14(11):2929-39.
[Plant Cell. 2002]Nature. 1995 Mar 9; 374(6518):183-6.
[Nature. 1995]Plant J. 2000 May; 22(4):345-54.
[Plant J. 2000]Plant Cell. 2006 Aug; 18(8):2082-93.
[Plant Cell. 2006]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]J Mol Evol. 2000 Mar; 50(3):203-13.
[J Mol Evol. 2000]Cell. 2003 Feb 7; 112(3):369-77.
[Cell. 2003]Proc Natl Acad Sci U S A. 1998 Aug 18; 95(17):10306-11.
[Proc Natl Acad Sci U S A. 1998]Curr Opin Plant Biol. 1999 Aug; 2(4):280-6.
[Curr Opin Plant Biol. 1999]Genome Res. 1998 Nov; 8(11):1113-30.
[Genome Res. 1998]Curr Opin Plant Biol. 2000 Aug; 3(4):285-90.
[Curr Opin Plant Biol. 2000]Genetics. 2001 May; 158(1):423-38.
[Genetics. 2001]Proc Natl Acad Sci U S A. 1996 Oct 15; 93(21):11746-50.
[Proc Natl Acad Sci U S A. 1996]Proc Natl Acad Sci U S A. 1998 Aug 18; 95(17):10306-11.
[Proc Natl Acad Sci U S A. 1998]Curr Opin Plant Biol. 1999 Aug; 2(4):301-4.
[Curr Opin Plant Biol. 1999]Mol Plant Microbe Interact. 2002 Jun; 15(6):529-39.
[Mol Plant Microbe Interact. 2002]Curr Opin Plant Biol. 2006 Apr; 9(2):122-7.
[Curr Opin Plant Biol. 2006]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Proc Biol Sci. 2001 Nov 7; 268(1482):2211-20.
[Proc Biol Sci. 2001]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):276-80.
[Nucleic Acids Res. 2002]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Plant J. 2002 Oct; 32(1):77-92.
[Plant J. 2002]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Science. 2001 Jan 5; 291(5501):118-20.
[Science. 2001]Genetics. 2007 Aug; 176(4):2317-33.
[Genetics. 2007]Genome. 2004 Oct; 47(5):868-76.
[Genome. 2004]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]Plant J. 2002 Oct; 32(1):77-92.
[Plant J. 2002]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Plant Mol Biol. 2003 Jan; 51(1):21-37.
[Plant Mol Biol. 2003]Proc Natl Acad Sci U S A. 2006 Dec 5; 103(49):18822-7.
[Proc Natl Acad Sci U S A. 2006]Plant Cell Physiol. 2000 Nov; 41(11):1187-92.
[Plant Cell Physiol. 2000]Genome Res. 1998 Nov; 8(11):1113-30.
[Genome Res. 1998]Plant Cell. 1999 Nov; 11(11):2099-112.
[Plant Cell. 1999]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Theor Appl Genet. 2004 Nov; 109(7):1434-47.
[Theor Appl Genet. 2004]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Genetics. 2002 Dec; 162(4):1961-77.
[Genetics. 2002]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Theor Appl Genet. 2004 Nov; 109(7):1434-47.
[Theor Appl Genet. 2004]Theor Appl Genet. 2004 Nov; 109(7):1434-47.
[Theor Appl Genet. 2004]Genetics. 2003 Sep; 165(1):309-19.
[Genetics. 2003]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Plant Cell. 2000 May; 12(5):663-76.
[Plant Cell. 2000]Plant Cell. 2004 Nov; 16(11):2870-94.
[Plant Cell. 2004]Theor Appl Genet. 2004 Nov; 109(7):1434-47.
[Theor Appl Genet. 2004]Cell. 2003 Feb 7; 112(3):379-89.
[Cell. 2003]Science. 2003 Aug 29; 301(5637):1230-3.
[Science. 2003]Genome. 2004 Oct; 47(5):868-76.
[Genome. 2004]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Syst Biol. 2005 Aug; 54(4):575-94.
[Syst Biol. 2005]BMC Plant Biol. 2004 Jun 1; 4():10.
[BMC Plant Biol. 2004]Plant Physiol. 2004 Oct; 136(2):3009-22.
[Plant Physiol. 2004]EMBO J. 1999 Sep 1; 18(17):4689-99.
[EMBO J. 1999]Plant Mol Biol. 2000 Jan; 42(2):387-96.
[Plant Mol Biol. 2000]Plant Cell. 2001 Jul; 13(7):1527-40.
[Plant Cell. 2001]Plant Physiol. 2002 Jun; 129(2):706-16.
[Plant Physiol. 2002]Plant J. 1999 Nov; 20(3):317-32.
[Plant J. 1999]Genome Res. 2002 Dec; 12(12):1871-84.
[Genome Res. 2002]Plant Cell. 2003 Apr; 15(4):809-34.
[Plant Cell. 2003]Science. 2006 Sep 15; 313(5793):1596-604.
[Science. 2006]Plant J. 2002 Oct; 32(1):77-92.
[Plant J. 2002]Proc Natl Acad Sci U S A. 2006 Oct 3; 103(40):14959-64.
[Proc Natl Acad Sci U S A. 2006]Plant J. 2002 Oct; 32(1):77-92.
[Plant J. 2002]Gene. 2007 Mar 15; 389(2):196-203.
[Gene. 2007]Science. 2000 Feb 11; 287(5455):1060-2.
[Science. 2000]Nature. 2003 May 1; 423(6935):91-6.
[Nature. 2003]Mol Genet Genomics. 2003 Dec; 270(5):432-41.
[Mol Genet Genomics. 2003]Theor Appl Genet. 2004 Nov; 109(7):1434-47.
[Theor Appl Genet. 2004]Curr Opin Plant Biol. 2006 Apr; 9(2):122-7.
[Curr Opin Plant Biol. 2006]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]J Mol Evol. 2002 Apr; 54(4):548-62.
[J Mol Evol. 2002]BMC Plant Biol. 2004 Jun 1; 4():10.
[BMC Plant Biol. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]J Mol Evol. 2002 Apr; 54(4):548-62.
[J Mol Evol. 2002]BMC Plant Biol. 2004 Jun 1; 4():10.
[BMC Plant Biol. 2004]Bioinformatics. 2002 Mar; 18(3):502-4.
[Bioinformatics. 2002]J Comput Biol. 2000; 7(6):761-76.
[J Comput Biol. 2000]Bioinformatics. 2005 Aug 15; 21(16):3431-2.
[Bioinformatics. 2005]Nucleic Acids Res. 2002 Jan 1; 30(1):276-80.
[Nucleic Acids Res. 2002]Proc Int Conf Intell Syst Mol Biol. 1995; 3():21-9.
[Proc Int Conf Intell Syst Mol Biol. 1995]Bioinformatics. 2005 May 1; 21(9):1859-75.
[Bioinformatics. 2005]Proc Natl Acad Sci U S A. 2006 May 2; 103(18):7175-80.
[Proc Natl Acad Sci U S A. 2006]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 1999 Jan 1; 27(1):297-300.
[Nucleic Acids Res. 1999]Plant Physiol. 2006 Nov; 142(3):1148-59.
[Plant Physiol. 2006]Genome Biol. 2003; 4(10):R68.
[Genome Biol. 2003]BMC Bioinformatics. 2003 Sep 2; 4():35.
[BMC Bioinformatics. 2003]