![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||
Copyright © 2009, American Society of Plant Biologists Highly Diversified Molecular Evolution of Downstream Transcription Start Sites in Rice and Arabidopsis1[W][OA] Division of Genome and Biodiversity Research, National Institute of Agrobiological Sciences, Tsukuba, Ibaraki 305–8602, Japan (T.T., T.I.); and Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Hokkaido 060–0814, Japan (K.O.K.) *Corresponding author; e-mail taitoh/at/affrc.go.jp. Received October 26, 2008; Accepted December 21, 2008. This article has been cited by other articles in PMC.Abstract Alternative usage of transcription start sites (TSSs) is one of the key mechanisms to generate gene variation in eukaryotes. Here, we show diversified molecular evolution of TSSs in remotely related flowering plants, rice (Oryza sativa) and Arabidopsis (Arabidopsis thaliana), by comprehensive analyses of large collections of full-length cDNAs and genome sequences. We determined 45,917 representative TSSs within 23,445 loci of rice and 35,313 TSSs within 16,964 loci of Arabidopsis, about two TSSs per locus in either species. The nucleotide features around TSSs displayed distinct patterns when the most upstream TSSs were compared with downstream TSSs. We found that CG-skew and AT-skew were clearly different between upstream and downstream TSSs, and that this difference was commonly observed in rice and Arabidopsis. Relative entropy analysis revealed that the most upstream TSSs had retained canonical cis elements, whereas downstream TSSs showed atypical nucleotide features. Expression patterns were distinguishable between upstream and downstream TSSs. These results indicate that plant TSSs were generally diversified in downstream regions, resulting in the development of new gene expression patterns. Furthermore, our comparative analysis of TSS variation between the species showed a positive correlation between TSS number and gene conservation. Rice and Arabidopsis might have evolved novel TSSs in an independent manner, which led to diversification of these two species. While a complete genome sequence enables us to estimate the total number of genes in an organism, transcriptional activities of genes can be verified by either tiling array analyses or mapping ESTs and cDNAs onto a genome (Suzuki et al., 2001a; Yamada et al., 2003; Halasz et al., 2006; Li et al., 2007). The complicated gene structure of eukaryotes, which includes alternative transcripts, hampers precise computational predictions of exon-intron boundaries. Discoveries of alternative variants of genes have, therefore, been accomplished by the experimental verification of transcripts (Kim et al., 2005; Blencowe, 2006; Chen et al., 2007). Findings from a wide variety of alternative transcripts in higher eukaryotes led to the concept that the number of transcript variants, rather than the total number of genes, would better reflect the biological complexity of an organism (Brett et al., 2002). Therefore, to understand the relationship between genes and their functions, it is necessary to study transcript variation. Alternative transcripts are mainly generated by two mechanisms: alternative splicing and alternative usage of transcription start sites (TSSs). Both mechanisms are known to play important roles in tissue-specific gene expression and functional variation, which have significant impact on biological processes (Landry et al., 2003; Iida and Go, 2006). Recent large-scale sequencing projects have produced a considerable number of 5′-end sequences of full-length cDNAs (FLcDNAs) from rice (Oryza sativa subsp. japonica; Satoh et al., 2007) and from Arabidopsis (Arabidopsis thaliana; Seki et al., 2002; Alexandrov et al., 2006). Therefore, in this paper, we attempt to elucidate the biological significance and evolution of alternative TSSs in plants. Past studies of TSS variation have focused mainly on mammals and fungi. For example, in human, millions of 5′-end sequences of FLcDNAs were used to determine 269,774 TSSs, from which 30,964 TSS clusters of 14,628 genes were obtained (Kimura et al., 2006). It was shown that alternative promoters and the resultant alternative usage of first exons had created a large number of transcript variants (Kimura et al., 2006). In yeast, 5′-end sequences of FLcDNAs were mapped on the genome sequence, and numerous TSSs were also clearly determined. Over 90% of the analyzed yeast loci had more than two transcript variants derived from different TSSs (Miura et al., 2006). These results indicated that TSS variation could be observed widely in animals and fungi. However, a comparison of promoters between human and mouse revealed that of 5,463 genes, which contained putative alternative promoters, only 807 were evolutionarily conserved (Tsuritani et al., 2007). In addition, another study of Cap Analysis of Gene Expression data sets found that TSSs of the orthologous genes did not always reside at the equivalent locations in the human and mouse genomes (Frith et al., 2006). These observations have suggested flexibility and rapid turnover of TSSs during evolution. Despite the large amount of TSS information in animals and fungi, there is a paucity of TSS studies in plants. Therefore, analysis of TSSs in the plant lineage could add to knowledge about the evolution of TSSs in eukaryotes. Recent progress of genome and transcriptome sequencing in rice and Arabidopsis gives us an opportunity to investigate TSS variation in higher plants. RIKEN and Ceres have released over 200,000 5′-end sequences obtained from Arabidopsis FLcDNA clones (Seki et al., 2002; Alexandrov et al., 2006). In addition, more than 580,000 5′- or 3′-end sequences of rice FLcDNAs have become available (Satoh et al., 2007). This wealth of sequence information allows us to conduct identification and comparative analyses of TSSs in more than 10,000 loci of these plants. Previous studies have indicated increased CG-skew around TSSs in both plants and fungi but not in animals (Fujimori et al., 2005) and that the skewed prevalence of adenine at TSSs and of cytosine at −1 bp of TSSs were common characteristics of rice and Arabidopsis (Alexandrov et al., 2006). Yamamoto et al. (2007a, 2007b) conducted a comprehensive analysis of promoter regions to detect frequently observed octamers derived from TATA box, Y patch, and CpG and reported that different octamers could be used for different gene expression mechanisms. In this study, we identified TSSs comprehensively by mapping transcripts to the genomes of rice and Arabidopsis and compared nucleotide signals around the TSSs. To our knowledge, this is the first attempt of a large-scale TSS comparison between higher plants based on FLcDNA sequences. We also conducted a comparative analysis of TSS variation and gene conservation to elucidate how TSSs and genes have evolved during plant evolution. RESULTS Identification and Clustering of TSSs We mapped rice FLcDNAs and their 5′-end sequences onto the rice genome to identify TSSs by a previously described method (The Rice Annotation Project, 2007). As shown in Table I, 32,644 FLcDNAs and 173,881 5′-end sequences of rice, which were successfully mapped onto the rice genome sequence (International Rice Genome Sequencing Project build 4.0), were used for further analyses. Likewise, for Arabidopsis, 21,859 FLcDNAs and 255,302 5′-end sequences were mapped onto the Arabidopsis genome sequence. As a result, we identified 87,782 and 93,726 nonredundant TSSs in rice and Arabidopsis, respectively. After we removed possible read-through transcripts and TSSs in nonannotated genomic positions (for details, see “Materials and Methods”), 80,743 rice transcripts corresponded to 23,445 loci and 91,919 Arabidopsis transcripts corresponded to 16,964 loci. The average numbers of transcripts per locus were 3.44 in rice and 5.42 in Arabidopsis.
In addition to alternative TSSs, it is known that a TSS can fluctuate in animal and fungus genomes for biological reasons (Carninci et al., 2006; Kimura et al., 2006; Miura et al., 2006). A TSS may not be determined on a single site of a genome sequence; hence, the TSS number inferred from transcript mapping could be overestimated if all fluctuating TSSs are counted. Taking this possibility into account, we decided to define regions of TSSs by clustering closely located TSSs. When we calculated the distance between TSSs of the same locus, 79.8% and 86.9% of all distances between TSS pairs were less than 100 bp in rice and Arabidopsis, respectively (Supplemental Fig. S1). Since these small fluctuations can be caused either by biological processes or by experimental errors, we first checked the accuracy of 5′-end sequencing. A large number of 5′-end sequences that were obtained from the same clones were determined by two independent experiments: complete sequencing of the FLcDNA and partial 5′-end sequencing. If there are experimental errors, the TSS positions should vary even though they are derived from the same clone. We evaluated the TSS positions in 15,381 FLcDNAs of rice and in 19,865 FLcDNAs of Arabidopsis. Our results showed that approximately 10% to 20% of TSSs of the same clones were not identical (Table II; Supplemental Fig. S2), and 4.8% and 4.5% of TSS pairs were more than 5 bp apart in rice and Arabidopsis, respectively. Note that the distance should be 0, because the sequences were determined using the same clone. Therefore, not only biological fluctuations but also these experimental errors should be diminished by clustering closely mapped TSSs.
To date, several criteria have been used for TSS clustering. For example, to analyze Cap Analysis of Gene Expression data, tags of 20- or 21-bp overlaps were clustered (Carninci et al., 2006), while Kimura et al. (2006) adopted a 500-bp interval between distinct TSSs, and, in yeast, a fixed 100-bp interval was used for clustering (Miura et al., 2006). In this study, we first estimated appropriate interval sizes (for details, see “Materials and Methods”), and they were determined to be 21 bp for rice and 27 bp for Arabidopsis. The TSSs were clustered by the single-linkage clustering method with these thresholds. As a result, the maximum cluster sizes were 133 and 193 bp in length, and the average sizes were 4.2 and 8.7 bp in rice and Arabidopsis, respectively. If we excluded single-member clusters, the average sizes were 14.0 bp in rice and 22.3 bp in Arabidopsis. Finally, we could define 45,917 TSS clusters within 23,445 loci in rice and 35,313 TSS clusters within 16,964 loci in Arabidopsis. On average, each locus had 1.96 and 2.08 TSS clusters in rice and Arabidopsis, respectively. The medians of the distances between TSSs were 149 bp in rice and 184 bp in Arabidopsis. The distribution of the average TSS distances in a locus showed similar patterns in the two species (Supplemental Fig. S3). Although hundreds of thousands of sequences were evaluated, the number of transcripts might not be saturated and there might be TSS variants missing in the cDNA libraries used. In fact, 50% and 58% of the clusters were determined by only one transcript in rice and Arabidopsis, respectively. This result suggests that more TSSs might be detected if we further collect cDNA clones. Thus, our estimates of the numbers of TSS clusters should be taken as lower limits, and there is likely to be more variation of TSSs than observed in this study. Prominent Nucleotide Features around TSSs As described in “Materials and Methods,” we defined a representative TSS in each TSS cluster for further analyses. We refer to these representatives simply as TSSs, unless otherwise noted. When there are two or more TSSs in a locus, cis elements of a downstream TSS may overlap with a transcribed region of an upstream TSS. In fact, 18.9% of downstream TSSs of rice were located within the protein-coding regions of transcripts initiated from upstream TSSs and resulted in truncated open reading frames (ORFs). The nucleotide compositions around the downstream TSSs might be distinct from those of the upstream TSSs, because the functional constraints of the transcribed regions of the upstream TSSs should create nucleotide biases. To assess this possibility, we separated the TSS data sets into the most upstream TSSs and the remaining downstream TSSs and analyzed their nucleotide features, such as CG-skew and AT-skew. When there was one TSS in a locus, it was included in the upstream TSS data set. First, we observed a strong peak of CG-skew in the upstream TSS data sets, whereas the downstream TSSs represented considerably reduced CG-skew in the two plants (Fig. 1, A and B
These comparisons of the nucleotide signals around TSSs between the upstream and downstream data sets suggested that in rice and Arabidopsis downstream transcription might be differently regulated from upstream transcription, which had canonical cis elements such as the TATA box. Previous studies have defined TSSs as being distinct if they were separated by over 500 bp, so that the downstream transcription signals do not heavily overlap with the upstream transcripts (Kimura et al., 2006; Tsuritani et al., 2007). To examine the possibility that the weakened downstream signals were caused by overlapping upstream transcripts, we reanalyzed nucleotide signals of downstream TSSs that were located more than 500 bp away from any upstream TSSs. We observed decreased nucleotide signals at the same level as those of all downstream TSSs (data not shown). In addition, this tendency was not different in protein-coding regions, untranslated regions, and introns (Supplemental Fig. S5). These results suggest that the weakened nucleotide signal might be due to different transcriptional signals rather than to overlapping transcription with the upstream TSSs. Relationship between TSS Diversification and Gene Expression Patterns It is expected that use of alternative TSSs is related to gene expression patterns in differentiated tissues or in response to specific conditions (Macknight et al., 2002; Lee et al., 2006; Szecsi et al., 2006). We examined whether TSS diversification is correlated with patterns of alternative gene expression, using information from the rice transcript library. To exclude ambiguity and experimental errors, we focused on loci where TSSs were determined by more than five transcripts. As a result, 1,012 (48.5%) of 2,088 loci that had exactly two TSSs consisted of cDNAs that were obtained from different libraries. Hence, as expected, differential TSSs should result in variations of gene expression patterns. For example, there are two TSSs in a locus named Os08g0199300 and annotated as “similar to YyaF/YCHF TRANSFAC/OBG family small GTPase plus RNA binding domain TGS” (Fig. 4
TSS Diversity Correlated with Protein Sequence Evolution To elucidate the evolutionary significance of TSS diversification, we used two approaches to analyze the relationship between the numbers of TSSs per locus and the evolutionary conservation of protein sequences. We determined orthologs between rice and Arabidopsis by reciprocal best hits of BLASTP searches and calculated protein identity between the orthologs. We found a positive correlation between the number of rice TSSs and protein identity (Fig. 5
Next, we searched the UniProtKB database for homologous sequences of the rice and Arabidopsis proteins and classified them into four groups by their level of conservation. The ratio of conserved protein groups increased as the number of TSSs per locus grew (Supplemental Fig. S8). However, if the cDNA collection was insufficient, the number of TSSs of poorly expressed genes might be underestimated. To exclude the possibility of insufficient sampling of cDNAs, we used TSSs that were supported by five or more transcripts and confirmed that the same tendency was observed (data not shown). Therefore, highly variable TSSs seemed to be prevalent in conserved protein-coding genes of either rice or Arabidopsis. DISCUSSION To cluster TSSs that fluctuated for biological or experimental reasons, we used a threshold interval of 21 bp for rice and 27 bp for Arabidopsis. Since the resultant average sizes of the TSS clusters were much smaller, 4.2 bp in rice and 8.7 bp in Arabidopsis, than those initial intervals, it seems that fluctuating TSSs were clustered effectively and that there was little excessive clustering. The longer average size of Arabidopsis TSS clusters may be due to experimental errors, as we observed more discrepancies in Arabidopsis sequences obtained from the same clone compared with those of rice (Table II). As each locus had on average two TSS clusters in either species, there should have been significant contribution of this TSS variation to these species. Indeed, TSS variants of several genes are known to be responsible for different expression patterns (Landry et al., 2003; Iida and Go, 2006). In this study, our large-scale analysis revealed that TSSs had been obtained from different libraries in about half of the loci that had two TSSs (Table III; Supplemental Table S1). This observation is consistent with our finding that the nucleotide signatures are distinguishable between the upstream and downstream TSSs, as canonical signals, such as the TATA box motif, were clearly depicted in the upstream TSSs but were considerably diminished in the downstream TSSs. These results suggest that transcription from the upstream TSSs is, at least in part, under a common regulatory mechanism, while the downstream TSSs are generally regulated by specialized systems (Fig. 6
In addition to CG-skew, which characterizes plant and yeast TSSs (Fujimori et al., 2005), AT-skew was found to be another strong indicator of TSSs. It is of particular interest that the distributions of AT-skew were nearly identical between rice and Arabidopsis. The sharp contrast of the AT-skew patterns between the upstream and downstream TSSs also supports the aforementioned idea that TSS variations are related to expression differences. A possible application of this clear AT-skew is that, since the AT-skew has been conserved between these remotely related plant species, one may consider a generalized method by which TSSs can be predicted from newly sequenced genomic DNA of plants. Because plants and fungi share common nucleotide features around TSSs (Fujimori et al., 2005), the animal machinery might have evolved independently. A reason for the weak signals of downstream TSSs appears to be overlap with upstream protein-coding regions. Since protein-coding regions are under functional constraints, the nucleotide compositions and genomic positions of cis elements will be affected. For example, the TATA box frequently contains TAA, which is a stop codon and may prematurely terminate translation. The medians of the distances between TSSs were relatively small, 149 and 184 bp in rice and Arabidopsis, respectively, so that it was possible that the signals overlapping the upstream protein-coding region remained generally weak. However, even though we used only downstream TSSs separated from upstream TSSs by more than 500 bp, the signals were almost identical to those of all downstream TSSs (data not shown). Therefore, we concluded that the distinct signals of the downstream TSSs were not necessarily due to upstream coding regions but that they are intrinsic to the nature of the downstream TSSs. We should note that the downstream TSSs might produce a truncated protein whose function is deteriorated or lost. Thus, regulation by alternative TSS usage may be achieved in a loss-of-function manner, which is suggested to be of evolutionary importance (Oda et al., 2002; Tanaka et al., 2005). On the contrary, if a new TSS is generated in the upstream region, it would affect the downstream canonical transcriptional signals. Therefore, upstream TSS generation might have been suppressed during evolution (Fig. 6 It is intuitively plausible that, if protein sequences are highly diverged because of relaxed functional constraint, regulation of their expression becomes concordantly variable. However, our analyses revealed that the number of TSSs increased proportionally to the sequence conservation in both rice and Arabidopsis. Although it was expected that the gene function affected the number of TSSs, our functional categorization of the proteins by the Gene Ontology hierarchy showed no significant correlation between gene function and the number of TSSs (Supplemental Fig. S9). Since highly conserved proteins generally play essential roles, are used in a variety of tissues, and are regulated by complex processes, elaborate transcriptional regulation to control several TSSs might be required. Intriguingly, the TSSs that we identified were not necessarily conserved between rice and Arabidopsis. As shown in Figure 4, B and C CONCLUSION We determined TSSs in rice and Arabidopsis by large-scale computation and found that both species have, on average, two or more TSSs per locus. The nucleotide signals around TSSs were similar in these two plants, while they were quite different between the upstream and downstream TSSs. A positive correlation between TSS numbers and gene conservation was also observed. This study provides an insight for diversified transcriptional variation that is likely to have contributed to the evolution of plant species. MATERIALS AND METHODS Genome and cDNA Sequences We used FLcDNAs and their 5′-end sequences for TSS determination (Supplemental Table S2). The FLcDNAs and 5′-end sequences of rice (Oryza sativa; Kikuchi et al., 2003; Satoh et al., 2007) and the FLcDNAs and 5′-end sequences of Arabidopsis (Arabidopsis thaliana; Seki et al., 2002; Alexandrov et al., 2006) were retrieved from the GenBank/EMBL/DDBJ DNA databases. In addition, the Arabidopsis FLcDNAs sequenced by RIKEN were downloaded from the RIKEN Arabidopsis Genome Encyclopedia (http://rarge.gsc.riken.jp/archives/rafl/sequence/; Sakurai et al., 2005). The library information of the rice FLcDNA clones, which was derived from 41 different libraries including unknown resources, was provided by Dr. S. Kikuchi (personal communication). For the rice genome sequence, the International Rice Genome Sequencing Project genome sequence build 4 was used (http://rgp.dna.affrc.go.jp/IRGSP/download.html). The Arabidopsis genome sequence was downloaded from the National Center for Biotechnology Information's FTP site (ftp://ftp.ncbi.nih.gov/genomes/) as of August 13, 2004. ORFs and annotation data of rice were downloaded from the RAP-DB (http://rapdb.dna.affrc.go.jp/; The Rice Annotation Project, 2008). ORFs of Arabidopsis were retrieved from The Arabidopsis Information Resource (TAIR) 7 annotation data (http://www.arabidopsis.org/) as of June 19, 2007 (Rhee et al., 2003). cDNA Mapping to Genome Sequences Positions of transcripts on the genome sequences were determined by methods described previously (The Rice Annotation Project, 2007). We used 5′-end positions aligned by the est2genome program with the following options: gap open penalty, 8; mismatch penalty, 6 (Rice et al., 2000). Since the cDNA data sets included redundant sequences, which were determined as a full-length sequence and as a 5′-EST of the same clone, we used only the full-length cDNAs. We noticed that approximately 5% of the mapped transcripts contained an unaligned 5′-region of 7 bp or more, which were possibly derived from remaining vector sequences. These unaligned regions were discarded from our analyses. We found that 764 RAP loci included nonoverlapping transcripts, which might be due to transcriptional read-through. These read-through candidates were not used in this study, because read-through transcripts lead to overestimation of alternative TSSs. Because 1,807 5′-end sequences of Arabidopsis did not correspond to any TAIR protein-coding regions, they were eliminated from our data sets. Clustering of 5′-End Positions We clustered 5′-end positions that fluctuated for biological or for experimental reasons. To determine an appropriate threshold for the distance between 5′-end positions to be clustered, the relationship between the distance and the total number of clusters was examined (Supplemental Fig. S10). The cluster number decreased gradually and monotonically as the distance increased. We adopted the threshold distance at which the rate of decrease in the total number of clusters was less than 1%: 21 bp for rice and 27 bp for Arabidopsis. Juxtaposed 5′-end positions within the threshold distance were clustered by the single-linkage clustering method. In each cluster, a single representative TSS was selected in the following order: (1) supported by a full-length sequence, (2) supported by the most clones, and (3) the most upstream 5′-TSS. Calculation of CG-Skew and AT-Skew We extracted genomic sequences that spanned the −250 to +350 bp region around each TSS. When an ambiguous nucleotide denoted by N existed in a sequence file, the sequence was eliminated from our analysis. CG-skew values [= (C − G)/(C + G)] were computed in a sliding window of 100 bp with 1-bp steps, where C stands for the total number of cytosines in the window and G stands for the total number of guanines. The position of a window in Figure 1 Calculation of the Relative Entropy at a Nucleotide Site We represented nucleotide biases by relative entropy, modifying a previously reported method (Schneider and Stephens, 1990; Crooks et al., 2004). The relative entropy (R) at a particular nucleotide position is: Sequence Analysis of Orthologs The rice protein set we used was compared with the Arabidopsis protein set of TAIR. Homologs and orthologs were determined, as described elsewhere (The Rice Annotation Project, 2007). Homologous sequences of other organisms were identified by BLASTP searches against UniProtKB (release 10.2) downloaded as of April 9, 2007 (The UniProt Consortium, 2007). We adopted less than 10−4 of the E value as a threshold. On the basis of the taxonomic groups to which the organisms of the homologs belonged, we categorized the rice and Arabidopsis proteins into (1) Oryzeae/Brassicaceae, (2) Liliopsidae/Eudicotyledons, (3) Viridiplantae, and (4) nonplant organisms (including fungi, animals, and prokaryotes). Supplemental Data The following materials are available in the online version of this article.
[Supplemental Data]
Acknowledgments We thank H. Numa and H. Sakai for their suggestions; S. Kikuchi, M. Seki, and T. Sakurai for providing information about FLcDNA clones; the Rice Annotation Project members for rice genome annotation data; and Y.Y. Yamamoto for helpful discussions. Notes 1This work was supported by the Ministry of Agriculture, Forestry, and Fisheries of Japan (Integrated Research Project for Plant, Insect, and Animal Using Genome Technology grant no. GD–1002 to T.T., T.I., and K.O.K. and Genomics for Agricultural Innovation grant no. GIR–1001 to T.T. and T.I.). The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Takeshi Itoh (taitoh/at/affrc.go.jp). [W]The online version of this article contains Web-only data. [OA]Open Access articles can be viewed online without a subscription. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||
EMBO Rep. 2001 May; 2(5):388-93.
[EMBO Rep. 2001]Science. 2003 Oct 31; 302(5646):842-6.
[Science. 2003]Genome Biol. 2006; 7(7):R59.
[Genome Biol. 2006]PLoS One. 2007 Mar 14; 2(3):e294.
[PLoS One. 2007]Genome Res. 2005 Apr; 15(4):566-76.
[Genome Res. 2005]Genome Res. 2006 Jan; 16(1):55-65.
[Genome Res. 2006]Proc Natl Acad Sci U S A. 2006 Nov 21; 103(47):17846-51.
[Proc Natl Acad Sci U S A. 2006]Genome Res. 2007 Jul; 17(7):1005-14.
[Genome Res. 2007]Genome Res. 2006 Jun; 16(6):713-22.
[Genome Res. 2006]Science. 2002 Apr 5; 296(5565):141-5.
[Science. 2002]Plant Mol Biol. 2006 Jan; 60(1):69-85.
[Plant Mol Biol. 2006]PLoS One. 2007 Nov 28; 2(11):e1235.
[PLoS One. 2007]BMC Genomics. 2005 Feb 28; 6(1):26.
[BMC Genomics. 2005]Nucleic Acids Res. 2007; 35(18):6219-26.
[Nucleic Acids Res. 2007]Genome Res. 2007 Feb; 17(2):175-83.
[Genome Res. 2007]Nat Genet. 2006 Jun; 38(6):626-35.
[Nat Genet. 2006]Genome Res. 2006 Jan; 16(1):55-65.
[Genome Res. 2006]Proc Natl Acad Sci U S A. 2006 Nov 21; 103(47):17846-51.
[Proc Natl Acad Sci U S A. 2006]Nat Genet. 2006 Jun; 38(6):626-35.
[Nat Genet. 2006]Genome Res. 2006 Jan; 16(1):55-65.
[Genome Res. 2006]Proc Natl Acad Sci U S A. 2006 Nov 21; 103(47):17846-51.
[Proc Natl Acad Sci U S A. 2006]Nucleic Acids Res. 2007; 35(18):6219-26.
[Nucleic Acids Res. 2007]BMC Genomics. 2007 Mar 8; 8():67.
[BMC Genomics. 2007]Genome Res. 2006 Jan; 16(1):55-65.
[Genome Res. 2006]Genome Res. 2007 Jul; 17(7):1005-14.
[Genome Res. 2007]Plant Cell. 2002 Apr; 14(4):877-88.
[Plant Cell. 2002]Plant J. 2006 Aug; 47(3):457-66.
[Plant J. 2006]EMBO J. 2006 Aug 23; 25(16):3912-20.
[EMBO J. 2006]Trends Genet. 2003 Nov; 19(11):640-8.
[Trends Genet. 2003]Mol Biol Evol. 2006 May; 23(5):1085-94.
[Mol Biol Evol. 2006]BMC Genomics. 2005 Feb 28; 6(1):26.
[BMC Genomics. 2005]Mol Biol Evol. 2002 May; 19(5):640-53.
[Mol Biol Evol. 2002]Mol Biol Evol. 2005 Feb; 22(2):243-50.
[Mol Biol Evol. 2005]Genome Res. 2001 May; 11(5):677-84.
[Genome Res. 2001]Nat Genet. 2006 Jun; 38(6):626-35.
[Nat Genet. 2006]Nat Genet. 2000 Oct; 26(2):225-8.
[Nat Genet. 2000]Genome Res. 2007 Jul; 17(7):1005-14.
[Genome Res. 2007]Science. 2003 Jul 18; 301(5631):376-9.
[Science. 2003]PLoS One. 2007 Nov 28; 2(11):e1235.
[PLoS One. 2007]Science. 2002 Apr 5; 296(5565):141-5.
[Science. 2002]Plant Mol Biol. 2006 Jan; 60(1):69-85.
[Plant Mol Biol. 2006]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D647-50.
[Nucleic Acids Res. 2005]Genome Res. 2007 Feb; 17(2):175-83.
[Genome Res. 2007]Trends Genet. 2000 Jun; 16(6):276-7.
[Trends Genet. 2000]Nucleic Acids Res. 1990 Oct 25; 18(20):6097-100.
[Nucleic Acids Res. 1990]Genome Res. 2004 Jun; 14(6):1188-90.
[Genome Res. 2004]Genome Res. 2007 Feb; 17(2):175-83.
[Genome Res. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D193-7.
[Nucleic Acids Res. 2007]