![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Copyright © 2005, Cold Spring Harbor Laboratory Press Identification of functional transcription factor binding sites using closely related Saccharomyces species 1 Computational Biology Program, Washington University School of Medicine, St. Louis, Missouri 63110, USA 2 Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA 3Corresponding author. E-mail jfay/at/genetics.wustl.edu; fax (314) 362-7855. Received December 16, 2004; Accepted March 8, 2005. Freely available online through the Genome Research Immediate Open Access option. This article has been cited by other articles in PMC.Abstract Comparative genomics provides a rapid means of identifying functional DNA elements by their sequence conservation between species. Transcription factor binding sites (TFBSs) may constitute a significant fraction of these conserved sequences, but the annotation of specific TFBSs is complicated by the fact that these short, degenerate sequences may frequently be conserved by chance rather than functional constraint. To identify intergenic sequences that function as TFBSs, we calculated the probability of binding site conservation between Saccharomyces cerevisiae and its two closest relatives under a neutral model of evolution. We found that this probability is <5% for 134 of 163 transcription factor binding motifs, implying that we can reliably annotate binding sites for the majority of these transcription factors by conservation alone. Although our annotation relies on a number of assumptions, mutations in five of five conserved Ume6 binding sites and three of four conserved Ndt80 binding sites show Ume6- and Ndt80-dependent effects on gene expression. We also found that three of five unconserved Ndt80 binding sites show Ndt80-dependent effects on gene expression. Together these data imply that although sequence conservation can be reliably used to predict functional TFBSs, unconserved sequences might also make a significant contribution to a species' biology. The ability of the cell to tightly control the expression of thousands of genes under a wide array of developmental and environmental conditions is still poorly understood for all except a few well studied processes (e.g., Stanojevic et al. 1991; Yuh et al. 1998; Vershon and Pierce 2000). A major goal of computational biology is to extend these specific examples to the generalized set of rules that explain the complex system of regulation used by the cell. In Saccharomyces cerevisiae, various experimental approaches have been used to identify a large number of regulatory motifs and the transcription factors that bind them (e.g., Bowdish and Mitchell 1993; Strich et al. 1994; Ozsarac et al. 1997; Pierce et al. 2003). Additionally, two comparative genomics approaches have identified many new motifs by their conservation between different yeast species (Cliften et al. 2003; Kellis et al. 2003). The combined experimental and computational approaches provide a sufficient knowledgebase of cis-regulatory motifs in S. cerevisiae, such that it is now possible to begin the next challenge of identifying which transcription factor binding sites (TFBSs) in the genome are functional, and under what circumstances they regulate transcription. Identifying functional TFBSs is a difficult task because most transcription factor binding motifs are short, degenerate sequences occurring frequently in the genome. Even in the compact S. cerevisiae genome, many instances of these binding sites are likely to be nonfunctional, spurious matches to the motif sequence. One approach to this problem is chromatin immunoprecipitation experiments, which can identify the promoters bound by a particular transcription factor (Lee et al. 2002; Martone et al. 2003; Harbison et al. 2004). However, these data are limited by the conditions under which they are assayed. Alternatively, it has been shown that conservation of noncoding DNA between genomes is a good indicator of biological function (e.g., Loots et al. 2000; Bergman and Kreitman 2001; Boffelli et al. 2003; Frazer et al. 2004; Johnson et al. 2004; Woolfe et al. 2004), so it is plausible that functional and nonfunctional TFBSs may be distinguished by sequence conservation alone. One difficulty is that TFBSs may frequently be conserved by chance, rather than functional constraint. This frequency will depend on the amount of divergence between species. The observation that TFBSs are as conserved as their adjacent sequences (Cliften et al. 2003) implies that the Saccharomyces species are too closely related to identify functional TFBSs. However, an alternative explanation that must be considered is that both the TFBS and its flanking sequences are under functional constraint. This alternative can be tested using molecular evolutionary models, which provide a probabilistic framework in which constrained and unconstrained sequences can be distinguished (Li 1997). In these models, sequences that are functionally constrained between species by purifying selection can be identified as those with fewer substitutions than expected in the absence of any constraint. Using these methods, we estimated that ~40% of S. cerevisiae intergenic sequences are functionally constrained. Because of this, we developed a probabilistic method for calculating how often a particular TFBS would be conserved in the absence of any functional constraint. Using the synonymous rate to estimate the neutral rate of evolution, we found that the majority of TFBSs have a very low probability of being conserved among S. cerevisiae and its two closest relatives, S. paradoxus and S. mikatae. Experimental validation of multiple TFBSs illustrates that conservation among three closely related species is sufficient to predict functional TFBSs, making it possible to annotate the genome for functionally constrained binding sites for the majority of known transcription factors. Surprisingly, our annotation suggests that the TFBSs account for less than half of the functional constraint in noncoding sequences. Results Functional constraint in intergenic sequences The identification of TFBSs by their sequence conservation between species requires that nonfunctional or unconstrained TFBSs are rarely conserved by chance. The probability that a nonfunctional binding site is conserved depends on the neutral substitution rate. We estimated the neutral substitution rate along the lineages leading to S. cerevisiae, S. paradoxus, and S. mikatae from the median synonymous substitution rate of 2098 coding sequence alignments of these species (Table 1). The total rate across the phylogeny is 0.83 substitutions per site and is a conservative estimate of the neutral rate, since synonymous sites have been shown to be under weak selective constraint (Akashi 2001). At this distance, the probability that a 10-base pair sequence is identical across the three species is 0.002 (Kimura 1980). This implies that TFBSs may rarely be conserved between these three species by chance.
The fraction of functionally constrained intergenic sequences can be estimated from the ratio of the intergenic to synonymous substitution rate (Table 1) (Wong and Nielsen 2004). From 4188 alignments, the median intergenic substitution rate across the three yeast species is 0.57, which implies that 43% of intergenic sequences are functionally constrained. The extent to which conserved TFBSs can account for this constraint is discussed below. Identification of functionally constrained Ndt80 and Ume6 binding sites using a neutral model of molecular evolution The neutral substitution rate implies that functional TFBSs can be identified by sequence conservation alone, regardless of the conservation of the flanking sequences. We tested this hypothesis using two well characterized transcription factors, Ndt80 and Ume6. Both proteins regulate the expression of meiosis-specific genes in S. cerevisiae. Ume6 is known to repress genes during vegetative growth and activate genes during early meiosis (Bowdish and Mitchell 1993; Steber and Esposito 1995). Ume6 affects gene expression by binding to the consensus sequence TSGGC GGCTAW (Williams et al. 2002). Ndt80 activates genes expressed in the middle stages of meiosis (Chu and Herskowitz 1998) by binding to the consensus sequence YGNCACAAAW (Pierce et al. 2003). We used a position weight matrix (PWM) representation of these TFBSs creating a probabilistic description of the nucleotide frequencies at each position of the motif. Our goal was to determine which sites matching these sequences function in Ndt80 or Ume6 transcriptional regulation. If functional Ndt80 and Ume6 binding sites are also functional in other species, then they are likely to be under purifying selection and can be identified in S. cerevisiae by their conservation at orthologous positions in other Saccharomyces species. For each site identified as a match to the Ndt80 or Ume6 PWM (Hertz and Stormo 1999), we counted the number of differences observed between S. cerevisiae, S. paradoxus, and S. mikatae. The neutral model predicts that there will be an average of 4.8 and 5.3 differences across the three species for Ndt80 and Ume6, respectively, and that less than 1.5% of the Ndt80 and 0.8% of the Ume6 sites should have 0 or 1 difference between the three species. For both transcription factors, we found a substantial overrepresentation of sites with 0 or 1 difference, 63% of Ndt80 sites and 97% of Ume6 sites, suggesting that only a small number of the Ndt80 or Ume6 sites are consistent with a neutral model (Fig. 1
To find the conserved TFBSs in the genome, we identified all sites that had a significant match to the PWM at orthologous positions in the three species. The advantage of this conservation test is that it does not rely on sequence conservation, which is not necessarily equal to TFBS conservation due to the degenerate nature of TFBSs (Moses et al. 2003). For example, we found that six of 20 Ndt80 sites and eight of 21 Ume6 sites with a single difference were not actually conserved sites, in that the sequence in either the S. paradoxus or S. mikatae genome was no longer a significant match to the PWM. Additionally, we identified a match to the Ndt80 site in the GAS4 promoter of S. cerevisiae that has three differences across the three species, but the sequence in each species is a match to the Ndt80 PWM. In total, we identified 59 conserved Ndt80 TFBSs and 63 conserved Ume6 TFBSs. The list of genes with conserved Ndt80 and Ume6 TFBSs can be found in the Supplemental material (Tables S1 and S2). The fraction of conserved Ndt80 and Ume6 TFBSs that are functional depends on how often a TFBS is conserved in the absence of selective constraint. We modeled the probability of TFBS conservation as the likelihood that a TFBS observed in S. cerevisiae remains a match to the PWM given the neutral rate of evolution of 0.83 substitutions per site. Because there can be a number of sequences that match a PWM, we enumerated all possible evolutionary descendents of the observed sequence and tested each sequence for a significant match to the PWM. If it matched, we calculated the probability of it occurring given the neutral substitution rate. The total probability that a TFBS is conserved is the sum of the probability of all sequences that maintain the TFBS (Fig. 2
For Ndt80 and Ume6, the probability that a single TFBS is conserved is 0.0058 and 0.0023, respectively. Given that 59 Ndt80 and 63 Ume6 sites are conserved, we expect that less than one of these conserved sites occurred by chance (Table 2). This low probability of TFBS conservation implies that conservation of Ndt80 and Ume6 across the three Saccharomyces species is a strong indicator that the binding site is under functional constraint, and we can therefore confidently predict that these conserved sites are functional in S. cerevisiae.
Mutation of conserved Ndt80 and Ume6 binding sites produces Ndt80- and Ume6-dependent effects on gene expression The conservation of Ndt80 and Ume6 binding sites suggests that they are true TFBSs and should therefore affect the expression levels of their adjacent genes. To test our predictions, we randomly selected four promoters containing conserved Ndt80 sites and four promoters containing conserved Ume6 sites (Fig. 3
Eight of nine mutants produced a significant change in expression levels during meiosis compared to the wild-type promoters from which they were derived (Fig. 4
If the conserved Ndt80 and Ume6 binding sites are functional Ndt80 and Ume6 binding sites, the effects of mutations within these sites should be Ndt80- and Ume6-dependent. We measured expression levels in either an ndt80Δ or ume6Δ strain where appropriate (Fig. 4 Mutations in Ndt80 binding sites that are not conserved affect gene expression One assumption underlying the comparative genomic approach is that functional TFBSs will be conserved between closely related species (Cliften et al. 2003; Kellis et al. 2003). To test the importance of conservation in identifying functional regulatory sites, we tested five Ndt80 binding sites that are not conserved between the three species for Ndt80-specific regulatory effects (Supplemental Fig. 1). Of five unconserved sites, two (URA4 and YMR111C) showed an Ndt80-specific effect in sporulation medium, suggesting that these sites are functional Ndt80 binding sites in S. cerevisiae, despite their absence in the other species (Fig. 5
Genome annotation of transcription factor binding sites The low probability of TFBS conservation under a neutral model combined with the experimental validation of multiple Ndt80 and Ume6 binding sites suggests that the same method may be applied to all other known S. cerevisiae transcription factor binding motifs. We compiled a list of 163 unique motifs from three sources (Zhu and Zhang 1999; Cliften et al. 2003; Kellis et al. 2003). To increase the sensitivity of our analysis, we derived a PWM for each of these published motifs (see Methods). We calculated the probability of each of the 163 motifs being conserved in neutrally evolving sequence. The probability of a TFBS being conserved across the three species is <5% for 134 of 163 motifs, and <1% for 69 of 163 motifs (e.g., Table 2). As expected, the neutral probability was well correlated with information content, which is a function of the size and degeneracy of the TFBS (Fig. 6
We annotated the S. cerevisiae genome for conserved instances of the 134 TFBSs that have a low probability of conservation. Annotation of 2 Mb of aligned intergenic sequences produced 27,225 predicted TFBSs or 6.5 sites per intergenic sequence. However, most intergenic sequences showed much higher levels of sequence conservation than could be explained by a handful of conserved TFBSs. The ratio of intergenic to synonymous substitution rate implies that 43% of all intergenic sites are more functionally constrained than synonymous sites (Table 1). The 27,225 conserved TFBSs identified in this study only cover 17% of the intergenic sequences examined after accounting for overlap between sites. The function of the remaining conserved intergenic sequences has yet to be determined. The complete results for the 163 motifs are in Supplemental Table 4. GBrowse (Stein et al. 2002) formatted files containing TFBS annotation are available as Supplemental files. Discussion We estimated that ~40% of intergenic sequences in S. cerevisiae are functionally constrained. These sequences may function in transcriptional regulation, translational regulation, or may function as noncoding RNA genes. This estimate is comparable to a previous estimate that 34% of the intergenic sequences are functionally constrained between S. cerevisiae, S. paradoxus, S. mikatae, and S. bayanus (Chin et al. 2005). In the present study we examined the portion of this constraint that can be attributed to conservation of TFBSs. We found that 134 of 163 regulatory motifs should rarely be conserved by chance, and we identified 27,225 conserved instances of these sites in 4188 intergenic sequences. Together these conserved TFBSs account for less than half of the functionally constrained intergenic sequences. Mutations in eight of nine predicted Ume6 and Ndt80 binding sites confirm our prediction that functional TFBSs can be identified through sequence conservation alone. However, as we discuss below, not all functional TFBSs may be identified. Additionally, we found that three of five Ndt80 binding sites that are not conserved produce Ndt80-dependent effects on expression when mutated. Whether or not these sites confer any biological differences between species remains an important unanswered question. Identifying novel Ndt80 and Ume6 binding sites by conservation alone We experimentally verified five novel Ume6 binding sites and three novel Ndt80 binding sites that were identified by their conservation among three closely related species. However, comparison of the predicted sites to those reported in the literature indicates that some previously known Ndt80 and Ume6 binding sites were not identified. As discussed below, there are a number of explanations for these false negatives. Our search of the literature found four experimentally identified Ume6 sites (Bowdish and Mitchell 1993, Bowdish et al. 1995; Strich et al. 1994), and eight Ndt80 sites (Ozsarac et al. 1997). For Ume6, we correctly identified the site in the HOP1 promoter. The IME2 promoter contains two conserved sites that fall just below the threshold for a match to the Ume6 PWM. The SPO13 promoter contains a Ume6 site, but there is no alignment available for its promoter. Of the eight Ndt80 sites, we correctly identified four (SPR3, SPS19, SMK1, and NDT80); SPS4 and SPS1 were not in our alignment data set, and DIT2 and CDC10 do not match the Ndt80 PWM, despite matching the middle sporulation element (CRCAAAW). In total, we correctly identified five of eight known Ndt80 or Ume6 TFBSs for which we have alignments. Thus, our strict motif model and stringent filter for alignment quality make our predictions a conservative estimate of the total Ndt80 and Ume6 regulons. Annotation of transcription factor binding sites throughout the genome We found that 133 of 163 known or putative transcription factor binding motifs have a sufficiently low probability of conservation to identify single sites that are under functional constraint. Annotating these constrained sites in the S. cerevisiae genome results in the prediction of 6.5 functional TFBSs per intergenic sequence. However, this annotation relies on a number of assumptions. First, we must assume mutational homogeneity across the genome. Direct estimates of mutation rates are consistent with a uniform mutation rate (Drake 1991), but it is also possible that rare mutational hot-spot or cold-spots may exist (Chuang and Li 2004). In yeast the synonymous substitution rate is nearly uniformly distributed across the genome (Chin et al. 2005). This suggests that mutational heterogeneity does not contribute to the divergence rate among the Saccharomyces genomes. A second assumption is that substitutions are independent of other positions within a TFBS. Although position independence is widely assumed within TFBSs, this has been proven false for some binding sites (Man and Stormo 2001; Bulyk et al. 2002). When calculating the probability that a TFBS is conserved between species, any dependencies between columns will reduce the number of ways in which a site could be maintained as a TFBS. Therefore, the position-independence assumption makes the probabilities calculated here a conservative estimate of the true probability that a binding site will be conserved in neutral sequence. A third assumption required for TFBS annotation is that TFBS conservation is due entirely to its functional importance in transcriptional regulation. This assumption may break down under conditions where overlap exists between TFBSs and other functional noncoding sequences. However, only 659 of the conserved TFBSs (2%) were found in 106 noncoding RNA genes (including tRNAs) present in the 4188 intergenic alignments. It is also possible that TFBSs overlap with each other, creating ambiguity about which TFBS is responsible for the functional constraint. The 27,225 significantly conserved TFBSs identified in the three yeast genomes account for ~470 kb of noncoding sequence, of which ~122 kb (27%) is shared between two or more binding sites. Therefore, in many cases we cannot uniquely determine which TFBS is responsible for the functional constraint. Finally, we assume that the multiple sequence alignments are correct. Simulations have shown that when the number of substitutions is less than one per site, the multiple sequence alignment programs perform quite well (Pollard et al. 2004). We avoided misalignments by choosing closely related yeast species and filtering out any dubious alignments (see Methods). However, some errors due to misalignment may be unavoidable. Even if some misaligned sequences are included in our analysis, this will likely result in missing conserved TFBSs (false negatives), rather than identifying false positives. The limitations of comparative genomics for identifying functional elements in noncoding DNA The comparative genomic method applied here and in many other studies relies on the assumption that functional elements will be shared between species. However, it is hypothesized that the majority of biological differences between species are due to changes in gene regulation (Wilson et al. 1974), suggesting that there may be species-specific regulatory signals. The fraction of genes that are differentially regulated between species is not known, but there is evidence to suggest that this fraction may be significant (Dermitzakis and Clark 2002; Moses et al. 2003). Furthermore, many conserved noncoding sequences are only conserved in a subset of taxa (Frazer et al. 2004), suggesting that very closely related species are needed to identify more recently evolved functional sequences (Boffelli et al. 2003). In addition, TFBS turnover (Ludwig et al. 2000; Dermitzakis and Clark 2002; Costas et al. 2003) and misalignment (Pollard et al. 2004) limit the comparative method even when the biology of the two organisms is the same (Ludwig et al. 1998). Our data (Fig. 5 The cis-regulatory code The cis-regulatory code is the set of rules that enables a cell to direct the expression program of each gene based on its cis-regulatory sequences. Identifying functional cis-regulatory sites in the S. cerevisiae genome will be a significant step towards describing this regulatory code. However, our results also suggest that although we may be able to annotate the location of the TFBSs, their effects on gene regulation may often be context-dependent, as shown by the two identical Ume6 sites with opposite effects on expression (Fig. 4 Methods Substitutions in coding and noncoding sequences The alignments used for this analysis were downloaded from http://www.broad.mit.edu/annotation/fungi/comp_yeasts/downloads.html (Kellis et al. 2003). The intergenic alignments were filtered to remove any alignment with >10% Ns or 10% missing data in any one species. In the TFBS analysis described here, 4188 alignments were used; 2098 coding sequence alignments were used to generate the synonymous and nonsynonymous rate estimates. The synonymous and nonsynonymous and intergenic substitution rates were estimated using PAML software (Yang 1997). The intergenic substitution rate was estimated using the Hasegawa, Kishino, and Yano substitution model (Hasegawa et al. 1985). Motif models The initial motif models were obtained from three sources. Cliften et al. (2003) published 65 motifs that were perfectly conserved at least five times between multiple yeast species. Kellis et al. (2003) also identified 71 motifs by their sequence conservation between species. The Kellis et al. (2003) motifs were downloaded from http://www.broad.mit.edu/annotation/fungi/comp_yeasts/motiflist.html. The Saccharomyces cerevisiae Promoter Database (SCPD) (Zhu and Zhang 1999) contains 42 motifs identified by various experimental methods. There were a total of 163 motifs after we removed motifs that were exactly identical, as well as shorter motifs that were contained within longer motifs. A position weight matrix (PWM) was used to describe all motif models. Because the published motif models were only available as consensus sequences, we generated substitution-derived PWMs, based on the observation that the same positions that are degenerate within a species are degenerate between species (Moses et al. 2003). The substitution PWM is based on all instances of the consensus sequence with 0 or 1 difference across the three species (Supplemental Fig. 4). To avoid adding too much noise to the motif model, we tested whether the differences were uniformly distributed across the positions within the motif using a χ2 test (d.f. equal to the motif width – 1). If the differences were uniformly distributed, we used a PWM representing the consensus sequence. The substitution-derived PWMs were found to be very similar to the original motif model except that we were able to refine the frequency of each base at degenerate positions for 91 of the 163 of the motifs. The PWMs used are available in Supplemental File 1. We used Patser (Hertz and Stormo 1999), with equal base frequencies, to scan all sequences for matches to a PWM. The use of local base frequencies may have biased the results, as 43% of the intergenic sequences are under functional constraint. Furthermore, using local base frequencies may lead to missing weak sites in regions of biased nucleotide composition (Dermitzakis et al. 2003). For a motif of width w, Patser scored each w-mer as the log-likelihood ratio of observing the sequence under the motif model or the background nucleotide frequencies. If S was greater than the default threshold score, then the w-mer was a match to the motif. The identification of TFBSs and the probability of TFBS conservation were dependent on this cutoff for PWM matches. As shown in Supplemental Figure 5, the choice of cutoff does not affect the overall percentage of TFBSs estimated to be functional. Calculating the neutral expectation for TFBS conservation We calculated the neutral expectation for TFBS conservation as the probability that a sequence matching the PWM in S. cerevisiae will remain a match to the PWM given a specified amount of evolutionary time. Starting from the highest possible scoring sequence for a particular PWM, we tested all possible sequences to which this initial sequence could evolve. For each descendant sequence, we tested whether or not this sequence was a match to the motif. If it was a match, we calculated the probability of observing this sequence (Fig. 2
Strains, media, and plasmids Escherichia coli DH5α was used for all plasmid manipulations. All yeast strains were derivatives of S288C. DBY8268 (a/a, Δura3 EcoRV-StuI/ura3-52, ho/ho) was used to measure wild-type expression levels. Expression levels were also measured in ndt80Δ and ume6Δ strains obtained from the yeast deletion collection (Giaever et al. 2002), present in BY4743 (a/a, his3D1/his3D1, leu2D0/leu2D0, lys2D0/LYS2, MET15/met15D0, ura3D0/ura3D0). YEp357R is a yeast-bacteria shuttle vector maintained in yeast as an episomal plasmid carrying the β-galactosidase gene and the URA3 gene for selection in yeast (Myers et al. 1986). Yeast cultures were grown in complete minimal medium, minus Uracil. Expression was measured in complete minimal medium or sporulation medium (1% potassium acetate). Measuring expression from wild-type and mutant promoters Promoters containing Ume6 or Ndt80 binding sites were cloned from S. cerevisiae strain S288C into YEp357R to generate a β-GAL fusion construct. The Ume6 or Ndt80 motifs were mutated using the QuikChange Site-Directed Mutagenesis Kit (Stratagene). The alignments of these binding sites and their flanking sequences can be found in Figure 3 Acknowledgments We thank Maia Dorsett and Hyun Seok Kim for helpful discussions about the experimental and computational design and procedures, Mark Johnston, Linda Riles, and Jim Dover for providing the deletion strains and plasmids, and Ting Wang, Sudhir Nayak, Sean Eddy, Mark Johnston, and Gary Stormo for helpful comments about the manuscript. S.W.D. is supported by NSF graduate fellowship #DGE-0202737. Notes [Supplemental material is available online at www.genome.org.] Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.3578205. Article published online ahead of print in April 2005. Freely available online through the Genome Research Immediate Open Access option. References
Web site references
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Science. 1991 Nov 29; 254(5036):1385-7.
[Science. 1991]Science. 1998 Mar 20; 279(5358):1896-902.
[Science. 1998]Curr Opin Cell Biol. 2000 Jun; 12(3):334-9.
[Curr Opin Cell Biol. 2000]Mol Cell Biol. 1993 Apr; 13(4):2172-81.
[Mol Cell Biol. 1993]Genes Dev. 1994 Apr 1; 8(7):796-810.
[Genes Dev. 1994]Science. 2002 Oct 25; 298(5594):799-804.
[Science. 2002]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):12247-52.
[Proc Natl Acad Sci U S A. 2003]Nature. 2004 Sep 2; 431(7004):99-104.
[Nature. 2004]Science. 2000 Apr 7; 288(5463):136-40.
[Science. 2000]Genome Res. 2001 Aug; 11(8):1335-45.
[Genome Res. 2001]Curr Opin Genet Dev. 2001 Dec; 11(6):660-6.
[Curr Opin Genet Dev. 2001]J Mol Evol. 1980 Dec; 16(2):111-20.
[J Mol Evol. 1980]Genetics. 2004 Jun; 167(2):949-58.
[Genetics. 2004]Mol Cell Biol. 1993 Apr; 13(4):2172-81.
[Mol Cell Biol. 1993]Proc Natl Acad Sci U S A. 1995 Dec 19; 92(26):12490-4.
[Proc Natl Acad Sci U S A. 1995]Proc Natl Acad Sci U S A. 2002 Oct 15; 99(21):13431-6.
[Proc Natl Acad Sci U S A. 2002]Mol Cell. 1998 Apr; 1(5):685-96.
[Mol Cell. 1998]Mol Cell Biol. 2003 Jul; 23(14):4814-25.
[Mol Cell Biol. 2003]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Science. 1998 Oct 23; 282(5389):699-705.
[Science. 1998]Proc Natl Acad Sci U S A. 2002 Oct 15; 99(21):13431-6.
[Proc Natl Acad Sci U S A. 2002]BMC Evol Biol. 2003 Aug 28; 3():19.
[BMC Evol Biol. 2003]Mol Cell Biol. 1997 Mar; 17(3):1152-9.
[Mol Cell Biol. 1997]Science. 1998 Oct 23; 282(5389):699-705.
[Science. 1998]Mol Cell Biol. 1993 Apr; 13(4):2172-81.
[Mol Cell Biol. 1993]Mol Cell Biol. 2003 Jul; 23(14):4814-25.
[Mol Cell Biol. 2003]Science. 2003 Jul 4; 301(5629):71-6.
[Science. 2003]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Bioinformatics. 1999 Jul-Aug; 15(7-8):607-11.
[Bioinformatics. 1999]Science. 2003 Jul 4; 301(5629):71-6.
[Science. 2003]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Genome Res. 2002 Oct; 12(10):1599-610.
[Genome Res. 2002]Genome Res. 2005 Feb; 15(2):205-13.
[Genome Res. 2005]Mol Cell Biol. 1993 Apr; 13(4):2172-81.
[Mol Cell Biol. 1993]Mol Cell Biol. 1995 Jun; 15(6):2955-61.
[Mol Cell Biol. 1995]Genes Dev. 1994 Apr 1; 8(7):796-810.
[Genes Dev. 1994]Mol Cell Biol. 1997 Mar; 17(3):1152-9.
[Mol Cell Biol. 1997]Proc Natl Acad Sci U S A. 1991 Aug 15; 88(16):7160-4.
[Proc Natl Acad Sci U S A. 1991]PLoS Biol. 2004 Feb; 2(2):E29.
[PLoS Biol. 2004]Genome Res. 2005 Feb; 15(2):205-13.
[Genome Res. 2005]Nucleic Acids Res. 2001 Jun 15; 29(12):2471-8.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2002 Mar 1; 30(5):1255-61.
[Nucleic Acids Res. 2002]BMC Bioinformatics. 2004 Jan 21; 5():6.
[BMC Bioinformatics. 2004]Proc Natl Acad Sci U S A. 1974 Jul; 71(7):2843-7.
[Proc Natl Acad Sci U S A. 1974]Mol Biol Evol. 2002 Jul; 19(7):1114-21.
[Mol Biol Evol. 2002]BMC Evol Biol. 2003 Aug 28; 3():19.
[BMC Evol Biol. 2003]Genome Res. 2004 Mar; 14(3):367-72.
[Genome Res. 2004]Science. 2003 Feb 28; 299(5611):1391-4.
[Science. 2003]Mol Cell Biol. 1993 Apr; 13(4):2172-81.
[Mol Cell Biol. 1993]Genome Biol. 2004; 5(9):R61.
[Genome Biol. 2004]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]J Mol Evol. 1985; 22(2):160-74.
[J Mol Evol. 1985]Science. 2003 Jul 4; 301(5629):71-6.
[Science. 2003]Nature. 2003 May 15; 423(6937):241-54.
[Nature. 2003]Bioinformatics. 1999 Jul-Aug; 15(7-8):607-11.
[Bioinformatics. 1999]BMC Evol Biol. 2003 Aug 28; 3():19.
[BMC Evol Biol. 2003]Bioinformatics. 1999 Jul-Aug; 15(7-8):563-77.
[Bioinformatics. 1999]Mol Biol Evol. 2003 May; 20(5):703-14.
[Mol Biol Evol. 2003]J Mol Evol. 1980 Dec; 16(2):111-20.
[J Mol Evol. 1980]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Gene. 1986; 45(3):299-310.
[Gene. 1986]Yeast. 1995 Apr 15; 11(4):355-60.
[Yeast. 1995]Science. 1998 Oct 23; 282(5389):699-705.
[Science. 1998]Proc Natl Acad Sci U S A. 2002 Oct 15; 99(21):13431-6.
[Proc Natl Acad Sci U S A. 2002]