![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||
Copyright © 2007 by The National Academy of Sciences of the USA Applied Mathematics, Evolution Defining functional distance using manifold embeddings of gene ontology annotations *Department of Mathematics, University of Minnesota, Minneapolis, MN 55455; and ‡Program in Bioinformatics, Boston University, Boston, MA 02215 †To whom correspondence may be addressed. E-mail: lerman/at/umn.edu or Email: borya/at/bu.edu Communicated by Ronald R. Coifman, Yale University, New Haven, CT, April 9, 2007. Author contributions: G.L. and B.E.S. performed research and wrote the paper. Received June 7, 2006. This article has been cited by other articles in PMC.Abstract Although rigorous measures of similarity for sequence and structure are now well established, the problem of defining functional relationships has been particularly daunting. Here, we present several manifold embedding techniques to compute distances between Gene Ontology (GO) functional annotations and consequently estimate functional distances between protein domains. To evaluate accuracy, we correlate the functional distance to the well established measures of sequence, structural, and phylogenetic similarities. Finally, we show that manual classification of structures into folds and superfamilies is mirrored by proximity in the newly defined function space. We show how functional distances place structure–function relationships in biological context resulting in insight into divergent and convergent evolution. The methods and results in this paper can be readily generalized and applied to a wide array of biologically relevant investigations, such as accuracy of annotation transference, the relationship between sequence, structure, and function, or coherence of expression modules. Keywords: kernel methods, diffusion geometry, domain evolution, functional annotation, homology modeling One of the fundamental questions in biology deals with the inter-relationship between structure, function and evolution. The need to precisely and quantitatively measure evolutionary relationships encouraged the development of robust and accurate sequence (1) and structure (2, 3) comparison methods. The importance of these algorithms to computational biology cannot be underestimated. For example, the efficacy of transferring functional annotation depends on the precision of these sequence and structure comparison algorithms (4, 5). Although significant progress has been made in defining distance between sequences and structures, a rigorous understanding of functional distance is still limited. At first glance, the notion of functional distance is qualitative and subjective. The development of annotation systems that depict function in a machine readable format was the first step in treating functional annotation rigorously. For example, the Gene Ontology (6) (GO) has become the gold standard for describing molecular functions of genes and proteins. However, the GO is not naturally amenable to measuring distance. One complication is an intrinsic bias in annotation where large numbers of unrelated genes share the same annotation (ATPase), making those categories uninformative. Previous attempts at identifying functional relationships between genes focused mostly on calculating statistical over-representation of functional categories (7). These methods are well suited for quantifying coherence of function in sets of genes, but not useful for exploring structure–function or sequence–function relationships. Recently, researchers have recognized the importance of measuring distance between annotations (8) and proposed a simple measure of distance using the shortest path algorithm (9). However, these kinds of distances lack resolution and are complicated by somewhat arbitrary characteristics of the ontology, e.g., when annotations on the same level differ in their degree of generality. Accordingly, we show that functional metrics based on shortest path algorithms perform significantly worse than methods based on diffusion-type manifold embedding (10) proposed in this work. Defining distances between functional categories is integrally important due to potential insights into the coevolution of sequence, structure and function (11). For example, function broadly defined as all activities performed by a set of sequences that fold into a domain structure, can be represented as a weighted subgraph of the GO directed acyclic graph (DAG) (12). This representation of function was used to establish the importance of considering homology relationships in a phylogenetic context. In this paper, we introduce more accurate and sensitive functional distances based on diffusion-type manifold embeddings of GO annotations to explore the structure–function relationship in detail. Manifold embedding techniques are based on kernels (see definition in Materials and Methods), which have already been successfully applied to various problems in bioinformatics (13). In particular, computational approaches aimed at integrating various data sets have explored the effect of adding GO kernels for use in subsequent classification by SVM (14). Although our approach also employs kernels defined on GO, there are several fundamental differences. Most importantly, we apply these kernels to quantify functional distances as opposed to applications centered on classification of data into specific categories. Moreover, our approach naturally extends the notion of functional distance to protein domains by using the geometric interpretation of the manifold embedding (see Materials and Methods). Finally, we apply functional distances to exploring coevolution of sequence, structure, and function. Functional distances defined here via diffusion-type manifold embedding techniques allow for increased sensitivity and arbitrary levels of granularity. Using our measures of functional distance, we can estimate the average divergence of function with respect to structure, sequence or phylogenetic similarity. Although clearly an area of active research, we show that functional distances are already accurate enough to discover specific relationships between protein domain functions. Finally, we show how functional distances can be used to explore divergent as well as convergent evolution. Results Defining Functional Distance. The molecular function component of the GO represents functional annotations as nodes on a DAG (6). We can capitalize on the hierarchical structure of the DAG to define local distances between functional annotations. Consider that there are only 20 possible annotations at the top, and >2,000 on the fifth level of the Gene Ontology. Thus, comparison at the top level of hierarchy will be, by design, less precise than at the bottom level. One way to address this would be to model the inherent bias of the ontology by taking into account node usage (14). For example, consider a case where a large proportion of proteins are coannotated with a pair of GO terms, the distance between these nodes on the GO DAG will be large because their cooccurrence is not specifically correlated with shared function. Thus, the basic idea behind building an appropriate kernel is that GO terms shared by few protein sequences will be assigned small local distances or equivalently high values of local similarities. Alternatively, general annotations appearing at the top of the ontology will be assigned large local distances (or small similarities). Using the intuition outlined above, we form a graph where weights represent local similarities and use several techniques of manifold and graph embedding to calculate global distances between functional annotations. Embedding strategies exploit the underlying geometry of the graph and can implicitly correct ambiguities in the ontology. Finally, we use a global measure of distance between GO terms in combination with representation of domain function as a GO subgraph (12) to compute meaningful functional distances between protein domains [see Materials and Methods and supporting information (SI) Text]. Correlating Functional Distance with Sequence, Structure, and Phylogenetic Proximity. We use the well known correlations of function with sequence, structure (12), and phylogenetic profiles (15) to evaluate the efficacy of using manifold embedding to quantify functional relationships between domains. The embedding procedure involves defining local similarity weights as described above and using them to form a kernel (the types of kernels used here and their direct relation to the notion of manifold embedding are described in Materials and Methods). The choice of kernel is arbitrary, but integrally important in the definition of distance. Thus, we compared the performance of several kernels in their ability to accurately represent functional distance between protein domains. We report results for four different choices of kernels. The first three are formed by diffusion-type kernels, whereas the fourth is similar to previously proposed shortest distance between GO annotations (9). We use Z scores (2) from DALI (16) to quantify structural proximity, BLAST (1) for sequence similarity and mutual information (MI) between phylogenetic profiles (15) for phylogenetic similarity (see Materials and Methods). We find that functional distances between protein domains calculated using diffusion-type kernels correlate well with sequence alignment, structural proximity and phylogenetic similarity (Fig. 1
One thing to note from Fig. 1
We find that the differences in the observed correlations between functional distances derived from each kernel and sequence, structure and phylogenetic proximity measures can provide insight into the behavior of the kernel at different scales of resolution. The chosen diffusion-type kernels (pseudoinverse of the graph Laplacian, LLE and diffusion powers) emphasize different ranges of interaction between GO annotations and consequently result in range-specific resolutions. Specifically, the LLE kernel corresponds to a low power of diffusion and thus emphasizes shorter-range interactions between annotations. The diffusion kernel of power m = 7, represents a functional distance with good resolution at medium distances because it takes into account larger paths along the unified GO annotation graph. Consequently, the range of approximately linear correlation with sequence alignment shortens. At last, the inverse Laplacian takes into account all powers of diffusion and thus incorporates all paths along the unified GO annotation graph. Therefore, it has impressive resolution at longer functional distances. Consistent with the explanation presented above, both the structural alignment and sequence alignment show increasingly sharp transitions when applying the LLE kernel, followed by the diffusion kernel with power m = 7 and at last the inverse Laplacian kernel (Table 1). Thus, manifold embedding of GO can produce a functional distances at needed resolution by choosing a kernel appropriate to the specific application. For maximum resolution at small functional distances, the LLE kernel is most appropriate, whereas maximum resolution at long distances can be achieved by using the inverse Laplacian kernel. However, as expected, the qualitative behavior of the correlations remains the same for all choices of diffusion kernels. Building a Functional Domain Universe Graph. Next, we wanted to explore whether our definition of functional distance that correlates on average with sequence, structural, and phylogenetic similarities (Fig. 1
Two things become immediately apparent from functional embedding of the protein domain universe. First, at short functional distances, domains sharing fold classification form clusters sharing common function. Second, at intermediate functional distances, clusters of domains with related functions are proximal on the graph. For example, DNA-binding domains form a cluster that is close to the cluster containing exonuclease domains and transcription factor domains. As another example, Rossman fold domains performing oxidoreductase activity are separated by only one step from domains with dehydrogenase activity. Although the graph shows separation of domains by fold and function, the structure–function relationship is clearly multifaceted. Functional clusters are not entirely monochromatic, e.g., functions are usually fulfilled by domains of several different folds. Some folds are also multifunctional and appear in clusters that are far from each other, e.g., Ferredoxins. Other folds are more functionally exclusive and only participate in clusters that are in close proximity, e.g., TIM beta/alpha barrel are mostly enzymatic functions. Finally, it appears that this representation of relationships between protein domain functions captures the separation of folds into functionally related superfamilies (17). Exploring Structure–Function Coevolution. Interestingly, there are certain domains that link proximal clusters. These domains may represent the intermediates in the evolutionary path from one function to another. For example, consider two clusters (labeled B and E on Fig. 2 Clearly, sequence binding specificity is not explicitly described by GO. However, the 3-helical bundles are a remarkable example of how GO embedding and the subsequent graph theoretical treatment can uncover relationships between structures by placing their functions in biological context. Subsequent application of evolutionary trace methods to the three families can uncover the residues responsible for the differential binding specificity of the 3-helical bundles and their mutational dynamics. Specificity of DNA binding in 3-helical bundle domains is an example of divergent evolution where sequences are related by common ancestry (22). On the other hand, convergent evolution is often defined as two proteins with no apparent homology performing the same function (22). An additional benefit of defining functional distances is that we can easily detect instances of convergence by examining domains with close functional distance and no structural similarity. For example, using functional distances, we easily confirmed the well documented case of convergence of tRNA synthases [1pys (23) and 1a8h (24), F score = 0.001 and Z score < 2]. Discussion Machine readable representations of function, e.g., GO, are a necessary first step toward high-throughput functional annotation of data from whole-genome sequencing and structural genomics projects. Although these databases represent an intuitively appealing representation of function, they are not immediately amenable to accurate definitions of functional distance. Using nonlinear manifold embedding techniques, we were able to define distances between functional annotations and use those to quantify distances between protein domains. We find that diffusion kernels perform remarkably well in creating an accurate global distance metric applicable to quantifying functional relationships between protein domains. As an example of specific insights that can be uncovered using the proposed distance metric, we explore functional relationships between 3-helical bundle domains which form two clusters in function space. These functional clusters turn out to be separable by the specificity of DNA binding. The family of sequences that are functionally similar to both clusters binds with intermediate specificity. We were also able to confirm examples of convergence where domains sharing close functional proximity appear to have evolved independently. Further exploration of this representation of the protein domain universe will undoubtedly uncover many more insights into the relationship between evolution of structure and function. Kernel-based functional distance metrics have several important advantages over previously described methods (14), Euclidean measures (12), and shortest path algorithms (9). First, the diffusion-type manifold embedding techniques give rise to distances taking into account both the geometry of the ontology and intrinsic biases in annotations in a robust way (insensitive to small amounts of noise). In particular, distances between subgraphs of annotation (e.g., those representing protein domains) have a clear geometric interpretation. Secondly, manifold embedding learns distances between annotations, rather than using kernels for classifications or defining distances between genes. Consequently, this approach is more natural for evaluating and comparing relationships between sequence, structure, and function as opposed to previous metrics that focused on applying GO kernels as part of a heterogeneous dataset for classification of protein–protein interactions (14). As a result, these methods are significantly more general and can be applied in calculations of functional distances between arbitrary numbers of genes. Additionally, techniques presented here can be easily adapted to other ontologies. Finally, correlations with sequence, structure (2, 15) and phylogenetic proximity (Fig. 1 Having the ability to estimate “distance” in function space is fundamental to computational biology in the postgenomic era. A variety of computational tasks including assessment of annotation accuracy from homology modeling and module detection from microarray data can be facilitated by an accurate measurement of functional relationship between genes. Materials and Methods The GO DAG (6) can be found at www.geneontology.org. For structural proximity calculations, we use the Dali domain dictionary (2). The list of domains (3306) can be found at romi.bu.edu/kernel_mapping/dali.txt. We use ASTRAL (25) to determine the SCOP (17) annotation for each domain. We use BLAST (1) to compare domain sequences. Matlab codes computing the following functional distances between annotations and protein domains can be found in www.math.umn.edu/~lerman/supp/protein_distance. More specific details of the methods are discussed in SI Text. Annotating Each Structure as a Subgraph on the GO. Local Similarities Between GO Annotations. Formally, we form a unified graph G whose nodes are all annotation of GO appearing in protein domains and whose edges are the union of all edges of subgraphs representing protein domains. The local similarity weight wij on an edge connecting annotation i and j is defined as follows: wij = 1/nij where nij is the number of domain subgraphs containing that edge. Similarities by Diffusion and LLE Kernels. A (positive definite) kernel K for the unified graph is a real symmetric matrix whose size is N, the number of vertices of the unified graph, and whose eigenvalues are nonnegative. Its elements Ki,j represent local similarities between corresponding graph nodes (i and j). The diffusion kernels are based on local diffusion process on the unified graph. We first normalize the local similarity weights defined above by the degree matrix D, which is defined as follows:
Distances Between Annotations and Their Relation to Manifold Embedding. Given a kernel K, we compute the distance d(x, y) between GO annotations x and y as follows:
F(x), F(y) , where F embeds the graph vertices into a Euclidean space (usually referred to as feature space). Consequently, Eq. 1 can be written as:
Assuming that the graph approximates a low-dimensional manifold or another continuous geometric structure, we view the graph embedding, F, as an approximation to a corresponding manifold embedding. The embedding and its corresponding distance are determined by the choice of kernel, which reflects geometric properties of the underlying graph or manifold. Indeed, when applying the diffusion kernel of power m (10), the corresponding distances measure the rate of connectivity between vertices according to paths of length m. The distances obtained by the inverse Laplacian represent the expected time to travel from one vertex to another vertex and then back to the original vertex (27). The LLE distance is similar to a diffusion kernel with low powers. The corresponding LLE embedding tries to preserve local distances to nearest points along the graph (see SI Text). In the SI Text, we discuss efficient numerical evaluation of the functional distances for different kernels and large N. The geodesic distances were calculated using Dijkstra's algorithm on the global GO graph (with local distances nij). Distances Between Subgraphs Representing Protein Domain Functions. Phylogenetic Similarity Between Protein Domains Based on Phylogenetic Profiles (P Score). We evaluate the phylogenetic similarity between structures by BLASTing (1) the set of nonredundant sequences found to fold into each domain against all fully sequenced genomes. The similarity between any two domains is then just the empirical mutual information, MI, between their phylogenetic profiles (15). If x and y are two phylogenetic profiles, then
Curve Fitting (Fig. 1 All curve fitting was done using Origin 7 SR1 (www.originlab.com). Exponential decay was modeled using the equation
Supporting Text
Acknowledgments We thank Mark Green and Institute for Pure and Applied Mathematics (University of California, Los Angeles) for inviting us to participate in a proteomics workshop, where we first met and started our discussion that led to this paper. G.L. thanks Ronald R. Coifman, Stephane Lafon, and Mauro Maggioni for introducing him to diffusion geometries and for forwarding him some of their papers and software. B.S. thanks Eugene Shakhnovich, Nick Grishin, Tim Reddy, and Joe Mellor for fruitful discussions and critical reading of the manuscript. G.L. is supported by National Science Foundation Grant 0612608. Footnotes The authors declare no conflict of interest. This article contains supporting information online at www.pnas.org/cgi/content/full/0702965104/DC1. References 1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Nucleic Acids Res. 1997;25:3389–3402. [PubMed] 2. Dietmann S, Holm L. Nat Struct Biol. 2001;8:953–957. [PubMed] 3. Shindyalov IN, Bourne PE. Nucleic Acids Res. 2001;29:228–229. [PubMed] 4. Sauder JM, Arthur JW, Dunbrack RL., Jr Proteins. 2000;40:6–22. [PubMed] 5. Gerstein M, Levitt M. Protein Sci. 1998;7:445–456. [PubMed] 6. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Nat Genet. 2000;25:25–29. [PubMed] 7. Berriz GF, King OD, Bryant B, Sander C, Roth FP. Bioinformatics. 2003;19:2502–2504. [PubMed] 8. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. Science. 2003;302:449–453. [PubMed] 9. Lord PW, Stevens RD, Brass A, Goble CA. Bioinformatics. 2003;19:1275–1283. [PubMed] 10. Coifman RR, Lafon S, Lee AB, Maggioni M, Nadler B, Warner F, Zucker SW. Proc Natl Acad Sci USA. 2005;102:7432–7437. [PubMed] 11. Shakhnovich BE, Max Harvey J. J Mol Biol. 2004;337:933–949. [PubMed] 12. Shakhnovich BE. PLoS Comput Biol. 2005;1:e9. [PubMed] 13. Schölkopf B, Tsuda K, Vert J-P. Kernel Methods in Computational Biology. Cambridge, MA: MIT Press; 2004. 14. Ben-Hur A, Noble WS. Bioinformatics. 2005;21(Suppl 1):i38–i46. [PubMed] 15. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Proc Natl Acad Sci USA. 1999;96:4285–4288. [PubMed] 16. Dietmann S, Park J, Notredame C, Heger A, Lappe M, Holm L. Nucleic Acids Res. 2001;29:55–57. [PubMed] 17. Andreeva A, Howorth D, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Nucleic Acids Res. 2004;32:D226–D229. [PubMed] 18. Tanaka Y, Nureki O, Kurumizaka H, Fukai S, Kawaguchi S, Ikuta M, Iwahara J, Okazaki T, Yokoyama S. EMBO J. 2001;20:6612–6618. [PubMed] 19. Tucker-Kellogg L, Rould MA, Chambers KA, Ades SE, Sauer RT, Pabo CO. Structure (London). 1997;5:1047–1054. 20. Yang W, Steitz TA. Cell. 1995;82:193–207. [PubMed] 21. Graham KS, Dervan PB. J Biol Chem. 1990;265:16534–40. [PubMed] 22. Ponting CP, Russell RR. Annu Rev Biophys Biomol Struct. 2002;31:45–71. [PubMed] 23. Mosyak L, Reshetnikova L, Goldgur Y, Delarue M, Safro MG. Nat Struct Biol. 1995;2:537–547. [PubMed] 24. Sugiura I, Nureki O, Ugaji-Yoshikawa Y, Kuwabara S, Shimada A, Tateno M, Lorber B, Giege R, Moras D, Yokoyama S, Konno M. Structure (London). 2000;8:197–208. 25. Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. Nucleic Acids Res. 2004;32:D189–D92. [PubMed] 26. Holm L, Sander C. Bioinformatics. 1998;14:423–429. [PubMed] 27. Ham J, Lee DD, Mika S, Scholkopf B. Proceedings of the Twenty-First International Conference on Machine Learning; Menlo Park, CA: AAAI Press; 2004. pp. 47–54. 28. Roweis ST, Saul LK. Science. 2000;290:2323–2326. [PubMed] 29. Kondor RI, Lafferty J. Machine Learning: Proceedings of the Nineteenth International Conference (ICML); San Francisco: Morgan Kaufmann; 2002. pp. 315–322. 30. Belkin M, Niyogi P. Neural Computation. 2003;15:1373–1396. 31. Memoli F, Sapiro G. Found Comp Math. 2005;5:313–347. 32. Dubuisson MP, Jain AK. Proceedings of the 12th IAPR; Los Alamitos, CA: IEEE Comp Soc Press; 1994. pp. 566–568. 33. Koonin EV, Mushegian AR, Bork P. Trends Genet. 1996;12:334–336. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||
Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nat Struct Biol. 2001 Nov; 8(11):953-7.
[Nat Struct Biol. 2001]Nucleic Acids Res. 2001 Jan 1; 29(1):228-9.
[Nucleic Acids Res. 2001]Proteins. 2000 Jul 1; 40(1):6-22.
[Proteins. 2000]Protein Sci. 1998 Feb; 7(2):445-56.
[Protein Sci. 1998]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Bioinformatics. 2003 Dec 12; 19(18):2502-4.
[Bioinformatics. 2003]Science. 2003 Oct 17; 302(5644):449-53.
[Science. 2003]Bioinformatics. 2003 Jul 1; 19(10):1275-83.
[Bioinformatics. 2003]Proc Natl Acad Sci U S A. 2005 May 24; 102(21):7432-7.
[Proc Natl Acad Sci U S A. 2005]J Mol Biol. 2004 Apr 2; 337(4):933-49.
[J Mol Biol. 2004]PLoS Comput Biol. 2005 Jun; 1(1):e9.
[PLoS Comput Biol. 2005]Bioinformatics. 2005 Jun; 21 Suppl 1():i38-46.
[Bioinformatics. 2005]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Bioinformatics. 2005 Jun; 21 Suppl 1():i38-46.
[Bioinformatics. 2005]PLoS Comput Biol. 2005 Jun; 1(1):e9.
[PLoS Comput Biol. 2005]PLoS Comput Biol. 2005 Jun; 1(1):e9.
[PLoS Comput Biol. 2005]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Bioinformatics. 2003 Jul 1; 19(10):1275-83.
[Bioinformatics. 2003]Nat Struct Biol. 2001 Nov; 8(11):953-7.
[Nat Struct Biol. 2001]Nucleic Acids Res. 2001 Jan 1; 29(1):55-7.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Nucleic Acids Res. 2001 Jan 1; 29(1):55-7.
[Nucleic Acids Res. 2001]Science. 2000 Dec 22; 290(5500):2323-6.
[Science. 2000]Proc Natl Acad Sci U S A. 2005 May 24; 102(21):7432-7.
[Proc Natl Acad Sci U S A. 2005]J Mol Biol. 2004 Apr 2; 337(4):933-49.
[J Mol Biol. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D226-9.
[Nucleic Acids Res. 2004]EMBO J. 2001 Dec 3; 20(23):6612-8.
[EMBO J. 2001]Cell. 1995 Jul 28; 82(2):193-207.
[Cell. 1995]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D226-9.
[Nucleic Acids Res. 2004]EMBO J. 2001 Dec 3; 20(23):6612-8.
[EMBO J. 2001]Cell. 1995 Jul 28; 82(2):193-207.
[Cell. 1995]J Biol Chem. 1990 Sep 25; 265(27):16534-40.
[J Biol Chem. 1990]Annu Rev Biophys Biomol Struct. 2002; 31():45-71.
[Annu Rev Biophys Biomol Struct. 2002]Nat Struct Biol. 1995 Jul; 2(7):537-47.
[Nat Struct Biol. 1995]Bioinformatics. 2005 Jun; 21 Suppl 1():i38-46.
[Bioinformatics. 2005]PLoS Comput Biol. 2005 Jun; 1(1):e9.
[PLoS Comput Biol. 2005]Bioinformatics. 2003 Jul 1; 19(10):1275-83.
[Bioinformatics. 2003]Nat Struct Biol. 2001 Nov; 8(11):953-7.
[Nat Struct Biol. 2001]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Nat Struct Biol. 2001 Nov; 8(11):953-7.
[Nat Struct Biol. 2001]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D189-92.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D226-9.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]PLoS Comput Biol. 2005 Jun; 1(1):e9.
[PLoS Comput Biol. 2005]Bioinformatics. 1998 Jun; 14(5):423-9.
[Bioinformatics. 1998]Proc Natl Acad Sci U S A. 2005 May 24; 102(21):7432-7.
[Proc Natl Acad Sci U S A. 2005]Science. 2000 Dec 22; 290(5500):2323-6.
[Science. 2000]Proc Natl Acad Sci U S A. 2005 May 24; 102(21):7432-7.
[Proc Natl Acad Sci U S A. 2005]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Trends Genet. 1996 Sep; 12(9):334-6.
[Trends Genet. 1996]