![]() | ![]() |
Formats:
|
||||||||||||||||||||||||
Copyright © 2007 McLaughlin et al; licensee BioMed Central Ltd. On the detection of functionally coherent groups of protein domains with an extension to protein annotation 1Department of Chemistry and Biochemistry, Center for Theoretical Biological Physics, University of California, San Diego, 9500 Gilman Drive La Jolla, CA 92093-0359, USA Corresponding author.William A McLaughlin: wimclaug/at/ucsd.edu; Ken Chen: kchen/at/watson.wustl.edu; Tingjun Hou: tihou/at/ucsd.edu; Wei Wang: wei-wang/at/ucsd.edu Received March 21, 2007; Accepted October 16, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation. Results Using a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast Saccharomyces cerevisiae. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition. Conclusion A new computational method was developed to identify groups of domains that are linked by a common function in the proteome of Saccharomyces cerevisiae. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins. Background Protein domains are sequential, structural, and functional units [1]. They perform and regulate catalysis, provide structural building blocks, and/or act as interaction mediators that link together cellular pathways. Protein domains can also be combined together to perform multifaceted functions [2-6]. For example, a DNA-binding domain can be combined with a dimerization domain to allow for cooperative DNA-binding [7]; and the SH2, SH3, and kinase domains can be combined to facilitate signal transduction [8]. A protein can be better characterized by the function of its domain combination rather than the functions of its individual domains. That deduction has been corroborated by the observation that function is better conserved across multi-domain proteins than across single domain proteins [9]. There are a variety of methods available for the identification of functional domain combinations based on protein sequence information, and these methods vary in both the scope of the combinations identified and in their applications. As examples, domain combinations can be identified by finding domain fusion pairs [10,11], prevalently co-occurring protein domains within individual protein sequences [12], densely interconnected domains within the protein domain networks [6,13], and domains that co-occur along particular stretches of the translated genome [14]. The methods can therefore be limited to the identification of combinations that occur within individual proteins, within a densely linked domain network, or within particular genomic regions (see Discussion for details). Further effort is needed to automatically identify groups of domains that perform particular cellular functions and automatically provide annotation to these groups. For the present study, a systematic method was developed to automatically identify functionally coherent groups of protein domains, referred to as domain assembly units or DASSEM units, and their corresponding functions. The method employed a soft-margin clustering technique that was guided by singular value decomposition (SVD). SVD is often used to capture the significant variance in a large dataset, and here it was used to retrieve the highly prevalent domain combinations found in an adjacency matrix of proteins versus domains. The prevalent domain combinations were clustered such that, when necessary, a domain was assigned to multiple groups in order to reflect the fact it can participate in different functions. Note that the clustering method is similar in spirit to the fuzzy k-means clustering method used for the extraction of coherent expression patterns from microarray experiments [15]. The current method was applied to the protein/domain complement of Saccharomyces cerevisiae, and 114 functionally coherent groups of domains, referred to as domain assembly units (DASSEM units), were identified. The functions of the units included a broad range of cellular tasks such as chromatin modification, carbohydrate transport, translation, and ubiquitin-dependent protein catabolism. Within the units, the functional linkages among the domains were demonstrated in three ways. First, there was a significant enrichment of Gene Ontology (GO) terms in proteins contributing domains to the units, which suggested that the domains were used in a functionally coherent manner. Second, the domains of DASSEM units were shown by manual review to be utilized in a rational way to facilitate particular cellular processes. Third, DASSEM units overlapped significantly with transcription modules, defined as groups of genes that share the same expression pattern under a set of cellular conditions. Such overlap further confirmed the functional coherence of DASSEM units. We found that the functional linkages within DASSEM units can allow for the prediction of protein function based on domain composition. Since the transfer of annotation from a DASSEM unit to a protein of unknown function does not require high sequence homology between the unknown protein and an annotated one [16], the method can be regarded as a non-homology-based method for functional annotation [17,18]. Non-homology based methods for protein functional annotation include phylogenetic profiling [19-21], chromosome proximity [22,23], text mining [24,25], domain fusion pairs analysis [11,26,27], and combination of these predictors such as mRNA co-expression and phylogenetic profiling [28]. Databases and annotation tools such as Prolinks [29], STRING [30], and Predictome provide an consolidation of methods for predicting the function of a protein using non-homology-based methods [26]. These methods have the property that a protein of unknown function can be annotated based on its biological context. i.e. how it relates to proteins of known function [16,31]. In a conceptually similar way, the DASSEM unit provides functional annotation by placing a protein in the context of domain combination that has been functionally characterized. DASSEM units thereby provide an additional means to predict protein function. Also, the annotation of DASSEM units can partially overcome the inaccuracy or incomplete annotation of individual domains. Results Derivation of domain assembly units A domain assembly unit can be viewed as a group of domains that are linked together by domain fusion events or events that cause domains that function together to be placed in the same protein. For example, if domain A is fused with domain B in one protein while domain B is fused with domain C in a second protein, domains A and C can be functionally linked. If multiple instances are found where A is fused to B and B is fused to C, the functional inference between A and C can be strengthened, and as a consequence the two proteins containing domains A and C can be found to participate in the same biological process (see below for examples). The cycle can continue with more domain fusions and lead to a larger group of functionally linked domains/proteins. These groups are required for higher order cellular functions. Shown in Figure Figure11
We derived 114 DASSEM units from the domain content of protein sequences within the proteome of Saccharomyces cerevisiae. Shown in Table 1 are six example units; and full lists of the units along with all their functional annotation are available online in the supplementary material [33]. The number of domains per unit ranged from 1 to 11, and the average was 3.3. Twenty six of the units contained only one domain. These domains had a high prevalence in proteins but a relatively low degree of co-occurrence with other domains. Although technically not combinations, the annotation of these domains can provide insight into their function. An example is the DASSEM unit which consists of the amino acid permease domain [PFAM:PF00324]. All 22 proteins that contributed domains to the unit contain only this one domain, which implies that the domain may not require a cooperation with any other domain or it may be too large (~500 amino acids) to be fused with other domains within a protein sequence.
The functional annotation for each DASSEM unit was automatically generated by finding the Gene Ontology (GO) terms that were enriched (p-value < 0.05) across the proteins associated with the unit (Figure (Figure1)1 Cooperative nature of domains within DASSEM units The domains within the DASSEM units cooperate with one another in order to achieve a particular function, which was made apparent through an examination of the functional roles of proteins associated with the units. Three example DASSEM units are described in following in order to show how such cooperation may occur. The first example, DASSEM 1 in Table 1, participates in the cell cycle and consists of three domains: fork head associated (FHA) domain [PFAM:PF00498], the fork head domain [PFAM:PF00250], and the protein kinase domain [PFAM:PF00069]. The proteins associated with the unit are listed in Figure Figure1.1 With regard to the cooperation of the domains of the unit, they have been modeled to control the cell cycle progression through the G2/M phase [43]. Specifically the model proposed that the phosphorylation of the protein Ndd1 by Cbl kinase allows it to bind to the FHA domain of the protein Fkh2. The binding of Ndd1 facilitates an active transcriptional complex formed between Ndd1, Fkh2 (which also contains the fork head DNA-binding domain), and the protein MCM1. The model has been confirmed by in a detailed study using phosphylation, binding, and transcription assays [44]. Note that as shown in Figure Figure11 A second DASSEM unit example is involved in the process of transcription regulation (p-value 8.13 × 10-17) and consists of four domains: the Fungal Zn(2)-Cys(6) binuclear cluster domain [PFAM:PF00172], the Fungal specific transcription factor domain [PFAM:PF04082], the Gal4-like dimerization domain [PFAM:PF03902], and the PAS domain [PFAM:PF00989]. The first three domains are the components of the GAL4 family transcription factors [45]. The binuclear cluster domain is relatively small and binds zinc to provide high structural stability for DNA-recognition [46]. The transcription factor domain binds DNA in a sequence specific manner [47], and the dimerization domain provides an interface for dimerization via a leucine zipper so that two proteins can bind to DNA cooperatively [7]. The PAS domain is involved in sensing stimuli such as the redox state of the cell [48], and can regulate transcription factor activity by facilitating dimerization [49]. The DASSEM unit implies a model of how the PAS domain can regulate transcription: a conformational change in the PAS domain induced by a change in redox state of the cell can allow the PAS domain to facilitate dimerization of GAL4 like transcription factors, which promotes the transcription of target genes. A third example DASSEM unit combines domains of the ABC transporter domain cassette with the 4Fe-4S binding domain [PFAM:PF00037] and the metal-binding RNase L inhibitor domain [PFAM:PF04068]. The domains of ABC transporter cassette hydrolyzes ATP to facilitate the active transport of allocrites (ions or small molecules) against their concentration gradient through cellular membranes [50]. When the cassette is used in conjunction with the 4Fe-4S binding domain, it transports iron into the cell for the assembly of the 4Fe-4S cluster within the domain [51]. One of the functions of the 4Fe-4S cluster domain is to detect oxidatively damaged DNA [52,53], and when it is combined with the RNase L inhibitor domain, a role in DNA/RNA metabolism has been proposed [54]. The DASSEM unit pieces together a mechanism of iron transport and DNA repair: iron transport by an ABC transporter cassette allows for the assembly of the 4Fe-4S cluster which in turn lends DNA-binding capability to proteins involved in oxidative repair of DNA. DASSEM units are utilized in transcription modules To further demonstrate the functional linkages among the domains within the DASSEM units, the units were shown to be utilized within transcription modules, defined as groups of genes that share the same expression pattern under a particular set of conditions and presumably have coherent functions [55,56]. For each of the 86 transcription modules defined by Ihmels et al., we first identified DASSEM units that contained domains also present in the module. A Venn diagram shows an example transcription module with the DASSEM units that had the highest overlap scores (Figure (Figure2)2
To further demonstrate the overlap between of the DASSEM units with the transcription modules, the DASSEM units with the highest overlap scores with the transcription modules were examined. The distribution of the overlap scores for the highest overlapping DASSEM units with the modules is shown in Figure Figure3,3
The difference in the overlap between the original versus the randomized modules shown in Figure Figure33
As illustrated in Figure Figure1,1 To further validate the functional coherence of the DASSEM units, a comparison of the GO term enrichment for the DASSEM units, the transcription modules and the randomized modules was made. The number of terms found below a p-value threshold of 0.05 with all GO term categories were considered was 2295 for the DASSEM units (median p-value 1.03*10-4), 3346 for the transcription modules (median p-value 9.62*10-5) and 240 for the randomized modules (median p-value 1.2*10-2). The plot shown in Figure Figure55
DASSEM units can be used to annotate proteins of unknown function Given that the domains within DASSEM units function together, we deduced that would they would be useful for the annotation of uncharacterized proteins and for the prediction of new functions of proteins (See Figure Figure6).6 In the following we manually review five example proteins and their functional predictions. For the examples, the predictions were corroborated by evidence in the literature and by the identification of similar functional annotation for their interaction partners [58-60]. Note the interaction partners were chosen that did not contain domains of the DASSEM unit. If they did they may have been used to derive the unit; and there would be a circular argument. The analysis therefore provides independent means of verifying the predicted annotation. One example of the use of DASSEM units for annotation is for the putative gene YBR025C. The protein product of the gene has two domains MMR_HSR1 [PFAM:PF01926] and DUF933 [PFAM:PF06071], and contributes domains to the fourth DASSEM unit listed in Table 1. That unit is annotated as having GTPase activity, being involved in the process of ribosome-nucleus export, and localizing to mitochondrion. Evidence that the YBR025C protein has these functions comes from its interaction partners Sen15, Dbp8, Rrp4 and Nup16. Sen15 is localized both to the nuclear membrane and to mitochondrion [61,62], Dbp8 is involved in ribosome biogenesis [63], Rrp40 functions in ribosome assembly [64], and Nup116 is a subunit of the nuclear pore complex that allows for energy-dependent rRNA export from the nucleus [65]. The protein Fun30/YAL019W contains the Helicase C domain [PFAM:PF00271] and the SNF2_N domain [PFAM:PF00176]. Its associated unit (DASSEM unit 5 in Table 1) has the function of ATPase activity, is involved in the process of chromosome organization and biogenesis, and is located in the chromatin remodeling complex. The protein has no known function based on the SGD database [42]. Evidence that Fun30 functions in the chromatin remodeling complex and has ATPase activity is that it has partial homology (35% sequence identity) to protein Snf2, which is the catalytic subunit of the chromosome remodeling complex that has ATPase activity [66,67]. An interaction partner of Fun30 is the origin replication complex ORC5, the protein complex that initiates replication and is involved in chromatin silencing [68]. Fun30 also has a genetic interaction with Swc3, a component of the chromatin remodeling complex SWR1 [67]. A third example is the protein Stb4/YMR019W, which contains the fungal specific transcription factor domain [PFAM:PF04082] and Zn(2)-Cys(6) binuclear cluster domain [PFAM:PF00172]. According to its associated DASSEM unit (the second unit in the Table 1), the protein is a putative transcriptional regulator. The GO term annotation for the process category in SGD is biological process unknown. The Stb4 protein activates transcription in a two hybrid assay without fusion to the Gal4p activation domain [69]. Its interaction partners include TAF4 [70], a subunit of the TFIID protein involved in RNA polymerase II transcription initiation [71], and Sin3 which is part of a histone deactylase complex that regulates transcription [69,72]. The evidence from the literature suggests that Stb4 is a transcription factor, which corroborates the annotation made by the DASSEM unit. A fourth example is Dug2, which implicated in small nuclear RNA binding by a DASSEM unit. Such binding is corroborated by the fact the one of its interaction partners is Utp15, a small nuclear RNA binding protein [73,74]. The fifth example is the ORF YHL010C which, according to a DASSEM unit, is involved in ubiquitin-dependent protein catabolism. Additional evidence that the protein product of YHL010C has that function comes from the fact that it has remote homology (30% sequence identity) to the human protein Brap2, a ubiquitin E3 ligase [75]. Discussion From the analysis of the function linkages of DASSEM units and their utilization in transcription modules, a hierarchy of domain function was apparent. Domains combine to form functional units defined as the DASSEM units, and the DASSEM units are utilized together within transcription modules. Since the domains of a DASSEM unit can be contained within different proteins in a transcription module, DASSEM units represent a level of functional domain organization that goes beyond individual proteins. That level is necessary to provide more comprehensive functions since individual proteins are limited to contain approximately five domains [76]. The utilization of DASSEM units within transcription modules is in agreement with the results of domain fusion pair analysis [27], and extends those results. In the initial studies of domain fusion on the genomic scale [10,11], a functional linkage between two domains in separate proteins was implied when the two domains are contained within the same protein in a different species. The DASSEM units can be viewed as extended groups of domains linked by successive domain fusions that occurred throughout the evolutionary lineage of a single species. Further, Marcotte et al. deduced that one reason for domain fusion was to reduce the entropy of physical dissociation, and thereby increase a functional association [11]. In a similar way, a reduction in the entropy of physical dissociation of domains in DASSEM units may increase their functional productivity. The domain combinations of DASSEM units are unique in different respects, and in the following some example methods for deriving domain combinations are discussed for comparison. One method is to identify domain pairs or triplets enriched across different proteins, referred to as supradomains [12]. DASSEM units correspond to some of these supradomains, but can extend the combinations to include more functionally linked domains. For example one supradomain contains the translation domain and P-loop containing nucleotide triphosphate hydrolase domain [3]. These domains correspond to the Pfam domains GTP_EFTU protein synthesis factor domain [PFAM:PF0009] and GTP_EFTU_D2 elongation factor TU domain 2 [PFAM:PF03144] that are part of the sixth DASSEM unit listed in Table 1. In total that DASSEM unit consists of seven domains, of which six are contained within the proteins IF-2, EF-TU, and EF-G. These three proteins all bind to the same site on the ribosome [77]. Also, the LepA domain recently was shown to bind to the same site on the ribosome and be involved in back translation [78]. Now all of the domains of the DASSEM unit have been demonstrated to form a functional group that bind to the same site on the ribosome. The DASSEM unit was able to find more of that functional group since it is not confined to domain combinations within individual proteins, while all the domains of supradomains must lie within an individual protein sequence. A second method to find domain combinations is through microsyntenies of domains, i.e. their co-occurrence in small regions of the genome, found through a comparison of multiple genomes [14]. It was indicated that these combinations, referred to as domain teams, are suited for the analysis of prokaryotic genomes rather than eukaryotic genomes. Further they require multiple genomes to extract a functionally relevant group of domains. A third method to find domain combination is through the study of domain networks. Each node in a domain network represents a domain and an edge exists if two domains co-occur within a protein [6,13]. The groups of domains represented by the DASSEM units partially coincide with some of the clusters of domains within domain networks, but there are differences due to the way they were derived. The clusters from domain networks rely on there being a relatively large clustering coefficient, i.e. dense connections between domains within the group and/or the same interaction network is present in multiple species. In contrast, the present method delineates domain combinations found within a single species and do not necessitate there being a relatively high number of links from each domain to the rest of domains in the group. An example DASSEM unit was described in Figure Figure1.1 A concurrent study used The Discovery of All Significant Substructure (DASS) algorithm to find combinations in biological data [79]. The algorithm may be able to identify functionally linked domain combinations which are similar to DASSEM units, depending on the level of significance chosen to ensure functional linkage of all the domains in the group. Currently, the algorithm has been applied to find domain combinations containing the SH2 and PDZ domains [79]. There are a number of limitations of the current analysis. For example, the ability of DASSEM units to predict protein function of individual proteins was manual assessed and requires further validation. That validation is complicated by the fact that the function of DASSEM units can be distributed across more than one protein and may not yet be annotated at the individual protein level. In addition, for the current study a single proteome, Saccharomyces cerevisea, was analyzed; and that may limit the number of the functional groups identified. However, the use of data only from S. cerevisea ensured that relevant domain groups were found and that these groups were not contaminated by outside domains. Conclusion A method was developed to identify groups of functionally linked domains. Knowledge of the groups furthers our understanding of what domains are utilized in a cooperative manner to perform a variety of cellular tasks. The groups can also provide a means to annotate uncharacterized proteins. Methods Data collection The protein sequences and open reading frame (ORF) designations from proteins in Saccharomyces cerevisiae were retrieved from the UniProt database [80], which consisted of Swiss-Prot Release 47.6 and TrEMBL Release 30.6, both having the time stamp of 02-Aug-2005. These flat files were inserted into a MySQL database using a BioPerl module (Bio::SeqIO::swiss)[81]. An additional table contained the Pfam annotations downloaded from the Pfam resource [82]. An adjacency matrix, which is referred to as the A matrix, was then created consisting of proteins versus domains: n = 3,781 proteins by m = 1,753 domains. Within the matrix, if a protein contained one or more copies of a domain, the corresponding matrix entry was one, otherwise it was zero. Each row vector of the A matrix lists the domains contained by a protein and each column vector of the A matrix lists the proteins in which a domain is present. Identification of domain combinations by a SVD guided clustering method Each domain can be considered as a vector in the protein space (each dimension is a protein) and the coordinates of a domain in this space are decided by its occurrence in each protein. A domain combination, referred to as domain assembly unit (DASSEM unit) is a cluster of domains in the protein space. Because of the large number of proteins, identification of domain clusters in such a high-dimension space is a challenge. We employed a soft-margin clustering method that was guided by singular value decomposition (SVD) [83]. SVD is a widely used spectral analysis technique to capture the significant variance of the data. The soft margin clustering method was used so that each domain was assigned, if necessary, to multiple DASSEM units to reflect the fact that a domain may participate in different domain combinations to deliver different functions. SVD was performed on the A matrix using Matlab,
where U (a n by n matrix) and VT (a m by m matrix, the superscript T denotes the transposed matrix) are orthonormal, and S is a n by m diagonal matrix containing singular values S11 > S22 > ...... > Smm, n > m. Analogous to the SVD analyses on gene expression microarray data [84,85], the ith row of VT, The proteins were clustered in the eigendomain space using the K-means algorithm, and a mixture Gaussian model was built upon these initial clusters in order to assign proteins to multiple clusters [86]. For the K-means step, we initialized the centroids of clusters as the following. Proteins were first clustered using only the first two eigendomains, After the K-means clustering step, each protein was assigned to only one cluster. Since domains may participate in multiple combinations, it was important to allow flexible assignment of proteins, and their corresponding domains, to multiple clusters. We modeled each cluster obtained from K-means using a Gaussian distribution, centered at the centroid of the cluster. The variance of each Gaussian distribution was determined by maximizing the likelihood of the data, where K is the number of clusters, n is the number of proteins considered in the analysis, Overlap between DASSEM units and transcription modules To show how DASSEM units were utilized, the content of the units was compared to that found in groups of proteins in transcription modules. From a study by Imhels et al., 86 transcription modules were obtained [55,56]. An overlap score, OS, was calculated for each DASSEM unit that contained domains overlapped with a given transcription module. The overlap score was the fraction of domains within a DASSEM unit that are in common with the transcription module multiplied by the fraction of domains within the transcription module that are in common with the DASSEM unit: where Nc is the number of domains that are in common between the DASSEM unit and the module, NM is the total number of domains in the module, and ND is the number of domains in the DASSEM unit. The overlap score was used to provide an overall measure the degree of functional utilization of a DASSEM unit within a transcription module. It considers the degree to which the function of the DASSEM unit is utilized and the degree to which the function of the unit contributes to the module. DASSEM units were ranked according to their overlap scores for each module. From the highest ranking DASSEM units, four DASSEM unit collections were generated for each module by combining the domains in units ranked one and two, units ranked one through three, units ranked one through four, and units ranked one through five. The cumulative overlap score of each collection with each module was calculated. The process of finding overlapping DASSEM units and collections was done for each of the 86 co-expression modules. A randomization protocol was employed to compare the average overlap scores for the highest overlapping DASSEM units with the original modules versus random sets of proteins. The random sets of proteins were created by randomly redistributing the proteins among the transcription modules while keeping the number of proteins in each module constant. Since the proteins were randomized, the domains within a given protein remained together. Statistical significance of the overlap scores were estimated by the p-values of Student's t-tests comparing the average overlap scores of DASSEM units with the original transcription modules versus the random sets of proteins. The other control analyses were fully explained in the results section. Authors' contributions WAM participated in the design of the study, performed data analysis, interpreted the results and wrote the manuscript. KC participated in the design of the study and wrote the Matlab computer code used for clustering. TH participated in the design of the study and performed data analysis to test the utility of the clustering algorithm. WW participated in the design of the study, interpreted the results and wrote the manuscript. All authors read and approved the final manuscript. Acknowledgements We thank Jan Ihmels for kindly providing lists of proteins with the same expression pattern. WAM is supported by a NIH training grant (5 T32DK07233). TH is supported by a postdoctoral scholarship from the NSF PFC-sponsored Center for Theoretical Biological Physics (Grants No. PHY-0216576 and PHY-0225630) at UCSD. We thank Robert Shoemaker, Li Shen, Xiaolong Yu, and Jie Liu for helpful discussions. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||
Nature. 2002 Nov 14; 420(6912):218-23.
[Nature. 2002]J Mol Biol. 2001 Apr 6; 307(4):1113-43.
[J Mol Biol. 2001]Genome Res. 2004 Mar; 14(3):343-53.
[Genome Res. 2004]Cell. 1988 Aug 26; 54(5):659-64.
[Cell. 1988]Science. 2003 Apr 18; 300(5618):445-52.
[Science. 2003]Nature. 1999 Nov 4; 402(6757):86-90.
[Nature. 1999]Science. 1999 Jul 30; 285(5428):751-3.
[Science. 1999]J Mol Biol. 2004 Feb 20; 336(3):809-23.
[J Mol Biol. 2004]Genome Res. 2004 Mar; 14(3):343-53.
[Genome Res. 2004]BMC Evol Biol. 2005 Mar 23; 5(1):24.
[BMC Evol Biol. 2005]Genome Biol. 2002 Oct 10; 3(11):RESEARCH0059.
[Genome Biol. 2002]Nat Biotechnol. 2000 Jun; 18(6):609-13.
[Nat Biotechnol. 2000]Curr Opin Struct Biol. 2000 Jun; 10(3):359-65.
[Curr Opin Struct Biol. 2000]Curr Opin Chem Biol. 2001 Feb; 5(1):46-50.
[Curr Opin Chem Biol. 2001]Microb Comp Genomics. 1998; 3(3):177-92.
[Microb Comp Genomics. 1998]Proc Natl Acad Sci U S A. 1999 Apr 13; 96(8):4285-8.
[Proc Natl Acad Sci U S A. 1999]Bioinformatics. 2004 Dec 12; 20(18):3710-5.
[Bioinformatics. 2004]Bioinformatics. 2004 Dec 12; 20(18):3710-5.
[Bioinformatics. 2004]Nature. 2000 Jul 6; 406(6791):90-4.
[Nature. 2000]Proc Natl Acad Sci U S A. 2002 Mar 19; 99(6):3746-51.
[Proc Natl Acad Sci U S A. 2002]BMC Genomics. 2005 Jun 10; 6(1):90.
[BMC Genomics. 2005]Proc Natl Acad Sci U S A. 2005 Sep 20; 102(38):13532-7.
[Proc Natl Acad Sci U S A. 2005]Genes Dev. 1991 Dec; 5(12B):2392-404.
[Genes Dev. 1991]Curr Biol. 2000 Aug 24; 10(16):R586-8.
[Curr Biol. 2000]Genes Dev. 2003 Jul 15; 17(14):1789-802.
[Genes Dev. 2003]Genome Res. 2004 Mar; 14(3):343-53.
[Genome Res. 2004]BMC Evol Biol. 2005 Mar 23; 5(1):24.
[BMC Evol Biol. 2005]Gene. 1997 Jan 15; 184(2):229-35.
[Gene. 1997]Proc Natl Acad Sci U S A. 1997 Mar 18; 94(6):2233-7.
[Proc Natl Acad Sci U S A. 1997]Nature. 1992 Apr 2; 356(6368):408-14.
[Nature. 1992]Cell. 1988 Aug 26; 54(5):659-64.
[Cell. 1988]Microbiol Mol Biol Rev. 1999 Jun; 63(2):479-506.
[Microbiol Mol Biol Rev. 1999]J Mol Biol. 1999 Oct 22; 293(2):381-99.
[J Mol Biol. 1999]EMBO J. 2003 Feb 3; 22(3):427-37.
[EMBO J. 2003]Biochemistry. 1998 May 5; 37(18):6465-75.
[Biochemistry. 1998]Proc Natl Acad Sci U S A. 2003 Oct 28; 100(22):12543-7.
[Proc Natl Acad Sci U S A. 2003]Cell Mol Life Sci. 2004 Apr; 61(7-8):930-44.
[Cell Mol Life Sci. 2004]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Bioinformatics. 2004 Sep 1; 20(13):1993-2003.
[Bioinformatics. 2004]Bioinformatics. 2005 Apr 15; 21(8):1592-5.
[Bioinformatics. 2005]Bioinformatics. 2004 Dec 12; 20(18):3710-5.
[Bioinformatics. 2004]Nucleic Acids Res. 2004; 32(21):6414-24.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2004 Mar 2; 101(9):2888-93.
[Proc Natl Acad Sci U S A. 2004]J Biol Chem. 1998 May 22; 273(21):12685-8.
[J Biol Chem. 1998]Mol Biol Cell. 2003 Aug; 14(8):3266-79.
[Mol Biol Cell. 2003]RNA. 2001 Sep; 7(9):1317-34.
[RNA. 2001]Yeast. 2004 Apr 30; 21(6):463-71.
[Yeast. 2004]RNA. 1998 Apr; 4(4):351-64.
[RNA. 1998]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D311-4.
[Nucleic Acids Res. 2004]Nucleic Acids Res. 1999 Aug 1; 27(15):3001-8.
[Nucleic Acids Res. 1999]Mol Cell. 2003 Dec; 12(6):1565-76.
[Mol Cell. 2003]Genes Dev. 2002 Mar 15; 16(6):659-72.
[Genes Dev. 2002]Mol Gen Genet. 1997 Oct; 256(4):376-86.
[Mol Gen Genet. 1997]Mol Cell Biol. 2002 Jul; 22(13):4723-38.
[Mol Cell Biol. 2002]J Biol Chem. 2000 May 5; 275(18):13895-900.
[J Biol Chem. 2000]Curr Genet. 2005 Jan; 47(1):1-17.
[Curr Genet. 2005]Nature. 2002 Jun 27; 417(6892):967-70.
[Nature. 2002]Nature. 2002 Jan 10; 415(6868):180-3.
[Nature. 2002]Nature. 2004 Jan 15; 427(6971):256-60.
[Nature. 2004]Genome Res. 1999 Jan; 9(1):17-26.
[Genome Res. 1999]Genome Biol. 2001; 2(9):RESEARCH0034.
[Genome Biol. 2001]Nature. 1999 Nov 4; 402(6757):86-90.
[Nature. 1999]Science. 1999 Jul 30; 285(5428):751-3.
[Science. 1999]J Mol Biol. 2004 Feb 20; 336(3):809-23.
[J Mol Biol. 2004]Science. 2003 Jun 13; 300(5626):1701-3.
[Science. 2003]Biol Chem. 2000 May-Jun; 381(5-6):377-87.
[Biol Chem. 2000]Cell. 2006 Nov 17; 127(4):721-33.
[Cell. 2006]Genome Res. 2005 Jun; 15(6):867-74.
[Genome Res. 2005]Genome Res. 2004 Mar; 14(3):343-53.
[Genome Res. 2004]BMC Evol Biol. 2005 Mar 23; 5(1):24.
[BMC Evol Biol. 2005]Curr Opin Struct Biol. 2004 Apr; 14(2):208-16.
[Curr Opin Struct Biol. 2004]Bioinformatics. 2007 Jan 1; 23(1):77-83.
[Bioinformatics. 2007]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D115-9.
[Nucleic Acids Res. 2004]Genome Res. 2002 Oct; 12(10):1611-8.
[Genome Res. 2002]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D138-41.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2000 Jul 18; 97(15):8409-14.
[Proc Natl Acad Sci U S A. 2000]Proc Natl Acad Sci U S A. 2000 Aug 29; 97(18):10101-6.
[Proc Natl Acad Sci U S A. 2000]Bioinformatics. 2004 Dec 12; 20(18):3710-5.
[Bioinformatics. 2004]Nat Genet. 2002 Aug; 31(4):370-7.
[Nat Genet. 2002]Bioinformatics. 2004 Sep 1; 20(13):1993-2003.
[Bioinformatics. 2004]