![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||
Copyright © The Author 2008. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org Context-dependent DNA recognition code for C2H2 zinc-finger transcription factors Department of Genetics, Washington University School of Medicine, 660 S Euclid, Box 8232, St. Louis, MO 63110, USA *To whom correspondence should be addressed. Associate Editor: Ivo Hofacker Received March 2, 2008; Revised June 4, 2008; Accepted June 24, 2008. This article has been cited by other articles in PMC.Abstract Motivation: Modeling and identifying the DNA-protein recognition code is one of the most challenging problems in computational biology. Several quantitative methods have been developed to model DNA-protein interactions with specific focus on the C2H2 zinc-finger proteins, the largest transcription factor family in eukaryotic genomes. In many cases, they performed well. But the overall the predictive accuracy of these methods is still limited. One of the major reasons is all these methods used weight matrix models to represent DNA-protein interactions, assuming all base-amino acid contacts contribute independently to the total free energy of binding. Results: We present a context-dependent model for DNA–zinc-finger protein interactions that allows us to identify inter-positional dependencies in the DNA recognition code for C2H2 zinc-finger proteins. The degree of non-independence was detected by comparing the linear perceptron model with the non-linear neural net (NN) model for their predictions of DNA–zinc-finger protein interactions. This dependency is supported by the complex base-amino acid contacts observed in DNA–zinc-finger interactions from structural analyses. Using extensive published qualitative and quantitative experimental data, we demonstrated that the context-dependent model developed in this study can significantly improves predictions of DNA binding profiles and free energies of binding for both individual zinc fingers and proteins with multiple zinc fingers when comparing to previous positional-independent models. This approach can be extended to other protein families with complex base-amino acid residue interactions that would help to further understand the transcriptional regulation in eukaryotic genomes. Availability:The software implemented as c programs and are available by request. http://ural.wustl.edu/softwares.html Contact: stormo/at/ural.wustl.edu 1 INTRODUCTION The specific interaction between transcription factors and their cognate DNA sites is critical for regulation of gene expression in cells. Identifying the rules that govern the relationship between the amino acid sequence of a transcription factor (TF) and its binding site specificity would be of great utility in molecular biology and has been sought after for many years (Pabo and Sauer, 1984; Seeman et al., 1976). However, unraveling the recognition code that specifies the amino acid-base interactions remains a very challenging problem. Early studies primarily tried to deduce a qualitative binding code from the solved crystal structures of DNA-protein complexes, but it soon became clear that there is no simple, universal recognition code (Matthews, 1988). More recently several groups have developed methods to infer quantitative codes, where the goal is to model the binding energies to many different DNA sequences based on the protein sequence (Benos et al., 2002; Kaplan et al., 2005; Kono and Sarai, 1999; Mandel-Gutfreund and Margalit, 1998; Suzuki and Yagi, 1994). Gutfreund and Margalit used the data from 53 co-crystals and a simple log-odds scoring system to generate the base-amino acid interaction weight matrix model (Mandel-Gutfreund and Margalit, 1998), while Keno and Sarai derived pairwise potentials between base and amino acid by a statistical analysis of 52 protein-DNA complex structures (Kono and Sarai, 1999). Both studies assumed similar base-amino acid preferences for all proteins and at all binding positions. However, structural analysis of protein-DNA complexes clearly showed that these two assumptions are oversimplified (Choo and Klug, 1997; Luscombe et al., 2000). Suzuki and Yagi developed a model that took both position-specific interactions and DNA binding geometries for proteins that belong to different protein families into account (Suzuki and Yagi, 1994), but with limited structural data and a simple, empirical scoring system this approach still has limited accuracy. An alternative approach is to learn the recognition code from extensive in vitro selection data. Two groups have developed sophisticated statistical methods (Benos et al., 2002; Kaplan et al., 2005) to model DNA-protein interactions with specific focus on the single protein family, C2H2 zinc-finger proteins. Based on statistical mechanics theory, Benos et al. developed an algorithm to estimate the probabilistic code for zinc-finger proteins (Benos et al., 2002). Kaplan et al. employed the expectation maximization (EM) algorithm to optimize the model for DNA–zinc-finger interactions (Kaplan et al., 2005). Both methods significantly improved the predictions of DNA-protein interactionscompared to previous methods. In many cases, they can accurately predict DNA binding sites for given proteins. However, the overall accuracy of their predictions is still limited for at least two reasons. One is that there are limited data upon which to infer the model parameters. And the other is that both methods assumed the positional independence for DNA–zinc-finger protein interactions. Methods based on the independence assumption are simple, with a small numbers of parameters, making them easy to implement, but their predictions are limited by the degree of validity this assumption. Benos et al. have shown that the positional independence can be a reasonably good approximation for the DNA sites based on their analysis of a large set of affinity data for five zinc-finger proteins (Benos et al., 2002), but several studies have indicated that positional correlations do exist among the zinc-finger protein residue positions (Elrod-Erickson and Pabo, 1999; Michael Gromiha et al., 2004; Miller and Pabo, 2001), or over the base-amino acid contact positions (Liu and Stormo, 2005). Because the assumption of positional independence is likely to be oversimplified, we developed the context dependent models for DNA–zinc-finger protein interactions described in this work. One approach that takes the positional dependency into account is to extend the position-independent weight matrix model by adding extra parameters to capture interactions between positions (Barash et al., 2003; Zhou and Liu, 2004). However, only considering dependencies between adjacent amino acid residues requires nearly 8000 additional parameters for just a single zinc-finger. Such an approach is not currently possible because of the limited experimental data. We proposed to overcome this difficulty by using the non-linear neural net (NN) model to represent DNA–zinc-finger interactions. NNs are structured computational models with a long history in pattern recognition that have been extensively used in biology for such tasks as identifying signal peptides (Bendtsen et al., 2004), predicting protein secondary structure (Qian and Sejnowski, 1988), characterizing the yeast transcriptional network (Hart et al., 2006) and analyzing the DNA-binding proteins and their binding residues (Ahmad et al., 2004). When comparing models for binary predictions, we found that the non-linear NN models significantly outperformed the linear perceptron model (equivalent to a weight matrix), suggesting that the positional dependency is involved in DNA–zinc-finger interactions. The structures of DNA–zinc-finger protein complexes include a set of non-canonical zinc fingers that differ from the simple set of interactions used for the positional independent model and probably contribute to the limited accuracy of its predictions. Using the NN model, we can predict DNA binding profiles for any given a C2H2 zinc-finger protein. By comparing our predictions with a large collection of published experimental data and those predicted by previous methods (Benos et al., 2002; Kaplan et al., 2005), we demonstrate that the integration of the positional dependency for modeling DNA–zinc-finger interactions can significantly improve predictive performance. The C2H2 zinc-finger protein is the largest TF family in all completely sequenced eukaryotic model genomes. For instance, about 30% of all TFs in the human genome are C2H2 zinc-finger proteins (Messina et al., 2004). Various zinc-finger proteins have been demonstrated to play essential roles in regulating different biological processes, including cell growth, differentiation, development and tumorigenesis through their selectively binding to particular DNA sites in the genome (Wolfe et al., 2000, 2001). An improvement in the recognition code for zinc-finger proteins will enhance our ability to identify target genes for specific zinc-finger TFs and our modeling of the regulation of gene expression in eukaryotes. 2 METHODS 2.1 Datasets The datasets used in this study include positive interaction data and negative non-interaction data. The interaction data were initially collected by Benos et al. (2002) from the published in vitro selection experiments for variants of the EGR proteins. There are a total of 1033 instances where each interaction pair contains a 10 bp long DNA site and the amino acid residues for three recognition helixes in EGR proteins. With these raw data and the DNA binding model as shown in Figure 1
2.2 Sequence representation and transformation For a binding site of length L, each DNA base N in the target site N1..NL is encoded with 4 binary digits, a = (0001), c = (0010), g = (0100), and t = (1000), while each amino acid residue A in the interacting amino acid residues, A1..AL, is represented in the similar way using the corresponding 20 binary digits. Each pair of amino acid-base, AN1..ANL, is represented in a similar way with 80 binary digits with a single 1 and the rest 0.2.3 Modeling base-amino acid residue preferences for DNA–zinc-finger interactions With the canonical binding model for zinc fingers as shown in Figure 1 , for 1≤i≤L, which forms the input vector for the network model Λ. There is a weight vector, , that assigns a weight to each element of the input vector. The output of the single output unit of the network, o(SAN W,Λ), for the given weights W of the network model Λ, is computed through a feed forward step with the sigmoid function:
To consider the context dependence between DNA–zinc-finger protein interactions, we employed a two-layer neural network to model DNA–zinc-finger interaction. While keeping the same structure for both input layer and output layer as those in the perceptron model, we added a hidden layer with a varied number of hidden units between them in the neural network model to capture the positional dependent interactions that were not counted by the perceptron model. We now have weights between the input vectors, , and each of the j hidden nodes, , and from each hidden node to the output node, The outputs for each hidden node, and for the final output node, are computed using the same scoring procedure. The model for DNA–zinc-finger protein interactions are optimized by minimizing the sum of errors (E2) between the target value and the computed network output for all training examples using bckpropagation algorithm (Mitchell, 1997; Rumelhart, 1994). The program package ZifNet used to build perceptron and neural network models were written in the C program language and are available upon request.2.4 Cross validation to estimate model parameters We used the cross validation procedure to optimize our models while the predictive performances were examined simultaneously. Each dataset consisting of both positive and negative data was randomly partitioned into three parts. 80% of the dataset was used to train the network model, 80% of the remaining dataset was used as the validation set to monitor an appropriate stopping point for gradient descent, and the remaining data was used to measure the prediction performances. The randomized partitions were repeated six times. The average of their predictive performances was used to assess the model performance. The predictive performance was measured with accuracy, sensitivity and specificity with the formulae shown below. We used the network output value 0.81 and 0.11 as stringent cut offs for positive and negative pairs, respectively. Network outputs between those values are always considered false predictions.
2.5 Prediction of DNA binding models for C2H2 zinc-finger proteins To estimate DNA binding profiles for a given zinc-finger, we first used the NN model to compute the network output scores for all possible 64 triplet sites. The value of the output unit of the network model, o(A1..AL,N1,..NL;W,Λ), for the given the binary classification model Λ and its weight W, is bounded between 0 and 1 and is interpreted as the probability of binding, P(bound AN1..ANL) (28). We chose the top 12 sites (20%) to calculate its weight matrix model using the formula below where a pseudo-count was introduced, as the additivity model was demonstrated to hold well for the side of DNA site for DNA–zinc-finger interactions (Benos et al., 2002).
2.6 Assessment of the predictions 2.6.1 Compare predicted DNA binding profiles with experimentally determined profiles. The DNA binding constants (Ka) for five zinc-finger proteins were downloaded from the website http://arep.med.harvard.edu/Bulyk/NAR2002supplementary/ (Bulyk et al., 2002). While the predictive DNA binding profiles for the 5 proteins were performed as described above, the experimentally determined profile represented as the probability of binding for base b at position i, P(b,i), for each protein was calculated by the following formulae
2.6.2 Assessment of different models with quantitative binding affinity data. We collected 9 sets of binding constants (Ka) for 31 different zinc fingers (Bulyk et al., 2001; Elrod-Erickson and Pabo, 1999; Hamilton et al., 1998; Liu and Stormo, 2005; Segal et al., 1999). For the datasets from Bulyk et al. (2001), only binding constants of the preferred DNA binding sites for each of 5 proteins were used for assessment. We used the correlation coefficient between the experimentally determined energy differences and those predicted by different models for DNA–zinc-finger interactions to compare our model with the existing models. For any given protein sequence, each model predicts the binding energy (proportional to the output score) for all possible binding site sequences, which are used in the comparisons to the experimental energies. 2.6.3 Comparison of predicted DNA binding profiles for zinc-finger proteins with multiple fingers with those in TRANSFAC database. To predict DNA bin-ding profiles for zinc-finger proteins with multiple fingers, we first determined the number of zinc-finger domains, and the key residues at positions of -1, +3 and +6 in each domain with the zinc-finger HMM model (Finn et al., 2006). After prediction of DNA binding profile for each individual finger with the method as described above, we assemble them together from C-terminal to N-terminal, as binding of zinc fingers to DNA sites follows the anti-parallel fashion (Elrod-Erickson et al., 1996, 1998). The assembled DNA binding profiles were then used to compare with those in TRANSFAC database. 3 RESULTS 3.1 Context dependencies in DNA–zinc-finger interactions C2H2 zinc-finger proteins typically contain multiple fingers that make tandem contacts along the DNA. Since most zinc-finger proteins are believed to bind DNA in a modular fashion (Choo and Klug, 1997; Elrod-Erickson et al., 1996), we model DNA binding specificities for individual zinc fingers. In previous studies, Benos et al. and Kaplan et al. have developed context-independent models to estimate DNA recognition preferences of C2H2 zinc-finger proteins (Benos et al., 2002; Kaplan et al., 2005) based on the canonical binding model of the DNA-protein complex of EGR1 (Elrod-Erickson et al., 1996; Pavletich and Pabo, 1991). According to this model (Fig. 1 Using a cross validation procedure (described in materials and methods) we optimized models for DNA–zinc-finger interactions while the model errors and performances were simultaneously assessed. Figure 2 = 0.01), sensitivity (P-value = 0.04) and specificity (P-value = 0.01). This indicates that interactions between base-amino acid contacting positions contribute to the affinity between the DNA sites and the zinc fingers.
3.2 Physical basis of context dependencies for DNA–zinc-finger protein interactions The inter-positional dependencies for DNA–zinc-finger interactions are consistent with the observed structures in many complexes. The program HBPLUS (Nucplot package) (Luscombe et al., 2000, 1998) was used to extract amino acid-DNA base contacts from each of more than 20 co-crystals of DNA-C2H2 zinc-finger protein complexes collected from the PDB database. Analysis of these structures indicated there are many variations from the canonical zinc fingers (Elrod-Erickson et al., 1998; Elrod-Erickson and Pabo, 1999; Wolfe et al., 2000) as shown in Figure 1
3.3 Estimation of DNA triplet binding profiles for individual C2H2 zinc fingers While structural studies indicate that four amino acids from each finger may interact with four positions in the binding site, most phage display experiments were screened against various DNA triplets in the context of the zif268 binding site (Choo and Klug, 1994; Segal et al., 1999). This leads to a distinct lack of variability in the datasets for the bases in the fourth position of the canonical model (5′−>3′) (Fig. 1 Figure 4
Figure 5
3.4 Prediction of DNA-binding profiles for multiple zinc-finger transcription factors Using the NN model for individual zinc fingers, we can predict the binding specificities of TFs with multiple fingers by simply concatenating the individual predictions together following the direction from C-terminal to N-terminal (Elrod-Erickson et al., 1998, 1996, Elrod-Erickson et al., 1998). For any given a zinc-finger protein, we first used the pFAM zinc-finger HMM model (Finn et al., 2006) to determine the number of zinc-finger domains, and the key residues at positions −1, +3 and +6 responsible for recognition of DNA bases. To examine the reliability of this approach, we compared the predicted specificities to the DNA-binding profiles from the TRANSFAC database, which is a repository for transcription factors and their sites from many eukaryotes (Matys et al., 2006). It includes 48 non-redundant weight matrices for C2H2 zinc-finger TFs with numbers of zinc-finger domains ranging from 2 to 20. We chose all matrices based on at least 6 binding sites and with 2, 3 or 4 zinc fingers. The predicted sequence logos and those directly from the TRANSFAC database are shown in Figure 6
4 DISCUSSION We have presented a general model for DNA–zinc-finger protein interactions that is capable of estimating DNA binding specificities for C2H2 zinc-finger TFs. In comparison to previous quantitative models (Benos et al., 2002; Kaplan et al., 2005), this model takes the context dependency into account. Evaluation with a large set of qualitative and quantitative experimental data demonstrates that the integration of context dependency for modeling DNA-protein interactions can improve predictive accuracy. Independence between positions is an assumption that is widely used in computational approaches that model binding sites and DNA-protein interactions, but the accuracy of this approximation remains controversial (Benos et al., 2002; Michael Gromiha et al., 2004; O'Flanagan et al., 2005; Tomovic and Oakeley, 2007). A typical method to assess positional independence is to statistically compare a set of experimentally measured free energies of binding (or binding affinities) with those estimated by either context independent or dependent models (Benos et al., 2002; Liu and Stormo, 2005; O'Flanagan et al., 2005; Tomovic and Oakeley, 2007). In this study we compare the perceptron model, which assumes additivity between the binding site positions, with the two-layer NN model which can incorporate non-independence between the positions. We found that the NN model significantly outperformed the perceptron model in a cross-validation study, suggesting that dependence between positions contributes significantly in the DNA recognition by zinc-finger proteins. This finding agrees with many previous mutagenesis studies, quantitative binding affinity assays and structural analyses (Elrod-Erickson and Pabo, 1999; Liu and Stormo, 2005; Michael Gromiha et al., 2004; Miller and Pabo, 2001; Wolfe et al., 2000). As shown in Figures 1 While our model attained reasonably good predictions, there is still ample room for improvement. Just like previous methods, we know that our current model is also limited by lack of sufficient training data, and in particular would benefit from quantitative binding data. Additionally, due to the biased training data collected from phage display experiments, we could not consider the contribution of residues at position +2 to DNA binding for both individual fingers and proteins with multiple fingers. Although the specific role of position +2 in sequence specificity has not been well understood (Wolfe et al., 2000), ignoring the contribution of residues at position +2 in the modeling of DNA–zinc-finger protein interaction is obviously over-simplified. With the development of high-throughput dsDNA microarray technology, bacterial one-hybrid systems, and improved SELEX methods (Bulyk et al., 2001; Liu and Stormo, 2005; Meng et al., 2005; Roulet et al., 2002), binding site data is becoming increasingly easy and inexpensive to obtain, which will lead to more accurate modeling that will facilitate the understanding the regulation of gene expression. ACKNOWLEDGEMENTS We thank David Granas for his examination on the distribution of key amino acid residues of zinc-finger proteins in Pfam dataset. We also thank Ryan Christensen for his independent tests of our prediction performance and comparisons with other methods. Funding: This work was supported by NIH grant HG00249 to G.D.S. Conflict of Interest: none declared. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||
Annu Rev Biochem. 1984; 53():293-321.
[Annu Rev Biochem. 1984]Proc Natl Acad Sci U S A. 1976 Mar; 73(3):804-8.
[Proc Natl Acad Sci U S A. 1976]Nature. 1988 Sep 22; 335(6188):294-5.
[Nature. 1988]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]PLoS Comput Biol. 2005 Jun; 1(1):e1.
[PLoS Comput Biol. 2005]Proteins. 1999 Apr 1; 35(1):114-31.
[Proteins. 1999]Nucleic Acids Res. 1998 May 15; 26(10):2306-12.
[Nucleic Acids Res. 1998]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]PLoS Comput Biol. 2005 Jun; 1(1):e1.
[PLoS Comput Biol. 2005]J Biol Chem. 1999 Jul 2; 274(27):19281-5.
[J Biol Chem. 1999]J Mol Biol. 2004 Mar 19; 337(2):285-94.
[J Mol Biol. 2004]J Mol Biol. 2001 Oct 19; 313(2):309-15.
[J Mol Biol. 2001]Bioinformatics. 2004 Apr 12; 20(6):909-16.
[Bioinformatics. 2004]J Mol Biol. 2004 Jul 16; 340(4):783-95.
[J Mol Biol. 2004]J Mol Biol. 1988 Aug 20; 202(4):865-84.
[J Mol Biol. 1988]PLoS Comput Biol. 2006 Dec 22; 2(12):e169.
[PLoS Comput Biol. 2006]Bioinformatics. 2004 Mar 1; 20(4):477-86.
[Bioinformatics. 2004]Genome Res. 2004 Oct; 14(10B):2041-7.
[Genome Res. 2004]Annu Rev Biophys Biomol Struct. 2000; 29():183-212.
[Annu Rev Biophys Biomol Struct. 2000]Structure. 2001 Aug; 9(8):717-23.
[Structure. 2001]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Mar 1; 30(5):1255-61.
[Nucleic Acids Res. 2002]Proc Natl Acad Sci U S A. 2001 Jun 19; 98(13):7158-63.
[Proc Natl Acad Sci U S A. 2001]J Biol Chem. 1999 Jul 2; 274(27):19281-5.
[J Biol Chem. 1999]Biochemistry. 1998 Feb 17; 37(7):2051-8.
[Biochemistry. 1998]Nucleic Acids Res. 2005 Sep 25; 33(17):e141.
[Nucleic Acids Res. 2005]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2758-63.
[Proc Natl Acad Sci U S A. 1999]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Structure. 1996 Oct 15; 4(10):1171-80.
[Structure. 1996]Structure. 1998 Apr 15; 6(4):451-64.
[Structure. 1998]Curr Opin Struct Biol. 1997 Feb; 7(1):117-25.
[Curr Opin Struct Biol. 1997]Structure. 1996 Oct 15; 4(10):1171-80.
[Structure. 1996]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]PLoS Comput Biol. 2005 Jun; 1(1):e1.
[PLoS Comput Biol. 2005]Science. 1991 May 10; 252(5007):809-17.
[Science. 1991]Acta Crystallogr D Biol Crystallogr. 1998 Nov 1; 54(Pt 6 Pt 1):1132-8.
[Acta Crystallogr D Biol Crystallogr. 1998]Structure. 1998 Apr 15; 6(4):451-64.
[Structure. 1998]J Biol Chem. 1999 Jul 2; 274(27):19281-5.
[J Biol Chem. 1999]Annu Rev Biophys Biomol Struct. 2000; 29():183-212.
[Annu Rev Biophys Biomol Struct. 2000]Nature. 1993 Dec 2; 366(6454):483-7.
[Nature. 1993]Proc Natl Acad Sci U S A. 1994 Nov 8; 91(23):11168-72.
[Proc Natl Acad Sci U S A. 1994]Proc Natl Acad Sci U S A. 1999 Mar 16; 96(6):2758-63.
[Proc Natl Acad Sci U S A. 1999]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2005 Jul 1; 33(Web Server issue):W389-92.
[Nucleic Acids Res. 2005]Proc Natl Acad Sci U S A. 2001 Jun 19; 98(13):7158-63.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]PLoS Comput Biol. 2005 Jun; 1(1):e1.
[PLoS Comput Biol. 2005]Nucleic Acids Res. 2002 Mar 1; 30(5):1255-61.
[Nucleic Acids Res. 2002]Proc Natl Acad Sci U S A. 2001 Jun 19; 98(13):7158-63.
[Proc Natl Acad Sci U S A. 2001]J Biol Chem. 1999 Jul 2; 274(27):19281-5.
[J Biol Chem. 1999]Biochemistry. 1998 Feb 17; 37(7):2051-8.
[Biochemistry. 1998]Nucleic Acids Res. 2005 Sep 25; 33(17):e141.
[Nucleic Acids Res. 2005]Structure. 1998 Apr 15; 6(4):451-64.
[Structure. 1998]Structure. 1996 Oct 15; 4(10):1171-80.
[Structure. 1996]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D247-51.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D108-10.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]PLoS Comput Biol. 2005 Jun; 1(1):e1.
[PLoS Comput Biol. 2005]Nucleic Acids Res. 2002 Oct 15; 30(20):4442-51.
[Nucleic Acids Res. 2002]J Mol Biol. 2004 Mar 19; 337(2):285-94.
[J Mol Biol. 2004]Bioinformatics. 2005 May 15; 21(10):2254-63.
[Bioinformatics. 2005]Bioinformatics. 2007 Apr 15; 23(8):933-41.
[Bioinformatics. 2007]Nucleic Acids Res. 2005 Sep 25; 33(17):e141.
[Nucleic Acids Res. 2005]Proc Natl Acad Sci U S A. 1998 Mar 31; 95(7):3431-6.
[Proc Natl Acad Sci U S A. 1998]EMBO J. 1996 Sep 16; 15(18):4992-5000.
[EMBO J. 1996]Genome Biol. 2007; 8(5):R84.
[Genome Biol. 2007]Annu Rev Biophys Biomol Struct. 2000; 29():183-212.
[Annu Rev Biophys Biomol Struct. 2000]Proc Natl Acad Sci U S A. 2001 Jun 19; 98(13):7158-63.
[Proc Natl Acad Sci U S A. 2001]Nucleic Acids Res. 2005 Sep 25; 33(17):e141.
[Nucleic Acids Res. 2005]Nat Biotechnol. 2005 Aug; 23(8):988-94.
[Nat Biotechnol. 2005]Nat Biotechnol. 2002 Aug; 20(8):831-5.
[Nat Biotechnol. 2002]