![]() | ![]() |
Formats:
|
||||||||||||||||
Tools for Interpreting Large-Scale Protein Profiling in Microbiology 1 Department of Chemical Engineering, University of Washington, Box 355014, Seattle, WA, 98195, USA 2 Department of Microbiology, University of Washington, Box 355014, Seattle, WA, 98195, USA 3 Department of Oral Biology and Center for Molecular Microbiology, College of Dentistry, Box 100424 JHMHSC, University of Florida, Gainesville, FL 32610, USA *corresponding author, Email: mhackett/at/u.washington.edu Contact information: Dr. Murray Hackett, Department of Chemical Engineering, Box 355014 University of Washington, Seattle, WA 98195, USA, Telephone: (206) 616-8071, Fax: (206) 616-5721 The publisher's final edited version of this article is available at J Dent Res. See other articles in PMC that cite the published article.Abstract Quantitative proteome analysis of microbial systems generates large datasets that can be difficult and time consuming to interpret. Fortunately, many of the data display and gene clustering tools developed to analyze large transcriptome microarray datasets are also applicable to proteomes. Plots of abundance ratio versus total signal or spectral counts can highlight regions of random error and putative change. Displaying data in the physical order of the genes in the genome sequence can highlight potential operons. At a basic level of transcriptional organization, identifying operons can give insights into regulatory pathways as well as provide corroborating evidence for proteomic results. Classification and clustering algorithms can group proteins together by their abundance changes under different conditions, helping to identify interesting expression patterns, but often work poorly with noisy data like that typically generated in a large-scale proteome analysis. Biological interpretation can be aided more directly by overlaying differential protein abundance data onto metabolic pathways, indicating pathways with altered activities. More broadly, ontology tools detect altered levels of protein abundance for different metabolic pathways, molecular functions and cellular localizations. In practice, pathway analysis and ontology are limited by the level of database curation associated with the organism of interest. Keywords: DAVID, Gene Ontology, GoMiner, bioinformatics, proteomics, Porphyromonas gingivalis, Methanococcus maripaludis, protein profiling, protein expression INTRODUCTION The advent of cDNA microarrays, allowing genome wide transcriptional analysis, has resulted in a large increase in the volume of data available to researchers. Large datasets, while providing tremendous amounts of new information, pose problems for efficient analysis. The result has been the development of programs and techniques for analyzing microarray data. Until recently the complexities of dealing with large datasets were not as much of a concern for proteomics, which was limited by the number of proteins that could be resolved by a 2D gel. Approaches based on the analysis of proteolytic digests by 2D capillary HPLC coupled with tandem mass spectrometry (Washburn et al., 2001, 2002) identify more proteins with more mass spectra per protein, yielding a greater level of proteome coverage and sampling depth than can be accomplished with gel electrophoresis, thus producing microarray size datasets. The LC/MS based approaches have been particularly useful for microbial systems, including those of interest to oral biologists (Lamont et al., 2006). Manual in-depth inspection coupled with literature research is indispensable when interpreting whole genome data. However, data analysis tools can make the process easier and more efficient. The field of genome wide data analysis is large and a discussion of every available program and technique is beyond the scope of this review. Rather our goal is to provide a critical introduction to the types of data analysis tools available that would be of interest to researchers exploring proteomics data from microbial species, with an emphasis on those we have actually used. We describe the tools roughly in order of increasing complexity. In order to give real world examples, we have used data from studies of two organisms, Methanococcus maripaludis S2 and Porphyromonas gingivalis ATCC 33277. M. maripaludis is an anaerobic Archaeon that obtains energy through the methanogenesis pathway, which reduces CO2 to methane employing H2 as an electron donor (Hendrickson et al., 2004). Our ongoing studies of M. maripaludis have implications for global carbon cycles, alternative energy sources, and an overall understanding of Archaeal biology. P. gingivalis is a Gram-negative facultative intracellular bacterial pathogen associated with periodontal disease and potentially with serious systemic conditions (Lamont and Jenkinson, 1998). P. gingivalis research, in the context of oral biofilms and the many other organisms present therein, provides a window into the role of microbial communities in human disease. These two organisms were chosen because of the authors’ familiarity with them but also as examples of microbial species commonly used in research that are not well-established model systems like Escherichia coli. As will be discussed, some tools are less tractable for less well-documented species. The tools discussed in this paper are primarily focused on extracting a biological interpretation from large datasets. Prior to looking for biological meaning, quantitative proteomics data needs to undergo several layers of processing. After initial acquisition with a mass spectrometer interfaced to some kind of high-resolution separations scheme (Washburn et al., 2001, 2002) the data need to be processed to assign peptide identifications to the fragments in a proteolytic digest (Eng et al., 1994), map the peptides back to their associated proteins (Tabb et al., 2002), and to determine the statistical validity of the results, both qualitatively and quantitatively (Keller et al., 2002; Nesvizhskii et al., 2004, Choi et al., 2008; Käll et al., 2008,). The earlier stages of data processing that create the input files for the tools reviewed here, with emphasis on the quantitative aspects, were recently reviewed with specific reference to microbial systems (Xia et al., 2007a). As with transcriptome microarrays, whole genome proteomics is primarily comparative, looking for a change in protein abundance between two or more conditions. From the initial stages of data processing one generates a list of proteins, with an abundance ratio relative to a reference condition, and a measure of the data’s statistical strength. Statistical measurements of the likelihood of an abundance difference, such as a p-value or a q-value (Storey and Tibshirani, 2003), are needed to properly evaluate the results. The list of proteins with accompanying abundance ratios and statistical measurements is virtually identical to the final results tabulated from transcriptome microarrays, though representing protein rather than RNA levels, and most of the statistical and bio-informatic tools and concerns are the same, even though the physical detection method and the biological basis of each method are completely different. The protein abundance ratio lists serve as inputs into the various computational tools described in this review. The point of view expressed by the authors is that of experimentalists in that we are users of data mining tools driven by biological questions, not statisticians or algorithm developers. Regarding nomenclature, we have followed the lead of most of the papers cited herein and used the terms “expression” and “abundance” somewhat interchangeably, keeping in mind that what is actually being measured is the net result of various biosynthetic and destructive processes that can contribute to a protein relative abundance measurement. PLOTS OF ABUNDANCE RATIO VERSUS SIGNAL OR SPECTRAL COUNTS Transcriptome microarray analysis has long visualized the overall quality and uncertainty of microarray results using M versus A plots. These are plots of the abundance ratio for each measurement, the M, against the overall signal strength of the measurement, the A (Quackenbush, 2002). Similar plots can be used for proteomics results, see Fig. 1A
PLOTS OF PROTEIN ABUNDANCE TRENDS IN GENOME ORDER The simplest analysis tool, sometimes referred to as “beads-on-a-string”, is shown in Fig. 1B HEAT MAPS A slightly different display method is that of “heat maps”. Heat maps display data in small cells colored to represent relative abundance values (Fig. 1C METABOLIC MAPS While displaying relative abundance results in genome order can highlight regulated operons, it does not provide any direct information about the functional differences between the tested conditions. Overlaying the abundance ratio data onto metabolic maps can make changes in metabolic processes readily apparent. As an example, we analyzed P. gingivalis proteomics data (Xia et al., 2007b) using BioCyc (available for online use or download from www.biocyc.org). For this example we used the web-based interface. Fig. 2
In addition to the metabolic map, BioCyc contains an extensive database and links to other databases to help analyze results. Clicking on any compound in a pathway will pull up a small map of the pathway results with the names of the compounds and proteins involved in the pathway. The small maps contain links to larger displays of the pathways, which in turn contain links to summary pages for all of the pathway’s substrates, products and enzymes. There are links to other summary pages and often other databases. Thus, while no pathway names appear on the metabolic map (Fig. 2 The major drawback to BioCyc, and the main difference with tools such as MEV, is the need for extensive curation prior to use. In order to map the data to a metabolic pathway in Biocyc two things need to occur. First, a metabolic map needs to exist. Only a limited number of organisms have pathway/genome databases in BioCyc and these are broken down into three levels of curation. The database for Escherichia coli K-12 is extensively curated and is the most accurate and complete database in BioCyc. Twenty other organisms (April, 2008) are what are termed Tier 2 and have limited manual curation. Most of the databases, including those for P. gingivalis W83 and M. maripaludis S2, are Tier 3 and solely computationally derived. Without manual curation, these pathway maps are incomplete, and often contain errors. In the case of M. maripaludis the metabolic map does not contain methanogenesis, its central metabolic pathway. As a result, BioCyc is presently of little use for examining M. maripaludis data. The second hurdle is that the data have to be correlated to the BioCyc database to be properly displayed. Fortunately, BioCyc accepts the genome sequence gene designations or ORF numbers, e.g. PG2181, and a simple tab delimited text file containing the gene designations and abundance ratios was used to upload the P. gingivalis data. While the metabolic map for P. gingivalis is not complete and is based on the genome of strain W83 rather than the experimental strain ATCC 33277, displaying the proteomics data on the map was still informative. Using BioCyc, we detected increased expression of thiamine biosynthetic genes in internalized versus control cells. Thiamine, also known as vitamin B1, is important for the pentose phosphate pathway (Schenk et al., 1998). P. gingivalis encodes proteins for the non-oxidative portion of the pentose phosphate pathway, which feeds into glycolysis. Both the non-oxidative pentose phosphate pathway and glycolysis showed some induction in internalized cells. These results were not as consistent as those for the thiamine biosynthetic pathway. This is not uncommon given the noisiness of the proteome data and the complexity of metabolic interactions. However, these data are consistent with the overall model of internalized cells experiencing higher nutrient levels, accompanied by an increase in metabolic activity and cell component synthesis (Xia et al., 2007b). A cautionary example of the kinds of artifacts often seen in this type of data analysis is illustrated by the BioCyc display of the Calvin cycle in P. gingivalis (Fig. 2 Another important consideration when dealing with web-based software resources like BioCyc is that they are constantly being updated. As this review goes to press, the P. gingivalis map has been changed and no longer shows the Calvin cycle. Such changes can appear without citation or other explanatory material, so the burden is often completely on the user to assess the accuracy of the pathway annotation. For researchers interested in oral pathogens another source of metabolic maps, though not one that can overlay proteomics data, can be found at Los Alamos National Laboratory (http://www.oralgen.lanl.gov/). The LANL site contains a searchable database for oral pathogens including genome displays, sequence searches, and metabolic pathways drawn from the KEGG (Kyoto Encyclopedia of Genes and Genomes) database (http://genome.jp/kegg). For P. gingivalis the Los Alamos database had fewer errors than BioCyc but did not seem to be as complete. The Calvin cycle, mistakenly presented in BioCyc, was not listed among the metabolic pathways in the Los Alamos P. gingivalis database. However, when comparing the thiamine biosynthesis pathway discussed above we found that only a subset of the proteins displayed as part of the pathway by BioCyc appeared on the same pathway at the Los Alamos site. The missing proteins were in the database, and their description included the fact that they are part of thiamine biosynthesis, but they simply did not appear on the thiamine biosynthesis pathway display. This shows an advantage of using multiple databases and display programs. They provide an opportunity to cross check databases against each other, helping to overcome the weaknesses in any one pathway tool. CLASSIFICATION AND CLUSTERING The graphical displays described above are primarily useful for examining individual experiments. Researchers dealing with experiments across multiple conditions are often interested in the patterns of abundance ratios across the experiments and grouping results with similar patterns. The expectation is that similarity with respect to protein levels is often indicative of similarity with respect to function (Eisen et al., 1998). Mathematical methods for grouping similar results have a long history in statistics and the transcriptome microarray community has borrowed and built on these methods to help identify patterns and groups of co-expressed genes from their large datasets. The result has been the development of a large number of algorithms for this purpose (Raychaudhuri et al., 2001; Chen et al., 2002; Boutros and Okey, 2005; Allison et al., 2006; Datta and Datta, 2006). These methods fall into two categories, classification and clustering, which can also be described as supervised and unsupervised methods, respectively (Allison et al., 2006). It should be noted that while classification and clustering can be of value, their use is based on an assumption that transcriptome or proteome data naturally fall into distinct groups. Evidence indicates that they do not (Bryan, 2004). As seen in Fig. 3A, C
Classification requires the user to define training sets to be used as the basis of the groupings. Training sets consist of subsets of the data, either proteins or entire datasets, which have been defined as belonging to a specific group. This requires prior knowledge in order to properly assign members of the training set. Because the training sets are used to drive the analysis they are considered to be “supervising” the analysis (Raychaudhuri et al., 2001; Allison et al., 2006). Classification algorithms are generally used in one of two ways in proteomics and functional genomics. One use is to classify entire datasets, for example whether a protein dataset was derived from an internalized intracellular pathogen or an externally grown control. The more common use is to classify subsets of proteins according to their abundance patterns across different datasets. If a particular set of proteins was of interest and known to be co-expressed, classification could be used to find other proteins in the dataset that matched their abundance pattern. There are numerous classification and clustering algorithms incorporated into MEV. A comprehensive review of different classification techniques is beyond the scope of this paper. Other analysis packages can be found at http://ihome.cuhk.edu.hk/~b400559/arraysoft_mining_specific.html; classification and clustering algorithms are also available as part of commercial systems such as Elucidator from Rosetta Inpharmatics (http://www.rosettabio.com/products/elucidator/default.htm). As an example of classifying proteins according to their abundance patterns, we conducted a classification to find proteins with patterns like those of the M. maripaludis flagella proteins (Fig. 4A
Classification techniques are subject to several sources of error. In brief, the major concerns are inappropriate choices for the training sets, over-fitting, and selection bias (Raychaudhuri et al., 2001; Ambroise and McLachlan, 2002; Allison et al., 2006). If the samples chosen for the training set are assigned to the set in error or cannot explain the phenotype of interest, then the classification is likely to fail. For the example (Fig. 4A Because it draws on pre-existing knowledge, classification tends to be used to ask specific questions. The discovery of patterns in the data, especially unexpected patterns, is the strength of clustering, or unsupervised algorithms. Unlike supervised methods, clustering does not require any prior knowledge of the proteins or conditions in question. Instead, the clusters are organized according to their similarity to all other proteins or conditions under consideration. Many different clustering algorithms have been developed (Sherlock, 2000; Raychaudhuri et al., 2001; Allison et al., 2006). As with the K nearest neighbor classification discussed above, they all use some measure of mathematical distance between the proteins or conditions to be compared. Hierarchical algorithms construct trees of the entire dataset grouping proteins with other proteins based on the calculated distance. Some algorithms require the user to define the number of clusters into which the data will be grouped (Sherlock, 2000). The algorithm then uses the calculated distance to split the proteins into that many groups. Finally, some methods employ a user defined minimum correlation for a cluster (Sherlock, 2000). Such algorithms use the calculated distances to construct groups where the proteins in each group possess the user defined minimum correlation or higher. Unlike the other clustering techniques, minimum correlation algorithms will not necessarily cluster all the proteins in the dataset. For an example of a commonly employed clustering technique, we chose agglomerative hierarchical clustering. The agglomerative hierarchical clustering algorithm from the MEV program was applied to the M. maripaludis dataset (Fig. 4B ONTOLOGY TOOLS The newest set of tools being developed for transcriptome and proteome analysis employ the concept of ontology. There are several definitions that apply in different fields, but here we are concerned with the concept of ontology as it has evolved in computer science and computational biology. For our purposes ontology can be thought of as an attempt to describe a part, or possibly all, of the universe of interest to a scientific discipline using a system of highly specific, hierarchical categories and their relationships (Mizoguchi, 2001). The categories and relationships must be so well defined as to be machine-readable and easily manipulated by computer. While a few other specialized biological ontologies have been developed, for transcriptome and proteome analysis the primary ontology is the Gene Ontology (GO) (http://www.geneontology.org/(Ashburner et al., 2000)), which evolved from the human gene ontology (HUGO). Examples of the Gene Ontology are shown for both a P. gingivalis and an M. maripaludis protein (Fig. 5
Most users will not interact directly with GO, but will use ontology tools that employ GO. An example of GO executed through an ontology tool, GoMiner, discussed in more detail below, is given for the M. maripaludis methanogenesis protein N5- methyltetrahydromethanopterin: Methyl transferase A (MtrA, MMP1564) (Fig. 5B There are several uses for ontologies, such as integrating the large number of different, autonomous databases of biological information or clearing up the semantic ambiguity in biological nomenclature (Schulze-Kremer, 2002). However, probably the most important aspect of ontologies for proteomics and functional genomics is that they provide a consistent structure for categorizing genes and proteins that yields readily to machine searchable annotation. A protein assigned to the “One-carbon compound biosynthetic process” will never be called “Biosynthetic process one-carbon compound” or “One-carbon compound process”. Thus all the proteins assigned to this category will have the exact same label for purposes of computation. The ontology tools take advantage of this machine searchable aspect to look for categories that show abundance change within a dataset (Zeeberg et al., 2003; Pavlidis et al., 2004; Subramanian et al., 2005; Huang et al., 2007). If a large number of proteins in the category “One-carbon compound biosynthetic process” displayed altered abundance in an experiment the ontology tool would list “One-carbon compound biosynthetic process” as altered. The primary goal is to provide a more efficient way to mine large quantities of data. Ontology tools fall into of one of two general groups (Pavlidis et al., 2004). One type of tool looks for over-representation of changed proteins in a category. The user supplies the tool with two lists, one of all proteins in the experiment and the other of those proteins that are changed in the condition of interest. The ontology is used to determine the categories for each protein. Then the tool calculates how likely the number of changed proteins in each category is to arise by chance from the overall list. Those categories that are over-represented in changed proteins are then reported. The second type of tool uses what has been called “functional class scoring” (Pavlidis et al., 2002; Pavlidis et al., 2004). These tools take into consideration the statistical likelihood that any protein is changed in the condition of interest, most commonly given as a p-value. The tools then calculate the likelihood that any category is changed using the statistics for every protein in the category. The advantage of this method is there is no need to group the proteins into changed and unchanged, instead the program uses the statistical power of the results for all the proteins in a category to determine the likelihood that a category is altered. To our knowledge only two ontology tools use “functional class scoring”, erminej (http://www-bioinformatics-ubc-ca/ermineJ/ (Pavlidis et al., 2004)) and GOdist (http://basalganglia.huji.ac.il/links.htm (Ben-Shaul et al., 2005)), and at this time neither program can easily be made to accept microbial proteomic datasets as inputs. We tested two ontology tools that employ over-representation algorithms, GoMiner (http://discover.nci.nih.gov/gominer/(Zeeberg et al., 2003; Zeeberg et al., 2005)) and DAVID (Database for Annotation, Visualization and Integrated Discovery, http://david.abcc.ncifcrf.gov/(Huang et al., 2007)). More ontology tools can be found at http://www.geneontology.org/GO.tools.shtml. Like BioCyc, both of these tools require files identifying the proteins to be associated with the databases underlying the programs. While both tools employed a number of different protein identifiers, neither took the gene designations from the sequencing projects, e.g. PG2181. For GoMiner we obtained the Universal Protein Resource (UniProt) identity numbers for the proteins from http://www.ebi.ac.uk/trembl/. For DAVID, entrez gene numbers (http://www.ncbi.nlm.nih.gov/sites/entrez) were used. Even when using protein identifiers recognized by the tool, only a subset of P. gingivalis and M. maripaludis proteins could be mapped onto the databases. The Gene Ontology came out of the Human Gene Ontology and the consortium working on GO is primarily focused on eukaryotic organisms, although the J. Craig Venter Institute and the Plant-Associated Microbe Gene Ontology consortium are also associated with the project (http://www.geneontology.org/GO.consortium.shtml#fulllist). The consequence is that GO is not well curated for many microbial species. Thus, not every protein from these organisms has been entered into GO and not all of the entries are complete. We analyzed the same P. gingivalis dataset used with BioCyc (Fig. 4
The P. gingivalis example also shows one of the difficulties in interpreting the output from an ontology program. Several of the under-expressed categories are subsets of each other. GoMiner may be reporting the same result several times. In addition to the lists of GO categories, GoMiner also has a hierarchical display of the results, making it easier to visualize the relationships among the different categories. The hierarchical display (not shown) indicated that the 4 iron, 4 sulfur cluster binding category is a member of the iron-sulfur cluster binding category that is in turn part of the metal cluster binding category. The output also gave the number of proteins in each category. The numbers of proteins in the three categories were very similar, in fact both metal cluster binding and iron-sulfur cluster binding contained the same 25 proteins. Thus, the three iron categories are effectively the same result displayed three times. Users can try to cut down on this confusion by restricting GoMiner to use only certain levels of the ontology, but this requires knowledge of the overall ontology structure and the scope of the results of interest and we did not use this function. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was also tested using the P. gingivalis and M. maripaludis datasets. The results were roughly similar to those obtained with GoMiner, but there were differences worth noting. DAVID only takes lists of proteins and lists of changed proteins, unlike GoMiner, that can also accept information about the direction of the abundance change for a protein. This means that to generate the same three types of analyses seen in GoMiner, overall change, increased abundance, and decreased abundance, three separate analyses have to be run in DAVID. While GoMiner uses only the Gene Ontology, DAVID employs the GO and other databases to provide categories. The databases can be selected by the user and include Uniprot and KEGG as options. Thus DAVID is more likely to find a category for any submitted protein. The same dataset that yielded 868 categorized proteins using GO yielded a category for every protein, 1199, in DAVID. The protein annotations may also be more complete in databases other than GO. For example, GO only recognized 18 of the 36 methanogenesis proteins in a simulated M. maripaludis dataset. When this same dataset was run with DAVID, one of the over-represented categories was the KEGG folate pathway, which contains the methanogenesis pathway. This included a link to KEGG with the changed proteins highlighted, clearly showing that all of the methanogenesis proteins were identified and had altered abundance. However, there are negatives associated with using multiple databases. Just like BioCyc’s computational pathway assignments, trying to match every protein to a broad array of databases can yield some categories inconsistent with the known biology of the organism. Indeed, one category that DAVID found with increased abundance for internalized P. gingivalis was photosynthesis, as was seen with BioCyc. Until organisms like P. gingivalis are better represented in GO, however, the benefits of using DAVID are likely to outweigh the drawbacks. CONCLUDING REMARKS The wealth of data available from whole genome transcriptome analysis and whole cell proteomics presents opportunities to greatly increase our understanding of biological systems. However, interpreting large datasets can be a substantial undertaking. Combing through thousands of data points by eye is time consuming and prone to personal biases and missed results. To make analyzing large datasets more efficient, a number of interpretive aids have evolved. The simplest of the analysis tools help identify obvious patterns in the data. Beads-on-a-string, clustering, and classification can help identify groups of co-expressed proteins. Co-expression can give clues to the biological changes being examined and the role of unknown proteins. Visualization tools such as MEV are insensitive to the type of input and thus are readily adaptable to proteomics data, despite their origins in transcription analysis. With these tools, drawing biological conclusions from patterns is entirely in the hands of the user. Other tools seek to directly provide biological interpretation. Mapping relative abundance data onto biochemical pathways can highlight changes in metabolism and biological processes. Ontology tools, the latest set of interpretive aids, try to identify biologically relevant categories that undergo change in the experiment. The more direct biological interpretation tools are heavily dependent on the accuracy and completeness of their underlying databases. They are generally more accurate and complete for a few well-studied organisms, such as humans or Escherichia coli. For less studied microorganisms the tools can still be useful, provided the user is careful about validating the results and is knowledgeable regarding the organism in question. None of these tools can be used naively with the expectation of generating valid information. Biocyc, GoMiner, DAVID, etc. are best viewed as tools to speed the process of discovery, not as replacements for skilled human judgment. Acknowledgments This work was supported by the NIDCR under RO1 DE014372 and RO1 DE011111, with additional support provided under RO1 GM074783. We also thank Todd Kitten, Qiangwei Xia, Tiansong Wang, Fred Taub, Gundula Bosch, and John A. Leigh for their assistance. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nat Biotechnol. 2001 Mar; 19(3):242-7.
[Nat Biotechnol. 2001]Anal Chem. 2002 Apr 1; 74(7):1650-7.
[Anal Chem. 2002]Infect Disord Drug Targets. 2006 Sep; 6(3):311-25.
[Infect Disord Drug Targets. 2006]J Bacteriol. 2004 Oct; 186(20):6956-69.
[J Bacteriol. 2004]Microbiol Mol Biol Rev. 1998 Dec; 62(4):1244-63.
[Microbiol Mol Biol Rev. 1998]Nat Biotechnol. 2001 Mar; 19(3):242-7.
[Nat Biotechnol. 2001]Anal Chem. 2002 Apr 1; 74(7):1650-7.
[Anal Chem. 2002]J Proteome Res. 2002 Jan-Feb; 1(1):21-6.
[J Proteome Res. 2002]Anal Chem. 2002 Oct 15; 74(20):5383-92.
[Anal Chem. 2002]Drug Discov Today. 2004 Feb 15; 9(4):173-81.
[Drug Discov Today. 2004]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9440-5.
[Proc Natl Acad Sci U S A. 2003]Nat Genet. 2002 Dec; 32 Suppl():496-501.
[Nat Genet. 2002]Anal Chem. 2004 Jul 15; 76(14):4193-201.
[Anal Chem. 2004]Proteomics. 2007 Dec; 7(23):4323-37.
[Proteomics. 2007]Proteomics. 2007 Dec; 7(23):4323-37.
[Proteomics. 2007]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9440-5.
[Proc Natl Acad Sci U S A. 2003]Nat Methods. 2005 May; 2(5):351-6.
[Nat Methods. 2005]Analyst. 2006 Dec; 131(12):1335-41.
[Analyst. 2006]Proteomics. 2007 Aug; 7(16):2904-19.
[Proteomics. 2007]Proteomics. 2007 Dec; 7(23):4323-37.
[Proteomics. 2007]Stat Methods Med Res. 2006 Feb; 15(1):3-20.
[Stat Methods Med Res. 2006]Int J Biochem Cell Biol. 1998 Dec; 30(12):1297-318.
[Int J Biochem Cell Biol. 1998]Proteomics. 2007 Dec; 7(23):4323-37.
[Proteomics. 2007]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]Trends Biotechnol. 2001 May; 19(5):189-93.
[Trends Biotechnol. 2001]Nat Rev Genet. 2006 Jan; 7(1):55-65.
[Nat Rev Genet. 2006]BMC Bioinformatics. 2006 Aug 31; 7():397.
[BMC Bioinformatics. 2006]Trends Biotechnol. 2001 May; 19(5):189-93.
[Trends Biotechnol. 2001]Nat Rev Genet. 2006 Jan; 7(1):55-65.
[Nat Rev Genet. 2006]Genes Dev. 2000 Apr 15; 14(8):963-80.
[Genes Dev. 2000]Proc Natl Acad Sci U S A. 1997 Feb 18; 94(4):1316-20.
[Proc Natl Acad Sci U S A. 1997]Proc Natl Acad Sci U S A. 2007 May 22; 104(21):8930-4.
[Proc Natl Acad Sci U S A. 2007]Trends Biotechnol. 2001 May; 19(5):189-93.
[Trends Biotechnol. 2001]Proc Natl Acad Sci U S A. 2002 May 14; 99(10):6562-6.
[Proc Natl Acad Sci U S A. 2002]Nat Rev Genet. 2006 Jan; 7(1):55-65.
[Nat Rev Genet. 2006]Bioinformatics. 2005 May 1; 21(9):1979-86.
[Bioinformatics. 2005]Genome Res. 2002 Jan; 12(1):165-76.
[Genome Res. 2002]Curr Opin Immunol. 2000 Apr; 12(2):201-5.
[Curr Opin Immunol. 2000]Trends Biotechnol. 2001 May; 19(5):189-93.
[Trends Biotechnol. 2001]Nat Rev Genet. 2006 Jan; 7(1):55-65.
[Nat Rev Genet. 2006]Proc Natl Acad Sci U S A. 1998 Dec 8; 95(25):14863-8.
[Proc Natl Acad Sci U S A. 1998]BMC Bioinformatics. 2005 Jul 15; 6 Suppl 2():S10.
[BMC Bioinformatics. 2005]Curr Opin Immunol. 2000 Apr; 12(2):201-5.
[Curr Opin Immunol. 2000]Bioinformatics. 2001 Apr; 17(4):309-18.
[Bioinformatics. 2001]Bioinformatics. 2003 Mar 1; 19(4):459-66.
[Bioinformatics. 2003]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]FEMS Microbiol Lett. 1994 Jun 1; 119(1-2):129-35.
[FEMS Microbiol Lett. 1994]In Silico Biol. 2002; 2(3):179-93.
[In Silico Biol. 2002]In Silico Biol. 2002; 2(3):179-93.
[In Silico Biol. 2002]Genome Biol. 2003; 4(4):R28.
[Genome Biol. 2003]Neurochem Res. 2004 Jun; 29(6):1213-22.
[Neurochem Res. 2004]Proc Natl Acad Sci U S A. 2005 Oct 25; 102(43):15545-50.
[Proc Natl Acad Sci U S A. 2005]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W169-75.
[Nucleic Acids Res. 2007]Neurochem Res. 2004 Jun; 29(6):1213-22.
[Neurochem Res. 2004]Bioinformatics. 2005 Apr 1; 21(7):1129-37.
[Bioinformatics. 2005]Genome Biol. 2003; 4(4):R28.
[Genome Biol. 2003]BMC Bioinformatics. 2005 Jul 5; 6():168.
[BMC Bioinformatics. 2005]Nucleic Acids Res. 2007 Jul; 35(Web Server issue):W169-75.
[Nucleic Acids Res. 2007]Proteomics. 2007 Dec; 7(23):4323-37.
[Proteomics. 2007]Genome Res. 2002 Jan; 12(1):165-76.
[Genome Res. 2002]