![]() | ![]() |
Formats:
|
||||||||||||||||||||||
Mining gene functional networks to improve mass-spectrometry-based protein identification 1Department of Computer Sciences, 1 University Station C0500, 2Department of Chemistry and Biochemistry & Institute for Cellular and Molecular Biology, Center for Systems and Synthetic Biology, 2500 Speedway, The University of Texas at Austin, Austin, TX 78712 and 3Children's Cancer Research Institute, The University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA *To whom correspondence should be addressed. Associate Editor: Jonathan Wren Received March 10, 2009; Revised June 26, 2009; Accepted July 19, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Motivation: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly. Results: We develop a method that analyzes MS/MS experiments in the larger context of the biological processes active in a cell. Our method, MSNet, improves protein identification in shotgun proteomics experiments by considering information on functional associations from a gene functional network. MSNet substantially increases the number of proteins identified in the sample at a given error rate. We identify 8–29% more proteins than the original MS experiment when applied to yeast grown in different experimental conditions analyzed on different MS/MS instruments, and 37% more proteins in a human sample. We validate up to 94% of our identifications in yeast by presence in ground-truth reference sets. Availability and Implementation: Software and datasets are available at http://aug.csres.utexas.edu/msnet Contact: miranker/at/cs.utexas.edu, marcotte/at/icmb.utexas.edu Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION High-throughput protein identification in biological samples aids our understanding of complex cellular systems and their behavior. Mass spectrometry (MS)-based shotgun proteomics offers fast, high-throughput characterization of complex protein mixtures. Several thousand proteins may be identified in a sample using high-resolution MS/MS instruments and/or extensive biochemical fractionation (Brunner et al., 2007; Graumann et al., 2007), but standard approaches only identify a fraction of the expected proteins. A shotgun proteomics experiment typically proceeds by MS/MS analysis of peptides from proteolytically digested proteins, followed by in silico matching of the MS/MS spectra against a database of theoretical peptide spectra derived from protein sequences (Fig. 1
Effective MS/MS protein identification is hindered by factors such as noisy spectra, low-concentration proteins, post-translational modifications and chemical properties that interfere with peptide ionization. For complex samples such as cell lysates, current MS search algorithms typically only match a small percentage (<20%) of all MS/MS spectra to real peptides, resulting in higher error rates and low recall at the protein level. As a result, only a percentage of the expected proteins are identified with confidence despite presence in the biological sample, and the MS/MS identification scores of many other proteins fall below acceptable confidence thresholds. MS/MS protein identification scoring schemes, such as BioWorks (ThermoFinnegan) and ProteinProphet (Nesvizhskii et al., 2003), assume that all proteins are equally likely to be present. In reality, other information may be available and can be used to influence the inferred probability of protein presence thereby rescuing proteins that fall below confidence thresholds. We use gene functional networks (Marcotte et al., 1999) as an external information source to analyze proteins in a sample in the context of the biological processes that are active in the cell. Given a list of proteins identified in an MS experiment (M), we determine a more complete list (M′) by considering the proteins that are expected to be present (or absent) based on their functional linkages to proteins in M. Each protein receives a revised identification score with contributions both from direct MS-based evidence, and MS evidence of neighbors in the gene functional network. Since current gene networks can be incomplete, we intend for M′ to serve as a complement to M, rather than replace it as the authoritative list of expressed proteins. Our data integration approach has the potential to enable pathway-based interpretation of high-throughput MS/MS experiments that are otherwise run in isolation. For instance, by integrating mass spectrometry data from yeast grown in rich medium with a published yeast functional network (Lee et al., 2007), we were able to confidently identify many proteins from ribosomal complexes and proteins involved in RNA binding, processing and degradation, thereby increasing the protein coverage in several active pathways (Section 4). When our method was applied to yeast grown in minimal medium, we increased the number of proteins identified in the reductive carboxylate cycle pathway (Ogata et al., 1999). In both cases, we expect the newly identified proteins to be present in the sample, but they were not identified with confidence by the MS analysis software, despite having at least one peptide identified per protein. We demonstrate the applicability of MSNet to data from different organisms, mass spectrometers, MS analysis pipelines, and experimental conditions. We identify 8–29% more proteins on different yeast datasets at the same error rate, and evaluate the quality of protein identifications via ROC and precision–recall plots. In yeast grown in rich medium, analyzed on a high-resolution mass spectrometer, we identify 29% more proteins than the original MS analysis, 97% of which are present in a reference set derived from independent identification experiments. We also demonstrate direct applicability to the human proteome using a human functional gene network, reporting 37% more proteins than the original MS analysis. 2 METHODS 2.1 MSNet algorithm MSNet introduces an additional stage of computational analysis to MS/MS shotgun protein identification (Fig. 1 We use the yeast gene functional network developed by Lee et al. (Lee et al., 2004, 2007) which spans >95% of the yeast genes. The network forms a graph G = (V, E) with |V| = N genes and |E| weighted edges (wij) between nodes. The weight wij of an edge between two genes i and j is defined as the log of the likelihood odds ratio that there exists a link, and is determined by Bayesian integration of thousands of diverse experiments that estimate functional association e.g. mRNA co-expression, phylogenetic profiles, protein interaction experiments and co-citation in published literature (Lee et al., 2007). Intuitively, wij denotes the strength of a functional link between two genes. For human samples, we use a similarly constructed human gene network (Lee and Marcotte, manuscript in preparation). MSNet computes a score yi for each protein i, which represents how likely it is for i to be present in the sample given MS evidence for i and its functionally related proteins j. The MSNet score for protein i (Equation 2) is the convex combination of two terms: (i) the probability that the protein is present in the sample given evidence from a MS experiment (oi) and (ii) the weighted average of MSNet scores of i's immediate network neighbors j (Equation 4). We set oi to the MS protein probability generated by ProteinProphet (Nesvizhskii et al., 2003), but any posterior probability of protein presence given sample-specific experimental data may be used instead (see discussion in Section 4). Since yi is defined in terms of yj, we update scores iteratively. At each iteration t, the algorithm includes evidence from neighbors at path length=t.
The MSNet score can be rewritten in vector notation using the weighted adjacency matrix UN×N and MS protein probability vector ON×1 to generate score vector YN×1 (Equation 2). The MSNet algorithm is closely related to diffusion algorithms like Google's PageRank (Langville and Meyer, 2006; Page et al., 1999). PageRank has been successfully used to determine a relevancy ranking of webpages based on the hyperlink structure of the web (Langville and Meyer, 2006). MSNet generates a ranking of proteins that is based not only on the link structure of a gene functional network, but also on per-protein relevance to a given sample. In Supplementary Appendix I, we show that MSNet is equivalent to a personalized (Page et al., 1999) or topic-sensitive variant of PageRank (Haveliwala, 2003) with two differences. First, PageRank is defined on a directed graph. Gene functional networks are undirected, so each edge must be interpreted as being bi-directional. A second related difference is that PageRank uses a column-normalized weight matrix H = UT. We justify the use of U in Supplementary Appendix I, and show that it performs better in our domain in Supplementary Figure S6. MSNet can be shown to converge to a unique solution irrespective of starting vector Y(0) (proof of convergence is in Supplementary Appendix I). In practice, MSNet converges within 10−6 tolerance in tens of iterations (Equation 3). In our experiments, we initialize Y(0) = O. Parameter (1 − γ)/γ weights the network's contribution to the MSNet score. We optimize γ in yeast by maximizing the area under the ROC curve (AUC) while maintaining similar error rates as the MS analysis across multiple datasets. AUC is not very sensitive to (1 − γ)/γ in the range [5,50] (see Supplementary Fig. S3). We set (1 − γ)/γ = 6 for yeast. 2.2 Evaluation methodology In this section, we describe the MSNet evaluation framework, introduce the error measures used and describe how they are computed. For a given mass spectrometry experiment and gene functional network, we calculate the MSNet protein identification score for every protein on a genome-wide scale. To test robustness to missing network links, the reported MSNet score is averaged across 10 runs of 10-fold cross-validation. We restrict our evaluation to proteins with at least one peptide identified in the MS experiment. We use a 5% false discovery rate (FDR) (Storey and Tibshirani, 2003) to determine a high-confidence list of proteins. The FDR at a score t is the fraction of false instances among all identifications with score ≥t. We employ two approaches to estimate the FDR: (i) using a protein reference set as ground-truth to categorize proteins as true or false instances; (ii) generating true and false (null) score distributions independent of ground truth as described in detail below. We conducted functional analysis of yeast proteins using SGD (Nash et al., 2007), FunSpec (Robinson et al., 2002) and FuncAssociate (Berriz et al., 2003), applying Bonferroni corrections. 2.2.1 Evaluation against a protein reference set When a protein reference dataset is available, we use it to label a protein as a true instance (T) if it is present in the reference set, and as a false instance (F) otherwise. We estimate the FDR at score threshold s as FDRref = F/(T + F), the percentage of false instances that have score ≥s. We also plot receiver operator characteristic (ROC) and precision-recall curves using the reference set to determine true and false instances. A ROC curve plots true positive rate (TPR) versus the false positive rate (FPR). A precision-recall curve plots (1-FDR) (precision) versus TPR (recall). TPR at a score threshold t is the fraction of true instances with score ≥t. FPR at score threshold t is the fraction of false instances with score ≥t. FDR is defined above. We also report the ROC AUC, the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (Fawcett, 2006). 2.2.2 Evaluation independent of a protein reference set When protein reference sets are unavailable, it is standard to compute error estimates by generating a null distribution of scores, and using the ratio of the areas of null and true distributions at scores ≥s as an estimate of the FDR at score threshold s. Though there has been extensive recent work on the estimation of FDRs at the peptide-level (Choi and Nesvizhskii, 2008; Kall et al., 2008), there is no consensus at the protein identification level (Tabb, 2008). Our purpose however is to develop an error model for MSNet, and we do not address the reliability of MS error models in this article. We generate an error model using a method we refer to as network-shuffling, similar to randomization or permutation tests used in statistical hypothesis testing. For a given dataset, we generate a null distribution of MSNet scores by running MSNet on a network where the labels on the nodes (protein names) are shuffled, such that proteins maintain features such as the MS protein identification score, but have a different set of network neighbors. This label-shuffling destroys any biological gene–gene association signal, while maintaining the total node degree (topology). We repeat the shuffling process multiple times and pool all generated scores to estimate the null score distribution. The true score distribution is generated by running MSNet on the original network. We plot density distributions for null and true scores (Supplementary Fig. S2) and estimate FDR as FDRshuff = Ns/Ts, where Ns is the area under the null distribution for scores ≥s and Ts is the area under the true distribution for scores ≥s. In this article, FDR refers to FDRshuff unless stated otherwise. 2.3 Datasets We evaluated MSNet on different organisms, experimental conditions and mass spectrometers (Table 1). MS/MS data was collected on low and high-resolution mass spectrometers: ThermoFinnigan's Surveyor/DecaXP+(LCQ) and LTQ-OrbiTrap (ORBI). MS/MS protein identification was conducted using Bioworks 3.3 (ThermoFinnigan), PeptideProphet (Keller et al., 2002) and ProteinProphet (Nesvizhskii et al., 2003). We considered the entire yeast genome except for proteins annotated as ‘dubious’, since these proteins were not considered in the yeast network (Lee et al., 2007). All MS yeast experiments were the result of combined MS analysis of multiple injections of the sample. An identified protein was labeled as a true instance if it was present in the corresponding protein reference set (Table 1).
2.3.1 Yeast (rich medium) Cell lysate from wild-type yeast grown in rich medium was analyzed on both LCQ and ORBI mass spectrometers. The LCQ data has been published previously (Lu et al., 2007). 2.3.2 Yeast (rich medium, polysomal fraction) Cellular lysate was separated in 7–47% sucrose gradient and fractions were monitored by UV absorbance for RNA content (Li et al., 2009). We chose the fraction containing 80S ribosomes for LC–MS/MS analysis on the LCQ. 2.3.3 Yeast (minimal medium) We used MS/MS data on wild-type yeast grown in minimal medium (MOPS9), previously published in (Lu et al., 2007), with cell lysate analyzed on an LCQ mass spectrometer. 2.3.4 Human Protein extracts from human HEK293T cell lines were prepared for MS/MS analysis as described in the Supplement. We evaluated results using the shuffled network approach, since no comprehensive protein reference set was available for this dataset. 2.3.5 Availability Yeast LCQ data has been previously published (Lu et al., 2007). Software and datasets are available at http://aug.csres.utexas.edu/msnet. Further details about sample preparation and protein reference sets are in the Supplement. 3 IMPLEMENTATION AND RESULTS We demonstrate that incorporating functional association information can substantially boost correct identification of proteins in a shotgun proteomics experiment, across a range of sample conditions and mass spectrometers. For each dataset in Table 1, we measured the number of proteins identified by MSNet at 5% FDR as compared to the original MS experiment at its 5% FDR. ProteinProphet (Nesvizhskii et al., 2003) computes FDR directly from protein probabilities, which the authors empirically show to be good estimates of the true posterior probability of protein presence. MSNet consistently increased the number of identified proteins by 8–29% across yeast experiments (Table 2) and at least 94% of MSNet proteins were validated—either by presence in the reference set, or previous identification in the MS experiment (Fig. 2
3.1 Yeast grown in rich medium We tested the applicability of our method to whole-cell lysate samples using yeast grown in rich medium analyzed on high and low-resolution mass spectrometers. In Table 2, we report the number of proteins identified by MSNet for the yeast rich medium sample analyzed on the high resolution LTQ-Orbitrap (Table 1, YPD-ORBI). MSNet reported 1835 identifications at 5% FDR, a 29% increase over the original MS experiment. We validated 96% of MSNet's 5% FDR proteins—92% were present in the reference set and a further 4% were previously identified in the original MS experiment (Fig. 2 We generated ROC and precision–recall plots for both MSNet and the original MS experiment, marking protein as a true instance if it was present in the YPD* reference set (Table 1), and false otherwise. In a ROC plot (Fig. 2 MSNet improved performance even when the original MS experiment was limited by instrument resolution, as we observed on the same sample re-analyzed on a low-resolution mass spectrometer (Table 1, YPD-LCQ). MSNet reported 8% more proteins than the original MS experiment (Table 2) and increased AUC by 24% (Table 2, Supplementary Fig. S1). The new MSNet identifications were enriched for ribosomal proteins (P <0.001). 3.2 Yeast grown in minimal medium We expect our method to be applicable to yeast in different sample conditions, since the gene network was constructed by integrating diverse biological experiments. Indeed, when applied to yeast grown in minimal medium (Table 1, YMD-LCQ), MSNet identified 9% more proteins at 5% FDR (Table 2). The new MSNet identifications were enriched for ribosomal proteins (P < 0.001) as in the rich-medium yeast experiment, but also for proteins of small molecule biosynthesis (P < 0.001) e.g. carboxylic acid, amine or folate metabolism, which is expected for growth in minimal medium. MSNet increased AUC by 17% when evaluated against the YMD* reference set (Table 2, Supplementary Fig. S1). 3.3 Yeast polysomal fraction We expect MSNet to be especially effective on smaller, focused protein preparations. Accordingly, we tested MSNet on a polysomal fraction of yeast grown in rich medium, fractionated on a sucrose density gradient (Table 1, YPD-LCQ-Fraction). Proteins in this sample were restricted to those co-fractionating with 80S ribosomes and were expected to be associated with ribosomal and translation functions. MSNet identified 16% more proteins at 5% FDR than the original MS experiment (Table 2). Ninety-four percent of MSNet identifications were validated, either by presence in the fractionation reference set or by previous identification in the MS experiment (Fig. 2 3.4 Applicability to higher organisms Finally, we tested MSNet in higher organisms by evaluating proteins expressed in human HEK293T cells analyzed on a high-resolution mass spectrometer (Table 1, Human-293T). We used a human gene functional network (Lee and Marcotte, manuscript in preparation). We considered 18 514 protein-coding genes present in the network, and reported up to 40% increase in the number of identified proteins at 5% FDR. We present a range of results in Table 2 with parameter (1 − γ)/γ varying in [6,10]. As in yeast (Section 2.1), this parameter may be optimized as reference sets for human data become available. The new 5% FDR MSNet proteins were not enriched for any functional category. 3.5 Performance on different MS/MS pipelines We tested the applicability of MSNet to MS/MS data analyzed using different software pipelines. There are several issues with systematic testing and comparison of different MS pipelines. First, there is currently only one published, freely available analysis pipeline that generates protein-level probabilities and FDRs i.e. the TransProteomicPipeline [TPP, (Keller et al., 2002; Nesvizhskii et al., 2003)], which we used for our main results. Second, a systematic comparison is non-trivial since each pipeline makes different statistical assumptions and the hypotheses are not independent. Third, any such effort also entails significant development to accommodate different data formats (Prince and Marcotte, 2008). Nevertheless, we tested four pipelines: (i) TPP with SEQUEST (Bioworks) for spectral matching (used for main results); (ii) TPP with X!Tandem (Craig and Beavis, 2004) for spectral matching; (iii) CRUX for spectral matching (Park et al., 2008), Percolator (Kall et al., 2007) for peptide-matching and DTASelect (Tabb et al., 2002) for protein reports; and finally (iv) a simple average of protein probabilities from the above pipelines. Since DTASelect does not generate protein scores or FDR, we implemented a simple protein probability as the probability that at least one constituent peptide's identification was correct as described in (Nesvizhskii et al., 2003). MSNet showed comparable performance across pipelines, with 10–12% higher AUC, and 7–12% more proteins at 5% FDR than the original analysis. The percentage increase in reported proteins depended on the coverage of the MS analysis software. As expected, the more the proteins confidently identified at the MS stage, the fewer the new MSNet identifications (details are in Supplementary Tables S4–S5 and Supplementary Fig. S5). 4 DISCUSSION AND CONCLUSIONS We have presented a method that improves the sensitivity and precision of protein identification by integrating functional linkage information into the computational analysis of MS shotgun proteomics experiments. Our methodology places MS experiments in a larger biological framework, where proteins expressed in a given cellular state may be readily analyzed in the context of their functionally related neighbors. We have shown that integrating data sources from outside an MS experiment can improve the protein identification rate of current MS technology and software. We increased the number of proteins identified at 5% FDR by 8–40%. We also improved performance against the original MS analysis in ROC and precision–recall plots, using our compilation of protein reference sets, showing 10–24% increase in ROC-AUC. We also presented an evaluation methodology to generate null distributions and FDRs for MSNet using network-shuffling, independent of gold-standard reference sets. These null distributions may be used to compute any other desired error estimate (e.g. p- and q-value). In two specific examples, we examine the immediate neighbors of two proteins identified by MSNet at 5% FDR in the proteome for yeast grown in rich medium. ARC40 is an essential subunit of the ARP2/3 complex (Fig. 3
MSNet improves protein identification by both increasing the number of true identifications and reducing false identifications. Since MSNet produces a revised ranking of MS-identified proteins, some proteins can receive lower ranks than in the MS analysis and fall below MSNet's 5% FDR threshold, despite satisfying the MS 5% FDR threshold. There is some evidence that these demoted proteins might be false positive MS identifications: in yeast, the percentage of demoted proteins that can be validated by presence in the reference set is much smaller than the percentage of new MSNet proteins that can be validated similarly (Supplementary Table S6). In human, all demoted proteins were network singletons i.e. they had no network neighbors. We list the demoted proteins for all experiments, as well as the union of MS and MSNet identifications in Supplementary Table S6. Using the high-confidence list of MSNet identifications as a starting point, one may narrow the range of additional experiments that are run to validate the existence of computationally predicted proteins. To the best of our knowledge our method is the first to use gene networks to improve protein identification in shotgun proteomics. Gene functional networks have been widely used for predicting gene function. For example, Deng et al. (2003) modeled functional linkages as a Markov network, predicting a gene's function based on the functions of its neighbors. More recently, Wei and pan (2008) used functional associations to learn per-gene mixing proportions in a spatially correlated mixture model to improve large-scale studies such as differential gene expression. We have shown that MSNet is able to exploit a single organism-wide gene functional network to improve protein identification across different sample conditions, including different growth media and ranging from proteome-wide analysis to subcellular fractions. In contrast to previous approaches using MS and mRNA expression data (Ramakrishnan et al., 2009), MSNet is easily applicable across datasets and experimental conditions, and does not depend on the availability of matching sample-specific data. MSNet is also directly applicable to smaller, focused protein preparations (Section 3.3) and to higher organisms, as we show for the proteome of cultured human cells. It is also possible to incorporate other sample-specific data when available by replacing the mass-spectrometry specific term oi (Equation 1) by a probability conditioned on other data sources e.g. LC separation profiles. ‘Omics’ integration approaches like MSNet will become increasingly powerful as functional association networks become broadly available, as for C.elegans (Lee et al., 2008), mouse (Guan et al., 2008; Kim et al., 2008; Pena-Castillo et al., 2008) and other organisms (Bowers et al., 2004; von Mering et al., 2003). ACKNOWLEDGEMENTS The authors thank Dan Boutz for mass-spectrometry assistance, Insuk Lee for pre-publication access to the human gene network, and Prof. William Noble and Lukas Kall for assistance with Percolator. They also thank Prof. Inderjit Dhillon, Prateek Jain and Raghu Meka for feedback on algorithm convergence, Martin Blom for discussions on network-shuffling, Lee Thompson for proofreading the convergence proofs and Lilyana Mihalkova, Prof. Raymond Mooney and Prof. William Press for useful discussions on relational learning. Funding: National Science Foundation grant (DBI-0640923); the National Institutes of Health grants (GM067779, GM076536); the Welch (F-1515) and Packard Foundations grant; International Human Frontier Science Program support (to C.V.). Conflict of Interest: none declared. REFERENCES
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||
Nat Biotechnol. 2007 May; 25(5):576-83.
[Nat Biotechnol. 2007]Mol Cell Proteomics. 2008 Apr; 7(4):672-83.
[Mol Cell Proteomics. 2008]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Anal Chem. 2002 Oct 15; 74(20):5383-92.
[Anal Chem. 2002]Anal Chem. 2002 Oct 15; 74(20):5383-92.
[Anal Chem. 2002]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Nature. 1999 Nov 4; 402(6757):83-6.
[Nature. 1999]PLoS One. 2007 Oct 3; 2(10):e988.
[PLoS One. 2007]Nucleic Acids Res. 1999 Jan 1; 27(1):29-34.
[Nucleic Acids Res. 1999]Science. 2004 Nov 26; 306(5701):1555-8.
[Science. 2004]PLoS One. 2007 Oct 3; 2(10):e988.
[PLoS One. 2007]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Proc Natl Acad Sci U S A. 2003 Aug 5; 100(16):9440-5.
[Proc Natl Acad Sci U S A. 2003]Nucleic Acids Res. 2007 Jan; 35(Database issue):D468-71.
[Nucleic Acids Res. 2007]BMC Bioinformatics. 2002 Nov 13; 3():35.
[BMC Bioinformatics. 2002]Bioinformatics. 2003 Dec 12; 19(18):2502-4.
[Bioinformatics. 2003]J Proteome Res. 2008 Jan; 7(1):47-50.
[J Proteome Res. 2008]J Proteome Res. 2008 Jan; 7(1):29-34.
[J Proteome Res. 2008]J Proteome Res. 2008 Jan; 7(1):45-6.
[J Proteome Res. 2008]Anal Chem. 2002 Oct 15; 74(20):5383-92.
[Anal Chem. 2002]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]PLoS One. 2007 Oct 3; 2(10):e988.
[PLoS One. 2007]Mol Cell Biol. 1999 Nov; 19(11):7357-68.
[Mol Cell Biol. 1999]Nature. 2003 Oct 16; 425(6959):737-41.
[Nature. 2003]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Nat Biotechnol. 2007 Jan; 25(1):117-24.
[Nat Biotechnol. 2007]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Nucleic Acids Res. 2007 Jan; 35(Database issue):D468-71.
[Nucleic Acids Res. 2007]Anal Chem. 2002 Oct 15; 74(20):5383-92.
[Anal Chem. 2002]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Bioinformatics. 2008 Dec 1; 24(23):2796-7.
[Bioinformatics. 2008]Bioinformatics. 2004 Jun 12; 20(9):1466-7.
[Bioinformatics. 2004]J Proteome Res. 2008 Jul; 7(7):3022-7.
[J Proteome Res. 2008]Nat Methods. 2007 Nov; 4(11):923-5.
[Nat Methods. 2007]J Proteome Res. 2002 Jan-Feb; 1(1):21-6.
[J Proteome Res. 2002]Anal Chem. 2003 Sep 1; 75(17):4646-58.
[Anal Chem. 2003]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Genome Biol. 2003; 4(5):P3.
[Genome Biol. 2003]Nucleic Acids Res. 2006; 34(13):3771-8.
[Nucleic Acids Res. 2006]PLoS One. 2007 Oct 3; 2(10):e988.
[PLoS One. 2007]Genome Res. 2003 Nov; 13(11):2498-504.
[Genome Res. 2003]PLoS One. 2007 Oct 3; 2(10):e988.
[PLoS One. 2007]Nature. 2002 Jul 25; 418(6896):387-91.
[Nature. 2002]Genome Res. 2003 Nov; 13(11):2498-504.
[Genome Res. 2003]J Comput Biol. 2003; 10(6):947-60.
[J Comput Biol. 2003]Bioinformatics. 2008 Feb 1; 24(3):404-11.
[Bioinformatics. 2008]Bioinformatics. 2009 Jun 1; 25(11):1397-403.
[Bioinformatics. 2009]Nat Genet. 2008 Feb; 40(2):181-8.
[Nat Genet. 2008]PLoS Comput Biol. 2008 Sep 26; 4(9):e1000165.
[PLoS Comput Biol. 2008]Genome Biol. 2008; 9 Suppl 1():S5.
[Genome Biol. 2008]Genome Biol. 2008; 9 Suppl 1():S2.
[Genome Biol. 2008]