Figure 1 Steps in Sample Analysis by Proteomics |
![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Copyright : © 2007 Colinge and Bennett. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Introduction to Computational Proteomics Fran Lewitter, Editor Whitehead Institute, United States of America * To whom correspondence should be addressed. E-mail: jcolinge/at/cemm.oeaw.ac.at This article has been cited by other articles in PMC.Introduction Proteomics is defined as the protein complement of the genome and involves the complete analysis of all the proteins in a given sample [1,2]. Several technologies are involved, and numerous questions concerning the proteins are addressed. What proteins are contained in a biological sample? At what concentration do the proteins exist? How do protein expression levels alter in different samples? What are the posttranslational modifications (PTMs)? Where in the cell [3] or an organism [4] are the proteins localised? How do the proteins interact with other proteins or molecules [5,6]? The following discussion concentrates on computational aspects of protein identification. Characterization (identification of protein modifications), quantitation, and sample comparisons are also discussed briefly. A typical proteomic experiment involves the analysis of complex samples, i.e., containing many proteins at varying concentrations [7]. Most of the currently available technology for identifying proteins from biological samples simply cannot contend with the complexity, and the majority of the low-abundance proteins are not observed. There are, however, a number of methods to separate the proteins contained in the original sample to obtain a simpler sample set that is amenable to in-depth analyses. Typical technologies are electrophoretic gels [8] and liquid chromatography [9] (LC) (see Figure 1
A dominant and well-practiced technique in proteomics is referred to as the “bottom-up” approach. Proteins are digested into peptides (smaller components of the protein) by a proteolytic enzyme, e.g., trypsin. Analysis of the peptides is achieved by mass spectrometry (MS), and, from the data generated, the peptides (and subsequently the proteins) can be identified. The resultant mixture of peptides obtained from the digestion of several proteins is often highly complex, and a degree of separation can be achieved by peptide LC. Possible combinations of separation techniques are illustrated in Figure 1 Mass spectrometers comprise three main components: an ion-source, a fragmentation cell, and a mass analyzer. Each component is essentially independent from the others, and as such it is possible to combine the different technological aspects to produce different types of mass spectrometers. To measure its molecular mass, a molecule must be ionised. This occurs in the ion source of the mass spectrometer. The source can be based either on electrospray ionization [10] (ESI), which is therefore appropriate for liquid samples; or on matrix assisted laser desorption ionization [11] (MALDI), which is appropriate for samples that have been mixed with a matrix and crystallized on a metallic plate. The most common types of mass analyzers used in proteomic laboratories are (i) ion trap (IT), where the radio frequency of the trap is varied and the ejected ions are detected; and (ii) time-of-flight (TOF) analyzers, where the time required for an ion to “fly” through an electric field–free region of the instrument is recorded and correlated to the mass of the ion. Most current instruments include a fragmentation cell that uses an inert gas to break the peptides by collision-induced dissociation (CID). A fragmentation cell, however, is not always present (see next section), or fragmentation can occur “spontaneously” (in-source and post-source decay). All mass spectrometers do not measure mass directly, but rather the mass-to-charge ratio. Hence the measurements obtained are dependent on the charge state(s) of the molecule. Peptide Mass Fingerprinting Separation of proteins by 2-D gel electrophoresis produces numerous spots that essentially contain one dominant protein. It is possible to enzymatically digest the protein in situ and measure peptide masses by MS. Historically, mass measurement of the digested proteins was initially performed with a matrix assisted laser desorption ionisation time-of-flight (MALDI-TOF) instrument. The ions generated by MALDI-TOF-MS are predominantly singly charged; therefore, the mass of the peptide can be easily calculated. Once the mass spectrum that is obtained has been signal-processed, a list of peptide experimental masses is generated (see the next section, Peak Detection). This mass list is also referred to as the experimental spectrum. The data generated can be searched against a protein database by comparing each protein sequence with the experimental peptide mass list. The comparison requires computation of a theoretical mass spectrum by digesting the sequence in silico and calculating theoretical peptide masses. A score is computed to measure the correlation between experimental and theoretical data. The highest-scoring sequence is assumed to be correct [12–14] (see Figure 2
The procedure described in the previous paragraph, named peptide mass fingerprinting (PMF), relies on a site-specific enzyme that cleaves at precise locations in the proteins. For example, trypsin cleaves after both lysine and arginine residues, provided the next amino acid in the sequence is not a proline residue. Conceptually, PMF is straightforward and clearly introduces the principle of MS data identification by database searching. Nevertheless, when searching large databases, or when the number of available peptides is limited, the risk of false positive identification becomes increasingly higher. The presence of modified (PTMs) or incompletely cleaved peptides further reduces PMF data specificity. Moreover, the experimental design may not be amenable to 2-D gel analysis, and as such the assumption that one protein is analyzed at a time is no longer valid. Therefore, an MS technology that allows more than single protein analysis and provides additional information on each peptide would be a marked improvement over PMF. Peak Detection The program extracting a list of masses from an experimental spectrum (usually provided by the MS instrument manufacturer) is essential in the identification of MS data. The performance of the algorithm and the quality of the data produced play an important role in both database searching and de novo sequencing. There are several methods to extract masses that range from straightforward local maximum detection to sophisticated wavelet analysis. In Figure 3
Tandem Mass Spectrometry From the point of view of data processing, tandem mass spectrometry (MS/MS) can be introduced as an additional level to mass fingerprinting. There are ways that peptides can be broken into smaller molecules (fragments). As the fragmentation process is governed by certain rules, the set of fragment masses constitutes specific data. By taking advantage of such peptide-specific mass sets, it is possible to identify the peptides. The peptide fragmentation process can be induced in many ways [15,16], e.g., by collision with an inert gas. A detailed explanation of the peptide fragmentation process is not within the scope of this paper. Nevertheless, briefly, two molecules (prefix and suffix) are created when a peptide is fragmented. As the fragmentation process can occur on multiple copies of the peptide, many (albeit not all) prefix and suffix ions are observed. Fragmentation, however, is not possible throughout the entirety of the peptide. Only well-defined ion types (a,b,c,x,y,z) are generally observed (see Figure 4
The ability to identify individual peptides enables the analysis of complex peptide mixtures, as the peptides can be readily separated by LC. As was the requirement for PMF, with this approach it is no longer necessary that all peptides from a protein be contained within a single spectrum. A standard procedure is to analyze a liquid sample with an LC-ESI-MS/MS instrument in data-dependent mode. That is, the peptides are separated by an LC column, and the liquid phase containing the peptides is continuously introduced and ionized in the source of the mass spectrometer. The instrument in effect then “scans” the fluid for peptides by alternating between MS and MS/MS acquisitions. Peptide masses are acquired in MS mode, and a predefined number of the most intense peaks are selected for fragmentation in MS/MS mode. The instrument then returns to MS mode, and the alternating cycle continues. See Figure 1 The flexibility obtained by the analytical procedure described above is exploited in shotgun proteomics [17]. Here, protein separation is not performed and the sample is digested in its entirety. The complete digest is then analysed by multidimensional peptide LC. Peptides from one single protein are dispersed over many LC fractions. MS/MS Scoring Functions The comparison of theoretical and experimental MS/MS spectra is performed by a scoring function, and the score (ideally complemented by a p-value) is used to recognize the correct peptide from a database. Reliable peptide identifications can then be considered for protein identification. The most intuitive notion of score is provided by shared peak count (SPC), i.e., the number of masses shared by experimental and theoretical spectra within a given mass tolerance δ. In practice, SPC does not perform well. All matched masses are weighted identically, although some are more reliable (i.e., informative) than other masses. For example, peptide fragmentation creates several fragment ion types (see Figure 4 SEQUEST [18] (Thermo Scientific, http://www.thermo.com) scoring function is heuristic in nature. In fact, SEQUEST uses two scoring functions. The initial one is used to rapidly determine the best 200 peptide candidates for each MS spectrum, and a second function rescores the 200 hits. The computation of the initial score Sp is performed by the formula (simplified, no immonium ions)
The final SEQUEST score computation is achieved by converting the theoretical fragment masses into an artificial spectrum and by computing a cross-correlation (a * e)(t) between the artificial spectrum a and the experimental mass list e, with delay t:
In addition to the Xcorr score, SEQUEST exports several other factors, e.g., Sp and the difference between the best and second-best scores. Several authors have utilised this information to develop machine learning methods to detect patterns that are characteristic of correct and false matches [20–22]. These meta-scores are usually an improvement over Xcorr. Mascot [23] (Matrix Science, http://www.matrixscience.com) scoring has never been published nor patented. It involves the selection of two fragment ion types, where most fragment matches are observed, and a probability-based score is computed on the basis of these two fragment types only. Experimental mass list pre-processing is also part of the Mascot algorithm. Mascot score is the negative logarithm of a p-value. The latter pre-processing and the selection of two fragment types are intended to obtain a more robust scoring system. The last approach presented here is based on likelihood ratios [24]. It is assumed that the fragment matches constitute independent Bernoulli events, and the probability of these events depends on the fragment type θ S only, where S = {a,b,y,…,} is the set of possible fragment types. This probability is denoted as pθ. If s = a1…an is a peptide sequence and ai the constituent amino acids, then the probability of a correct match between s and an experimental spectrum is estimated by taking the product of pθ for every matched fragment and of 1 − pθ for every unmatched fragment. The null-model is identical with random fragment match probabilities rθ. We find:
S(s, i) S is the set of fragment types ending at amino acid ai, M(s, i) S(s, i) is the set of fragment types matching the experimental fragment mass. S(s, i) may be a proper subset of S because certain fragments are not always possible depending on their amino acid composition (neutral loss). Probabilities, pθ, θ S are learnt from a set of correct matches. Probabilities of random fragment matches rθ are learnt from random peptides. Preferably, only the fragment types with probabilities pθ and rθ sufficiently different are actually used in the scoring function L.This approach can be extended by introducing more complex models that capture additional properties of correct and random peptide matches [25–27]. A hidden Markov model (HMM) was used to model sequences of consecutive fragment matches with mismatch tolerance (Figure 7
Modified Peptides It is possible that some amino acids are modified (PTMs, chemical modifications), resulting in mass shifts. Such changes in mass need to be taken into account to correctly compute theoretical MS/MS spectra. The simplest cases are fixed modifications, e.g., carboxyamidomethyl cysteine (+57.02146 Da). All cysteine residues in a protein are reduced (i.e., the disulfide bonds are broken) and the nominal amino acid mass is replaced by a shifted mass in all computations. There are also variable modifications that are not present systematically. In this case, it is necessary to compute several theoretical spectra to cover all eventualities (see Figure 4 In practice it is not feasible to allow many variable modifications when searching mass spectrometric–generated data against a database. Search space and time is markedly increased as is the false positive rate. Protein Identification Obtaining reliable peptide identifications is an essential step toward reliable protein identifications; however, some additional aspects need to be taken into consideration. Most of the problems associated with protein identification are caused by peptides shared by several proteins; see Figure 8
To assign a score to a protein identification is an open question, as there are many options. A standard approach is simply to sum the highest score for each distinct peptide identified. Alternatively, it is possible to consider the multiplicity of spectra matched for each peptide to support additional evidence [30]. Not to assign a score at all is also an option, and a list of trusted proteins is the only output in that case. A classical criterion to accept a protein identification is to detect two distinct peptides above a reasonable peptide score [31]. A very small number of false positive identifications are generated by this approach. The choice of protein database plays an important role in MS data identification. Classically, either comprehensive or curated databases have been utilised. As comprehensive databases, NCBInr (http://www.ncbi.nlm.nih.gov) and Ensembl [32] are those most frequently used, whereas commonly used curated databases are UniProtKB/Swiss-Prot [33] and International Protein Index (IPI) [34]. The latter integrates several curated databases and aims to include all alternative splice forms and active fragments. The IPI database offers a good combination of quality and exhaustiveness, which is crucial for proteomic data analysis. Peptide De Novo Sequencing In the preceding sections, MS data searched against a protein sequence database was described. Situations also arise where such a database is not available or is inappropriate. A classical example is the analysis of a sample from an organism whose genome is not completely sequenced [35]. A more difficult example is the case where peptides are modified in an unexpected manner and hence are not found via the variable modifications specified during the database search. As consideration of all possible modifications is not feasible, a method that would predict part of the unmodified peptide sequence would enable the possibility of searching candidate peptides by homology before confirmation by MS/MS [36,37]. To predict the peptide sequence directly from an MS/MS spectrum is known as de novo peptide sequencing. To do this in reality is not straightforward, and prediction of short reliable sections of the sequence (so-called sequence tags) is often more realistic. The sequence tags can be used either as incomplete but reliable sequences or for searching a database by allowing mismatches. Sequence tags from several peptides from the same protein can result in specific identification of the protein. In the early days of de novo peptide sequencing, algorithms were developed that attempted to reconstruct peptide sequences by essentially considering all amino acid combinations. Such approaches are obviously not applicable to generic problems. Currently, researchers in the field investigate graph theoretic algorithms, Markov chain Monte Carlo heuristic optimization, or HMMs. Usually, a preliminary filtering of the experimental mass list is performed to remove noisy peaks. A well-established method involves the computation of a spectrum graph G. Based on the masses in the experimental mass list, one vertex per mass is created; two vertices are linked provided the mass difference equals one amino acid mass within a given tolerance [38], and the edge is labelled with the corresponding amino acid (see Figures 9
Given a spectrum, the problem of predicting the most plausible peptide sequence can be solved by finding a longest path in the spectrum graph [24,40,42]. The length of each edge is given by a scoring function that measures the fit between the additional theoretical masses yielded by the edge and the MS/MS data. Other algorithms use the spectrum graph to produce candidate peptide sequences that are progressively extended. This is typically achieved by iteratively growing and trimming a population of sequences [41]. It is also possible to combine C- and N-terminal partial sequences as obtained by a spectrum graph without computing one longest path [43]. A very different point of view is to define a scoring function and to optimize it over the space of all possible peptide sequences. The optimization is usually performed by a genetic algorithm [44,45]. A recent and innovative paper models spectral peaks as if the peaks were generated by a sequential process and hence applies a HMM [46]. Noisy peak filtering can be achieved by ad hoc methods that define noise according to a proportion of the total peaks or the total signal [39,41]. Alternatively, prediction of the type of each peak can be attempted, e.g., a,b,y fragment ion. Peaks that result in a reliable prediction can be included for further computation [47]. Other Problems To directly match proteomic data with genome sequences has attracted significant attention because there is the potential to complement and correct genome annotations by MS data. This potential is indeed confirmed by new findings reported by several authors [48–50]. The problem of genome searching can be approached in different ways. The most challenging case is to search MS data against a eukaryotic genome, as peptides can be coded across exon/intron boundaries. One method is to use a gene prediction algorithm to obtain protein sequences that are searched as per a standard protein database. An alternative method is to use de novo predictions and to search the predicted sequences by homology. Finally, it is possible to combine gene structure predictions and MS data searches to reveal and validate splice sites [51]. Sample comparison is essential in proteomics, and several methods have been developed to quantitatively evaluate datasets. With 2-D gels, spot volumes can provide semiquantitative information [8,52]. It is also possible to label peptides with specific reagents that alter the mass by a known value [53]. Two or more modified samples are pooled prior to LC-MS analysis. The mass shifts in the spectra indicate the origin of the peptide, and relative peak intensities provide quantitative information. Label-free methods have been introduced that require neither 2-D gels nor peptide modification. These methods either sum all the peak intensities of a given peptide during one LC-MS experiment [54] (extracted ion chromatogram) or count the number of spectra matching the peptides of a protein [55,56]. Alternatively, it is possible to use protein chips to measure protein concentration [57]. In each case, a protein can be assigned an expression profile across samples, and techniques similar to micro-array data analysis can be applied. Despite the great importance of PTMs for biological function, studies on a large scale are difficult [58,59]. In the context of computational analyses, comprehensive approaches toward general PTMs are difficult. Although many laboratories have undertaken detailed investigations of a specific modification in the quest to determine answers to a particular biological question, e.g., phosphorylation events in signalling pathways, most of these studies have involved manual or semi-automated annotation of the modification site(s), and data processing is more a matter of storing and visualizing. Bioinformatics has later contributed in a systems biology approach by utilising the information gained from such studies to assign function to the proteins and to reveal biological interactions. There are also a number of interesting and important computational proteomic questions, which are considered out of the scope of this introduction, and are therefore not covered. These include protein structure elucidation via MS; glycan and lipid analysis; direct profiling of samples by MS, i.e., metabolomics. Here masses, not necessarily peptides, are detected in each sample and are comparatively analysed. Resources InSilicoSpectro [60] is an open-source Perl project that implements many MS-related computations and contains numerous simple examples illustrating some of the presented concepts. Two elementary implementations of PMF and MS/MS database search in C++ are provided with example data (see Text S3 and Text S5). Phenyx is freely available at http://www.phenyx-ms.com and Mascot at http://www.matrixscience.com. Two open-source database search engines have been developed, OMSSA [61] and X!Tandem [62]. Several public MS/MS data repositories are accessible over the Internet, including Peptide Atlas (http://www.peptideatlas.org), Open Proteomics Database (http://bioinformatics.icmb.utexas.edu/OPD), and Pride (http://www.ebi.ac.uk/pride). Conclusion Proteomics plays an ever-increasing and pivotal role in biological research, and there are a range of technologies available that can generate large quantities of data. The analysis of such data opens new and challenging areas of interest for bioinformatics. In addition to the utilisation of classical methods and resources, new types of data require modelling and processing. Perhaps the best example is the mass spectrum itself, which contains continuous and discrete information simultaneously. Such issues are reflected in the difficulty of designing high-performance scoring functions and de novo sequencing algorithms. To provide an introduction to this fascinating field of research, we have presented general concepts of proteomics. The central problem of MS data identification by database searching has been explained at an introductory level, and should allow any interested reader to grasp the fundamental concepts of this area of research. Figure S1: Accepting Pairs of Amino Acid Masses in De Novo Sequencing A spectrum graph generated with the same spectrum as in the paper (peptide MTDSK) but by allowing pairs of amino acid mass differences. Observe the massive increase in complexity. (13 KB PDF) Click here for additional data file.(14K, pdf) Figure S2: Noise in Mass Spectra Impacts De Novo Sequencing A graph obtained based on a relatively small real spectrum for the peptide LRDQLGTAK by only accepting single amino acid mass differences (all the y fragments are present). This example shows why it is important to filter mass lists for noise prior to de novo prediction, since the spectrum becomes very complex otherwise. (45 KB PDF) Click here for additional data file.(45K, pdf) Text S1: computeMOWSEMatrix.cpp A C++ program to implement the computation of the MOWSE matrix, which is used by the MOWSE PMF scoring function. (6 KB TXT) Click here for additional data file.(6.2K, txt) Text S2: MOWSE Matrix The MOWSE matrix computed by computeMOWSEMatrix.cpp (see Text S1). (10 KB TXT) Click here for additional data file.(10K, txt) Text S3: pmfDBSearch.cpp A C++ program implementing a minimal PMF database search algorithm. (18 KB TXT) Click here for additional data file.(19K, txt) Text S4: A PMF Mass List An example mass list for PMF searching (in pkl format, SWISS-PROT ID: ENO_YEAST). (1 KB TXT) Click here for additional data file.(873 bytes, txt) Text S5: msmsDBSearch.cpp A C++ program implementing a minimal MS/MS database search algorithm. (23 KB TXT) Click here for additional data file.(23K, txt) Text S6: An MS/MS Mass List An example mass list for MS/MS searching (in mgf format, SWISS-PROT ID: ENO_YEAST). (22 KB TXT) Click here for additional data file.(23K, txt) Acknowledgments The authors thank Jörg Hau for the BSA MALDI-TOF spectrum used in Figure 3 Abbreviations
Footnotes Jacques Colinge and Keiryn L. Bennett are with Ce-M-M-, the Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria. Author contributions. JC taught this material several times in front of various audiences and KB brought additional expert MS knowledge. KB and JC wrote the paper. Funding. JC was partially supported by an Austrian Proteomics Platform II (APP-II) Network grant of the GenAU Program of the Austrian Ministry of Research and Education (BM:BWK). Competing interests. The authors have declared that no competing interests exist. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nature. 2003 Mar 13; 422(6928):198-207.
[Nature. 2003]Nature. 2000 Jun 15; 405(6788):837-46.
[Nature. 2000]Anal Chem. 2006 Sep 15; 78(18):6448-56.
[Anal Chem. 2006]Nature. 2006 Mar 30; 440(7084):631-6.
[Nature. 2006]Nature. 2000 Feb 10; 403(6770):623-7.
[Nature. 2000]Proteomics. 2004 Jul; 4(7):2125-50.
[Proteomics. 2004]Proteomics. 2004 Dec; 4(12):3665-85.
[Proteomics. 2004]Trends Biochem Sci. 1995 Jun; 20(6):219-24.
[Trends Biochem Sci. 1995]Curr Biol. 1993 Jun 1; 3(6):327-32.
[Curr Biol. 1993]Anal Chem. 2000 Jun 1; 72(11):2482-9.
[Anal Chem. 2000]Biomed Mass Spectrom. 1984 Nov; 11(11):601.
[Biomed Mass Spectrom. 1984]Nat Biotechnol. 2001 Mar; 19(3):242-7.
[Nat Biotechnol. 2001]J Am Soc Mass Spectrom. 2002 Apr; 13(4):378-86.
[J Am Soc Mass Spectrom. 2002]Nat Biotechnol. 2004 Feb; 22(2):214-9.
[Nat Biotechnol. 2004]Electrophoresis. 1999 Dec; 20(18):3551-67.
[Electrophoresis. 1999]Anal Chem. 2003 Feb 1; 75(3):435-44.
[Anal Chem. 2003]Proteomics. 2004 Jul; 4(7):1977-84.
[Proteomics. 2004]J Proteome Res. 2005 Nov-Dec; 4(6):2273-82.
[J Proteome Res. 2005]Mol Cell Proteomics. 2005 Oct; 4(10):1419-40.
[Mol Cell Proteomics. 2005]Anal Chem. 2004 Mar 15; 76(6):1664-71.
[Anal Chem. 2004]J Proteome Res. 2004 Sep-Oct; 3(5):1082-5.
[J Proteome Res. 2004]Genome Res. 2004 May; 14(5):925-8.
[Genome Res. 2004]Proteomics. 2004 Jul; 4(7):1985-8.
[Proteomics. 2004]Anal Chem. 2001 May 1; 73(9):1917-26.
[Anal Chem. 2001]Anal Chem. 2005 Jul 15; 77(14):4626-39.
[Anal Chem. 2005]Anal Chem. 2004 Apr 15; 76(8):2220-30.
[Anal Chem. 2004]Anal Chem. 2005 Feb 15; 77(4):964-73.
[Anal Chem. 2005]Anal Chem. 2001 Jun 1; 73(11):2594-604.
[Anal Chem. 2001]J Comput Biol. 2001; 8(3):325-37.
[J Comput Biol. 2001]J Comput Biol. 2003; 10(1):1-12.
[J Comput Biol. 2003]Anal Chem. 2001 Jun 1; 73(11):2594-604.
[Anal Chem. 2001]Rapid Commun Mass Spectrom. 2003; 17(20):2337-42.
[Rapid Commun Mass Spectrom. 2003]Bioinformatics. 2004 Sep 22; 20(14):2296-304.
[Bioinformatics. 2004]Anal Chem. 2005 Nov 15; 77(22):7265-73.
[Anal Chem. 2005]Anal Chem. 2005 Feb 15; 77(4):964-73.
[Anal Chem. 2005]Anal Chem. 2001 Jun 1; 73(11):2594-604.
[Anal Chem. 2001]J Comput Biol. 2006 Mar; 13(2):364-78.
[J Comput Biol. 2006]Proteomics. 2004 Jan; 4(1):59-77.
[Proteomics. 2004]Genome Biol. 2005; 6(1):R9.
[Genome Biol. 2005]J Proteome Res. 2005 Jan-Feb; 4(1):167-74.
[J Proteome Res. 2005]Proteomics. 2004 Dec; 4(12):3665-85.
[Proteomics. 2004]Anal Bioanal Chem. 2005 Jun; 382(3):669-78.
[Anal Bioanal Chem. 2005]J Proteome Res. 2004 May-Jun; 3(3):350-63.
[J Proteome Res. 2004]Anal Chem. 2003 Sep 15; 75(18):4818-26.
[Anal Chem. 2003]Anal Chem. 2004 Jul 15; 76(14):4193-201.
[Anal Chem. 2004]Anal Chem. 2005 Jan 15; 77(2):596-606.
[Anal Chem. 2005]Nature. 2006 Jan 12; 439(7073):168-74.
[Nature. 2006]Nat Biotechnol. 2003 Mar; 21(3):255-61.
[Nat Biotechnol. 2003]J Proteome Res. 2006 Mar; 5(3):619-24.
[J Proteome Res. 2006]J Proteome Res. 2004 Sep-Oct; 3(5):958-64.
[J Proteome Res. 2004]J Proteome Res. 2004 Nov-Dec; 3(6):1234-42.
[J Proteome Res. 2004]