![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2007 Whistler et al; licensee BioMed Central Ltd. A method for improving SELDI-TOF mass spectrometry data quality 1Chronic Viral Diseases Branch, Centers for Disease Control and Prevention, 1600 Clifton Rd, G41, Atlanta, Georgia, 30329, USA Corresponding author.Toni Whistler: taw6/at/cdc.gov; Dominique Rollin: ddr2/at/cdc.gov; Suzanne D Vernon: sdv2/at/cdc.gov Received June 25, 2007; Accepted September 5, 2007. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results. Methods We fractionated and analyzed quality control and investigational serum samples on 3 Protein Chips and used statistical methods to identify poor-quality spectra and to identify and reduce technical variation. Results Using diagnostic plots, we were able to visually depict all spectra and to identify and remove those that were of poor quality. We detected a technical variation associated with when the samples were run (referred to as batch effect) and corrected for this variation using analysis of variance. These corrections increased the number of peaks that were reproducibly detected. Conclusion By removing poor-quality, outlier spectra, we were able to increase peak detection, and by reducing the variance introduced when samples are processed and analyzed in batches, we were able to increase the reproducibility of peak detection. Background Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) allows users to generate protein expression data rapidly from a large number of samples and has been used increasingly to identify diagnostic biomarkers of cancer [1-3], mental illness [4,5], and neurological disorders [6,7]. However, as with any analytic technique, its results must be reproducible if one is to have confidence in them. Several challenges to implementing SELDI-TOF MS in routine clinical diagnostics have already been overcome [8-10]. These include challenges pertaining to biologic samples such as the characterization of sample donors (e.g., by age, sex, fasting status, diurnal rhythm) [11]; sample collection and handling [12,13]; and the effects of freezing, thawing, and storage on specimen stability [14]. Parameters of the SELDI-TOF MS technique that have been assessed range from its sample-processing and robotic-handling systems to its application of the energy-absorbing matrix [15-17]. Finally, many aspects of the technique designed to improve the calibration and quality of the spectra [10,18-21] and of peak detection and quantification [22-24] have made SELDI-TOF MS one of the most promising protein biomarker discovery methods. Even though a variety of software packages can be used to analyze SELDI-TOF MS data, few are effective in averaging replicate spectra or identifying poor-quality spectra [25,26], and none are capable of analyzing and adjusting for the variation introduced when samples are processed and analyzed in batches. We demonstrate that conventional statistical approaches can be used to identify outlying spectra and correct for batch variation, as well as to increase the number of peaks detected by SELDI-TOF MS and improve the reproducibility of peak detection. Results To identify and remove poor-quality spectra, we assessed the degree of linear relationship among all spectra in each data set (a ProteinChip-fraction combination). We then generated a pair-wise similarity matrix using the Pearson correlation coefficient on normalized intensity values of each spectrum. To visually depict the data, we drew a diagnostic plot of 1 minus the mean (1-mean) of Pearson correlation coefficients (x-axis) against the range of correlation coefficients (y-axis) (Figure (Figure1).1
Variation in analytic results is introduced when samples are processed and analyzed in different batches. To examine the extent of this batch effect, we used the nonparametric Kruskal-Wallis test to compare the normalized peak intensities in the spectra within a batch to the same peak (by mass-to-charge (m/z) value) in the spectra from all other batches. Our null hypothesis was that intensity means would be identical for each peak across the different batches. Using a corrected p-value of < 0.005 to calculate the number of peaks that were different in at least one batch, we found a statistically significant batch effect in at least 50% of peaks for each ProteinChip-fraction combination (Figure (Figure22
We used a 2-way analysis of variation (ANOVA) model to explore batch effect variation in the QC sample and a 3-way ANOVA model to explore batch effect variation in the investigational samples. The batch from which spectra were processed was the largest source of variation in both the QC and the investigational samples (Figure (Figure2).2 As described in the Methods section, we used the Batch Remover tool (Partek Genomics Suite) to reduce the effects of batch variation. Hierarchical clustering of spectra in each data set showed that before we used the Batch Removal tool, each batch clustered as a distinct node (Figure (Figure33
We assessed the quality of each spectrum by using the Pearson correlation coefficient to compare the 11,876 intensity measures of each spectrum. Using the cut-off criteria we established of 1-mean > 0.2 for QC spectra and > 0.4 for specimen spectra, we obtained very similar results if we used peak intensities (less than 100 values per spectrum) to generate the correlation matrix. Before the outlier spectra and batch effect variance were removed, the correlation coefficients ranged from 0.75 to 0.95 in each full data set (Table 1A). Removing poor-quality spectra improved the correlations, 0.88 to 0.96 (Table 1B) as did removing the batch effect, 0.95 to 0.99 (Table 1C). Duplicate spectra from individual samples showed a high degree of reproducibility as demonstrated by a median Pearson correlation coefficient of 0.98 for the 207 pairs of spectra in the CMLS-F4 data set. Results for the other data sets were similar (results not shown).
To measure the reproducibility, we calculated the coefficient of variation for the peak intensities of all spectra in each QC sample data set (Table 1). Similar data is available for the investigational samples [see Additional file 1]. The removal of low-quality spectra generally improved the number of peaks common to all spectra in that data set and reduced the average CV for the full spectrum (Table 1B). Batch removal produced a more dramatic effect (Table 1C): the number of peaks remained the same, but the average CV improved as did the number of peaks in each data set with a CV < 30% (Table 1). For example, the CMLS_F5 QC serum data set started with 66 spectra with 54 peaks present; the average CV for specimens in this set was 70% (range: 11–274%, Table 1A). Using the diagnostic plot criteria, we removed two spectra, thereby reducing the CV range to 10–48% and the average CV to 24% (Table 1B). Removing the batch effect technical variance further reduced the CV range to 6–31% and the average CV to 13% (Table 1C). We obtained similar results with the specimen data sets [see Additional file 1]. For all data sets, the CV for m/z values were within the 0.3% reported in the literature [19]. Discussion Even though SELDI-TOF MS is designed as a high-throughput automated assay, large studies involving many biological samples are often divided into batches that are analyzed over several days to weeks. To detect any variability that may occur, analysts process pooled human serum (QC samples) with the study samples. In this study, we used an ANOVA model to assess technical variance in peak intensities that could be introduced by differences among sample batches, variations in the spot position of each sample on the ProteinChip, and variations in the ProteinChip array. We found that batch differences accounted for the largest source of technical variability in each data set, with variations in spot position and ProteinChip array contributing little. Therefore, any analysis that ignores the variation associated with processing samples in different batches leaves a considerable amount of noise in the data. The balanced design of the experiments we conducted allowed us to reliably estimate the batch effect and then to remove that effect using the Partek Batch Remover (based on a mixed-model ANOVA). As only technical factors were included in the ANOVA model, the peak intensity data can then be used in further statistical analyses. Hong et al. [27] identified the correlation matrix as an effective metric for identifying lower quality spectra. However, we found that this approach was less effective when used to establish one cut-off value for several large data sets. In an attempt to automate our decisions on which spectra should be included in our analysis, we drew on our knowledge of microarrays and presented our data in diagnostic plots [28] (Figure (Figure1).1 QC data sets represent the same pooled serum sample run with each batch of investigational sera. This directly evaluates the repeatability of measurements and a more stringent cut off value (1-mean > 0.2) is used with the QC data sets compared to the investigational data sets. The repeatability is expected to be much higher in the QC data sets. Good performance should be associated with low coefficients of variation for the peak intensities, as the data are all derived from the same pooled reference serum. Table 1 illustrates the improvements in data quality and reproducibility resulting from the removal of outlying, poor-quality spectra and the removal of the technical batch effect. The average CVs for all data sets (except H50-F1) were ≤ 20% when all peaks were considered rather than just 3 to 7 major peaks as reported in some studies [29,30]; furthermore, more than 90% of all peaks in each data set, other than H50-F1, had peak intensity CVs < 30%. Conclusion In this study, we used a diagnostic plot to detect and discard low-quality spectra. This method was easy to implement and effective in detecting outlier spectra. Our use of the model-based ANOVA to account for the technical variance introduced by batch processing of spectra further improved the data quality. Methods Samples A reference or QC sample was prepared by pooling serum collected in Vacutainer tubes with no additives from 10 donors. This was processed, aliquoted and frozen in the same manner as study subject samples. Serum samples from 207 subjects (referred to as investigational sera) were collected during a clinical study of Chronic Fatigue Syndrome in Wichita Kansas [31]. Serum fractionation All of the experimental protocols were performed by a single laboratorian. To reduce sample complexity and increase the number of protein peaks detected, we performed anion exchange fractionation using the Expression Difference Mapping™ Kit – Serum Fractionation (Ciphergen Biosystems Inc., Fremont, CA, USA) the robotic Biomek 2000 liquid handling system (Beckman Coulter, Fullerton, CA, USA). We collected six different fractions – pH 9 (F1), pH 7 (F2), pH 5 (F3), pH 4 (F4), pH 3 (F5) and organic (F6) – from investigational serum samples that were fractionated in 11 batches over a period of 7 months. Twenty investigational samples and 3 QC samples were processed in each batch and then frozen at -80°C. For each batch, we analyzed fractions in the same order and kept freezing times (2 to 11 days) and processing conditions constant. Protein expression profiling Aliquots of each fraction were bound in duplicate with a randomized ProteinChip/spot position allocation scheme to 3 different types of ProteinChip arrays: IMAC-Cu (metal binding), H50 (hydrophobic chemistry) and two CM10 (anionic chemistry) ProteinChip arrays. One for a high stringency (HS) wash using 50 mM HEPES, pH7 performed before sample application to allow selective binding of proteins, and one for low stringency (LS) wash, 0.1 M sodium acetate pH4, performed before sample application. From previous studies [32], we know that F2 is not particularly informative, and F5 has many overlapping peaks present in F4 and/or F6. Therefore, we did not run these fractions in this study. For each ProteinChip array, the relevant QC fraction was present on one spot position. The details of ProteinChip processing have been described previously [32]. We used saturated sinapinic acid in 50% acetonitrile/0.5% trifluoroacetic acid as matrix and applied it using the robot. We read the ProteinChips in a PBSIIc mass spectrometer (Ciphergen Biosystems) using automated data collection protocols with previously optimized conditions [32]. We used data from the low mass range protocols (3000 to 30,000 Daltons) in our analysis and calibrated for mass accuracy using the "all-in-one" protein standard II on NP20 ProteinChips (Ciphergen Biosystems). The "all-in-one" peptide standard should be used if a greater accuracy is required at m/z < 8,000 applied with the sinapinic acid matrix to keep data comparable. Instrument performance and evaluation are critical to spectrometer function and complete details of calibration, alignment and accuracy assessments performed routinely are fully outlined in a previous publication [32]. Using data from 15 fractions, we generated 414 investigational spectra and 66 QC spectra per ProteinChip-fraction, each of which we considered a data set. We had to make an unanticipated instrument adjustment, which involved a preventative maintenance service, between the sixth and seventh batches because of laboratory relocation. Processing of spectral data sets We used the QC serum sample to develop and evaluate data processing procedures, which we then used in processing data for the 207 investigational samples. We exported raw spectrum data files for each ProteinChip-fraction and processed them using the following calibration equation:
Where m/z is the mass-to-charge ratio, U is the voltage (20,000 for this data set), and t is the time-of-flight. For our mass calibration, we used the values, a = 0.336302, b = 0, and t0 = 0.09, which we obtained from the calibration equation generated from the protein standard. The final spectrum, from m/z 3,000 to 30,000, generated 11,876 data points. We saved the m/z and intensity values as comma-separated values files. We used SpecAlign [33] to pre-process each spectral data set of QC spectra (66 per ProteinChip-fraction) and specimen spectra (414 spectra per ProteinChip-fraction). We then followed the steps below to process the data: 1. Smooth the data using the Savitzky-Golay filter with a setting of 8. 2. Denoise the spectra using a wavelet transform with a threshold setting of 0.5. 3. View baseline subtraction using a window setting of 5. 4. Subtract baseline. 5. Rescale intensity values to positive. 6. Normalize intensity values using Total Ion Current. 7. Generate an average spectrum. 8. Align spectra using the combined Fast Fourier Transform (FFT)/Peak matching method on the full m/z range, with a scale of 1, a maximum shift of 20, looking ahead by 1, and using the average as a reference. 9. Export the processed data as a single file (to be used for correlation analysis). 10. Pick peaks with a baseline cut-off of 0.5, a window of 10, and a height ratio of 1.5 11. Export peak intensity values for all spectra in a single file. Statistical Analysis We performed all statistical analysis using Partek Genomics Suite software, version 6.2 (Partek Inc., St. Charles, Missouri). To detect outlier spectra, we used full spectrum processed data consisting of 11,876 intensity values covering the m/z range from 3,000 to 30,000 (the data file exported in step 9 above). We also generated a similarity matrix using the Pearson correlation coefficient on all combinations of spectra within the data set. We then calculated a mean correlation coefficient for each spectrum and visually depicted the coefficients on diagnostic plots [28]. Our cut-off criteria were 1-mean > 0.2 for the removal of QC spectra and > 0.4 for the removal of spectra from investigational samples. After the quality assessment of the spectra and prior to the batch removal process, we used a 2-way ANOVA model to determine the variation in the data sets. Variation in the QC data sets attributable to the batch process (Batch, Figure Figure2)2 We averaged all spectra with replicates and performed all statistical analyses using nonparametric tests: the Mann-Whitney test to compare 2 groups and the Kruskal-Wallis test to compare more than 2 groups. A bootstrap method was used to perform multiple test correction in the statistical tests. The bootstrap is used to determine the probability of obtaining a particular p-value by chance. Group labels are randomly re-assigned (with replacement) for a total of 2,000 iterations of the bootstrap. The bootstrap method does not assume that tests are independent. Hierarchical clustering was performed on the peak intensities of the spectra using a Spearman rank dissimilarity metric with average linkage. Competing interests The author(s) declare that they have no competing interests. Authors' contributions TW designed the experiments, developed the analytical approach, implemented the analysis and wrote the manuscript. DDR performed the laboratory experiments. SDV had the original idea for the study and assisted in the writing of the manuscript. All authors read and approved the manuscript. Additional file 1 Summary data showing stages in the quality assessment of specimen spectra. Pearson correlation coefficients were calculated for the entire spectrum prior to peak detection, the coefficient for the entire dataset is reported (Grand statistic). The coefficient of variation (CV) was calculated for peak intensities present in the entire spectrum. Click here for file(9.9K, pdf) Acknowledgements We thank Dr Elizabeth Unger for her critical reading of the manuscript and her invaluable comments. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Cancer Invest. 2006 Dec; 24(8):747-53.
[Cancer Invest. 2006]Genome Inform. 2005; 16(2):195-204.
[Genome Inform. 2005]Diagn Pathol. 2006 Jul 17; 1():11.
[Diagn Pathol. 2006]Neurobiol Dis. 2006 Jul; 23(1):61-76.
[Neurobiol Dis. 2006]Biol Psychiatry. 2004 Mar 1; 55(5):524-30.
[Biol Psychiatry. 2004]Clin Chem Lab Med. 2005; 43(12):1314-20.
[Clin Chem Lab Med. 2005]Clin Biochem. 2004 Jul; 37(7):636-41.
[Clin Biochem. 2004]Clin Chem Lab Med. 2006; 44(10):1243-52.
[Clin Chem Lab Med. 2006]Clin Chem. 2005 Sep; 51(9):1637-49.
[Clin Chem. 2005]Proteomics. 2005 Aug; 5(13):3262-77.
[Proteomics. 2005]Proteomics. 2005 Nov; 5(16):4107-17.
[Proteomics. 2005]Proteomics. 2004 Aug; 4(8):2320-32.
[Proteomics. 2004]Clin Chim Acta. 2006 Apr; 366(1-2):249-56.
[Clin Chim Acta. 2006]BMC Bioinformatics. 2005 Jul 15; 6 Suppl 2():S5.
[BMC Bioinformatics. 2005]Biotechniques. 2005 Mar; 38(3):463-71.
[Biotechniques. 2005]Clin Chem. 2005 Jan; 51(1):102-12.
[Clin Chem. 2005]Clin Chim Acta. 2006 Apr; 366(1-2):249-56.
[Clin Chim Acta. 2006]Cancer Sci. 2007 Jan; 98(1):37-43.
[Cancer Sci. 2007]Proteomics. 2006 Jan; 6(2):709-20.
[Proteomics. 2006]Pharmacogenomics. 2006 Mar; 7(2):211-8.
[Pharmacogenomics. 2006]Proteome Sci. 2007 Jul 2; 5():9.
[Proteome Sci. 2007]Proteome Sci. 2007 Jul 2; 5():9.
[Proteome Sci. 2007]Proteome Sci. 2007 Jul 2; 5():9.
[Proteome Sci. 2007]Bioinformatics. 2005 May 1; 21(9):2088-90.
[Bioinformatics. 2005]Biotechniques. 2005 Mar; 38(3):463-71.
[Biotechniques. 2005]