Format

Send to

Choose Destination
Data Brief. 2015 Dec 17;6:286-94. doi: 10.1016/j.dib.2015.11.063. eCollection 2016 Mar.

Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods.

Author information

1
ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France.
2
ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France.
3
ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France.

Abstract

This data article describes a controlled, spiked proteomic dataset for which the "ground truth" of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center