Your browser version may not work well with NCBI's Web applications. More information here...

Submission Guidelines

We aim to make data deposit procedures as straightforward as possible and will provide as much assistance as you require to get your data submitted to Peptidome. If you have problems or questions about the submission procedures described on this page, please e-mail us at peptidome@ncbi.nlm.nih.gov and one of our curators will quickly get back to you. Once you have assembled all required files, please transfer them to us using the Peptidome Submission Form.

A standard submission has three required components, a Metadata file that describes the experiment and associated files, the Data files (both raw data files and peptide identification output files), and finally the table of Results. A complete description of each component follows:

Metadata

Metadata refers to descriptive information about the overall experiment, the biological samples under examination, protocols used to generate the data, and references to associated data files.

Metadata is supplied to us by completing the 'Peptidome Metadata Template' spreadsheet in the NCBI_Peptidome_Submission_Template Excel file. Guidelines for the content of each field are provided both within the spreadsheet and in the following tables:

Data Files

Two types of data files should be supplied with each submission:

Raw data

The raw data files that contain the MS1 and MS2 information from the instrument. The preferred raw data format is one of the standard XML formats (mzData, mzXML, or mzML) that contain both the MS1 and MS2 data from a single fraction. Less desirably, text formats (.mgf, .pkl, .sqt, .dta) may be accepted. Proprietary binary data from the manufacturers (e.g., .raw or .wiff) are not accepted. There should be one or more raw files per biological sample.

Peptide identification output files

The Peptide identification output files from any search engine program that was used to match the MS2 scans to the peptides. We currently support Mascot DAT files, OMSSA ASN.1 or XML formatted files, or any search engine output that has been converted to PepXML. If no search engine was used, or if we don't yet support the engine you used, then your Results table must include references to spectrum files and the additional information that is usually extracted from the search engine output files, e.g. charge state, identification scores, etc. See below for an example.

Results

The Results tables describe your view of the final, processed results as discussed in any associated manuscript. Results tables list the proteins discovered in each Sample in the Study. For each protein, the peptides should be listed, and for each peptide, the matching spectrum files should be listed. Modifications can also be specified. If the matching spectrum file list is omitted, then every matching spectrum in the Peptide identification output files is assumed to be correct. Similarly, if the Results table contains only proteins, then all associated peptides and spectra will be gleaned from the Peptide identification output files. Note that this might be too permissive, and matches between peptides and spectra that do not pass standard criteria will be unintentionally accepted.

A spectrum file name is represented as a colon separated pair of the spectrum_file_name:spectrum number. The spectrum file extension may be omitted from the file name. Lists of matching spectrum file names should be comma separated. The expected format is illustrated in the 'Results example' spreadsheet in the NCBI_Peptidome_Submission_Template, and in the following table:

ProteinPeptideSpectrum files
CATA_MOUSEFSTVAGESGSADTVRDPR07FEB15_ABRF_FT_100a:2171, 07FEB15_ABRF_FT_100a:2177, 07FEB15_ABRF_FT_100a:2183
GPLLVQDVVFTDEMAHFDR07FEB15_ABRF_FT_100a:3653, 07FEB15_ABRF_FT_100a:3660
GPLLVQDVVFTDEMAHFDRER07FEB15_ABRF_FT_100a:3231, 07FEB15_ABRF_FT_100a:3495, 07FEB15_ABRF_FT_100a:3499
LCENIAGHLKDAQLFIQK07FEB15_ABRF_FT_100a:2967, 07FEB15_ABRF_FT_100a:2968
LFAYPDTHR07FEB15_ABRF_FT_50a:2395
LVNADGEAVYCK07FEB15_ABRF_FT_100a:2151, 07FEB15_ABRF_FT_100a:2157, 07FEB15_ABRF_FT_100a:2161
VWPHKDYPLIPVGK07FEB15_ABRF_FT_100a:2768, 07FEB15_ABRF_FT_100a:2774, 07FEB15_ABRF_FT_50a:2808
CATD_HUMANAIGAVPLIQGEYMIPCEK07FEB15_ABRF_FT_100a:3305, 07FEB15_ABRF_FT_100a:3310
FDGILGMAYPR07FEB15_ABRF_FT_100a:3258, 07FEB15_ABRF_FT_10a:3109, 07FEB15_ABRF_FT_10a:3111
ISVNNVLPVFDNLMQQK07FEB15_ABRF_FT_100a:3771, 07FEB15_ABRF_FT_100a:3775
LVDQNIFSFYLSR07FEB15_ABRF_FT_100a:3705, 07FEB15_ABRF_FT_100a:3711, 07FEB15_ABRF_FT_100a:3716
QVFGEATKQPGITFIAAK07FEB15_ABRF_FT_100a:2882
VSTLPAITLK07FEB15_ABRF_FT_100a:2857
HBA3_PANTRVGAHAGZYGAEALER07FEB15_ABRF_FT_100a:2245, 07FEB15_ABRF_FT_25a:2301, 07FEB15_ABRF_FT_50a:2289
VLSPADKTNVK07FEB15_ABRF_FT_100a:1892
KCRM_HUMANFEEILTR07FEB15_ABRF_FT_5a_070216183448:2394
FKLNYKPEEEYPDLSK07FEB15_ABRF_FT_100a:2690
GQSIDDMIPAQK07FEB15_ABRF_FT_100a:2553, 07FEB15_ABRF_FT_10a:2535
GTGGVDTAAVGSVFDVSNADR07FEB15_ABRF_FT_100a:2990, 07FEB15_ABRF_FT_100a:3004
HPKFEEILTR07FEB15_ABRF_FT_100a:2455, 07FEB15_ABRF_FT_100a:2462, 07FEB15_ABRF_FT_100a:2468
LGSSEVEQVQLVVDGVK07FEB15_ABRF_FT_100a:3133, 07FEB15_ABRF_FT_100a:3139, 07FEB15_ABRF_FT_100a:3142
LNYKPEEEYPDLSK07FEB15_ABRF_FT_100a:2477
LSVEALNSLTGEFK07FEB15_ABRF_FT_100a:3339, 07FEB15_ABRF_FT_100a:3345, 07FEB15_ABRF_FT_100a:3350
LSVEALNSLTGEFKGK07FEB15_ABRF_FT_100a:3169, 07FEB15_ABRF_FT_100a:3174, 07FEB15_ABRF_FT_100a:3176
RGTGGVDTAAVGSVFDVSNADR07FEB15_ABRF_FT_100a:2885
SFLVWVNEEDHLR07FEB15_ABRF_FT_100a:3320, 07FEB15_ABRF_FT_100a:3327, 07FEB15_ABRF_FT_100a:3328
SIKGYTLPPHCSR07FEB15_ABRF_FT_50a:2176
TDLNHENLKGGDDLDPNYVLSSR07FEB15_ABRF_FT_100a:2735, 07FEB15_ABRF_FT_100a:2739
VLTLELYK07FEB15_ABRF_FT_100a:2980, 07FEB15_ABRF_FT_100a:2989, 07FEB15_ABRF_FT_25a:2979
VLTLELYKK07FEB15_ABRF_FT_25a:2753

(optional) For those instances where there are no search output files availiable, information about the peptide to spectrum match should be provided in additional columns as shown below. Note that if this information can be extracted from submitted files, then these columns are optional, otherwise they are required. There should always be one column listing the charge state, one for the theoretical precursor mass, and one or more columns for each relevant search score.

ProteinPeptideSpectrum filesChargeMassScore: expectScore: dot
CATA_MOUSEFSTVAGESGSADTVRDPR07FEB15_ABRF_FT_100a:217111850.880.8970.78
FSTVAGESGSADTVRDPR07FEB15_ABRF_FT_100a:217721850.880.0020.92
FSTVAGESGSADTVRDPR07FEB15_ABRF_FT_100a:218321850.881.0340.76
GPLLVQDVVFTDEMAHFDR07FEB15_ABRF_FT_100a:365312188.060.7640.74
GPLLVQDVVFTDEMAHFDR07FEB15_ABRF_FT_100a:366032188.060.2340.84

Modifications

(optional) If the Peptide identification output files are in a supported format, then modification information need not be listed. Modifications are listed using the UNIMOD ACCESSION number. Fixed modifications for given residues are listed separately and are assumed to apply to all residues of that type. Each modified peptide string is given for each applicable spectrum. In the modified peptide strings each residue is followed by a UNIMOD ACCESSION in parenthesis if it is modified and fixed modifications need not be listed.

Example table listing fixed modifications:

ModificationResidues
5K, R, C
34C, R

Example table listing variable modifications:

PeptideMod StringSpectrum FileSpectrum ID
LSVEALNSLTGEFKLSV(18)EALNSL(24)TGEFK07FEB15_ABRF_FT_100a3350

Quantification

(optional) The following tables are used to denote any quantification value associated with the proteins, peptides or spectra listed above. Each quantification value is a single number per protein, peptide, and/or spectrum per sample. If any quantification values are supplied, then the quantification metadata field should describe the units (if applicable) and methodology.

The table below shows example protein quantification data:

ProteinValue
ACH1_DROME0.453166424
ACSA2_ACEXY0.42215327
ANXA5_CHICK0.083062546
ATPA_XYLFT0.675071009
BID_HUMAN 0.104248212
CATG_HUMAN0.369605387

The table below shows example peptide quantification data:

PeptideValue
MGLRLSQLIDVNLK0.004854258
FEELLTR0.437050859
VLTEILASR0.409989692
SSTIANIVR0.519375723
LGRIEADSESQEDIIR0.641893347
IFGSYDPR0.733304693
NVNPVALPR0.248824725
SSGVPPEVFTR0.244778981
TIQNDIMLLQLSR0.087638226
VSSFLPWIR0.387405287
AFTECCVVASQLR0.580127074
GGTNIITLLAVVK0.662884271

The table below shows example spectra quantification data:

Spectrum fileSpectrum idValue
07FEB15_ABRF_FT_100a33500.577423865
07FEB15_ABRF_FT_50a2410.773084765
07FEB15_ABRF_FT_100a98720.12518907
07FEB15_ABRF_FT_25a23130.353178653