We aim to make data deposit procedures as straightforward as possible and will provide as much assistance
as you require to get your data submitted to Peptidome. If you have problems or questions about the submission
procedures described on this page, please e-mail us at peptidome@ncbi.nlm.nih.gov
and one of our curators will quickly get back to you. Once you have assembled all required files,
please transfer them to us using the Peptidome Submission Form.
A standard submission has three required components, a
Metadata file that describes the
experiment and associated files, the Data files
(both raw data files and peptide identification output files), and finally the
table of Results. A complete description of each component follows:
Metadata refers to descriptive information about the overall experiment,
the biological samples under examination, protocols used to generate the data, and references to associated data files.
Metadata is supplied to us by completing the 'Peptidome Metadata Template' spreadsheet in the
NCBI_Peptidome_Submission_Template Excel file.
Guidelines for the content of each field are provided both within the spreadsheet and in the following tables:
Two types of data files should be supplied
with each submission:
The raw data files that contain the MS1 and
MS2 information from the instrument. The preferred raw data format is one of the standard XML formats
(mzData,
mzXML, or
mzML) that contain both
the MS1 and MS2 data from a single fraction. Less desirably, text formats (.mgf, .pkl, .sqt, .dta) may be accepted.
Proprietary binary data from the manufacturers (e.g., .raw or .wiff) are not accepted.
There should be one or more raw files per biological sample.
The Peptide identification output files from any search engine program that was used to match the MS2 scans to the peptides.
We currently support Mascot
DAT files, OMSSA ASN.1 or XML formatted files, or
any search engine output that has been converted to PepXML. If no
search engine was used, or if we don't yet support the engine you used, then
your Results table must include references to spectrum files and the additional information that is usually extracted from
the search engine output files, e.g. charge state, identification scores, etc. See below for an example.
The Results tables describe your view of the final, processed results as discussed in any associated manuscript.
Results tables list the proteins discovered in each Sample in the Study. For each protein, the peptides should be
listed, and for each peptide, the matching spectrum files should be listed.
Modifications can also be specified. If the matching spectrum file list is omitted, then
every matching spectrum in the Peptide identification output files is assumed
to be correct. Similarly, if the Results table contains only proteins,
then all associated peptides and spectra will be gleaned from the Peptide identification output files.
Note that this might be too permissive, and matches between peptides and spectra that do not pass standard criteria will be unintentionally accepted.
A spectrum file name is represented as a colon separated pair
of the spectrum_file_name:spectrum number. The spectrum file
extension may be omitted from the file name. Lists of matching spectrum file names should be comma separated.
The expected format is illustrated in the 'Results example' spreadsheet in
the NCBI_Peptidome_Submission_Template, and in the following table:
| Protein | Peptide | Spectrum files |
| CATA_MOUSE | FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2171, 07FEB15_ABRF_FT_100a:2177, 07FEB15_ABRF_FT_100a:2183 |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3653, 07FEB15_ABRF_FT_100a:3660 |
| GPLLVQDVVFTDEMAHFDRER | 07FEB15_ABRF_FT_100a:3231, 07FEB15_ABRF_FT_100a:3495, 07FEB15_ABRF_FT_100a:3499 |
| LCENIAGHLKDAQLFIQK | 07FEB15_ABRF_FT_100a:2967, 07FEB15_ABRF_FT_100a:2968 |
| LFAYPDTHR | 07FEB15_ABRF_FT_50a:2395 |
| LVNADGEAVYCK | 07FEB15_ABRF_FT_100a:2151, 07FEB15_ABRF_FT_100a:2157, 07FEB15_ABRF_FT_100a:2161 |
| VWPHKDYPLIPVGK | 07FEB15_ABRF_FT_100a:2768, 07FEB15_ABRF_FT_100a:2774, 07FEB15_ABRF_FT_50a:2808 |
| CATD_HUMAN | AIGAVPLIQGEYMIPCEK | 07FEB15_ABRF_FT_100a:3305, 07FEB15_ABRF_FT_100a:3310 |
| FDGILGMAYPR | 07FEB15_ABRF_FT_100a:3258, 07FEB15_ABRF_FT_10a:3109, 07FEB15_ABRF_FT_10a:3111 |
| ISVNNVLPVFDNLMQQK | 07FEB15_ABRF_FT_100a:3771, 07FEB15_ABRF_FT_100a:3775 |
| LVDQNIFSFYLSR | 07FEB15_ABRF_FT_100a:3705, 07FEB15_ABRF_FT_100a:3711, 07FEB15_ABRF_FT_100a:3716 |
| QVFGEATKQPGITFIAAK | 07FEB15_ABRF_FT_100a:2882 |
| VSTLPAITLK | 07FEB15_ABRF_FT_100a:2857 |
| HBA3_PANTR | VGAHAGZYGAEALER | 07FEB15_ABRF_FT_100a:2245, 07FEB15_ABRF_FT_25a:2301, 07FEB15_ABRF_FT_50a:2289 |
| VLSPADKTNVK | 07FEB15_ABRF_FT_100a:1892 |
| KCRM_HUMAN | FEEILTR | 07FEB15_ABRF_FT_5a_070216183448:2394 |
| FKLNYKPEEEYPDLSK | 07FEB15_ABRF_FT_100a:2690 |
| GQSIDDMIPAQK | 07FEB15_ABRF_FT_100a:2553, 07FEB15_ABRF_FT_10a:2535 |
| GTGGVDTAAVGSVFDVSNADR | 07FEB15_ABRF_FT_100a:2990, 07FEB15_ABRF_FT_100a:3004 |
| HPKFEEILTR | 07FEB15_ABRF_FT_100a:2455, 07FEB15_ABRF_FT_100a:2462, 07FEB15_ABRF_FT_100a:2468 |
| LGSSEVEQVQLVVDGVK | 07FEB15_ABRF_FT_100a:3133, 07FEB15_ABRF_FT_100a:3139, 07FEB15_ABRF_FT_100a:3142 |
| LNYKPEEEYPDLSK | 07FEB15_ABRF_FT_100a:2477 |
| LSVEALNSLTGEFK | 07FEB15_ABRF_FT_100a:3339, 07FEB15_ABRF_FT_100a:3345, 07FEB15_ABRF_FT_100a:3350 |
| LSVEALNSLTGEFKGK | 07FEB15_ABRF_FT_100a:3169, 07FEB15_ABRF_FT_100a:3174, 07FEB15_ABRF_FT_100a:3176 |
| RGTGGVDTAAVGSVFDVSNADR | 07FEB15_ABRF_FT_100a:2885 |
| SFLVWVNEEDHLR | 07FEB15_ABRF_FT_100a:3320, 07FEB15_ABRF_FT_100a:3327, 07FEB15_ABRF_FT_100a:3328 |
| SIKGYTLPPHCSR | 07FEB15_ABRF_FT_50a:2176 |
| TDLNHENLKGGDDLDPNYVLSSR | 07FEB15_ABRF_FT_100a:2735, 07FEB15_ABRF_FT_100a:2739 |
| VLTLELYK | 07FEB15_ABRF_FT_100a:2980, 07FEB15_ABRF_FT_100a:2989, 07FEB15_ABRF_FT_25a:2979 |
| VLTLELYKK | 07FEB15_ABRF_FT_25a:2753 |
(optional) For those instances where there are no
search output files availiable, information about the
peptide to spectrum match should be provided in
additional columns as shown below. Note that if this
information can be extracted from submitted files,
then these columns are optional, otherwise they are
required. There should always be one column listing
the charge state, one for the theoretical precursor mass, and one or more columns for each
relevant search score.
| Protein | Peptide | Spectrum files | Charge | Mass | Score: expect | Score: dot |
| CATA_MOUSE | FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2171 | 1 | 1850.88 | 0.897 | 0.78 |
| FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2177 | 2 | 1850.88 | 0.002 | 0.92 |
| FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2183 | 2 | 1850.88 | 1.034 | 0.76 |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3653 | 1 | 2188.06 | 0.764 | 0.74 |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3660 | 3 | 2188.06 | 0.234 | 0.84 |
(optional) If the Peptide identification output files are in a supported format, then
modification information need not be listed. Modifications are
listed using the UNIMOD ACCESSION number. Fixed modifications for
given residues are listed separately and are assumed to apply to
all residues of that type. Each modified peptide string is
given for each applicable spectrum. In the modified peptide
strings each residue is followed by a UNIMOD ACCESSION in parenthesis
if it is modified and fixed modifications need not be listed.
Example table listing fixed modifications:
| Modification | Residues |
| 5 | K, R, C |
| 34 | C, R |
Example table listing variable modifications:
| Peptide | Mod String | Spectrum File | Spectrum ID |
| LSVEALNSLTGEFK | LSV(18)EALNSL(24)TGEFK | 07FEB15_ABRF_FT_100a | 3350 |
(optional) The following tables are used to denote any quantification value associated with the proteins, peptides or spectra listed above. Each quantification value is a single number per protein, peptide, and/or spectrum per sample. If any quantification values are supplied, then the quantification metadata field should describe the units (if applicable) and methodology.
The table below shows example protein quantification data:
| Protein | Value |
| ACH1_DROME | 0.453166424 |
| ACSA2_ACEXY | 0.42215327 |
| ANXA5_CHICK | 0.083062546 |
| ATPA_XYLFT | 0.675071009 |
| BID_HUMAN | 0.104248212 |
| CATG_HUMAN | 0.369605387 |
The table below shows example peptide quantification data:
| Peptide | Value |
| MGLRLSQLIDVNLK | 0.004854258 |
| FEELLTR | 0.437050859 |
| VLTEILASR | 0.409989692 |
| SSTIANIVR | 0.519375723 |
| LGRIEADSESQEDIIR | 0.641893347 |
| IFGSYDPR | 0.733304693 |
| NVNPVALPR | 0.248824725 |
| SSGVPPEVFTR | 0.244778981 |
| TIQNDIMLLQLSR | 0.087638226 |
| VSSFLPWIR | 0.387405287 |
| AFTECCVVASQLR | 0.580127074 |
| GGTNIITLLAVVK | 0.662884271 |
The table below shows example spectra quantification data:
| Spectrum file | Spectrum id | Value |
| 07FEB15_ABRF_FT_100a | 3350 | 0.577423865 |
| 07FEB15_ABRF_FT_50a | 241 | 0.773084765 |
| 07FEB15_ABRF_FT_100a | 9872 | 0.12518907 |
| 07FEB15_ABRF_FT_25a | 2313 | 0.353178653 |