We aim to make data deposit procedures as straightforward as possible and will provide as much assistance as you require to get your data submitted to Peptidome. If you have problems or questions about the submission procedures described on this page, please e-mail us at peptidome@ncbi.nlm.nih.gov and one of our curators will quickly get back to you. Once you have assembled all required files, please transfer them to us using the Peptidome Submission Form.
A standard submission has three required components, a Metadata file that describes the experiment and associated files, the Data files (both raw data files and peptide identification output files), and finally the table of Results. A complete description of each component follows:
Metadata refers to descriptive information about the overall experiment, the biological samples under examination, protocols used to generate the data, and references to associated data files.
Metadata is supplied to us by completing the 'Peptidome Metadata Template' spreadsheet in the NCBI_Peptidome_Submission_Template Excel file. Guidelines for the content of each field are provided both within the spreadsheet and in the following tables:
| This section describes the overall experiment. | |
|---|---|
| title | A unique title, less than 120 characters, that describes the overall study. |
| summary | A thorough description of the goals and objectives of this study. The abstract from the associated publication may be suitable. Include as much text as necessary to thoroughly describe the study. |
| overall design | Indicate how many Samples are analyzed, if replicates are included, are there control and/or reference Samples, etc. |
| contributor | "Firstname,Initial,Lastname". Example: "John,H,Smith" or "Jane,Doe". Put each contributor on a separate line. |
| pubmed id | PubMed identifier (PMID) that references a published article describing this study. This can be e-mailed to us later once your data are published. |
| release date | Enter the date on which to release your data to the public. Use format MM-DD-YYYY. The maximum allowable limit is one year from the day of submission, but this date may be brought forward or pushed back at any time by e-mailing us. |
| This section describes protocols and instrumentation details which are common to all Samples. Protocols which are applicable to specific Samples should be included in the SAMPLES section instead. | |
|---|---|
| Sample preparation protocols | |
| growth | Describe the conditions that were used to grow or maintain organisms or cells prior to protein preparation. |
| treatment | Describe any treatments applied to the biological material prior to extract preparation. |
| extract | Describe the protocol used to extract and prepare the protein. |
| separation | Describe the method(s) used to separate the protein mixtures. (e.g., column chromatography, gel electrophoresis, capillary electrophoresis). |
| digestion | Specify the enzyme used to digest the sample, the duration and temperature of digestion, and whether in gel or in solution. |
| Instrumentation | |
| platform | Describe the generic instrument type using one of the following terms:
|
| manufacturer | Specify the instrument vendor or manufacturer. |
| ion source | Specify the ionization source, e.g., ESI or MALDI, together with any additional details like sprayer or laser type and parameters, voltages, etc. |
| analyzer | Specify the analyzer components and parameters, e.g., quadrupole, time-of-flight, ion trap, etc. |
| detector | Specify the detector type and any additional parameters like sensitivity, etc. |
| setup | Describe any additional parameters and configuration of the entire instrument. |
| Data processing | |
| software | Specify the software.version used to identify the proteins/peptides. |
| parameters | Describe any parameters not specified in peptide identification output files. |
| quantification | Describe any protocol used to quantify the proteins or peptides. |
| This section lists and describes each of the biological Samples under investigation, as well as any protocols that are specific to individual Samples. | |
|---|---|
| sample ID | A unique identifier for this Sample. This identifier is used only as an internal reference within a given file and will not appear on final records. |
| title | Unique title that describes the Sample. We suggest that you use the convention: [biomaterial]-[condition(s)]-[replicate number], e.g., Muscle_exercised_60min_rep2. |
| organism | Scientific name of organism(s) from which the biological material was derived. |
| characteristics | List all available characteristics of the biological source, including factors
not necessarily under investigation, e.g.,
Strain: C57BL/6 |
| description | Additional information not provided in the other fields, or paste in broad descriptions that cannot be easily dissected into the other fields. |
| This section lists all of the files associated with the experiment and their relationship to each other. Each Sample may have multiple rows, one for each file. | |
|---|---|
| sample ID | A unique identifier for this Sample. This identifier is used only as an internal reference within a given file and will not appear on final records. |
| results file | The name of the Results file or spreadsheet that lists all of the proteins, peptides, and matching scans for each Sample. See the Results template spreadsheet for the required format. Each Sample must have only one Results file. |
| fraction | An ordinal number for the gel slice, or an "x,y" coordinate for 2D gels. |
| raw file | The name of the file containing the instrument generated (raw) data for each fraction. |
| raw file type | The raw data type (e.g. mzData, mzXML, mzML, mgf, pkl, sqt, dta). |
| peptide identification output file | The name of the peptide identification output file that matches scans to peptides. There may be more than one output file per raw file. |
| peptide identification file type | The algorithm or method that generated the peptide identification output file, e.g. OMSSA, Mascot, X!Tandem, Sequest, NIST MS, PEAKS, manual inspection, etc. |
Two types of data files should be supplied with each submission:
The raw data files that contain the MS1 and MS2 information from the instrument. The preferred raw data format is one of the standard XML formats (mzData, mzXML, or mzML) that contain both the MS1 and MS2 data from a single fraction. Less desirably, text formats (.mgf, .pkl, .sqt, .dta) may be accepted. Proprietary binary data from the manufacturers (e.g., .raw or .wiff) are not accepted. There should be one or more raw files per biological sample.
The Peptide identification output files from any search engine program that was used to match the MS2 scans to the peptides. We currently support Mascot DAT files, OMSSA ASN.1 or XML formatted files, or any search engine output that has been converted to PepXML. If no search engine was used, or if we don't yet support the engine you used, then your Results table must include references to spectrum files and the additional information that is usually extracted from the search engine output files, e.g. charge state, identification scores, etc. See below for an example.
The Results tables describe your view of the final, processed results as discussed in any associated manuscript. Results tables list the proteins discovered in each Sample in the Study. For each protein, the peptides should be listed, and for each peptide, the matching spectrum files should be listed. Modifications can also be specified. If the matching spectrum file list is omitted, then every matching spectrum in the Peptide identification output files is assumed to be correct. Similarly, if the Results table contains only proteins, then all associated peptides and spectra will be gleaned from the Peptide identification output files. Note that this might be too permissive, and matches between peptides and spectra that do not pass standard criteria will be unintentionally accepted.
A spectrum file name is represented as a colon separated pair of the spectrum_file_name:spectrum number. The spectrum file extension may be omitted from the file name. Lists of matching spectrum file names should be comma separated. The expected format is illustrated in the 'Results example' spreadsheet in the NCBI_Peptidome_Submission_Template, and in the following table:
| Protein | Peptide | Spectrum files |
|---|---|---|
| CATA_MOUSE | FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2171, 07FEB15_ABRF_FT_100a:2177, 07FEB15_ABRF_FT_100a:2183 |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3653, 07FEB15_ABRF_FT_100a:3660 | |
| GPLLVQDVVFTDEMAHFDRER | 07FEB15_ABRF_FT_100a:3231, 07FEB15_ABRF_FT_100a:3495, 07FEB15_ABRF_FT_100a:3499 | |
| LCENIAGHLKDAQLFIQK | 07FEB15_ABRF_FT_100a:2967, 07FEB15_ABRF_FT_100a:2968 | |
| LFAYPDTHR | 07FEB15_ABRF_FT_50a:2395 | |
| LVNADGEAVYCK | 07FEB15_ABRF_FT_100a:2151, 07FEB15_ABRF_FT_100a:2157, 07FEB15_ABRF_FT_100a:2161 | |
| VWPHKDYPLIPVGK | 07FEB15_ABRF_FT_100a:2768, 07FEB15_ABRF_FT_100a:2774, 07FEB15_ABRF_FT_50a:2808 | |
| CATD_HUMAN | AIGAVPLIQGEYMIPCEK | 07FEB15_ABRF_FT_100a:3305, 07FEB15_ABRF_FT_100a:3310 |
| FDGILGMAYPR | 07FEB15_ABRF_FT_100a:3258, 07FEB15_ABRF_FT_10a:3109, 07FEB15_ABRF_FT_10a:3111 | |
| ISVNNVLPVFDNLMQQK | 07FEB15_ABRF_FT_100a:3771, 07FEB15_ABRF_FT_100a:3775 | |
| LVDQNIFSFYLSR | 07FEB15_ABRF_FT_100a:3705, 07FEB15_ABRF_FT_100a:3711, 07FEB15_ABRF_FT_100a:3716 | |
| QVFGEATKQPGITFIAAK | 07FEB15_ABRF_FT_100a:2882 | |
| VSTLPAITLK | 07FEB15_ABRF_FT_100a:2857 | |
| HBA3_PANTR | VGAHAGZYGAEALER | 07FEB15_ABRF_FT_100a:2245, 07FEB15_ABRF_FT_25a:2301, 07FEB15_ABRF_FT_50a:2289 |
| VLSPADKTNVK | 07FEB15_ABRF_FT_100a:1892 | |
| KCRM_HUMAN | FEEILTR | 07FEB15_ABRF_FT_5a_070216183448:2394 |
| FKLNYKPEEEYPDLSK | 07FEB15_ABRF_FT_100a:2690 | |
| GQSIDDMIPAQK | 07FEB15_ABRF_FT_100a:2553, 07FEB15_ABRF_FT_10a:2535 | |
| GTGGVDTAAVGSVFDVSNADR | 07FEB15_ABRF_FT_100a:2990, 07FEB15_ABRF_FT_100a:3004 | |
| HPKFEEILTR | 07FEB15_ABRF_FT_100a:2455, 07FEB15_ABRF_FT_100a:2462, 07FEB15_ABRF_FT_100a:2468 | |
| LGSSEVEQVQLVVDGVK | 07FEB15_ABRF_FT_100a:3133, 07FEB15_ABRF_FT_100a:3139, 07FEB15_ABRF_FT_100a:3142 | |
| LNYKPEEEYPDLSK | 07FEB15_ABRF_FT_100a:2477 | |
| LSVEALNSLTGEFK | 07FEB15_ABRF_FT_100a:3339, 07FEB15_ABRF_FT_100a:3345, 07FEB15_ABRF_FT_100a:3350 | |
| LSVEALNSLTGEFKGK | 07FEB15_ABRF_FT_100a:3169, 07FEB15_ABRF_FT_100a:3174, 07FEB15_ABRF_FT_100a:3176 | |
| RGTGGVDTAAVGSVFDVSNADR | 07FEB15_ABRF_FT_100a:2885 | |
| SFLVWVNEEDHLR | 07FEB15_ABRF_FT_100a:3320, 07FEB15_ABRF_FT_100a:3327, 07FEB15_ABRF_FT_100a:3328 | |
| SIKGYTLPPHCSR | 07FEB15_ABRF_FT_50a:2176 | |
| TDLNHENLKGGDDLDPNYVLSSR | 07FEB15_ABRF_FT_100a:2735, 07FEB15_ABRF_FT_100a:2739 | |
| VLTLELYK | 07FEB15_ABRF_FT_100a:2980, 07FEB15_ABRF_FT_100a:2989, 07FEB15_ABRF_FT_25a:2979 | |
| VLTLELYKK | 07FEB15_ABRF_FT_25a:2753 |
(optional) For those instances where there are no search output files availiable, information about the peptide to spectrum match should be provided in additional columns as shown below. Note that if this information can be extracted from submitted files, then these columns are optional, otherwise they are required. There should always be one column listing the charge state, one for the theoretical precursor mass, and one or more columns for each relevant search score.
| Protein | Peptide | Spectrum files | Charge | Mass | Score: expect | Score: dot |
|---|---|---|---|---|---|---|
| CATA_MOUSE | FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2171 | 1 | 1850.88 | 0.897 | 0.78 |
| FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2177 | 2 | 1850.88 | 0.002 | 0.92 | |
| FSTVAGESGSADTVRDPR | 07FEB15_ABRF_FT_100a:2183 | 2 | 1850.88 | 1.034 | 0.76 | |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3653 | 1 | 2188.06 | 0.764 | 0.74 | |
| GPLLVQDVVFTDEMAHFDR | 07FEB15_ABRF_FT_100a:3660 | 3 | 2188.06 | 0.234 | 0.84 |
(optional) If the Peptide identification output files are in a supported format, then modification information need not be listed. Modifications are listed using the UNIMOD ACCESSION number. Fixed modifications for given residues are listed separately and are assumed to apply to all residues of that type. Each modified peptide string is given for each applicable spectrum. In the modified peptide strings each residue is followed by a UNIMOD ACCESSION in parenthesis if it is modified and fixed modifications need not be listed.
Example table listing fixed modifications:
| Modification | Residues |
|---|---|
| 5 | K, R, C |
| 34 | C, R |
Example table listing variable modifications:
| Peptide | Mod String | Spectrum File | Spectrum ID |
|---|---|---|---|
| LSVEALNSLTGEFK | LSV(18)EALNSL(24)TGEFK | 07FEB15_ABRF_FT_100a | 3350 |
(optional) The following tables are used to denote any quantification value associated with the proteins, peptides or spectra listed above. Each quantification value is a single number per protein, peptide, and/or spectrum per sample. If any quantification values are supplied, then the quantification metadata field should describe the units (if applicable) and methodology.
The table below shows example protein quantification data:
| Protein | Value |
|---|---|
| ACH1_DROME | 0.453166424 |
| ACSA2_ACEXY | 0.42215327 |
| ANXA5_CHICK | 0.083062546 |
| ATPA_XYLFT | 0.675071009 |
| BID_HUMAN | 0.104248212 |
| CATG_HUMAN | 0.369605387 |
The table below shows example peptide quantification data:
| Peptide | Value |
|---|---|
| MGLRLSQLIDVNLK | 0.004854258 |
| FEELLTR | 0.437050859 |
| VLTEILASR | 0.409989692 |
| SSTIANIVR | 0.519375723 |
| LGRIEADSESQEDIIR | 0.641893347 |
| IFGSYDPR | 0.733304693 |
| NVNPVALPR | 0.248824725 |
| SSGVPPEVFTR | 0.244778981 |
| TIQNDIMLLQLSR | 0.087638226 |
| VSSFLPWIR | 0.387405287 |
| AFTECCVVASQLR | 0.580127074 |
| GGTNIITLLAVVK | 0.662884271 |
The table below shows example spectra quantification data:
| Spectrum file | Spectrum id | Value |
|---|---|---|
| 07FEB15_ABRF_FT_100a | 3350 | 0.577423865 |
| 07FEB15_ABRF_FT_50a | 241 | 0.773084765 |
| 07FEB15_ABRF_FT_100a | 9872 | 0.12518907 |
| 07FEB15_ABRF_FT_25a | 2313 | 0.353178653 |