| RAId_DbS | ||
|
RAId_DbS is a software used to identify peptides from a MS/MS spectrum
| ||
| SYNTAX | ||
| RAId_DbS [-fp] [-ez] [-ng] [-cg] [-lb] [-ub] [-pt] [-dt] | ||
|
[-mc] [-sc] [-ns] [-as] [-ap] [-ud] [-db] [-f] [-ip] [-op]
| ||
| DESCRIPTION | ||
|
RAId_DbS is a software designed to identify peptides in a
specified protein database using the information contained
in the MS/MS spectrum. One of the key features of
RAId_DbS is that for each identified peptide it reports an
E-value based on a theoretical distribution (Alves G.,
Ogurtsov Y.A., Yu Yi-Kuo. Biology Direct 2007, 2:25).
Another feature of RAId_DbS is that for each MS/MS
spectrum RAId_DbS reports a measure of goodness of its model.
RAId_DbS has a unique database structure which incorporates information from post-translational modifications (PTM), single amino acid polymorphism (SAP) and diseases associated with mutations. The information used to generate the annotated database is obtained mainly from GeneBank. Annotated databases from various species can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/RAId/Software/RAId_DbS/. RAId_DbS's unique database structure permits the user to customize searches by including specific information related to PTMs, SAPs and diseases by creating one's own database which can be searched separately or together with a specified database. | ||
| OPTIONS | ||
| -fp | Formatting database option. The options -fp will generate the database used in RAId_DbS from a database file of protein sequences in FASTA format. -fp /path/input_database_name /path/output_database_name See below for further example of formatting database. | |
| -ez | Enzyme options. Default value: -ez 1 -ez 1 = Trypsin (K,R) -ez 2 = Lys-C (K) -ez 3 = Arg-C (R) -ez 4 = GluC-Phosphate (E,D) -ez 5 = GluC-Bicarbonate (E) -ez 6 = Pepsin (L,F) -ez 7 = Chymotrypsin (F,Y,W) | |
| -ng | Chemical group attached to peptide N-terminal. Default value: -ng 1.007825 User can specify any molecular weight after -ng. Example: -ng 1.007825 = Hydrogen. -ng 43.01838 = Acetyl. | |
| -cg | Chemical group attached to peptide C-terminal. Default value: -cg 17.002739 -cg 17.002739 = Free Acid -cg 16.01872 = Amide | |
| -lb | Lower bound in the charge state of the precursor ion. Default value: -lb 1. Range value: [1,9]. | |
| -ub | Upper bound in the charge state of the precursor ion. Default value: -ub 3. Range value: [1,9]. | |
| -pt | Parent ion mass tolerance (Da.). Default value: -pt 1.0. RAId_DbS will look for all masses within +/- 3*(Parent ion mass tolerance). | |
| -dt | Daughter ion mass tolerance (Da.). Default value: -dt 0.2. | |
| -mc | Cysteine modification options. Default value: Unmodified Cysteine (103.009186 Da.). Chemical group attached to the side chain of cysteine. -mc C00 = Unmodified Cysteine (103.009186 Da.). -mc C31 = Carboxymethylation (161.014649 Da.). -mc C32 = Carbamidomethylation (160.030646 Da.). -mc C33 = Pyridylethylation (208.066421 Da.). | |
| -sc | Fragmentation series used to score peptide. Default value: -sc b,y Any combination of the following are possible choice: (b,a,c,b-NH3,b-H2O,b2,y,x,y-NH3,y-H2O,y2) | |
| -ns | Search for novel single amino acid polymorphism (SAP). Default value: -ns 0. -ns 0 = does not search for SAP. -ns 1 = does search for SAP. | |
| -as | Number of annotated Single Amino Acid Polymorphisms (SAP) allowed per peptide. Default value: -as 1. Allowed parameter [0,2] -as 0 = does not search for annotated SAP. -as 1 = search for 1 annotated SAP per peptide. -as 2 = search for 2 annotated SAP per peptide. | |
| -ap | Number of annotated Post-Translation Modifications (PTM) allowed per peptide. Default value: -ap 2. Allowed parameter range [0,5]. -ap 0 = does not search for annotated PTM. -ap 1 = 1 annotated PTM per peptide. -ap 2 = 2 annotated PTM per peptide. | |
| -ud | User annotated database. -ud /path/user_database_name | |
| -db | Annotated (unannotated) protein database used. -db /path/database_name | |
| -f | MS/MS spectrum file name. -f file_name | |
| -ip | MS/MS spectrum file directory path. -ip /path/ | |
| -op | Output search results path. -op /path/ | |
|
EXAMPLES | ||
|
Executing RAId_DbS RAId_DbS -ez 1 -dt 0.05 -pt 0.8 -ng 1.007825 -cg 17.002739 -lb 2 -ub 4 -mc COO -sc b,y,b-H2O -ns 0 -as 1 -ap 2 -db $HOME/Human -f example.dta -ip $HOME -op $HOME. The example above would execute RAId_DbS using: Trypsin as the enzyme -ez 1 Molecular error tolerance of daughter ion 0.05 Da. -dt 0.05 Molecular error tolerance of parent ion 0.8 Da. -pt 0.05 N-terminal group hydrogen -ng 1.0078 C-terminal group Free Acid -cg 17.0027 Searching assuming precursor ion molecular to have charge between [2,4] -lb 2 -ub 4 Cysteine unmodified -mc C00 Fragmented series used to score peptide b,y,b-H20 -sc b,y,b-H2O Searching for novel SAP option off -ns 0 Searching with annotated SAP option on -as 1 Two annotated PTMs allowed per peptide -ap 2 Database used for the search located at $HOME directory database_name = Human -db $HOME/Human MS/MS file used example.dta -f example.dta Path (folder) of the MS/MS file used is the $HOME directory -ip $HOME Path (folder) to place search result is the $HOME directory -op $HOME Formatting Database -fp | ||
| -fp | /path/input_database_filename /path/output_database_filename After the database is formatted the 3 files will be created. output_database_filename.def, full_output_db_filename.seq, full_output_db_filename.prs. | |
|
User Annotated Database -ud To generate a user specified database the user need to create two files: a fasta file of sequences where the first word of the sequence descriptor will be used as the sequence identifier (>seq_identifier) and a second file containing the user expertise knowledge related to PTMs, SAPs and disease. Fasta file example: >Id_Seq1 Isoform alpha MLLATLLLLLLGGALAHPDRIIFPNHACEDPPAVLLEVQGTLQRPLVRDSRTSPANCTWLILGSKEQTVT IRFQKLHLACGSERLTLRSPLQPLISLCEAPPSPLQLPGGNVTITYSYAGARAPMGQGFLLSYSQDWLMC LQEEFQCLNHRCVSAVQRCDGVDACGDGSDEAGCSSDPFPGLTPRPVPSLPCNVTLEDFYGVFSSPGYTH ... >Id_Seq2 Isoform beta MLLATVVVVTSGGALAHPDRIIFPNHACEDPPAVLLEVQGTLQRPLVRDSRTSPANCTWLILGSKEQTVT IRFQKLHLACGSERLTLRSPLQPLISLCEAPPSPLQLPGGNVTITYSYAGARAPMGQGFLLSYSQDWLMC LQEEFQCLNHRCVSAVQRCDGVDACGDGSDEAGCSSDPFPGLTPRPVPSLPCNVTLEDFYGVFSSPGYTH ... Id_Seq1 and Id_Seq2 are going to be used as the sequence identifiers. User knowledge file example: | ||
| >Id_Seq1 | ||
| 48 | SAP A W deadly cancer | |
| 56 | PTM N N08,N09,N10,N11,N12 | |
| 111 | PTM N N08,N09,N10,N11,N12 | |
| 139 | SAP M V diabetes | |
| 193 | SAP N L,I,V | |
| 193 | PTM N N08 | |
| 299 | PTM N N08,N09,N10,N11,N12 | |
| 365 | SAP A T color blind | |
| 434 | SAP S C,T,V,P insulin dependent diabetes | |
| 558 | SAP R H,P,W | |
| >Id_Seq2 | ||
| 48 | SAP R W deadly cancer | |
| 56 | PTM N N08,N09,N10,N11,N12 | |
| 111 | PTM N N08,N09,N10,N11,N12 | |
| 139 | SAP M V diabetes | |
| 193 | SAP N L,I,V | |
| 193 | PTM N N08 | |
| 299 | PTM N N08,N09,N10,N11,N12 | |
| 365 | SAP A T color blind | |
| 434 | SAP S C,T,V,P insulin dependent diabetes | |
| 558 | SAP R H,P,W | |
|
Where the above files have the following structure >seq_identifier First column field = residue position. Second column field = SAP or PTM. Third column field = original residue in the sequence. Fourth column field = either a list of possible SAPs (L,I,V) or a list of possible PTMs (N08,N09,N10,N11,N12). Fifth column field = disease name if any at the given position. Once the user has created the two files as described above the user can generate a format database which RAId_DbS can process by executing UserDb.pl script. $ ./UserDb.pl fasta_file_name knowledge_file_name output_format_database The output_format_database is the database that can be process by RAId_DbS using -ud option. Directories Directory PARAMETER_FILES The directory "PARAMETER_FILES" should contain the following four files: PTMTABLE_REGULARAMINO, Parameter_File, Parameter_File_Tag and RANDOM_DB_RAID. "PTMTABLE_REGULARAMINO" is a file that contains amino acid residue information post-translation modification and others. The user can add any new post-translation modification to this file as long as he keeps the same annotation structure present in the file as shown below. | ||
| Line Code | Description | |
| ID | Chemical Name of Amino Acid/PTM | |
| AC | Residue Key | |
| TG | Target Unmodified Amino Acid | |
| RW | Unmodified Amino Acid Molecular Weight | |
| MW | Modified Amino Acid Molecular Weight | |
| PA | Location of the Modification in the Amino Acid Residue | |
| PP | Position of the Amino Residue in the Peptide | |
| CF | Chemical Modification to the Amino Acid Residue | |
| MM | Monoisotopic Mass Difference MM=MW-RW | |
| KY | Other Common Names Used to Identify the Same Molecule | |
| LT | Other Terms Found In Literature not Necessary Correct Names | |
| Example of the addition of a new amino acid Glycine to the "PTMTABLE_REGULARAMINO" file. | ||
| ID | Glycine | |
| AC | G00 | |
| TG | Glycine | |
| RW | 57.021465 | |
| MW | 57.021465 | |
| PA | Peptide Back-Bone | |
| PP | None | |
| CF | None | |
| MM | 0 | |
| KY | None | |
| LT | None | |