QMBP banner
Help:
RAId_DbS/deNovo help
Online resources:
RAId
ITM Probe
FTP resources:
RAId
Group Members:
Yi-Kuo Yu
Gelio Alves
Timothy Doerr
Oleg Obolensky
Aleksey Ogurtsov
Aleksandar Stojmirovic
Alumni:
Damir Herman
Mihaela Sardiu
Rajarshi Ray


RAId_DbS
RAId_DbS is a software used to identify peptides from a MS/MS spectrum

SYNTAX
RAId_DbS [-fp] [-ez] [-ng] [-cg] [-lb] [-ub] [-pt] [-dt]
[-mc] [-sc] [-ns] [-as] [-ap] [-ud] [-db] [-f] [-ip] [-op]

DESCRIPTION
RAId_DbS is a software designed to identify peptides in a specified protein database using the information contained in the MS/MS spectrum. One of the key features of RAId_DbS is that for each identified peptide it reports an E-value based on a theoretical distribution (Alves G., Ogurtsov Y.A., Yu Yi-Kuo. Biology Direct 2007, 2:25). Another feature of RAId_DbS is that for each MS/MS spectrum RAId_DbS reports a measure of goodness of its model.

RAId_DbS has a unique database structure which incorporates information from post-translational modifications (PTM), single amino acid polymorphism (SAP) and diseases associated with mutations. The information used to generate the annotated database is obtained mainly from GeneBank. Annotated databases from various species can be downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/RAId/Software/RAId_DbS/.

RAId_DbS's unique database structure permits the user to customize searches by including specific information related to PTMs, SAPs and diseases by creating one's own database which can be searched separately or together with a specified database.

OPTIONS
-fp Formatting database option.

The options -fp will generate the database used in RAId_DbS from a database file of protein sequences in FASTA format.

-fp /path/input_database_name /path/output_database_name

See below for further example of formatting database.

-ez Enzyme options.
Default value: -ez 1
-ez 1 = Trypsin (K,R)
-ez 2 = Lys-C (K)
-ez 3 = Arg-C (R)
-ez 4 = GluC-Phosphate (E,D)
-ez 5 = GluC-Bicarbonate (E)
-ez 6 = Pepsin (L,F)
-ez 7 = Chymotrypsin (F,Y,W)

-ng Chemical group attached to peptide N-terminal.
Default value: -ng 1.007825
User can specify any molecular weight after -ng.
Example:
-ng 1.007825 = Hydrogen.
-ng 43.01838 = Acetyl.

-cg Chemical group attached to peptide C-terminal.
Default value: -cg 17.002739
-cg 17.002739 = Free Acid
-cg 16.01872 = Amide

-lb Lower bound in the charge state of the precursor ion.
Default value: -lb 1.
Range value: [1,9].

-ub Upper bound in the charge state of the precursor ion.
Default value: -ub 3.
Range value: [1,9].

-pt Parent ion mass tolerance (Da.).
Default value: -pt 1.0.
RAId_DbS will look for all masses within +/- 3*(Parent ion mass tolerance).

-dt Daughter ion mass tolerance (Da.).
Default value: -dt 0.2.

-mc Cysteine modification options.
Default value: Unmodified Cysteine (103.009186 Da.). Chemical group attached to the side chain of cysteine.
-mc C00 = Unmodified Cysteine (103.009186 Da.).
-mc C31 = Carboxymethylation (161.014649 Da.).
-mc C32 = Carbamidomethylation (160.030646 Da.).
-mc C33 = Pyridylethylation (208.066421 Da.).

-sc Fragmentation series used to score peptide. Default value: -sc b,y
Any combination of the following are possible choice: (b,a,c,b-NH3,b-H2O,b2,y,x,y-NH3,y-H2O,y2)

-ns Search for novel single amino acid polymorphism (SAP).
Default value: -ns 0.
-ns 0 = does not search for SAP.
-ns 1 = does search for SAP.

-as Number of annotated Single Amino Acid Polymorphisms (SAP) allowed per peptide.
Default value: -as 1.
Allowed parameter [0,2]
-as 0 = does not search for annotated SAP.
-as 1 = search for 1 annotated SAP per peptide.
-as 2 = search for 2 annotated SAP per peptide.

-ap Number of annotated Post-Translation Modifications (PTM) allowed per peptide.
Default value: -ap 2.
Allowed parameter range [0,5].
-ap 0 = does not search for annotated PTM.
-ap 1 = 1 annotated PTM per peptide.
-ap 2 = 2 annotated PTM per peptide.

-ud User annotated database.
-ud /path/user_database_name

-db Annotated (unannotated) protein database used.
-db /path/database_name

-f MS/MS spectrum file name.
-f file_name

-ip MS/MS spectrum file directory path.
-ip /path/

-op Output search results path.
-op /path/

EXAMPLES

Executing RAId_DbS

RAId_DbS -ez 1 -dt 0.05 -pt 0.8 -ng 1.007825 -cg 17.002739 -lb 2 -ub 4 -mc COO -sc b,y,b-H2O -ns 0 -as 1 -ap 2 -db $HOME/Human -f example.dta -ip $HOME -op $HOME.

The example above would execute RAId_DbS using:

Trypsin as the enzyme -ez 1

Molecular error tolerance of daughter ion 0.05 Da. -dt 0.05

Molecular error tolerance of parent ion 0.8 Da. -pt 0.05

N-terminal group hydrogen -ng 1.0078

C-terminal group Free Acid -cg 17.0027

Searching assuming precursor ion molecular to have charge between [2,4] -lb 2 -ub 4

Cysteine unmodified -mc C00

Fragmented series used to score peptide b,y,b-H20 -sc b,y,b-H2O

Searching for novel SAP option off -ns 0

Searching with annotated SAP option on -as 1

Two annotated PTMs allowed per peptide -ap 2

Database used for the search located at $HOME directory database_name = Human -db $HOME/Human

MS/MS file used example.dta -f example.dta

Path (folder) of the MS/MS file used is the $HOME directory -ip $HOME

Path (folder) to place search result is the $HOME directory -op $HOME


Formatting Database -fp

-fp /path/input_database_filename /path/output_database_filename

After the database is formatted the 3 files will be created.

output_database_filename.def, full_output_db_filename.seq, full_output_db_filename.prs.

User Annotated Database -ud

To generate a user specified database the user need to create two files: a fasta file of sequences where the first word of the sequence descriptor will be used as the sequence identifier (>seq_identifier) and a second file containing the user expertise knowledge related to PTMs, SAPs and disease.

Fasta file example:
>Id_Seq1 Isoform alpha
MLLATLLLLLLGGALAHPDRIIFPNHACEDPPAVLLEVQGTLQRPLVRDSRTSPANCTWLILGSKEQTVT
IRFQKLHLACGSERLTLRSPLQPLISLCEAPPSPLQLPGGNVTITYSYAGARAPMGQGFLLSYSQDWLMC
LQEEFQCLNHRCVSAVQRCDGVDACGDGSDEAGCSSDPFPGLTPRPVPSLPCNVTLEDFYGVFSSPGYTH
...
>Id_Seq2 Isoform beta
MLLATVVVVTSGGALAHPDRIIFPNHACEDPPAVLLEVQGTLQRPLVRDSRTSPANCTWLILGSKEQTVT
IRFQKLHLACGSERLTLRSPLQPLISLCEAPPSPLQLPGGNVTITYSYAGARAPMGQGFLLSYSQDWLMC
LQEEFQCLNHRCVSAVQRCDGVDACGDGSDEAGCSSDPFPGLTPRPVPSLPCNVTLEDFYGVFSSPGYTH
...
Id_Seq1 and Id_Seq2 are going to be used as the sequence identifiers.

User knowledge file example:
>Id_Seq1
48 SAP A W deadly cancer
56 PTM N N08,N09,N10,N11,N12
111 PTM N N08,N09,N10,N11,N12
139 SAP M V diabetes
193 SAP N L,I,V
193 PTM N N08
299 PTM N N08,N09,N10,N11,N12
365 SAP A T color blind
434 SAP S C,T,V,P insulin dependent diabetes
558 SAP R H,P,W
>Id_Seq2
48 SAP R W deadly cancer
56 PTM N N08,N09,N10,N11,N12
111 PTM N N08,N09,N10,N11,N12
139 SAP M V diabetes
193 SAP N L,I,V
193 PTM N N08
299 PTM N N08,N09,N10,N11,N12
365 SAP A T color blind
434 SAP S C,T,V,P insulin dependent diabetes
558 SAP R H,P,W

Where the above files have the following structure
>seq_identifier
First column field = residue position.
Second column field = SAP or PTM.
Third column field = original residue in the sequence.
Fourth column field = either a list of possible SAPs (L,I,V) or a list of possible PTMs (N08,N09,N10,N11,N12).
Fifth column field = disease name if any at the given position.

Once the user has created the two files as described above the user can generate a format database which RAId_DbS can process by executing UserDb.pl script.

$ ./UserDb.pl fasta_file_name knowledge_file_name output_format_database

The output_format_database is the database that can be process by RAId_DbS using -ud option.

Directories

Directory PARAMETER_FILES

The directory "PARAMETER_FILES" should contain the following four files: PTMTABLE_REGULARAMINO, Parameter_File, Parameter_File_Tag and RANDOM_DB_RAID.


"PTMTABLE_REGULARAMINO" is a file that contains amino acid residue information post-translation modification and others. The user can add any new post-translation modification to this file as long as he keeps the same annotation structure present in the file as shown below.

Line Code Description
ID Chemical Name of Amino Acid/PTM
AC Residue Key
TG Target Unmodified Amino Acid
RW Unmodified Amino Acid Molecular Weight
MW Modified Amino Acid Molecular Weight
PA Location of the Modification in the Amino Acid Residue
PP Position of the Amino Residue in the Peptide
CF Chemical Modification to the Amino Acid Residue
MM Monoisotopic Mass Difference MM=MW-RW
KY Other Common Names Used to Identify the Same Molecule
LT Other Terms Found In Literature not Necessary Correct Names

Example of the addition of a new amino acid Glycine to the "PTMTABLE_REGULARAMINO" file.
ID Glycine
AC G00
TG Glycine
RW 57.021465
MW 57.021465
PA Peptide Back-Bone
PP None
CF None
MM 0
KY None
LT None

Copyright | Disclaimer | Privacy | Accessibility
Maintained by Aleksey Y. Ogurtsov. Last update: November 16, 2009