PubMed Entrez BLAST OMIM Taxonomy Structure

Spidey Home

 

Guide to running Spidey as a standalone program: what do all those options do?

Spidey has many runtime options whose function may or may not be clear. What follows is a guide to the different options, giving the string that describes the option, the command-line flag for the option, and a short description of the option's purpose and use.

 

Input file -- genomic sequence(s) (Required) type: File flag: -i

This one is a little misleading, as Spidey currently only accepts a single genomic sequence at a time. The input to this option should be 1) the name of a file containing a FASTA, ASN.1, or GenBank flatfile genomic record, or 2) a genomic accession or gi, if your computer is running on a network that can access GenBank.

Input file -- mRNA sequence(s) (Required) type: File flag: -m

The input to this option should be 1) the name of a file containing one or more FASTA, ASN.1, or GenBank flatfile sequences, 2) the name of a file containing one or more accessions or gi numbers, if your computer can access GenBank (you need to set 'Input file is a GI list' to T/t if your file has accessions/gi numbers), or 3) a single mRNA accession or gi, if your computer can access GenBank.

Print alignment? 0=summary+alignments, 1=summary, 2=alignments, 3=summary&alignments in different files (Optional) type: Integer flag: -p

You can choose to print a text summary of the alignment, the alignment itself, both, or neither. The output for choices 0-2 will go to the output file specified by the -o option, or to stdout if nothing is chosen. For choice #3, the summary will appear in the output file (-o) and the alignment will appear in the alignment file (-a).

Output file 1 (summary or summary+aln) (Optional) type: string flag: -o

This should be a string, representing the name of the file in which you want Spidey to print its output. If nothing is set here, Spidey will print to stdout (on a computer with no stdout, Spidey creates a file named stdout and prints there).

Output file 2 (alignments) (Optional) type: string flag: -a

If the 'print alignment' choice is #3, this option can direct the printed exon alignments to the file of your choice. If the 'print alignment' choice is #3 and this option is not set, the exon alignments will be printed to a file named spidey.aln.

Input file is a GI list (Optional) type: T/F flag: -G

If the mRNA file is an accession/GI list instead of a sequence file, you need to set this option to T/t so that Spidey knows to fetch these records.

Number of gene models (Optional) type: integer flag: -n

Spidey can return multiple models for each input mRNA. You can set the maximum number of models to return per mRNA; if not enough models fall above the cutoffs, some of the models may be 'no alignment found'. If this option is not set, the default is to return the single best model for each mRNA.

Organism (genomic sequence) (Optional) type: string flag: -r

Spidey needs to know what organism the genomic sequence comes fromk, so that it knows which splice matrices to use. The options are v for vertebrate, d for drosophila, p for plants, and c for C. elegans. The default is vertebrate.

First-pass e-value (Optional) type: real flag: -e

This is the expectation value for the first BLAST run that Spidey does. The higher the value, the faster the run, but the greater the chance of missing something important. Default is 0.0000001.

Second-pass e-value (Optional) type: real flag: -f

This is the expectation value for the second, more careful BLAST run that Spidey performs on regions that look promising from the first BLAST. Again, this is a balance between speed and sensitivity. Default is 0.001.

Third-pass e-value (Optional) type: real flag: -g

This is the expectation value cutoff for the third and final BLAST that Spidey does to fill in any remaining spaces between adjacent exons. Since this search is so constrained, speed is not really an issue, so this value has been set rather high -- the default is 10.

% identity cutoff (Optional) type: integer flag: -c

For quality control, you may want to return only those models which fall above a certain percent identity overall. In that case, set this value to something nonzero (the default value is 0) and Spidey will report 'no alignment' if it could not find any models above the cutoff.

% length coverage cutoff (Optional) type: integer flag: -l

If this option is set above zero (default is 0), Spidey will only return models in which the percentage of the length of the mRNA that is contained in the alignment is above the cutoff.

interspecies alignment (Optional) type: T/F flag: -s

If this option is set to true (T/t), Spidey will adjust its gap opening and gap extension parameters to encourage longer, gappier alignments like those that you see between species. The default is false.

Print ASN.1 alignment? (Optional) type: T/F flag: -j

If this option is set to true (T/t), Spidey will print the ASN.1 Seq-align(s) corresponding to the mRNA alignment(s). See the next option for information about how to control where the output goes. Default is false.

File for asn.1 (Optional) type: string flag: -k

If the 'print ASN.1 alignment' option is true, this option can be used to set a file where the ASN.1 gets printed. If the 'print ASN.1 alignment' option is true and this option is not set, the ASN.1 output goes to a file named spidey.asn.

Is the input mRNA masked (lowercase)? (Optional) type: T/F flag: -w

If the mRNA input is masked FASTA sequence, you may want to retain that masking by setting this option to T/t. If this option is F/f (default), any lowercase characters in the mRNA input are treated as regular sequence characters.

Fetch the CDS and compute its results also? (Optional) type: T/F flag: -d

If your computer is network-aware or can otherwise access GenBank, Spidey can extract a CDS alignment from an mRNA alignment. If the mRNA record can be fetched (or if it has been given in ASN.1 instead of accession/gi), and there is a CDS annotated, this option may be useful. The default is false.

File with feature table (Optional) type: file flag: -t

If you have masked the mRNA sequences and have generated a table instead of lowercase masked output, you can feed this information to Spidey. The table format is:

sequence id name of feature start stop
NM_004377.1 repetitive_region 12 40

Currently Spidey only recognizes repetitive_region features.

Start of genomic interval desired (from) (Optional) type: integer flag: -F

If you know that the mRNA is contained in a certain interval of the genomic sequence, you may restrict Spidey's search to that interval to increase speed and sensitivity. If nothing is entered here, Spidey will search the genomic sequence starting at 0.

Stop of genomic interval desired (to) (Optional) type: integer flag: -T

If you know that the mRNA is contained in a certain interval of the genomic sequence, you may restrict Spidey's search to that interval to increase speed and sensitivity. If nothing is entered here, Spidey will search to the end of the genomic sequence.

Make a multiple alignment of all input mRNAs (Optional) type: boolean flag: -u

If you have multiple mRNAs and they overlap on the genomic sequence, Spidey can print a multiple alignment of all mRNAs, exon by exon as they overlap. If the mRNAs do not overlap you will get an error message.

Use extra-large intron sizes (Optional) type: boolean flag: -X

The maximum intron size is set at 35kb, with the maximum first and last intron sizes at 100kb. If you have an mRNA with introns larger than these limits, iset -XT to get a maximum internal intron size of 120kb and a maximum first and last intron size of 240kb. Using this option will often result in significantly longer compute times.

 

Spidey executable FAQ Download Source Privacy statement Disclaimer