Gap opening penalty and gap extension penalty for gaps inside of a sequence used in pairwise global alignment
in the progressive alignment stage.
Gap opening penalty and gap extension penalty for gaps at ends of a sequence used in pairwise global alignment
in the progressive alignment stage.
Use RPS-BLAST to find conserved domains in query sequences to guide alignment. The sequence matches to conserved
domains will be converted into pair wise alignment constraints. Ranges of input sequences that match to the same
conserved domain will be aligned to each other in the final multiple alignment. We strongly recommend checking
This box can be unchecked in order to decrease computation time if all sequences are expected to match to the
same conserved domains or not to match to any conserved domain. COBALT is optimized for cases where groups of
sequences do match to the same domain (see Query Clustering below).
Note: Unchecking this box in other cases will result in poorer alignment.
E-value threshold for accepting BLAST-P hits in pair wise local alignment of input sequences. The accepted
matches will be converted into pair wise alignment constraints. Pair wise locally aligned ranges of input
sequences will be aligned to one another in the multiple alignment. E-value can be increased if very dissimilar
sequences are used.
Note: Changing this value can significantly impact quality of the resulting alignment.
Identify conserved columns after the first iteration of progressive alignment and re-align input sequences using
this information. Unchecking this box will reduce computation time but will also result in poorer alignment
(especially if Use query clusters box is checked). We strongly recommend checking this box.
Reduce computation time by using clusters of similar sequences. The idea behind using clusters is that constraints
do not contribute information for alignment of very similar sequences. Then computationally intensive tasks
of identifying conserved domains and consistent set of constraints can be avoided for many sequences. Clusters of
similar sequences are found using alignment-free k-mer counting-based method. See Edgar RC, Nucleic Acids Res
16:380-5, 2004, PMID: 14729922 for k-mer counting-based sequence similarity.
Constraints will be computed only for cluster representatives. In-cluster sequences will be aligned using combined
local and global alignment. We recommend that the Find Conserved Columns and Recompute Alignment option (above) is
We recommend using this option for aligning BLAST results and whenever a subset of input sequences that
share conserved domains is expected.
This option can be unchecked for aligning of sequences that are not expected to share conserved domains and are
expected to have very short pair wise local alignments.
Number of letters in a word (k-mer) for k-mer count-based sequence similarity computation. Smaller words will
make sequences more similar than larger words.
Maximum allowed distance between two sequences in a cluster. This threshold prvents COBALT from forming clusters
o unrelated sequences.
The distance between two sequences is computed as a fraction of words that appear in both sequences with respect
to number of all words in the longer sequence (similarly as in Edgar RC, Nucleic Acids Res
16:380-5, 2004, PMID: 14729922).
This distance overestimates exponentially scaled percentage of different residues in aligned sequences (see graphs
in the above paper for details).
Allowed range for this threshold is between 0 and 1. Smaller values result in more clusters and hence more
conserved domain-based constraints used in multiple alignment. Larger values result in fewer clusters and
hence less conserved domain information used in multiple alignment.