Format

Send to:

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2000 Oct;16(10):915-22.

CAST: an iterative algorithm for the complexity analysis of sequence tracts. Complexity analysis of sequence tracts.

Author information

  • 1Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens GR-15701, Greece.

Abstract

MOTIVATION:

Sensitive detection and masking of low-complexity regions in protein sequences. Filtered sequences can be used in sequence comparison without the risk of matching compositionally biased regions. The main advantage of the method over similar approaches is the selective masking of single residue types without affecting other, possibly important, regions.

RESULTS:

A novel algorithm for low-complexity region detection and selective masking. The algorithm is based on multiple-pass Smith-Waterman comparison of the query sequence against twenty homopolymers with infinite gap penalties. The output of the algorithm is both the masked query sequence for further analysis, e.g. database searches, as well as the regions of low complexity. The detection of low-complexity regions is highly specific for single residue types. It is shown that this approach is sufficient for masking database query sequences without generating false positives. The algorithm is benchmarked against widely available algorithms using the 210 genes of Plasmodium falciparum chromosome 2, a dataset known to contain a large number of low-complexity regions.

AVAILABILITY:

CAST (version 1.0) executable binaries are available to academic users free of charge under license. Web site entry point, server and additional material: http://www.ebi.ac.uk/research/cgg/services/cast/

PMID:
11120681
[PubMed - indexed for MEDLINE]
Free full text
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire
    Loading ...
    Write to the Help Desk