Format

Send to

Choose Destination
See comment in PubMed Commons below
PLoS One. 2007 Jun 27;2(6):e579.

Indexing strategies for rapid searches of short words in genome sequences.

Author information

1
Ludwig Institute for Cancer Research, Bâtiment Génopode, Université de Lausanne, Lausanne, Switzerland; Swiss Institute of Bioinformatics, Bátiment Génopode, Université de Lausanne, Lausanne, Switzerland. Christian. Iseli@lic.org

Abstract

Searching for matches between large collections of short (14-30 nucleotides) words and sequence databases comprising full genomes or transcriptomes is a common task in biological sequence analysis. We investigated the performance of simple indexing strategies for handling such tasks and developed two programs, fetchGWI and tagger, that index either the database or the query set. Either strategy outperforms megablast for searches with more than 10,000 probes. FetchGWI is shown to be a versatile tool for rapidly searching multiple genomes, whose performance is limited in most cases by the speed of access to the filesystem. We have made publicly available a Web interface for searching the human, mouse, and several other genomes and transcriptomes with oligonucleotide queries.

PMID:
17593978
PMCID:
PMC1894650
DOI:
10.1371/journal.pone.0000579
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Support Center