Display Settings:

Format

Send to:

Choose Destination
We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
    BMC Bioinformatics. 2008 Apr 15;9:197. doi: 10.1186/1471-2105-9-197.

    Accelerating string set matching in FPGA hardware for bioinformatics research.

    Source

    Institute of Digital Biology, Mississippi State University, Mississippi 39762, USA. yogi@cse.msstate.edu

    Abstract

    BACKGROUND:

    This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences).

    RESULTS:

    This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs.

    CONCLUSION:

    Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.

    PMID:
    18412963
    [PubMed - indexed for MEDLINE]
    PMCID:
    PMC2374783
    Free PMC Article

    Images from this publication.See all images (4)Free text

    Figure 1
    Figure 2
    Figure 3
    Figure 4

      Supplemental Content

      Icon for BioMed Central Icon for PubMed Central

      Save items

      Recent activity

      Your browsing activity is empty.

      Activity recording is turned off.

      Turn recording back on

      See more...
      Write to the Help Desk