Warning: The NCBI web site requires JavaScript to function. more...
Generate a file for use with external citation management software.
Division of Mathematics, Computer Science and Statistics, University of Texas, San Antonio 78249-0664.
An efficient algorithm is described for finding matches, repeats and other word relations, allowing for errors, in large data sets of long molecular sequences. The algorithm entails hashing on fixed-size words in conjunction with the use of a linked list connecting all occurrences of the same word. The average memory and run time requirement both increase almost linearly with the total sequence length. Some results of the program's performance on a database of Escherichia coli DNA sequences are presented.
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on