Using hybrid hierarchical K-means (HHK) clustering algorithm for protein sequence motif super-rule-tree (SRT) structure construction

Int J Data Min Bioinform. 2010;4(3):316-30. doi: 10.1504/ijdmb.2010.033523.

Abstract

Many algorithms or techniques to discover motifs require a predefined fixed window size in advance. Because of the fixed size, these approaches often deliver a number of similar motifs simply shifted by some bases or including mismatches. To confront the mismatched motifs problem, we use the super-rule concept to construct a Super-Rule-Tree (SRT) by a modified Hybrid Hierarchical K-means (HHK) clustering algorithm, which requires no parameter set-up to identify the similarities and dissimilarities between the motifs. By analysing the motif results generated by our approach, they are significant not only in sequence area but also in secondary structure similarity.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Amino Acid Motifs*
  • Amino Acid Sequence
  • Cluster Analysis
  • Pattern Recognition, Automated / methods
  • Protein Structure, Secondary
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins