Send to

Choose Destination
Comput Biol Chem. 2007 Apr;31(2):138-42. Epub 2007 Feb 23.

Integrating subcellular location for improving machine learning models of remote homology detection in eukaryotic organisms.

Author information

Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99352, USA.


A significant challenge in homology detection is to identify sequences that share a common evolutionary ancestor, despite significant primary sequence divergence. Remote homologs will often have less than 30% sequence identity, yet still retain common structural and functional properties. We demonstrate a novel method for identifying remote homologs using a support vector machine (SVM) classifier trained by fusing sequence similarity scores and subcellular location prediction. SVMs have been shown to perform well in a variety of applications where binary classification of data is the goal. At the same time, data fusion methods have been shown to be highly effective in enhancing discriminative power of data. Combining these two approaches in the application SVM-SimLoc resulted in identification of significantly more remote homologs (p-value<0.006) than using either sequence similarity or subcellular location independently.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center