Bio-support vector machines for computational proteomics

Zheng Rong Yang; Kuo-Chen Chou

doi:10.1093/bioinformatics/btg477

Bio-support vector machines for computational proteomics

Bioinformatics. 2004 Mar 22;20(5):735-41. doi: 10.1093/bioinformatics/btg477. Epub 2004 Jan 29.

Authors

Zheng Rong Yang¹, Kuo-Chen Chou

Affiliation

¹ Department of Computer Science, Exeter University, Exeter EX4 4PT, UK. Z.R.Yang@exeter.ac.uk

PMID: 14751989
DOI: 10.1093/bioinformatics/btg477

Abstract

Motivation: One of the most important issues in computational proteomics is to produce a prediction model for the classification or annotation of biological function of novel protein sequences. In order to improve the prediction accuracy, much attention has been paid to the improvement of the performance of the algorithms used, few is for solving the fundamental issue, namely, amino acid encoding as most existing pattern recognition algorithms are unable to recognize amino acids in protein sequences. Importantly, the most commonly used amino acid encoding method has the flaw that leads to large computational cost and recognition bias.

Results: By replacing kernel functions of support vector machines (SVMs) with amino acid similarity measurement matrices, we have modified SVMs, a new type of pattern recognition algorithm for analysing protein sequences, particularly for proteolytic cleavage site prediction. We refer to the modified SVMs as bio-support vector machine. When applied to the prediction of HIV protease cleavage sites, the new method has shown a remarkable advantage in reducing the model complexity and enhancing the model robustness.

Publication types

Comparative Study
Evaluation Study
Validation Study

MeSH terms

Algorithms
Amino Acid Sequence
Artificial Intelligence*
Binding Sites
Computational Biology / methods
Computing Methodologies
Databases, Protein
Molecular Sequence Data
Pattern Recognition, Automated*
Protein Binding
Proteins / chemistry*
Proteins / metabolism*
Proteome / chemistry
Proteome / metabolism
Proteomics / methods*
Sequence Alignment / methods*
Sequence Analysis, Protein / methods*
Structure-Activity Relationship

Substances

Proteins
Proteome