A protein mapping method based on physicochemical properties and dimension reduction

Comput Biol Med. 2015 Feb:57:1-7. doi: 10.1016/j.compbiomed.2014.11.012. Epub 2014 Nov 28.

Abstract

Background: The graphical mapping of a protein sequence is more difficult than the graphical mapping of a DNA sequence because of the twenty amino acids and their complicated physicochemical properties. However, the graphical mapping for protein sequences attracts many researchers to develop different mapping methods. Currently, researchers have proposed their mapping methods based on several physicochemical properties. In this article, a new mapping method for protein sequences is developed by considering additional physicochemical properties, which is a simple and effective approach.

Methods: Based on the 12 major physicochemical properties of amino acids and the PCA method, we propose a simple and intuitive 2D graphical mapping method for protein sequences. Next, we extract a 20D vector from the graphical mapping which is used to characterize a protein sequence.

Results: The proposed graphical mapping consists of three important properties, one-to-one, no circuit, and good visualization. This mapping contains more physicochemical information. Next, this proposed method is applied to two separate applications. The results illustrate the utility of the proposed method.

Discussion: To validate the proposed method, we first give a comparison of protein sequences, which consists of nine ND6 proteins. The similarity/dissimilarity matrix for the ssnine ND6 proteins correctly reveals their evolutionary relationship. Next, we give another application for the cluster analysis of HA genes of influenza A (H1N1) isolates. The results are consistent with the known evolution fact of the H1N1 virus. The separate applications further illustrate the utility of the proposed method.

Keywords: Dimension reduction; Graphical representation; H1N1; ND6 protein; Physicochemical property; Protein sequence.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Cluster Analysis
  • Computational Biology / methods*
  • Humans
  • Mammals
  • Principal Component Analysis
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins