Large margin classifiers and Random Forests for integrated biological prediction

Int J Bioinform Res Appl. 2012;8(1-2):38-53. doi: 10.1504/IJBRA.2012.045975.

Abstract

Incorporating various sources of biological information is important for biological discovery. For example, genes have a multiview representation. They can be represented by features such as sequence length and pairwise similarities. Hence, the types vary from numerical features to categorical features. We propose a large margin Random Forests (RF) classification approach based on RF proximity kernals. Random Forests accommodate mixed data types naturally. The performance on four biological datasets is promising compared with other state of the art methods including Support Vector Machines (SVMs) and RF classifiers. It demonstrates high potential in the discovery of functional roles of biomolecules.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Computational Biology / methods*
  • Gene Expression Profiling / methods
  • Pattern Recognition, Automated
  • Support Vector Machine