Phylogenetic inference in protein superfamilies: analysis of SH2 domains

Proc Int Conf Intell Syst Mol Biol. 1998:6:165-74.

Abstract

This work focuses on the inference of evolutionary relationships in protein superfamilies, and the uses of these relationships to identify key positions in the structure, to infer attributes on the basis of evolutionary distance, and to identify potential errors in sequence annotations. Relative entropy, a distance metric from information theory, is used in combination with Dirichlet mixture priors to estimate a phylogenetic tree for a set of proteins. This method infers key structural or functional positions in the molecule, and guides the tree topology to preserve these important positions within subtrees. Minimum-description-length principles are used to determine a cut of the tree into subtrees, to identify the subfamilies in the data. This method is demonstrated on SH2-domain containing proteins, resulting in a new subfamily assignment for Src2-drome and a suggested evolutionary relationship between Nck_human and Drk_drome, Sem5_caeel, Grb2_human and Grb2_chick.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Artificial Intelligence
  • Bayes Theorem
  • Binding Sites / genetics
  • Evolution, Molecular
  • Humans
  • Likelihood Functions
  • Molecular Sequence Data
  • Phylogeny*
  • Proteins / chemistry
  • Proteins / classification*
  • Proteins / genetics*
  • Sequence Homology, Amino Acid
  • Software
  • src Homology Domains / genetics*

Substances

  • Proteins