Format

Send to

Choose Destination
Comput Biol Chem. 2014 Feb;48:64-70. doi: 10.1016/j.compbiolchem.2013.11.004. Epub 2013 Dec 1.

Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

Author information

1
School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea; School of Computational Sciences, Korea Institute of Advanced Study, Seoul 130-722, Republic of Korea.
2
School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea.
3
Center for Advanced Computation, Korea Institute for Advanced Study, Seoul 130-722, Republic of Korea.
4
School of Computational Sciences, Korea Institute of Advanced Study, Seoul 130-722, Republic of Korea.
5
School of Chemical and Biological Engineering, Seoul National University, Seoul 151-742, Republic of Korea. Electronic address: byungkim@snu.ac.kr.

Abstract

Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping.

KEYWORDS:

Optimum subgrouping node; Phylogenetic tree; Protein family discrimination; Statistical analysis; Subgrouping

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center