Format

Send to

Choose Destination
PLoS One. 2014 Feb 24;9(2):e89890. doi: 10.1371/journal.pone.0089890. eCollection 2014.

Evaluation of sequence features from intrinsically disordered regions for the estimation of protein function.

Author information

1
School of Engineering and Physics, The University of the South Pacific, Suva, Fiji ; Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
2
Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia ; National Information and Communication Technology Australia (NICTA), Brisbane, Australia.
3
School of Engineering, Griffith University, Brisbane, Australia.
4
Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, Japan.

Abstract

With the exponential increase in the number of sequenced organisms, automated annotation of proteins is becoming increasingly important. Intrinsically disordered regions are known to play a significant role in protein function. Despite their abundance, especially in eukaryotes, they are rarely used to inform function prediction systems. In this study, we extracted seven sequence features in intrinsically disordered regions and developed a scheme to use them to predict Gene Ontology Slim terms associated with proteins. We evaluated the function prediction performance of each feature. Our results indicate that the residue composition based features have the highest precision while bigram probabilities, based on sequence profiles of intrinsically disordered regions obtained from PSIBlast, have the highest recall. Amino acid bigrams and features based on secondary structure show an intermediate level of precision and recall. Almost all features showed a high prediction performance for GO Slim terms related to extracellular matrix, nucleus, RNA and DNA binding. However, feature performance varied significantly for different GO Slim terms emphasizing the need for a unique classifier optimized for the prediction of each functional term. These findings provide a first comprehensive and quantitative evaluation of sequence features in intrinsically disordered regions and will help in the development of a more informative protein function predictor.

PMID:
24587103
PMCID:
PMC3933697
DOI:
10.1371/journal.pone.0089890
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center