Format

Send to

Choose Destination
Bioinformatics. 2015 May 15;31(10):1544-52. doi: 10.1093/bioinformatics/btu851. Epub 2015 Jan 8.

PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment.

Author information

1
Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.
2
Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland Department of Biosciences, University of Helsinki, 00014 Helsinki, Finland and Institute of Biotechnology, University of Helsinki, 00014 Helsinki, Finland.

Abstract

MOTIVATION:

The last decade has seen a remarkable growth in protein databases. This growth comes at a price: a growing number of submitted protein sequences lack functional annotation. Approximately 32% of sequences submitted to the most comprehensive protein database UniProtKB are labelled as 'Unknown protein' or alike. Also the functionally annotated parts are reported to contain 30-40% of errors. Here, we introduce a high-throughput tool for more reliable functional annotation called Protein ANNotation with Z-score (PANNZER). PANNZER predicts Gene Ontology (GO) classes and free text descriptions about protein functionality. PANNZER uses weighted k-nearest neighbour methods with statistical testing to maximize the reliability of a functional annotation.

RESULTS:

Our results in free text description line prediction show that we outperformed all competing methods with a clear margin. In GO prediction we show clear improvement to our older method that performed well in CAFA 2011 challenge.

PMID:
25653249
DOI:
10.1093/bioinformatics/btu851
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center