Format

Send to

Choose Destination
Genome Biol. 2016 Sep 7;17(1):184. doi: 10.1186/s13059-016-1037-6.

An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Jiang Y1, Oron TR2, Clark WT3, Bankapur AR4, D'Andrea D5, Lepore R5, Funk CS6, Kahanda I7, Verspoor KM8,9, Ben-Hur A7, Koo da CE10, Penfold-Brown D11,12, Shasha D13, Youngs N12,13,14, Bonneau R13,14,15, Lin A16, Sahraeian SM17, Martelli PL18, Profiti G18, Casadio R18, Cao R19, Zhong Z19, Cheng J19, Altenhoff A20,21, Skunca N20,21, Dessimoz C22,23,24, Dogan T25, Hakala K26,27, Kaewphan S26,27,28, Mehryary F26,27, Salakoski T26,28, Ginter F26, Fang H29, Smithers B29, Oates M29, Gough J29, Törönen P30, Koskinen P30, Holm L30,31, Chen CT32, Hsu WL32, Bryson K22, Cozzetto D22, Minneci F22, Jones DT22, Chapman S33, Bkc D33, Khan IK34, Kihara D34,35, Ofer D36, Rappoport N36,37, Stern A36,37, Cibrian-Uhalte E25, Denny P38, Foulger RE38, Hieta R25, Legge D25, Lovering RC38, Magrane M25, Melidoni AN38, Mutowo-Meullenet P25, Pichler K25, Shypitsyna A25, Li B2, Zakeri P39,40, ElShal S39,40, Tranchevent LC41,42,43, Das S44, Dawson NL44, Lee D44, Lees JG44, Sillitoe I44, Bhat P45, Nepusz T46, Romero AE47, Sasidharan R48, Yang H49, Paccanaro A47, Gillis J50, Sedeño-Cortés AE51, Pavlidis P52, Feng S1, Cejuela JM53, Goldberg T53, Hamp T53, Richter L53, Salamov A54, Gabaldon T55,56,57, Marcet-Houben M55,56, Supek F56,58,59, Gong Q60,61, Ning W60,61, Zhou Y60,61, Tian W60,61, Falda M62, Fontana P63, Lavezzo E62, Toppo S62, Ferrari C64, Giollo M64,65, Piovesan D64, Tosatto SC64, Del Pozo A66, Fernández JM67, Maietta P68, Valencia A68, Tress ML68, Benso A69, Di Carlo S69, Politano G69, Savino A69, Rehman HU70, Re M71, Mesiti M71, Valentini G71, Bargsten JW72, van Dijk AD72,73, Gemovic B74, Glisic S74, Perovic V74, Veljkovic V74, Veljkovic N74, Almeida-E-Silva DC75, Vencio RZ75, Sharan M76, Vogel J76, Kansakar L77, Zhang S77, Vucetic S77, Wang Z78, Sternberg MJ79, Wass MN80, Huntley RP25, Martin MJ25, O'Donovan C25, Robinson PN81, Moreau Y82, Tramontano A5, Babbitt PC83, Brenner SE17, Linial M84, Orengo CA44, Rost B53, Greene CS85, Mooney SD86, Friedberg I87,88, Radivojac P89.

Author information

1
Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA.
2
Buck Institute for Research on Aging, Novato, CA, USA.
3
Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA.
4
Department of Microbiology, Miami University, Oxford, OH, USA.
5
University of Rome, La Sapienza, Rome, Italy.
6
Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO, USA.
7
Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
8
Department of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia.
9
Health and Biomedical Informatics Centre, University of Melbourne, Parkville, Victoria, Australia.
10
Department of Biology, New York University, New York, NY, USA.
11
Social Media and Political Participation Lab, New York University, New York, NY, USA.
12
CY Data Science, New York, NY, USA.
13
Department of Computer Science, New York University, New York, NY, USA.
14
Simons Center for Data Analysis, New York, NY, USA.
15
Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
16
Department of Electrical Engineering and Computer Sciences, University of California Berkeley, Berkeley, CA, USA.
17
Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, USA.
18
Biocomputing Group, BiGeA, University of Bologna, Bologna, Italy.
19
Computer Science Department, University of Missouri, Columbia, MO, USA.
20
ETH Zurich, Zurich, Switzerland.
21
Swiss Institute of Bioinformatics, Zurich, Switzerland.
22
Bioinformatics Group, Department of Computer Science, University College London, London, UK.
23
University of Lausanne, Lausanne, Switzerland.
24
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
25
European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK.
26
Department of Information Technology, University of Turku, Turku, Finland.
27
University of Turku Graduate School, University of Turku, Turku, Finland.
28
Turku Centre for Computer Science, Turku, Finland.
29
University of Bristol, Bristol, UK.
30
Institute of Biotechnology, University of Helsinki, Helsinki, Finland.
31
Department of Biological and Environmental Sciences, Universitity of Helsinki, Helsinki, Finland.
32
Institute of Information Science, Academia Sinica, Taipei, Taiwan.
33
Department of Computational Science and Engineering, North Carolina A&T State University, Greensboro, NC, USA.
34
Department of Computer Science, Purdue University, West Lafayette, IN, USA.
35
Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
36
Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
37
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
38
Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, London, UK.
39
Department of Electrical Engineering, STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Leuven, Belgium.
40
iMinds Department Medical Information Technologies, Leuven, Belgium.
41
Inserm UMR-S1052, CNRS UMR5286, Cancer Research Centre of Lyon, Lyon, France.
42
Université de Lyon 1, Villeurbanne, France.
43
Centre Léon Bérard, Lyon, France.
44
Institute of Structural and Molecular Biology, University College London, London, UK.
45
Cerenode Inc., Boston, MA, USA.
46
Molde University College, Molde, Norway.
47
Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
48
Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA, USA.
49
School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Ireland.
50
Stanley Institute for Cognitive Genomics Cold Spring Harbor Laboratory, New York, NY, USA.
51
Graduate Program in Bioinformatics, University of British Columbia, Vancouver, Canada.
52
Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, Canada.
53
Department for Bioinformatics and Computational Biology-I12, Technische Universität München, Garching, Germany.
54
DOE Joint Genome Institute, Walnut Creek, CA, USA.
55
Bioinformatics and Genomics, Centre for Genomic Regulation, Barcelona, Spain.
56
Universitat Pompeu Fabra, Barcelona, Spain.
57
Institució Catalana de Recerca i Estudis Avançats, Barcelona, Spain.
58
Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia.
59
EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation, Barcelona, Spain.
60
State Key Laboratory of Genetic Engineering, Collaborative Innovation Center of Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Science, Fudan University, Shanghai, China.
61
Children's Hospital of Fudan University, Shanghai, China.
62
Department of Molecular Medicine, University of Padua, Padua, Italy.
63
Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy.
64
Department of Information Engineering, University of Padua, Padova, Italy.
65
Department of Biomedical Sciences, University of Padua, Padova, Italy.
66
Instituto De Genetica Medica y Molecular, Hospital Universitario de La Paz, Madrid, Spain.
67
Spanish National Bioinformatics Institute, Spanish National Cancer Research Institute, Madrid, Spain.
68
Structural and Computational Biology Programme, Spanish National Cancer Research Institute, Madrid, Spain.
69
Control and Computer Engineering Department, Politecnico di Torino, Torino, Italy.
70
National University of Computer & Emerging Sciences, Islamabad, Pakistan.
71
Anacleto Lab, Dipartimento di informatica, Università degli Studi di Milano, Milan, Italy.
72
Applied Bioinformatics, Bioscience, Wageningen University and Research Centre, Wageningen, Netherlands.
73
Biometris, Wageningen University, Wageningen, Netherlands.
74
Center for Multidisciplinary Research, Institute of Nuclear Sciences Vinca, University of Belgrade, Belgrade, Serbia.
75
Department of Computing and Mathematics FFCLRP-USP, University of Sao Paulo, Ribeirao Preto, Brazil.
76
Institute for Molecular Infection Biology, University of Würzburg, Würzburg, Germany.
77
Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.
78
University of Southern Mississippi, Hattiesburg, MS, USA.
79
Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Imperial College London, London, UK.
80
School of Biosciences, University of Kent, Canterbury, Kent, UK.
81
Institut für Medizinische Genetik und Humangenetik, Charité - Universitätsmedizin Berlin, Berlin, Germany.
82
Department of Electrical Engineering ESAT-SCD and IBBT-KU Leuven Future Health Department, Katholieke Universiteit Leuven, Leuven, Belgium.
83
California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, CA, USA.
84
Department of Chemical Biology, The Hebrew University of Jerusalem, Jerusalem, Israel.
85
Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA.
86
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA.
87
Department of Microbiology, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
88
Department of Computer Science, Miami University, Oxford, OH, USA. idoerg@iastate.edu.
89
Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA. predrag@indiana.edu.

Abstract

BACKGROUND:

A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.

RESULTS:

We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.

CONCLUSIONS:

The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.

KEYWORDS:

Disease gene prioritization; Protein function prediction

PMID:
27604469
PMCID:
PMC5015320
DOI:
10.1186/s13059-016-1037-6
[Indexed for MEDLINE]
Free PMC Article

Publication types, MeSH terms, Substance, Secondary source ID, Grant support

Publication types

MeSH terms

Substance

Secondary source ID

Grant support

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center