Format

Send to

Choose Destination
Proteins. 2019 Jun;87(6):520-527. doi: 10.1002/prot.25674. Epub 2019 Mar 9.

NetSurfP-2.0: Improved prediction of protein structural features by integrated deep learning.

Author information

1
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark.
2
Department of Bio and Health Informatics, Technical University of Denmark, Kongens Lyngby, Denmark.
3
The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
4
Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kongens Lyngby, Denmark.
5
Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina.
6
Faculty of Applied Sciences, Centre of Excellence for Omics-Driven Computational Biodiscovery (COMBio), AIMST University, Kedah, Malaysia.

Abstract

The ability to predict local structural features of a protein from the primary sequence is of paramount importance for unraveling its function in absence of experimental structural information. Two main factors affect the utility of potential prediction tools: their accuracy must enable extraction of reliable structural information on the proteins of interest, and their runtime must be low to keep pace with sequencing data being generated at a constantly increasing speed. Here, we present NetSurfP-2.0, a novel tool that can predict the most important local structural features with unprecedented accuracy and runtime. NetSurfP-2.0 is sequence-based and uses an architecture composed of convolutional and long short-term memory neural networks trained on solved protein structures. Using a single integrated model, NetSurfP-2.0 predicts solvent accessibility, secondary structure, structural disorder, and backbone dihedral angles for each residue of the input sequences. We assessed the accuracy of NetSurfP-2.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features. We observe a correlation of 80% between predictions and experimental data for solvent accessibility, and a precision of 85% on secondary structure 3-class predictions. In addition to improved accuracy, the processing time has been optimized to allow predicting more than 1000 proteins in less than 2 hours, and complete proteomes in less than 1 day.

KEYWORDS:

deep learning; disorder; local structure prediction; secondary structure; solvent accessibility

PMID:
30785653
DOI:
10.1002/prot.25674

Supplemental Content

Full text links

Icon for Wiley
Loading ...
Support Center