Format

Send to

Choose Destination
See comment in PubMed Commons below
Bioinformatics. 2014 Apr 15;30(8):1112-1119. Epub 2014 Jan 2.

SNPdryad: predicting deleterious non-synonymous human SNPs using only orthologous protein sequences.

Author information

1
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8.
2
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Computer Science, University of Toronto, Toronto, Ontario, Canada M5S 3G4 The Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1, Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada M5S 3E1 and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8.

Abstract

MOTIVATION:

The recent advances in genome sequencing have revealed an abundance of non-synonymous polymorphisms among human individuals; subsequently, it is of immense interest and importance to predict whether such substitutions are functional neutral or have deleterious effects. The accuracy of such prediction algorithms depends on the quality of the multiple-sequence alignment, which is used to infer how an amino acid substitution is tolerated at a given position. Because of the scarcity of orthologous protein sequences in the past, the existing prediction algorithms all include sequences of protein paralogs in the alignment, which can dilute the conservation signal and affect prediction accuracy. However, we believe that, with the sequencing of a large number of mammalian genomes, it is now feasible to include only protein orthologs in the alignment and improve the prediction performance.

RESULTS:

We have developed a novel prediction algorithm, named SNPdryad, which only includes protein orthologs in building a multiple sequence alignment. Among many other innovations, SNPdryad uses different conservation scoring schemes and uses Random Forest as a classifier. We have tested SNPdryad on several datasets. We found that SNPdryad consistently outperformed other methods in several performance metrics, which is attributed to the exclusion of paralogous sequence. We have run SNPdryad on the complete human proteome, generating prediction scores for all the possible amino acid substitutions.

AVAILABILITY AND IMPLEMENTATION:

The algorithm and the prediction results can be accessed from the Web site: http://snps.ccbr.utoronto.ca:8080/SNPdryad/ CONTACT: Zhaolei.Zhang@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Silverchair Information Systems
    Loading ...
    Support Center