Format

Send to

Choose Destination
Cell Rep. 2015 Jul 14;12(2):183-9. doi: 10.1016/j.celrep.2015.06.031. Epub 2015 Jul 2.

Semi-supervised Learning Predicts Approximately One Third of the Alternative Splicing Isoforms as Functional Proteins.

Author information

1
Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada.
2
Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada.
3
Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada.
4
Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany.
5
Chair for Proteomics and Bioanalytics, TU Muenchen, Freising 85354, Germany; German Cancer Consortium (DKTK), Munich, Germany; German Cancer Research Center (DKFZ), Heidelberg, Germany; Center for Integrated Protein Science Munich, Munich, Germany; Bavarian Biomolecular Mass Spectrometry Center, Technische Universität München, Freising, Germany.
6
Lehrstuhl fuer Systembiologie der Pflanzen, TU Muenchen, Munich, Germany.
7
Frontier Research Core for Life Sciences, University of Toyama, Toyama 930-8555, Japan.
8
Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Princess Margaret Cancer Center, University Health Network, Toronto, ON M5T 2M9, Canada.
9
Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON M5S 1AS, Canada; Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada; Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1AS, Canada. Electronic address: pi@kimlab.org.

Abstract

Alternative splicing acts on transcripts from almost all human multi-exon genes. Notwithstanding its ubiquity, fundamental ramifications of splicing on protein expression remain unresolved. The number and identity of spliced transcripts that form stably folded proteins remain the sources of considerable debate, due largely to low coverage of experimental methods and the resulting absence of negative data. We circumvent this issue by developing a semi-supervised learning algorithm, positive unlabeled learning for splicing elucidation (PULSE; http://www.kimlab.org/software/pulse), which uses 48 features spanning various categories. We validated its accuracy on sets of bona fide protein isoforms and directly on mass spectrometry (MS) spectra for an overall AU-ROC of 0.85. We predict that around 32% of "exon skipping" alternative splicing events produce stable proteins, suggesting that the process engenders a significant number of previously uncharacterized proteins. We also provide insights into the distribution of positive isoforms in various functional classes and into the structural effects of alternative splicing.

PMID:
26146086
DOI:
10.1016/j.celrep.2015.06.031
[Indexed for MEDLINE]
Free full text

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center