Format

Send to

Choose Destination
Cell Syst. 2018 Feb 28;6(2):180-191.e4. doi: 10.1016/j.cels.2017.12.007. Epub 2018 Jan 17.

Scikit-ribo Enables Accurate Estimation and Robust Modeling of Translation Dynamics at Codon Resolution.

Author information

1
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Applied Mathematics & Statistics, Stony Brook University, Stony Brook, NY 11794, USA.
2
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
3
Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, MD 21205, USA.
4
Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA.
5
Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Departments of Computer Science and Biology, Johns Hopkins University, Baltimore, MD 21211, USA. Electronic address: mschatz@cs.jhu.edu.

Abstract

Ribosome profiling (Ribo-seq) is a powerful technique for measuring protein translation; however, sampling errors and biological biases are prevalent and poorly understood. Addressing these issues, we present Scikit-ribo (https://github.com/schatzlab/scikit-ribo), an open-source analysis package for accurate genome-wide A-site prediction and translation efficiency (TE) estimation from Ribo-seq and RNA sequencing data. Scikit-ribo accurately identifies A-site locations and reproduces codon elongation rates using several digestion protocols (r = 0.99). Next, we show that the commonly used reads per kilobase of transcript per million mapped reads-derived TE estimation is prone to biases, especially for low-abundance genes. Scikit-ribo introduces a codon-level generalized linear model with ridge penalty that correctly estimates TE, while accommodating variable codon elongation rates and mRNA secondary structure. This corrects the TE errors for over 2,000 genes in S. cerevisiae, which we validate using mass spectrometry of protein abundances (r = 0.81), and allows us to determine the Kozak-like sequence directly from Ribo-seq. We conclude with an analysis of coverage requirements needed for robust codon-level analysis and quantify the artifacts that can occur from cycloheximide treatment.

KEYWORDS:

Ribo-seq; bioinformatics; machine learning; statistical method; translation

PMID:
29361467
PMCID:
PMC5832574
[Available on 2019-02-28]
DOI:
10.1016/j.cels.2017.12.007

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center