Format

Send to

Choose Destination
J Biol Chem. 2018 Mar 30;293(13):4913-4927. doi: 10.1074/jbc.RA117.001052. Epub 2018 Jan 29.

A statistical model for improved membrane protein expression using sequence-derived features.

Author information

1
From the Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125.
2
From the Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125 clemons@caltech.edu.

Abstract

The heterologous expression of integral membrane proteins (IMPs) remains a major bottleneck in the characterization of this important protein class. IMP expression levels are currently unpredictable, which renders the pursuit of IMPs for structural and biophysical characterization challenging and inefficient. Experimental evidence demonstrates that changes within the nucleotide or amino acid sequence for a given IMP can dramatically affect expression levels, yet these observations have not resulted in generalizable approaches to improve expression levels. Here, we develop a data-driven statistical predictor named IMProve that, using only sequence information, increases the likelihood of selecting an IMP that expresses in Escherichia coli The IMProve model, trained on experimental data, combines a set of sequence-derived features resulting in an IMProve score, where higher values have a higher probability of success. The model is rigorously validated against a variety of independent data sets that contain a wide range of experimental outcomes from various IMP expression trials. The results demonstrate that use of the model can more than double the number of successfully expressed targets at any experimental scale. IMProve can immediately be used to identify favorable targets for characterization. Most notably, IMProve demonstrates for the first time that IMP expression levels can be predicted directly from sequence.

KEYWORDS:

computational biology; machine-learning; membrane biogenesis; membrane biophysics; membrane protein; prediction; protein expression; structural biology

PMID:
29378850
PMCID:
PMC5880134
[Available on 2019-03-30]
DOI:
10.1074/jbc.RA117.001052

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center