Format

Send to

Choose Destination
Brief Bioinform. 2019 Jan 30. doi: 10.1093/bib/bbz007. [Epub ahead of print]

Disentangling the complexity of low complexity proteins.

Author information

1
Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany.
2
Department of Biomedical Science, University of Padova, Padova, Italy.
3
Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus.
4
Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece.
5
MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary.
6
Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France.
7
Institute of Informatics, Silesian University of Technology, Gliwice, Poland.
8
Center of New Technologies, University of Warsaw, Warsaw, Poland.
9
Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland.
10
Institute of Biochemistry and Biophysics, Warsaw, Poland.
11
Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary.
12
Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France.
13
Institute of Bioengineering, University ITMO, St. Petersburg, Russia.
14
Earlham Institute, Norwich, UK.
15
ELIXIR Hub, Welcome Genome Campus, Hinxton, UK.
16
CNR Institute of Neuroscience, Padova, Italy.

Abstract

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs.

PMID:
30698641
DOI:
10.1093/bib/bbz007

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center