Systematic Analysis of Primary Sequence Domain Segments for the Discrimination Between Class C GPCR Subtypes

Interdiscip Sci. 2018 Mar;10(1):43-52. doi: 10.1007/s12539-018-0286-3. Epub 2018 Feb 19.

Abstract

G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.

Keywords: Biocuration; G-protein-coupled receptors; Machine learning; Pharmaco-proteomics; Support vector machines.

MeSH terms

  • Amino Acid Sequence
  • Protein Domains
  • Receptors, G-Protein-Coupled / chemistry*
  • Sequence Alignment
  • Sequence Analysis, Protein / methods*
  • Support Vector Machine

Substances

  • Receptors, G-Protein-Coupled