Protein similarity networks reveal relationships among sequence, structure, and function within the Cupin superfamily

PLoS One. 2013 Sep 6;8(9):e74477. doi: 10.1371/journal.pone.0074477. eCollection 2013.

Abstract

The cupin superfamily is extremely diverse and includes catalytically inactive seed storage proteins, sugar-binding metal-independent epimerases, and metal-dependent enzymes possessing dioxygenase, decarboxylase, and other activities. Although numerous proteins of this superfamily have been structurally characterized, the functions of many of them have not been experimentally determined. We report the first use of protein similarity networks (PSNs) to visualize trends of sequence and structure in order to make functional inferences in this remarkably diverse superfamily. PSNs provide a way to visualize relatedness of structure and sequence among a given set of proteins. Structure- and sequence-based clustering of cupin members reflects functional clustering. Networks based only on cupin domains and networks based on the whole proteins provide complementary information. Domain-clustering supports phylogenetic conclusions that the N- and C-terminal domains of bicupin proteins evolved independently. Interestingly, although many functionally similar enzymatic cupin members bind the same active site metal ion, the structure and sequence clustering does not correlate with the identity of the bound metal. It is anticipated that the application of PSNs to this superfamily will inform experimental work and influence the functional annotation of databases.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Binding Sites
  • Carbohydrate Epimerases / chemistry
  • Carbohydrate Epimerases / genetics
  • Carbohydrate Epimerases / metabolism
  • Carboxy-Lyases / chemistry
  • Carboxy-Lyases / genetics
  • Carboxy-Lyases / metabolism
  • Cysteine Dioxygenase / chemistry
  • Cysteine Dioxygenase / genetics
  • Cysteine Dioxygenase / metabolism
  • Evolution, Molecular*
  • Mannose-6-Phosphate Isomerase / chemistry
  • Mannose-6-Phosphate Isomerase / genetics
  • Mannose-6-Phosphate Isomerase / metabolism
  • Models, Molecular*
  • Molecular Sequence Data
  • Multigene Family
  • Oxidoreductases / chemistry
  • Oxidoreductases / genetics
  • Oxidoreductases / metabolism
  • Plants / genetics*
  • Plants / metabolism
  • Protein Binding
  • Seed Storage Proteins / chemistry*
  • Seed Storage Proteins / genetics
  • Seed Storage Proteins / metabolism
  • Sequence Alignment
  • Structural Homology, Protein

Substances

  • Seed Storage Proteins
  • Oxidoreductases
  • Cysteine Dioxygenase
  • oxalate oxidase
  • Carboxy-Lyases
  • oxalate decarboxylase
  • Carbohydrate Epimerases
  • Mannose-6-Phosphate Isomerase

Grants and funding

This work was supported by the National Science Foundation (MCB-1041912) to EWM (http://www.nsf.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.