Protein Domains and Macromolecular Structures:
  Tools for discovery of sequence/function/structure associations

Various data types, such as literature, nucleotide and protein sequences, and three-dimensional structures, are often submitted to public databases independently of each other by different research groups. Yet these data are related through their coverage of the same topic via different research methods, and the sum of the information they contain is greater than any one part.

To address this, the NCBI Protein Classification and Structure Group contributes to the broader NCBI effort to provide integrated access to previously disparate data through the Entrez retrieval system, which uses computational methods to identify related data and makes it possible to traverse from literature about diseases → gene locations and sequences → protein sequences → conserved domains & protein functions, 3D structures, small molecules & their biological activities → and more.

The illustration below provides an example, starting with a PubMed article about the human CLCN1 gene and Becker-type myotonia, then linking to the protein sequence, its conserved domains, and a related 3D structure. If you would like to explore these paths interactively, open the PubMed record for PMID 7951242 shown below (or any other record of interest in PubMed or other Entrez databases), then use the "Related information" menu in the right margin of the display to select related data of interest and begin traversing through the Entrez system.

The exact appearance and position of links to related data in the live Entrez system may vary from the image below due to ongoing enhancements in the Entrez user interface. They can appear as "Links" pull down menus near the top of a display or as ads in the right margin of displays (for example, the "Find Related Data" pull down menu in the right margin of search results pages, or the "Related information" ad in the right margin of database record displays).

Example of data integration in Entrez through its Links feature.  Starting with a single record in any Entrez database, you can use the Links menu to traverse to related data in other Entrez databases.  This can facilitate biological discovery through identifying associations among previously disparate data.
Integrated access to previously disparate data back to top

The NCBI Protein Classification and Structure Group contributes to the broader NCBI effort to provide integrated access to previously disparate data through the Entrez retrieval system. Identification of relationships among those data in turn can lead to new discoveries.

This system was developed because the study of a biological question, such as the molecular mechanism underlying a human disease, is often approached from many angles by many different laboratories -- some might focus on gene identification and sequencing, while others might focus on analyzing protein function and three-dimensional structure, and yet others might study associated genetic variations, gene expression profiles, or phenotypes. Because each group submits its data independently of the others to a public database, the resultant distinct but related data sets may be scattered across a variety of databases.

Recognizing that the sum of the total knowledge is greater than any one part, Entrez brings a wide range of data types from varying sources into a single search system and identifies relationships among records within an individual database and across databases. These associations are presented as Links in Entrez search results displays and in individual database records. Therefore, once you retrieve one data element in Entrez, such as a PubMed record for an article that reports the sequencing of a disease gene, the corresponding sequence data and much more are one click away, as shown in the illustration above.


Further reading:

  Ostell, J. The Entrez Search and Retrieval System. In The NCBI Handbook [Internet], National Library of Medicine (US), National Center for Biotechnology Information, Bethesda, MD, Chapter 15, 2002 Oct. 9 [revised 2003 Aug. 13]. [cited 2008 Oct 02]. Available from in the Entrez Bookshelf (
  Geer RC, Sayers EW. Entrez: making use of its power. Brief Bioinform.:4(2):179-84, 2003 Jun.
Revised 25 August 2021