Display Settings:

Format

Send to:

Choose Destination
See comment in PubMed Commons below
J Mol Biol. 2005 May 20;348(5):1235-60. Epub 2005 Apr 2.

Progress of structural genomics initiatives: an analysis of solved target structures.

Author information

  • 1Department of Biochemistry and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK. annabel.todd@deshaw.com

Erratum in

  • J Mol Biol. 2005 Oct 28;353(3):760.

Abstract

The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.

PMID:
15854658
[PubMed - indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Elsevier Science
    Loading ...
    Write to the Help Desk