Format

Send to

Choose Destination
J Struct Funct Genomics. 2010 Mar;11(1):51-9. doi: 10.1007/s10969-010-9086-7. Epub 2010 Apr 11.

High-throughput computational structure-based characterization of protein families: START domains and implications for structural genomics.

Author information

1
Department of Pharmacology, College of Physicians and Surgeons of Columbia University, Center for Computational Biology and Bioinformatics, 630 West 168th St. PH 7W 313, New York, NY 10032, USA.

Abstract

SkyLine, a high-throughput homology modeling pipeline tool, detects and models true sequence homologs to a given protein structure. Structures and models are stored in SkyBase with links to computational function annotation, as calculated by MarkUs. The SkyLine/SkyBase/MarkUs technology represents a novel structure-based approach that is more objective and versatile than other protein classification resources. This structure-centric strategy provides a multi-dimensional organization and coverage of protein space at the levels of family, function, and genome. The concept of "modelability", the ability to model sequences on related structures, provides a reliable criterion for membership in a protein family ("leverage") and underlies the unique success of this approach. The overall procedure is illustrated by its application to START domains, which comprise a Biomedical Theme for the Northeast Structural Genomics Consortium as part of the Protein Structure Initiative. START domains are typically involved in the non-vesicular transport of lipids. While 19 experimentally determined structures are available, the family, whose evolutionary hierarchy is not well determined, is highly sequence diverse, and the ligand-binding potential of many family members is unknown. The SkyLine/SkyBase/MarkUs approach provides significant insights and predicts: (1) many more family members (approximately 4,000) than any other resource; (2) the function for a large number of unannotated proteins; (3) instances of START domains in genomes from which they were thought to be absent; and (4) the existence of two types of novel proteins, those containing dual START domain and those containing N-terminal START domains.

PMID:
20383749
PMCID:
PMC2881152
DOI:
10.1007/s10969-010-9086-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Springer Icon for PubMed Central
Loading ...
Support Center