A computational pipeline for protein structure prediction and analysis at genome scale

Bioinformatics. 2003 Oct 12;19(15):1985-96. doi: 10.1093/bioinformatics/btg262.

Abstract

Motivation: Experimental techniques alone cannot keep up with the production rate of protein sequences, while computational techniques for protein structure predictions have matured to such a level to provide reliable structural characterization of proteins at large scale. Integration of multiple computational tools for protein structure prediction can complement experimental techniques.

Results: We present an automated pipeline for protein structure prediction. The centerpiece of the pipeline is our threading-based protein structure prediction system PROSPECT. The pipeline consists of a dozen tools for identification of protein domains and signal peptide, protein triage to determine the protein type (membrane or globular), protein fold recognition, generation of atomic structural models, prediction result validation, etc. Different processing and prediction branches are determined automatically by a prediction pipeline manager based on identified characteristics of the protein. The pipeline has been implemented to run in a heterogeneous computational environment as a client/server system with a web interface. Genome-scale applications on Caenorhabditis elegans, Pyrococcus furiosus and three cyanobacterial genomes are presented.

Availability: The pipeline is available at http://compbio.ornl.gov/proteinpipeline/

Publication types

  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Validation Study

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Caenorhabditis elegans Proteins / chemistry
  • Caenorhabditis elegans Proteins / metabolism
  • Computer Simulation
  • Computing Methodologies
  • Database Management Systems*
  • Databases, Protein*
  • Genome
  • Information Storage and Retrieval
  • Models, Molecular*
  • Molecular Sequence Data
  • Proteins / chemistry*
  • Proteins / classification*
  • Proteome / chemistry
  • Proteome / classification
  • Pyrococcus furiosus / chemistry
  • Pyrococcus furiosus / metabolism
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Software*
  • Systems Integration
  • User-Computer Interface

Substances

  • Caenorhabditis elegans Proteins
  • Proteins
  • Proteome