Poorly conserved ORFs in the genome of the archaea Halobacterium sp. NRC-1 correspond to expressed proteins

Bioinformatics. 2004 May 22;20(8):1248-53. doi: 10.1093/bioinformatics/bth075. Epub 2004 Feb 10.

Abstract

Motivation: A large fraction of open reading frames (ORFs) identified as 'hypothetical' proteins correspond to either 'conserved hypothetical' proteins, representing sequences homologous to ORFs of unknown function from other organisms, or to hypothetical proteins lacking any significant sequence similarity to other ORFs in the databases. Elucidating the functions and three-dimensional structures of such orphan ORFs, termed ORFans or poorly conserved ORFs (PCOs), is essential for understanding biodiversity. However, it has been claimed that many ORFans may not encode for expressed proteins.

Results: A genome-wide experimental study of 'paralogous PCOs' in the halophilic archaea Halobacterium sp. NRC-1 was conducted. Paralogous PCOs are ORFs with at least one homolog in the same organism, but with no clear homologs in other organisms. The results reveal that mRNA is synthesized for a majority of the Halobacterium sp. NRC-1 paralogous PCO families, including those comprising relatively short proteins, strongly suggesting that these Halobacterium sp. NRC-1 paralogous PCOs correspond to true, expressed proteins. Hence, further computational and experimental studies aimed at characterizing PCOs in this and other organisms are merited. Such efforts could shed light on PCOs' functions and origins, thereby serving to elucidate the vast diversity observed in the genetic material.

Publication types

  • Evaluation Study

MeSH terms

  • Bacterial Proteins / genetics*
  • Chromosome Mapping / methods*
  • Conserved Sequence / genetics*
  • Gene Expression Profiling / methods*
  • Genome, Archaeal
  • Genome, Bacterial
  • Halobacterium / genetics*
  • Open Reading Frames / genetics*
  • Sequence Alignment / methods
  • Sequence Analysis, DNA / methods*

Substances

  • Bacterial Proteins