Computational prediction of SEG (single exon gene) function in humans

Front Biosci. 2005 May 1:10:1382-95. doi: 10.2741/1627.

Abstract

Human genes are often interrupted by non-coding, intragenic sequences called introns. Hence, the gene sequence is divided into exons (coding segments) and introns (non-coding segments). Consequently, a majority of them are multi exon genes (MEG). However, a considerable amount of single exon genes (SEG) are present in the human genome (approximately 12%). This amount is sizeable and it is important to probe their molecular function and cellular role. Hence, we performed a genome wide functional assignment to 3750 SEG sequences using PFAM (protein family database), PROSITE (database of biologically meaningful signatures or motifs) and SUPERFAMILY (a library covering all proteins of known 3 dimensional structure). PFAM assigned 13% SEG to trans-membrane receptor genes of the G-protein coupled receptor (GPCR) family and showed that a majority of SEG proteins have DNA binding function. PROSITE identified 336 unique motif types in them and this accounts for 25% of all known patterns, with a majority having PHOSPHORYLATION and ACETYLATION signals. SUPERFAMILY assigned 33% SEG to the membrane all alpha (proteins containing alpha helix structural elements according to SCOP (structural classification of proteins) definition). Functional assignment of SEG proteins at multiple levels (sequence signals, sequence families, 3D structures) using PFAM, PROSITE and SUPERFAMILY is envisioned to suggest their selective and predominant molecular function in cellular systems. Their function as DNA binding, phosphorylating, acetylating and house-keeping agents is intriguing. The analysis also showed evidence of SEG expression and retro-transposition. However, this information is inadequate to draw concerted conclusion on the prevalent role played by these proteins in cellular biology. A complete understanding of SEG function will help to explore their role in cellular environment. The derived datasets from these analyses are available at http://sege.ntu.edu.sg/wester/intronless/human/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology*
  • Databases, Nucleic Acid*
  • Exons / physiology*
  • Genome, Human*
  • Humans