A survey of metazoan selenocysteine insertion sequences

Biochimie. 2002 Sep;84(9):953-9. doi: 10.1016/s0300-9084(02)01441-4.

Abstract

The computational detection of novel selenoproteins in genomic sequences is usually achieved through identification of SECIS, a conserved secondary structure element found in the 3' UTR of animal selenoprotein mRNAs. Previous studies have used "descriptors" specifying the number of base pairs and the conserved nucleotides in SECIS to identify this element. A major drawback of the "descriptor" approach is that the number of detections in current genomic or transcript databases largely exceeds the number of true selenoproteins. In this study, we use instead the ERPIN program to detect SECIS elements. ERPIN is based on a lod-score profile algorithm that uses a training-set of aligned RNA sequences as input. From an initial alignment of 44 animal SECIS sequences, we performed a series of iterative searches in which the training set was progressively enriched up to 117 confirmed SECIS elements, from a large collection of metazoan species. About 200 high-scoring candidates were also detected. We show that ERPIN scores for these candidates can be converted into expect values, thus enabling their statistical evaluation. The most interesting SECIS candidates are presented.

MeSH terms

  • Animals
  • Base Sequence
  • Conserved Sequence*
  • Databases, Genetic
  • Humans
  • Molecular Sequence Data
  • Nucleic Acid Conformation
  • Proteins / genetics*
  • Selenocysteine / genetics*
  • Selenocysteine / metabolism
  • Selenoproteins
  • Sequence Alignment
  • Sequence Homology, Nucleic Acid
  • Software

Substances

  • Proteins
  • Selenoproteins
  • Selenocysteine