Logo of narLink to Publisher's site
Nucleic Acids Res. 2002 Jan 1; 30(1): 335–340.

UTRdb and UTRsite: specialized databases of sequences and functional elements of 5′ and 3′ untranslated regions of eukaryotic mRNAs. Update 2002


The 5′- and 3′-untranslated regions (5′- and 3′-UTRs) of eukaryotic mRNAs are known to play a crucial role in post-transcriptional regulation of gene expression modulating nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and stability. UTRdb is a specialized database of 5′ and 3′ untranslated sequences of eukaryotic mRNAs cleaned from redundancy. UTRdb entries are enriched with specialized information not present in the primary databases including the presence of nucleotide sequence patterns already demonstrated by experimental analysis to have some functional role. All these patterns have been collected in the UTRsite database so that it is possible to search any input sequence for the presence of annotated functional motifs. Furthermore, UTRdb entries have been annotated for the presence of repetitive elements. All Internet resources we implemented for retrieval and functional analysis of 5′- and 3′-UTRs of eukaryotic mRNAs are accessible at http://bighost.area.ba.cnr.it/BIG/UTRHome/.


The completion of the sequencing of human and of other organism genomes has opened new avenues for understanding the basic mechanisms of cell function. These processes mostly rely on a spatial–temporal coordinated expression of genes mediated by regulatory elements embedded in the non-coding part of the genomes. Among non-coding regions, the 5′- and 3′-untranslated regions (5′- and 3′-UTRs) of eukaryotic mRNAs have often been experimentally demonstrated to contain sequence elements crucial for many aspects of gene regulation and expression (17).

The main functional roles so far demonstrated for 5′- and 3′-UTR sequences are: (i) control of mRNA cellular and subcellular localization (4,79); (ii) control of mRNA stability (1,10,11); (iii) control of mRNA translation efficiency (1214).

Several regulatory signals have already been identified in 5′- or 3′-UTR sequences, usually corresponding to short oligonucleotide tracts, also able to fold in specific secondary structures, which are protein binding sites for various regulatory proteins.

The analysis of large collections of functionally equivalent sequences (15,16), such as 5′- and 3′-UTR sequences, could indeed be very useful for defining their structural and compositional features as well as for searching the alleged function-associated sequence patterns (1719). For this reason we constructed UTRdb, a specialized sequence collection, deprived from redundancy, of 5′- and 3′-UTR sequences from eukaryotic mRNAs.

UTRdb entries have been enriched with specialized information, not present in the primary databases, including the presence of sequence patterns demonstrated by experimental evidence to play some functional role. Additionally, because ∼10% of mammalian mRNAs contain repetitive elements in their UTRs (20) but they are usually not annotated in the original records, we decided to add this information into our database as well.

We also created UTRsite, a collection of functional sequence patterns located in the 5′- or 3′-UTR sequences which could prove very useful for automatic annotation of anonymous sequences generated by sequencing projects as well as for finding previously undetected signals in known gene sequences.


The specialized database of UTR sequences was generated by UTRdb_gen, a computer program we devised for this task. Eight sequence collections were generated for both 5′- and 3′-UTR sequences, one for each of the eukaryotic division of the EMBL/GenBank nucleotide database, namely: (i) human, (ii) rodent, (iii) other mammal, (iv) other vertebrate, (v) invertebrate, (vi) plant, (vii) fungi and (viii) virus.

UTRdb_gen, performing an accurate parsing of the Feature Table of the relevant EMBL entries is able to automatically generate the various UTRdb collections. Although the feature keys ‘5′UTR’ and ‘3′UTR’ is a valid feature for the EMBL/GenBank entries, only a small percentage of the entries are adequately annotated. Indeed, of the about 250 000 primary entries where UTRdb_gen was able to extract 5′- or 3′-UTR sequences, only 12% contained the 5′UTR or 3′UTR feature key in the corresponding EMBL entry. UTRdb_gen is able to define UTRs, even when these keys are not reported in the primary entry by using a predefinite syntactic parsing of other relevant feature keys, such as mRNA, CDS, exon, intron, etc.

UTRdb_gen automatically annotates generated UTR entries by adding some specialized information such as completeness or incompleteness of the UTR, number of spanned exons and cross-referencing to the primary database entry. A cross reference between 5′- and 3′-UTR sequences from the same mRNA has also been established.

A further interface between the UTRdb_gen and the BLAST engine (parameters: expect < 10–5, minimum length = 50 nt, percentage identity > 95%) adds information about the position and the identity of any vector that may contaminate UTR entries.

The generation of UTR entries cleaned from redundancy has been obtained by using CLEANUP program (21) which is able to generate automatically, and very quickly, cleaned collections by removing entries that have a similarity and overlapping degree with longer entries present in the database above a user-fixed threshold. In this case, the cut-off parameters we used for the CLEANUP application were 95% for similarity and 90% for overlapping.

The specialized information included in UTR entries is generated by using two programs: (i) UTRnote including information about the location of experimentally defined patterns collected in UTRsite and (ii) UTRrepeat (which uses RepeatMasker) including repetitive elements present in the Repbase database (19). The UTRsite entries describe the various regulatory elements present in UTRs whose functional role has been established on an experimental basis. Each UTRsite entry is constructed on the basis of information reported in the literature and revised by distinguished scientists experimentally working on the functional characterization of the relevant UTR regulatory element.


Table Table11 reports a summary description of UTRdb (release 15.0) which in total contains 247 548 entries and 64 060 991 nt. On average >35% of entries resulted to be redundant and were then removed from the database. Vector contamination was found in 188 and 196 entries of 5′- and 3′-UTRs, respectively.

Table 1.
Number of entries (N) and nucleotide length (L) of UTRdb collections (release 15.0) after redundancy cleaning

5′-UTR sequences were defined as the mRNA region spanning from the cap site to the starting codon (excluded), whereas 3′-UTR sequences were defined as the mRNA region spanning from the stop codon (excluded) to the poly(A) starting site.

A sample entry of UTRdb is shown in Figure Figure1.1. The UTRdb entries have been formatted according to the EMBL database format.

Figure 1
Sample entry of UTRdb. Specialized information not present in the primary EMBL/GenBank database is shown in bold with active crosslinks with other databases underlined. The ‘UT’ line reports information regarding the relevant ...

Table Table22 reports functional patterns and repetitive elements included in UTRsite. More entries will be included in further releases. A sample UTRsite entry is reported in Figure Figure2.2. Functional patterns, defined on the basis of the information reported in the literature and/or advice by the scientists expert in the field, were described by using the pattern description syntax used in PatSearch program (22).

Figure 2
Sample entry of UTRsite describing the ‘Iron responsive element (IRE)’ (24). The IRE functional pattern, which consists of both primary and secondary structure information, is described in the ‘Pattern’ section according ...
Table 2.
Functional patterns included so far in UTRsite. For each pattern the number of hits with non-redundant UTRdb entries is also reported


UTRdb and UTRsite are publicly available by anonymous FTP (ftp://area.ba.cnr.it/pub/embnet/database/utr/). All internet resources we implemented for retrieval and functional analysis of 5′- and 3′-UTR sequences are accessible at http://bighost.area.ba.cnr.it/BIG/UTRHome/. These include SRS retrieval (23) of UTRdb and UTRsite, also available at the EBI World Wide Web server (http://srs.ebi.ac.uk:80/), UTRscan and UTRblast. The UTRscan utility allows the enquirer to search user submitted sequences for any of the patterns collected in UTRsite. The UTRblast utility allows database searches against fully annotated UTRdb entries.


The important role that UTRs of eukaryotic mRNAs may play in gene regulation and expression is now widely recognized. Indeed, experimental studies have demonstrated that sequence motifs located in the UTRs are involved in crucial biological functions.

The huge amount of functionally equivalent sequences stored in UTRdb now makes possible the study of their structural and compositional features and the application of statistical methods for the identification of significant signals. Previous cleaning-up of databases is however necessary to avoid artefacts caused by redundant sequences. Even if statistical significance does not necessarily mean biological significance, it may provide useful indication for further experimental work, such as site-directed mutagenesis.

UTRdb will be updated with the new EMBL database releases and UTRsite will be continuously updated by adding new entries describing functional patterns whose biological role has been experimentally demonstrated.


For revision of UTRsite entries we would like to thank Jim Malter (APP 3′-UTR stability control element), Alain Krol (SECIS), Matthias Hentze (IRE, 15-LOX DICE and msl-2), Bill Marzluff (histone stem–loop structure), Ann-Bin Shyu (ARE), Arturo Verrotti (CPE), Elizabeth Goodwin (TGE), Roger Kaspar (ribosomal protein mRNA TOP), Danuta Radzioch (TNF mRNA translation repression element), Ruben Boado (GLUT1 mRNA stabilizing element), Zendra E. Zehner (Vimentin 3′-UTR mRNA element), Shu-Yun Le (IRES), Anne Ephrussi (BRE), Howy Jacobs (rpmS12), Allen Miller (BYDV), John Parsch (adh DRE). This work was supported by Ministero dell’Istruzione e Ricerca, Italy [projects: Bioinformatics and Genomic Research (COFIN99), Programma ‘Biotecnologie’ (legge 95/95 – 5%), Programma ‘Studio di geni di interesse biomedico e agroalimentare’ (CEGBA)].


1. Decker C.J. and Parker,R. (1994) Mechanism of mRNA degradation in eukaryotes Trends Biochem. Sci., 19, 336–340. [PubMed]
2. Kaufman R.J. (1994) Control of gene expression at the level of translation initiation. Curr. Opin. Biotechnol., 5, 550–557. [PubMed]
3. Klausner R.D., Rouault,T.A. and Harford,J.B. (1993) Regulating the fate of mRNA: the control of cellular iron metabolism. Cell, 72, 19–28. [PubMed]
4. Singer R.H. (1992) The cytoskeleton and mRNA localization. Curr. Opin. Cell Biol., 4, 15–19. [PubMed]
5. Wilhelm J.E. and Vale,R.D. (1993) RNA on the move: the mRNA localization pathway. J. Cell Biol., 123, 269–274. [PMC free article] [PubMed]
6. McCarthy J.E.G. and Kollmus,H. (1995) Cytoplasmic mRNA–protein interactions in eukaryotic gene expression. Trends Biochem. Sci., 20, 191–197. [PubMed]
7. Bashirullah A., Cooperstock,R.L. and Lipshitz,H.D. (1998) RNA localization in development. Annu. Rev. Biochem., 67, 335–394. [PubMed]
8. Johnston D. (1995) The intracellular localization of messenger RNAs. Cell, 81, 161–170. [PubMed]
9. Jansen R.P. (2001) mRNA localization: message on the move. Nat. Rev. Mol. Cell. Biol., 2, 247–256. [PubMed]
10. Beelman C.A. and Parker,R. (1995) Degradation of mRNA in eukaryotes. Cell, 81, 179–183. [PubMed]
11. Mitchell P. and Tollervey,D. (2001) mRNA turnover. Curr. Opin. Cell Biol., 13, 320–325. [PubMed]
12. Curtis D., Lehman,R. and Zamore,P.D. (1995) Translational regulation in development. Cell, 81, 171–178. [PubMed]
13. Sonenberg N. (1994) mRNA translation: influence of the 5′ and 3′ untranslated regions. Curr. Opin. Genet. Dev., 4, 310–315. [PubMed]
14. Macdonald P. (2001) Diversity in translational regulation. Curr. Opin. Cell Biol., 13, 326–331. [PubMed]
15. Mengeritsky G. and Smith,T.F. (1987) Recognition of characteristic patterns in sets of functionally equivalent DNA sequences. Comput. Appl. Biosci., 3, 223–227. [PubMed]
16. Konopka A.K. (1994) In Smith,D.W. (ed.), Informatics and Genome Projects. Academic Press, San Diego, CA.
17. Pesole G., Liuni,S., Grillo,G. and Saccone,C. (1997) Structural and compositional features of untranslated regions of eukaryotic mRNAs. Gene, 205, 95–102. [PubMed]
18. Pesole G., Grillo,G. and Liuni,S. (1996) Databases of mRNA untranslated regions for Metazoa. Comput. Chem., 20, 141–144. [PubMed]
19. Pesole G., Fiormarino,G. and Saccone,C. (1994) Sequence analysis and compositional properties of untranslated regions of human mRNAs. Gene, 140, 219–225. [PubMed]
20. Makalowski W., Zhang,J. and Boguski,M. (1996) Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. Genome Res., 6, 846–857. [PubMed]
21. Grillo G., Attimonelli,M., Liuni,S. and Pesole,G. (1996) CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases. Comput. Appl. Biosci., 12, 1–8. [PubMed]
22. Pesole G., Liuni,S. and D’Souza,M. (2000) PatSearch: a pattern matcher software that finds functional elements in nucleotide and protein sequences and assesses their statistical significance. Bioinformatics, 16, 439–450. [PubMed]
23. Etzold T., Ulyanov,A. and Argos,P. (1996) SRS: information retrieval system for molecular biology data banks. Methods Enzymol., 266, 114–128. [PubMed]
24. Hentze M.W. and Kuhn,L.C. (1996) Molecular control of vertebrate iron metabolism: mRNA-based regulatory circuits operated by iron, nitric oxide, and oxidative stress. Proc. Natl Acad. Sci. USA, 93, 8175–8182. [PMC free article] [PubMed]
25. Williams A.S. and Marzluff,W.F. (1995) The sequence of the stem and flanking sequences at the 3′ end of histone mRNA are critical determinants for the binding of the stem–loop binding protein. Nucleic Acids Res., 23, 654–662. [PMC free article] [PubMed]
26. Chen C. and Shyu,A. (1995) AU-rich elements: characterization and importance in mRNA degradation. Trends Biochem. Sci., 20, 465–470. [PubMed]
27. Goodwin E.B., Okkema,P.G., Evans,T.C. and Kimble,J. (1993) Translational regulation of tra-2 by its 3′-untranslated region controls sexual identity in C. elegans. Cell, 75, 329–339. [PubMed]
28. Hubert N., Walczak,R., Sturchler,C., Schuster,C., Westhof,E., Carbon,P. and Krol,A. (1996) RNAs mediating cotranslational insertion of selenocysteine in eukaryotic selenoproteins. Biochimie, 78, 590–596. [PubMed]
29. Walczak R., Westhof,E., Carbon,P. and Krol,A. (1996) A novel RNA structural motif in the selenocysteine insertion element of eukaryotic selenoprotein mRNAs. RNA, 2, 367–379. [PMC free article] [PubMed]
30. Fagegaltier D., Lescure,A., Walczak,R., Carbon,P. and Krol,A. (2000) Structural analysis of new local features in SECIS RNA hairpins. Nucleic Acids Res., 28, 2679–2689. [PMC free article] [PubMed]
31. Zaidi S.H.E. and Malter,J.S. (1994) Amyloid precursor protein mRNA stability is controlled by a 29-base element in the 3′-untranslated region. J. Biol. Chem., 269, 24007–24013. [PubMed]
32. Verrotti A., Thompson,S., Wreden,C., Strickland,S. and Wickens,M. (1996) Evolutionary conservation of sequence elements controlling cytoplasmic polyadenylylation. Proc. Natl Acad. Sci. USA, 93, 9027–9032. [PMC free article] [PubMed]
33. Amaldi F. and Pierandrei-Amaldi,P. (1997) TOP genes: a translationally controlled class of genes including those coding for ribosomal proteins. Prog. Mol. Subcell. Biol., 18, 1–17. [PubMed]
34. Kaspar R.L., Kakegawa,T., Cranston,H., Morris,D.R. and White,M.W. (1992) A regulatory cis element and a specific binding factor involved in the mitogenic control of murine ribosomal protein L32 translation. J. Biol. Chem., 267, 508–514. [PubMed]
35. Morris D.R., Kakegawa,T., Kaspar,R.L. and White,M.W. (1993) Polypyrimidine tracts and their binding proteins: regulatory sites for posttranscriptional modulation of gene expression. Biochemistry, 32, 2931–2937. [PubMed]
36. Hel Z., Di Marco,S. and Radzioch,D. (1998) Characterization of the RNA binding proteins forming complexes with a novel putative regulatory region in the 3′-UTR of TNF-α mRNA. Nucleic Acids Res., 26, 2803–2812. [PMC free article] [PubMed]
37. Zehner Z.E., Shepherd,R.K., Gabryszuk,J., Fu,T.F., Al-Ali,M. and Holmes,W.M. (1997) RNA–protein interactions within the 3′ untranslated region of vimentin mRNA. Nucleic Acids Res., 25, 3362–3370. [PMC free article] [PubMed]
38. Boado R.J. and Pardridge,W.M. (1998) Ten nucleotide cis element in the 3′-untranslated region of the GLUT1 glucose transporter mRNA increases gene expression via mRNA stabilization. Brain Res. Mol. Brain Res., 59, 109–113. [PubMed]
39. Le S.Y. and Maizel,J.V.,Jr (1997) A common RNA structural motif involved in the internal initiation of translation of cellular mRNAs. Nucleic Acids Res., 25, 362–369. [PMC free article] [PubMed]
40. Gebauer F., Corona,D.F., Preiss,T., Becker,P.B. and Hentze,M.W. (1999) Translational control of dosage compensation in Drosophila by sex-lethal: cooperative silencing via the 5′ and 3′ UTRs of msl-2 mRNA is independent of the poly(A) tail. EMBO J., 18, 6146–6154. [PMC free article] [PubMed]
41. Mariottini P., Shah,Z.H., Toivonen,J.M., Bagni,C., Spelbrink,J.N., Amaldi,F. and Jacobs,H.T. (1999) Expression of the gene for mitoribosomal protein S12 is controlled in human cells at the levels of transcription, RNA splicing, and translation. J. Biol. Chem., 274, 31853–31862. [PubMed]
42. Castagnetti S., Hentze,M.W., Ephrussi,A. and Gebauer,F. (2000) Control of oskar mRNA translation by Bruno in a novel cell-free system from Drosophila ovaries. Development, 127, 1063–1068. [PubMed]
43. Kim-Ha J., Kerr,K. and Macdonald,P.M. (1995) Translational regulation of oskar mRNA by bruno, an ovarian RNA-binding protein, is essential. Cell, 81, 403–412. [PubMed]
44. Guo L., Allen,E. and Miller,W.A. (2000) Structure and function of a cap-independent translation element that functions in either the 3′ or the 5′ untranslated region. RNA, 6, 1808–1820. [PMC free article] [PubMed]
45. Parsch J., Stephan,W. and Tanda,S. (1999) A highly conserved sequence in the 3′-untranslated region of the drosophila Adh gene plays a functional role in Adh expression. Genetics, 151, 667–674. [PMC free article] [PubMed]
46. Ostareck-Lederer A., Ostareck,D., Standart,N. and Thiele,B. (1994) Translation of 15-lipoxygenase mRNA is inhibited by a protein that binds to a repeated sequence in the 3′ untranslated region. EMBO J., 13, 1476–1481. [PMC free article] [PubMed]
47. Kozak M. (1999) Initiation of translation in prokaryotes and eukaryotes. Gene, 234, 187–208. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...