![]() | ![]() |
Formats:
|
||||||||||
Copyright © 2009 Firth and Atkins; licensee BioMed Central Ltd. A case for a CUG-initiated coding sequence overlapping torovirus ORF1a and encoding a novel 30 kDa product 1BioSciences Institute, University College Cork, Cork, Ireland 2Department of Human Genetics, University of Utah, Salt Lake City, UT 84112-5330, USA Corresponding author.Andrew E Firth: A.Firth/at/ucc.ie; John F Atkins: j.atkins/at/ucc.ie Received August 9, 2009; Accepted September 8, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract The genus Torovirus (order Nidovirales) includes a number of species that infect livestock. These viruses have a linear positive-sense ssRNA genome of ~25-30 kb, encoding a large polyprotein that is expressed from the genomic RNA, and several additional proteins expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the polyprotein coding sequence, ORF1a, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF1a. We propose that the new ORF utilizes a non-AUG initiation codon - namely a conserved CUG codon in a strong Kozak context - upstream of the ORF1a AUG initiation codon, resulting in a novel 258 amino acid protein, dubbed '30K'. Findings The genus Torovirus belongs to the family Coronaviridae in the order Nidovirales. Species include Bovine torovirus, Equine torovirus and Porcine torovirus. As with other members of the order Nidovirales, these viruses have a linear positive-sense ssRNA genome encoding a large replicase polyprotein that is expressed from the genomic RNA (ORF1a and, via ribosomal frameshifting, an ORF1a-ORF1b fusion product), and a number of other proteins - including the structural proteins - which are translated from a nested set of 3'-coterminal sub-genomic RNAs (Figure (Figure1A)1A
Overlapping genes are common in RNA viruses where they serve as a mechanism to optimize the coding potential of compact genomes. However, annotation of overlapping genes can be difficult using conventional gene-finding software [7]. Recently we have been using a number of complementary approaches to systematically identify new overlapping genes in virus genomes [7-11]. When we applied these methods to the toroviruses, we found strong evidence for a new coding sequence - overlapping the 5'-terminal region of ORF1a (Figure (Figure1).1 Relatively little sequence data is available for the relevant 5'-terminal region of the torovirus genome. In fact there are only two non-identical sequences in GenBank (tblastn [12] of translated NC_007447 ORF1a; 2 Aug 2009) for the region of interest: [GenBank:NC_007447] - Breda virus or Bovine torovirus (derived from [GenBank:AY427798]) [5], and [GenBank:DQ310701] - Berne virus or Equine torovirus [4]. However these two viruses are reasonably divergent (mean nucleotide identity within ORF1a ~68%), thus providing robust statistics for comparative methods of gene prediction. The NC_007447 and DQ310701 ORF1a amino acid sequences were aligned with CLUSTALW [13] and back-translated to produce a nucleotide sequence alignment, which was analyzed with a number of techniques. The first piece of evidence for an overlapping coding sequence is the presence of an unusually long open reading frame (229 codons; hereafter ORFX) at the 5' end of ORF1a but in the +2 reading frame relative to ORF1a (Figure (Figure1B,1B
Next, the ORF1a alignment was analysed for conservation at synonymous sites, as described in [11] (but inspired by ref. [14]). The procedure takes into account whether synonymous site codons are 1-, 2-, 3-, 4- or 6-fold degenerate and the differing probabilities of transitions and transversions. There was a striking, and highly statistically significant (p < 10-17 for the total conservation within ORFX), peak in ORF1a-frame synonymous site conservation at the 5' end of the alignment, corresponding precisely to the conserved open reading frame, ORFX (Figure (Figure1B,1B Finally, we analysed the alignment with MLOGD - a gene-finding program which was designed specifically for identifying overlapping coding sequences, and which includes explicit models for sequence evolution in multiply-coding regions [7,8] (Figure (Figure1B,1B In Breda virus (NC_007447), the annotated ORF1a AUG initiation codon is at nucleotide coordinates 859..861 and the first ORFX-frame AUG codon is at coordinates 1110..1112. However leaky scanning to this AUG codon is unlikely, due to intervening AUG codons in the ORF1a frame (1 in NC_007447, 3 in DQ310701; Figure Figure2).2 Initiation at the upstream CUG codon would give ORFX the nucleotide coordinates 774..1547 in NC_007447 and 776..1549 in DQ310701, resulting in a 258 amino acid product with a molecular mass of 30 kDa which, for want of a better designation, we tentatively name '30K'. The full predicted amino acid sequences are shown in Figure Figure3.3
It is expected that a large proportion of ribosomes should scan past the CUG codon and initiate at the ORF1a AUG codon - thus allowing synthesis of the replicase polyprotein - though the additional possibility that the CUG-initiation efficiency may be temporally regulated as part of the virus lifecycle can not currently be discounted [16,22]. Overlapping genes are difficult to identify and are often overlooked. However, it is important to be aware of such genes as early as possible in order to avoid confusion (otherwise functions of the overlapping gene may be wrongly ascribed to the gene they overlap), and also so that the functions of the overlapping gene may be investigated in their own right. We hope that presentation of this bioinformatic analysis will help fullfil these goals. Initial verification of ORFX product could be by means of immunoblotting with ORFX-specific antibodies, bearing in mind, however, that it may be expressed at relatively low levels. Competing interests The authors declare that they have no competing interests. Authors' contributions AEF carried out the bioinformatic analysis and wrote the manuscript. Both authors edited and approved the final manuscript. Acknowledgements This work was supported by National Institutes of Health Grant R01 GM079523 and an award from Science Foundation Ireland, both to JFA. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||
J Gen Virol. 1993 Nov; 74 ( Pt 11)():2305-16.
[J Gen Virol. 1993]J Gen Virol. 2006 Jun; 87(Pt 6):1403-21.
[J Gen Virol. 2006]Virus Res. 2006 Jan; 115(1):56-68.
[Virus Res. 2006]Virol J. 2009 Feb 5; 6():14.
[Virol J. 2009]BMC Bioinformatics. 2006 Feb 16; 7():75.
[BMC Bioinformatics. 2006]Bioinformatics. 2005 Feb 1; 21(3):282-92.
[Bioinformatics. 2005]Virol J. 2009 Feb 5; 6():14.
[Virol J. 2009]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Virus Res. 2006 Jan; 115(1):56-68.
[Virus Res. 2006]J Virol. 2006 Apr; 80(8):4157-67.
[J Virol. 2006]Nucleic Acids Res. 1994 Nov 11; 22(22):4673-80.
[Nucleic Acids Res. 1994]Virol J. 2009 Feb 5; 6():14.
[Virol J. 2009]Nucleic Acids Res. 2008 May; 36(8):2530-46.
[Nucleic Acids Res. 2008]J Virol. 2003 Sep; 77(17):9567-77.
[J Virol. 2003]Bioinformatics. 2005 Feb 1; 21(3):282-92.
[Bioinformatics. 2005]BMC Bioinformatics. 2006 Feb 16; 7():75.
[BMC Bioinformatics. 2006]Biol Cell. 2003 May-Jun; 95(3-4):169-78.
[Biol Cell. 2003]Nucleic Acids Res. 1987 Oct 26; 15(20):8125-48.
[Nucleic Acids Res. 1987]Proc Natl Acad Sci U S A. 1990 Nov; 87(21):8301-5.
[Proc Natl Acad Sci U S A. 1990]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Genome Res. 2007 Oct; 17(10):1496-504.
[Genome Res. 2007]J Virol. 2009 Oct; 83(20):10719-36.
[J Virol. 2009]Bioinformatics. 2001 Sep; 17(9):847-8.
[Bioinformatics. 2001]Biol Cell. 2003 May-Jun; 95(3-4):169-78.
[Biol Cell. 2003]Proc Natl Acad Sci U S A. 2008 Jul 22; 105(29):10079-84.
[Proc Natl Acad Sci U S A. 2008]