![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2009 Sabath et al; licensee BioMed Central Ltd. A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives 1Department of Biology and Biochemistry, University of Houston, Houston, TX 77204, USA Corresponding author.Niv Sabath: nsabath/at/uh.edu; Nicholas Price: price4890/at/gmail.com; Dan Graur: dgraur/at/uh.edu Received July 2, 2009; Accepted September 17, 2009. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract The Israeli acute paralysis virus (IAPV) is a honeybee-infecting virus that was found to be associated with colony collapse disorder. The IAPV genome contains two genes encoding a structural and a nonstructural polyprotein. We applied a recently developed method for the estimation of selection in overlapping genes to detect purifying selection and, hence, functionality. We provide evolutionary evidence for the existence of a functional overlapping gene, which is translated in the +1 reading frame of the structural polyprotein gene. Conserved orthologs of this putative gene, which we provisionally call pog (predicted overlapping gene), were also found in the genomes of a monophyletic clade of dicistroviruses that includes IAPV, acute bee paralysis virus, Kashmir bee virus, and Solenopsis invicta (red imported fire ant) virus 1. Background Colony collapse disorder (CCD) is a syndrome characterized by the mass disappearance of honeybees from hives [1]. CCD imperils a global resource estimated at approximately $200 billion [2]. For example, it has been estimated that up to 35% of hives in the US may have been affected [3]. Many culprits have been suggested as causal factors of CCD, among them fungal, bacterial, and protozoan diseases, external and internal parasites, in-hive chemicals, agricultural insecticides, genetically modified crops, climatic factors, changed cultural practices, and the spread of cellular phones [1]. The Israeli acute paralysis virus (IAPV), a positive-strand RNA virus belonging to the family Dicistroviridae, was found to be strongly correlated with CCD [4]. It was first isolated in Israel [5], but was later found to have a worldwide distribution [4,6,7]. The genome of IAPV contains two long open reading frames (ORFs) separated by an intergenic region. The 5' ORF encodes a structural polyprotein; the 3' ORF encodes a non-structural polyprotein [5]. The non-structural polyprotein contains several signature sequences for helicase, protease, and RNA-dependent RNA polymerase [5]. The structural polyprotein, which is located downstream of the non-structural polyprotein, encodes two (and possibly more) capsid proteins. Overlapping genes are easily missed by annotation programs [8], as evidenced by the fact that several overlapping genes were only detected by using the signatures of purifying selection [9-13]. Here, we apply a recently developed method for the detection of selection in overlapping reading frames [14] to the genome of IAPV and its relatives. Results and Discussion In the fourteen completely sequenced dicistroviral genomes (Table 1), we identified 43 same-strand overlapping ORFs of lengths equal or greater than 60 codons on the positive strand. Ten overlapping ORFs were found in concordant genomic locations in two or more genomes. The concordant overlapping ORFs were assigned to three orthologous clusters (Table 2). The overlapping ORFs in all three clusters are phase-1 overlaps, i.e., shifted by one nucleotide relative to the reading-frames of the known polyprotein genes. Two of the orthologous clusters overlap the gene encoding the nonstructural polyprotein and one overlaps the reading frame of the structural polyprotein. (In appendix 1, we present the results concerning the overlapping ORFs on the negative strand. We note, however, that dicistroviruses are not known to be ambisense [15].)
We identified a strong signature of purifying selection in cluster A that contains overlapping ORFs from four genomes: IAPV, Acute bee paralysis virus (ABPV), Kashmir bee virus (KBV), and Solenopsis invicta virus 1 (SINV-1) [16-18]. This ORF overlaps the 5' end of the structural polyprotein gene (Figure (Figure1A).1A
An additional indication for selection on these ORFs was obtained by comparing the degrees of conservation of the hypothetical protein sequences of the overlapping ORFs against the protein sequences of the known genes (structural and nonstructural polyproteins, Table 3). The degree of amino-acid conservation and, hence, sequence identity between orthologous protein-coding genes is influenced ceteris paribus by the intensity of purifying selection. If both overlapping genes are under similar strengths of selection, the amino-acid sequence identity of one pair of homologous genes would be similar to that of the overlapping pair. On the other hand, if a functional gene overlaps a non-functional ORF, the amino-acid identity between the hypothetical protein sequences of the non-functional ORFs would be much lower than that between the two homologous overlapping functional genes. We found that the degree of amino-acid conservation of the overlapping sequence identity between pairs of overlapping ORFs in cluster A is only slightly lower than that of the known gene (maximum of 12% difference between IAPV and SINV-1 in cluster A, Table 3). In contrast, the amino-acid sequence identity between ORF pairs in clusters B and C is much lower than that between the pairs of known genes (maximum of 44% difference between CrPV and DCV in cluster C, Table 3). The signature of purifying selection on the ORFs in cluster A suggests that they may encode functional proteins. We provisionally term this gene pog (predicted overlapping gene). In Figure Figure1,1 An examination of the DNA alignment of pog (Figures (Figures2)2
A protein motif search resulted in several matches, all with a weak score. Two patterns were found in all four proteins: (1) a signature of rhodopsin-like GPCRs (G protein-coupled receptors), and (2) a protein kinase C phosphorylation site (Figure (Figure3).3 Conclusion In this note, we provide evolutionary evidence (purifying selection) for the existence of a functional overlapping gene, pog, in the genomes of IAPV, ABPV, KBV, and SINV-1. To our knowledge, this putative gene, whose coding region overlaps the structural polyprotein, has not been described in the literature before. Methods Sequence Data, Processing, and Analysis Fourteen completely sequenced dicistrovirid genomes were obtained from NCBI (Table 1). Each genome was scanned for the presence of overlapping ORFs. We used BLASTP [26] with the protein sequences of the known genes to identify matches of orthologous overlapping ORFs (E value < 10-6). Matching overlapping ORFs were assigned into clusters. Within each cluster, we aligned the amino-acid orthologs by using the sequences of the known genes as references. If alignment length of the overlapping sequence exceeded 60 amino-acids, and if the amino-acid sequence identity among the hypothetical genes within a cluster was higher than 65%, we tested for selection on the hypothetical gene (see below). We aligned the protein sequences of the two polyproteins with CLUSTAW [27] as implemented in the MEGA package [28]. Alignment quality was confirmed using HoT [29]. We reconstructed two phylogenetic trees (one for each polyprotein) by applying the neighbor joining method [30], as implemented in the MEGA package [28]. Trees were rooted by the mid-point rooting method [31] and confidence of each branch was estimated by bootstrap with 1000 replications. Detection of Selection in Overlapping Genes We used the method of Sabath et al. [14] for the simultaneous estimation of selection intensities in overlapping genes. This method uses a maximum-likelihood framework to fit a Markov model of codon substitution to data from two aligned homologous overlapping sequences. To predict functionality of an ORF that overlaps a known gene, we modified an existing approach for predicting functionality in non-overlapping genes [32]. Given two aligned orthologous overlapping sequences, we estimate the likelihood of two hierarchical models. In model 1, there is no selection on the ORF. In model 2, the ORF is assumed to be under selection. The likelihood-ratio test is used to test whether model 2 fits the data significantly better than model 1, in which case, the ORF is predicted to be under selection and most probably functional. Motifs We looked for motifs within the inferred protein sequences encoded by the overlapping ORF by using the motif search server http://motif.genome.jp/ and the My-Hits server http://hits.isb-sib.ch/cgi-bin/PFSCAN with the following motif databases: PRINTS [33], PROSITE [34], and Pfam [35]. We used PSIPRED [20] to predict secondary structure, and MEMSAT [21] to predict transmembrane protein topology. Competing interests The authors declare that they have no competing interests. Authors' contributions NS carried out the analysis and wrote the draft manuscript. NP performed the motif search. DG and NP contributed to the interpretation of the results and the final version. All authors have read and approved the manuscript. Appendix 1 Overlapping ORFs on the negative strand In the fourteen completely sequenced dicistroviruse genomes (Table 1), we identified 240 overlapping ORFs of length equal or greater than 60 codons on the negative strand. Of the 240 ORFs, 113 were found in concordant genomic locations in two or more genomes. The concordant overlapping ORFs were assigned into 29 clusters (Additional file 1). There are 9, 1, and 19 clusters in phase 0, 1, and 2, respectively. The cluster size ranges from 2 to 9. In two clusters, 5 and 10, both in phase 2, there is a weak signature of selection. However, this signature seems to be a false positive, which was driven by the unique structure of opposite-strand phase-2 overlap (Additional file 2). In this structure, codon positions one and two of one gene match codon positions two and one of the overlapping gene. This structure leads to a situation where most changes are either synonymous or nonsynonymous in both overlapping genes and occasionally, to false signal of purifying selection on the overlapping ORF. In addition, one of the clusters (cluster 10) does not constitute a monophyletic clade, and is, therefore, unlikely to be functional. We therefore conclude that dicistroviruses most probably do not encode proteins on the negative strand. Additional file 1 Clusters of orthologous overlapping ORFs on the negative strands of dicistrovirid genomes. Click here for file(163K, DOC) Additional file 2 The corresponding codon positions of overlapping genes in opposite-strand phase-2. First and second codon positions, in which ~5% and 0% of the changes are synonymous, are marked in red. Third codon positions, in which ~70% of the changes are synonymous, are marked in blue. Click here for file(45K, PPT) Acknowledgements We thank Dr. Ilan Sela and an anonymous reviewer for their comments. This work was supported in part by US National Library of Medicine Grant LM010009-01 to Dan Graur and Giddy Landan and by the Small Grants Program of the University of Houston. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
PLoS Biol. 2007 Jun; 5(6):e168.
[PLoS Biol. 2007]PLoS One. 2008; 3(12):e4071.
[PLoS One. 2008]Science. 2007 Oct 12; 318(5848):283-7.
[Science. 2007]J Gen Virol. 2007 Dec; 88(Pt 12):3428-38.
[J Gen Virol. 2007]J Invertebr Pathol. 2008 Nov; 99(3):348-50.
[J Invertebr Pathol. 2008]J Gen Virol. 2007 Dec; 88(Pt 12):3428-38.
[J Gen Virol. 2007]Nat Med. 2001 Dec; 7(12):1306-12.
[Nat Med. 2001]Proc Natl Acad Sci U S A. 2008 Apr 15; 105(15):5897-902.
[Proc Natl Acad Sci U S A. 2008]Virol J. 2009 Mar 17; 6():32.
[Virol J. 2009]PLoS One. 2008; 3(12):e3996.
[PLoS One. 2008]Virus Res. 2003 Jun; 93(2):141-50.
[Virus Res. 2003]J Gen Virol. 2004 Aug; 85(Pt 8):2263-70.
[J Gen Virol. 2004]Virology. 2004 Oct 10; 328(1):151-7.
[Virology. 2004]PLoS One. 2008; 3(12):e3996.
[PLoS One. 2008]Mol Biol Evol. 1987 Jul; 4(4):406-25.
[Mol Biol Evol. 1987]Brief Bioinform. 2008 Jul; 9(4):299-306.
[Brief Bioinform. 2008]Microbiol Rev. 1983 Mar; 47(1):1-45.
[Microbiol Rev. 1983]Virol J. 2009 Mar 17; 6():32.
[Virol J. 2009]Bioinformatics. 2007 Mar 1; 23(5):538-44.
[Bioinformatics. 2007]Bioinformatics. 2000 Apr; 16(4):404-5.
[Bioinformatics. 2000]Bioinformatics. 2007 Mar 1; 23(5):538-44.
[Bioinformatics. 2007]Nat Immunol. 2001 Feb; 2(2):116-22.
[Nat Immunol. 2001]Mol Biol Evol. 2003 Jun; 20(6):979-87.
[Mol Biol Evol. 2003]J Mol Biol. 1990 Oct 5; 215(3):403-10.
[J Mol Biol. 1990]Curr Protoc Bioinformatics. 2002 Aug; Chapter 2():Unit 2.3.
[Curr Protoc Bioinformatics. 2002]Brief Bioinform. 2008 Jul; 9(4):299-306.
[Brief Bioinform. 2008]Mol Biol Evol. 2007 Jun; 24(6):1380-3.
[Mol Biol Evol. 2007]Mol Biol Evol. 1987 Jul; 4(4):406-25.
[Mol Biol Evol. 1987]PLoS One. 2008; 3(12):e3996.
[PLoS One. 2008]Genome Res. 2002 Jan; 12(1):198-202.
[Genome Res. 2002]Nucleic Acids Res. 2002 Jan 1; 30(1):239-41.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D227-30.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2008 Jan; 36(Database issue):D281-8.
[Nucleic Acids Res. 2008]Bioinformatics. 2000 Apr; 16(4):404-5.
[Bioinformatics. 2000]Bioinformatics. 2007 Mar 1; 23(5):538-44.
[Bioinformatics. 2007]