• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jvirolPermissionsJournals.ASM.orgJournalJV ArticleJournal InfoAuthorsReviewers
J Virol. Nov 2004; 78(22): 12576–12590.
PMCID: PMC525058

Functional Genomics Analysis of Singapore Grouper Iridovirus: Complete Sequence Determination and Proteomic Analysis

Abstract

Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products.

Iridoviruses are animal viruses that infect only invertebrates and poikilothermic vertebrates, such as fish, insects, amphibians, and reptiles (40). They have been implicated as causative agents of serious systemic diseases among cultured and ornamental fish, as well as wild fish. Within the family Iridoviridae, four genera of DNA-containing viruses are currently known to infect invertebrates (Iridovirus and Chloriridovirus) and cold-blooded vertebrates (Ranavirus and Lymphocystivirus) (36). Major characteristic features of all iridoviruses are the large icosahedral viral particles (120 to 300 nm) present in the cytoplasm. Generally, isolates from fish tend to be larger (200 to 300 nm) in size than both amphibian and invertebrate viruses (120 to 200 nm). To date, genome sequences of five iridovirus genomes have been published: Lymphocystis disease virus (LCDV) (genus Lymphocystivirus) (33), Chilo iridescent virus (CIV) (genus Iridovirus) (18), Tiger frog virus (TFV) (genus Ranavirus) (13), Infectious spleen and kidney necrosis virus (ISKNV) (genus unassigned) (14), and Ambystoma tigrinum virus (ATV) (genus Ranavirus) (19).

Iridovirus pathogens have been regarded as a cause of serious systemic diseases among feral, cultured, and ornamental fish in the recent years. Mortalities of fish due to systemic iridovirus infection reaching 30 to 100% were observed. Histopathological signs in iridovirus-infected fish may include enlargement of cells and necrosis of the renal and splenic hematopoietic tissues (28). In 1994, a novel viral disease called sleepy grouper disease (SGD) resulted in significant economic losses in Singapore marine net cage farms. Finally, this novel iridovirus of the genus Ranavirus, designated Singapore grouper iridovirus (SGIV), was successfully isolated in 1998 from brown-spotted grouper (6, 29). Further, it was successfully grown in an alternate grouper embryonated egg (Epinephelus tauvina) cell line, with good resultant titers (9) and was used as a source to purify SGIV. The physiochemical properties of SGIV have been reported previously (28). At the molecular level, only a partial sequence encoding the highly conserved major capsid protein in SGIV has been reported (28). Due to its relevance in the aquaculture industry, it is important to study the molecular mechanism of viral infection and virus-host interaction in grouper. As an initial part of these studies, we have determined the complete genomic sequence of SGIV. We have also confirmed the authenticity of some open reading frames (ORFs) using the proteomic approach and reverse transcription-PCR (RT-PCR).

MATERIALS AND METHODS

Virus infection, purification, and genomic DNA extraction.

Grouper embryonic cells from the brown-spotted grouper Epinephelus tauvina (5) were cultured in Eagle's minimum essential medium containing 10% fetal bovine serum, 0.116 M NaCl, 100 IU of penicillin G/ml and 100 μl of streptomycin sulfate/ml. Culture media were equilibrated with HEPES to the final concentration of 5 mM and adjusted to pH 7.4 with NaHCO3. Virus was inoculated onto confluent monolayers of the grouper cell line at a multiplicity of infection of approximately 0.1. When the cytopathic effect was sufficient, the medium containing SGIV was harvested and centrifuged at 12,000 × g for 30 min at 4°C. The pellet comprising the virus was resuspended with the culture medium and ultrasonicated. The suspension containing the lysate, virus, and cellular debris was then centrifuged at 4,000 × g for 20 min at 4°C. The supernatant was layered onto a cushion of 35% sucrose and centrifuged at 210,000 × g for 1 h at 4°C. The pellet, resuspended with the TN buffer (50 mM Tris-HCl [pH 7.4], 150 mM NaCl), was overlaid with 30, 40, 50, and 60% (m/v) sucrose gradients and centrifuged at 210,000 × g for 1 h at 4°C. Virus bands, present in 50% sucrose, were aspirated, sonicated briefly, and reloaded onto sucrose gradients. The lowest band (50% sucrose) was individually aspirated and spun down at 100,000 × g. The purity of virus was examined by negative staining under transmission electron microscopy (JEOL 100 CXII) and was shown to be sufficiently pure for isolation of the genomic DNA, construction of shotgun and restriction libraries, and proteomic analysis. The genomic DNA of the SGIV was treated with protease K and N-lauroylsarcosine, followed by phenol-chloroform extraction and alcohol precipitation (16).

Construction of libraries.

Soluble genomic DNA was quantified by spectrophotometry (UV-1600; Shimadzu). Sixty micrograms of genomic DNA was diluted with TM buffer (5 mM Tris-HCl [pH 8.0], 1.5 mM MgCl2) to a final volume of 200 μl and ultrasonicated (3-s bursts) using an ultrasonic liquid processor (model XL2020; Misonix Inc., Farmingdale, N.Y.). The appropriate viral DNA fragments (500 to 800 bp) were excised from the 1.0% agarose gel and extracted using the QIAquick gel extraction kit (QIAGEN). Genomic DNA fragments were end repaired with T4 DNA polymerase, followed by phosphorylation with T4 polynucleotide kinase. DNA fragments were purified using a High Pure PCR product purification kit (Roche) before the next enzymatic reaction. Sonicated fragments were ligated by incubation at 16°C overnight to the pUC19 vector, which had been prelinearized by SmaI followed by dephosphorylation. After purification, chimerical plasmids were transformed into electrocompetent-cell DH5α. More than 1,000 recombinants were selected from the library by the blue/white screening assay. To construct the restriction library, DNA fragments were obtained by restriction digestion with BamHI and cloned into the corresponding site of pBluescriptII KS(+) vector. Both libraries were used to scaffold the SGIV genome.

Assembly and analysis of SGIV genome.

Sequencing of the viral fragments was carried out following the standard protocol supplied by Applied Biosystems. All cycle sequencing products were loaded onto the ABI PRISM 3100 genetic analyzer to acquire nucleotide sequences from both directions. Before the scaffolds were created, high-throughput BLAST analysis was performed for all nucleotide sequences to eliminate contamination reads, followed by vector screening with the InterPhace program (University of Washington). A software package, Vector NTI Suite 7.1 (InforMax Inc., Frederick, Mass.), was applied to create the contigs, assemble the genome, identify ORFs, analyze presumptive genes, and draw the genomic map. The whole genome was also submitted to http://www.softberry.com (Softberry Inc., Mount Kisco, N.Y.) for identification of all potential ORFs. These ORFs were searched against the mirror site of National Center for Biotechnology Information (NCBI) nucleotide database at the Singapore Bioinformatics Institute. The presumptive genes were submitted to the NCBI network service to search for conserved domains. Protein motifs were analyzed by using the PROSITE database, release 18.17 (8). Signal peptides and signal anchors were predicted with SignalP V2.0 (24, 25). Signal anchors exist in certain membrane proteins (type II membrane proteins) attaching to the membrane by an N-terminal sequence which shares many characteristics with a signal peptide sequence but is not cleaved. Transmembrane domains were predicted with TMpred (15).

Mass spectrometric analysis of SGIV proteins.

The protein pellet of the lower band from sucrose gradient ultracentrifugation was separated by one-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Thirty-nine well-separated protein bands were excised, reduced, alkylated, and digested with trypsin (31). To extract the peptides, the gel particles were twice treated with 20 mM NH4HCO3 and 5% formic acid in 50% acetonitrile, respectively. All supernatants were combined and dried in a vacuum centrifuge. Dried peptides were dissolved in 3 to 20 μl of 0.1% trifluoroacetic acid in 50% acetonitrile. Dissolved peptides (0.5 μl) were spotted onto a target plate, followed by an equal volume of 10-mg/ml α-cyano-4-hydroxycinnamic acid in 50% acetonitrile-0.1% trifluoroacetic acid. After the spots had dried, the target plate was loaded into a Voyager-DE STR BioSpectrometry workstation mass spectrometer (PerSeptive Biosystems, Framingham, Mass.). Mass spectra were acquired with 20.5 kV, 73.5% of grid, and a delayed time of 380 ns under a positive-ion reflector mode. The resulting peptide mass fingerprints were searched against the SGIV ORF database using the AutoMS-Fit search program (version 1.2.18; PerSeptive Biosystem).

RT-PCR.

Total RNA was extracted from viral cultures at different infective stages using an RNeasy Mini kit (QIAGEN). After the treatment of the total RNA with the RNase-free DNase I (QIAGEN), gene-specific primers were used to amplify the target genes by using the OneStep RT-PCR kit (QIAGEN). All the steps were followed according to the manufacturer's manual. Briefly, cDNA was reverse transcribed at 50°C for 30 min. The PCR amplification segment was started with an initial heating step at 95°C for 15 min (in order to simultaneously deactivate omniscript and sensiscript reverse transcriptases). After the activation of the HotStarTaq DNA polymerase, PCR amplification reactions were performed for 30 cycles under conditions of 95°C for 30 s, 51 to 58°C for 15 s, and 72°C for 1 min per cycle. The annealing temperature was optimized for different target genes. RT-PCR products were analyzed with 1% agarose gel and also subjected to nucleotide sequencing.

Virus abbreviations.

ALIV, African lampeye iridovirus; ATV, Ambystoma tigrinum virus; BIV, Bohle iridovirus; BVDV, bovine viral diarrhea virus; CIV, Chilo iridescent virus; CV, chlorella virus; CZIV, Costelytra zealandica iridescent virus; EHDV, epizootic hemorrhagic disease virus; EHNV, epizootic hematopoietic necrosis virus; EHV-1, equine herpesvirus; FPV, fowlpox virus; FV3, frog virus 3; GIV, grouper iridovirus; GSIV, giant seaperch iridovirus; HVAV, Heliotis virescens ascovirus; IMRV, Ictalurus melas ranavirus; ISKNV, infectious spleen and kidney necrosis virus; LBIV, largemouth bass iridovirus; LCDV-1, lymphocystis disease virus 1; LYCIV, large yellow croaker iridovirus; MSEPV, Melanoplus sanguinipes entomopoxvirus; OMRV, Oncorhynchus mykiss ranavirus; PBCV, Paramecium bursaria chlorella virus; RGV, Rana grylio virus; RRV, Regina ranavirus; RSBI, Red Sea bream iridovirus; SBIV, sea bass iridovirus; SCV, Siniperca chuatsi virus; SFAV, Spodoptera frugiperda ascovirus; SGIV, Singapore grouper iridovirus; SIV, Simulium iridescent virus; SOV, Sciaenops ocellatus virus; TFV, tiger frog virus; TIV, Tipula iridescent virus; WIV, Wiseana iridescent virus.

RESULTS AND DISCUSSION

Determination of the SGIV genome sequence.

We set out to generate 8× to 9× genome coverage of the SGIV genome. The bulk of the sequence coverage (2,065 passing reads) resulted from the shotgun library. However, 214 passing reads from the restriction library provided important intermediate-range linking information for assembly. Thirteen contigs ranging from 28,106 to 651 bp were scaffolded with the Contig Express program (CEP) of the Vector NTI suite 7.1. Final gaps were directly sequenced off the genomic DNA with custom synthetic primers and closed by 50 passing reads. In total, 2,329 cycle sequencing reaction products (free of contamination reads) from both random shotgun and restriction libraries were used to assemble the SGIV genome. Most of the genome (98.4%) was compiled by sequencing at least three times. Only 1.6% of the genome was assembled from a single recombinant. One hundred percent of the genome sequence was constructed from sequencing in both directions. Like other iridoviruses, SGIV was made up of a double-stranded DNA which is circularly permuted (30, 11). The whole SGIV genome consists of 140,131 bp with a G+C content of 48.64% (Fig. (Fig.1),1), which is slightly less than that of TFV (55.01%), ISKNV (54.78%), and ATV (54.02%) but substantially more than that of LCDV-1 (29.07%) and CIV (28.63%).

FIG. 1.
Organization of the SGIV genome. The SGIV genome is shown in a linear format. A total of 162 ORFs, predicted by the FGENESV program (available through: http://www.softberry.com), supplemented with Vector NTI suite 7.1, are indicated by their locations, ...

Coding capacity of the viral genomic DNA sequence.

Prediction of presumptive genes was carried out by using the viral gene prediction program under the website http://www.softberry.com supplemented with Vector NTI suite 7.1. One hundred sixty-two presumptive ORFs were identified to code for proteins ranging from 41 to 1,268 amino acids on the sense (R) and antisense (L) DNA strands (Table (Table1).1). Computer-assisted analyses of the deduced amino acid sequences revealed that 23 of the ORFs share high levels of identity to iridovirus proteins which have been described previously to have specific biological functions. Fifty-one ORFs are homologous to other iridovirus genes, for which the corresponding proteins and their respective functions remain unknown. Additionally, three ORFs show weak homologies to genes of other viruses. Forty-two conserved domains, motifs, or signatures are identified from the NCBI CD-Search database and the PROSITE database (Table (Table1).1). A number of genes of SGIV are shown to be present in the ATV, TFV, LCDV, CIV, and ISKNV genomes. These include genes for the DNA polymerase, the DNA repair protein, the two largest subunits of DNA-dependent RNA polymerase II, the TFIIS, RNase III, ATPase, etc. (Table (Table1).1). There is no evidence of introns, and both strands are shown to contain ORFs. Three pairs of ORFs partially overlap other ORFs.

TABLE 1.
Listing of potential expressed ORFs in SGIV

Repetitive regions.

The analysis of the genome showed the presence of 17 repetitive regions distributed throughout the genome. In total. these occupy 2.6% of the SGIV genome, varying in size from 31 to 1,119 bp. These regions encompass eight perfect and nine imperfect repetitive sequences whose match percentages range from 80 to 99% (Table (Table2).2). No homologies between those repeats were detected. The base composition of 12 repeats is found to be more than 65% G+C. The longest perfect repetitive region, consisting of 11.4 copy numbers and 63 bp per period, is identified at positions 99529 to 100248 in the genome, where it is situated at the position 277 bp upstream of the start codon of the largest subunit of DNA-dependent RNA polymerase II (ORF104L). The biological function of these repetitive sequences remains to be determined. However, “junk DNA” intergenic sequences have been found to exert control over recombination, DNA replication, and gene expression. Many repeats act as binding sites for proteins or as structural elements on the level of RNA (35).

TABLE 2.
Positions of repetitive sequences in SGIV genome

DNA replication and repair.

Iridovirus replication occurs in two phases: a nuclear phase and a cytoplasmic phase. A functional nucleus is an essential cellular component for virus replication. After viral DNA is synthesized in the cell nucleus, the majority of viral DNA is transported to the cytoplasm where the packaging of DNA into the viral capsid occurs (40). SGIV encodes homologs of proteins involved in DNA replication, such as DNA polymerase (ORF128R), DNA repair protein (ORF097L), ATP-GTP binding protein (ORF052L), DNA binding/packing protein (ORF116R), and two helicases (ORF060R and ORF152R) containing highly conserved domains for DNA recombination and repair besides replication (Fig. (Fig.22).

FIG. 2.
Sequence alignment of selective SGIV ORF060R, ORF152L, and ORF076L with other known proteins. The homologous regions are shaded (black represents identical, grey represents conservative). The positions of the amino acid sequence are indicated on the left ...

ORF146L encodes a putative NTPase/helicase-like protein which could be a primase whose continual activity is required at the DNA replication fork. It catalyzes the synthesis of short molecules used as primers for DNA polymerase. ORF025L encodes a putative DNA binding motif—the so-called SAP motif (named after SAF-A/B, Acinus and PIAS)—which is found in a number of chromatin-associated proteins. It binds specifically to DNA elements called scaffold/matrix attachment regions, which are chromatin regions that bind to the nuclear matrix. Two proteins containing the SAP motif, SAF-A and Acinus, are targets of caspase cleavage during apoptosis, followed by chromatin degradation typical of programmed cell death (3). During apoptosis, SAF-A is cleaved in a caspase-dependent way. The cleavage occurs within the bipartite DNA-binding domain, resulting in the loss of DNA-binding activity and the concomitant detachment of SAF-A from nuclear structural sites. On the other hand, the cleavage does not compromise the association of SAF-A with hnRNP complexes, indicating that the function of SAF-A in the RNA metabolism is not affected during apoptosis (10). It may be inferred that the detachment of SAF-A, caused by the apoptotic proteolysis of its DNA-binding domain, could contribute to nuclear breakdown during host cell apoptosis.

Transcription and mRNA biogenesis.

The putative SGIV gene products that are related to DNA transcription comprise the two largest subunits of DNA-dependent RNA polymerase II (ORF073L and ORF104L), one transcription elongation factor, TFIIS (ORF085R), and one RNase III enzyme (ORF084L; RNase III).

In addition, ORF063L exhibits similarity to one of the rat transcription factors which are important for transcriptional initiation. It may normally act to repress transcription at a variety of loci and may also play a role in chromatin structure or assembly (32). ORF061R encodes a TFIIF-interacting CTD phosphatase motif. It includes an NLI-interacting factor involved in RNA polymerase II regulation. ORF102L contains a fusion protein domain consisting of ubiquitin at the N terminus and ribosomal protein L40 at the C terminus. It also contains a zinc finger-like domain and is located in the cytoplasm (4). Ubiquitin is a highly conserved nuclear and cytoplasmic protein that has a major role in targeting cellular proteins for degradation by the 26S proteosome. It is also involved in the maintenance of chromatin structure, the regulation of gene expression, and the stress response.

Nucleotide metabolism.

Predicted amino acid sequences of proteins required for the nucleotide transport and metabolism contain α and β subunits of ribonucleoside-diphosphate reductase (ORF064R and ORF047L), a ubiquitous cytosolic enzyme with a key role in DNA synthesis as it catalyzes the biosynthesis of deoxyribonucleotides. ORF049L encodes a dUTPase which is critical for the fidelity of DNA replication and repair. It also decreases the intracellular concentration of dUTP so that uracil cannot be incorporated into DNA (7). Purine nucleoside phosphorylase, which is involved in nucleotide transport and metabolism and encoded by ORF076L and which exists widely in mammals, was first identified in the family of Iridoviridae (Fig. (Fig.22).

Cell signaling.

ORF078L and ORF081L encode two protein kinases that share a conserved catalytic core common with both serine/threonine and tyrosine protein kinases. There are a number of conserved regions in the catalytic domain of protein kinases. The protein corresponding to ORF067L belongs to the family of deoxynucleoside kinases that consists of various cytidine, guanosine, adenosine, and thymidine kinases (which also phosphorylate deoxyuridine and deoxycytosine). These enzymes catalyze the production of deoxynucleotide 5′-monophosphate from a deoxynucleoside.

Immune evasion function.

ORF028L, ORF029L, ORF031L, ORF033L, ORF035L, and ORF131R encode homologs of the immunoglobulin (Ig)-like domains. Cellular members of the Ig superfamily include secreted and membrane-bound receptors and cell adhesion proteins (ORF029L and ORF035L) (39). ORF005L encodes a homolog of a mammalian amino acid transporter. It is also comprised of a C-type lectin signature which may bind to major histocompatibility complex (MHC) class I complex antigens and may promote or inhibit immune activity through intracellular signaling pathways. Thus, it is possible that ORF005L may interfere with normal immune surveillance or host responses (2). ORF068L is composed of an Ig-MHC signature ([FY]-x-C-x-[VA]-x-H). It is known that Ig constant domains and a single extracellular domain in each type of MHC chain are related. These homologous domains are approximately 100 amino acids long and include a conserved intradomain disulfide bond (26). These genes may function in host immune evasion, immune modulation, and aspects of cell and/or tissue tropism or perform other cellular functions (2).

ORF070R encodes a thiol oxidoreductase that impels the formation of disulfide bond. The correct formation of disulfide bonds is important for the folding and function of many secretory and membrane proteins. Organisms from all kingdoms of life have evolved a diverse range of thiol oxidoreductases (21).

ORF155R exhibits homology to mammalian semaphorin homologue. The sema domain occurs in semaphorins, which are a large family of secreted and transmembrane proteins, some of which function as repellent signals during axon guidance. Sema domains also occur in the hepatocyte growth factor receptor (41).

ORF053R encodes a prokaryotic membrane lipoprotein lipid attachment site found in prokaryotes. To our knowledge, this is a first report of this motif in iridovirus. Membrane lipoproteins are synthesized with a precursor signal peptide, which is cleaved by a specific lipoprotein signal peptidase (signal peptidase II). The peptidase recognizes a conserved sequence and cuts upstream of a cysteine residue to which a glyceride-fatty acid lipid is attached (12).

Cellular function.

ORF003L is similar to 3-β-hydroxysteroid dehydrogenase from TFV and other poxviruses. It catalyzes the oxidative conversion of both 3-β-hydroxysteroid and ketosteroids, playing a critical role in biosynthesis of all classes of steroid hormones. ORF130L encodes a TonB-dependent receptor that interacts with outer membrane receptor proteins that carry out high-affinity binding and energy-dependent uptake of specific substrates into the periplasmic space. These substrates are either poorly permeative through porin channels or are encountered at very low concentrations. In the absence of TonB, these receptors bind to their substrates but do not carry out active transport. ORF115R encodes a homolog of a Bak protein, a member of the B-cell lymphoma (32% identity over 152 amino acids). Bcl-2 and related cytoplasmic proteins are key regulators of apoptosis, the cell suicide program critical for development, tissue homeostasis, and protection against pathogens. Bcl-2 family members are essential for maintenance of major organ systems to prevent a cellular apoptotic response to viral infection (1). ORF019R is composed of a glycoprotein hormone β chain signature. The function of ORF019R in the viral replication cycle is unknown.

Phylogenetic analysis.

Iridoviruses are large cytoplasmic DNA viruses where each type has a specific insect or vertebrate host (38). One of the unifying features of this virus group is the presence of a major capsid protein (MCP) that is approximately 50 kDa in size. MCP is a suitable target for the study of viral evolution, since it contains highly conserved domains, but is sufficiently diverse to distinguish closely related iridovirus isolates (34). The amino acid sequences of the known MCPs are used in comparative analyses to elucidate the phylogenic relationships between different cytoplasmic DNA viruses.

ORF072R encodes SGIV MCP. Phylogenetic analysis indicated that SGIV is distinct from all known iridoviruses (Fig. (Fig.3),3), but it is much closer to the genus Ranavirus. Within the MCP, amino acid identities of 73.0 (BIV), 72.8 (TFV), 72.8 (FV3), 72.4 (ATV), and 72.1% (ENHV) are noted. However, it only shows amino acid identities of 52.2 (LCDV), 45.7 (CIV), and 44.4% (ISKNV). This suggested that SGIV is a novel member of the genus Ranavirus within the family Iridoviridae. Generally, viruses with sequence identities within a given gene of less than 80% are considered members of different species rather than strains of the same species (37). The conserved protein sequence of the ATPase was also used to determine the relationship of SGIV with other iridoviruses (Fig. (Fig.3).3). The phylogenic tree of ATPase supports the view that SGIV is a novel species of the genus Ranavirus.

FIG. 3.
Phylogenetic relationship of SGIV with representative iridoviruses. The analysis was based on the multiple alignments of the protein sequences of the major capsid protein and ATPase of iridoviruses. (A) SGIV, ORF072R, accession no. ...

Relationship of SGIV to other iridoviruses.

Conservation of synteny and of gene order can give insights to assess structural conservation among the viral genomes within the family Iridoviridae. Conservation of synteny refers to a pair of genomes in which at least some of the genes are located at similar map positions regardless of the gene order or the presence of intervening genes. When the evolutionary distance is large, scrambling of the gene order and the presence of nonsyntenic intervening genes become frequent (23). Therefore, it is necessary to account for these features when studying iridovirus evolution.

To make comparisons between SGIV and five other iridovirus genomes (ATV, TFV, LCDV, ISKNV, or CIV), we shifted the starting coordinates and set the start codon (ATG) of MCPs as the first base for all viral genomes. We also altered sense and antisense strands on ATV, LCDV, ISKNV, and CIV genomes in order to get the same nucleotide order on MCPs individually. However, none of the annotated ORFs were affected.

Comparing the SGIV genome to the LCDV, ISKNV, or CIV genome does not show possible clustering of genes in spite of the fact that SGIV shares 43, 22, or 29 real or annotated ORFs with the LCDV, ISKNV, or CIV genome, respectively. Although only 20 ORFs of SGIV reveal similarities to those of TFV genomes, it appears that some genes are located at similar map positions. In contrast, comparison of the SGIV genome with those of other iridoviruses shows that SGIV is much closer to ATV than other iridoviruses whose genomes are known. The sequenced genomes of the two closely related iridoviruses SGIV and ATV were compared with emphasis on genome organization and coding capacity (Fig. (Fig.4).4). The genome size and ORF numbers of the SGIV genome are much larger than those of ATV, which has a genome of 106,332 bp and contains 91 ORFs. Seventy-one ORFs of SGIV and ATV showed close homologies. There were some discrepancies in annotation, but inspection of DNA sequences showed that the corresponding genes are always present. Twenty-two corresponding ORFs between these two genomes are putative genes, but all remaining ORFs have no known function (Table (Table1).1). At least eight regions of conserved synteny containing more than three genes or annotated ORFs were also examined. Interestingly, TFIIS, RNase III, and one ORF (SGIV 086R, ATV 023L, and TFV 087R) are arranged in succession among SGIV, ATV, and TFV (Fig. (Fig.4).4). This cluster of genes may become a useful gene marker to distinguish unknown viruses from the genus Ranavirus. Scrambling of gene blocks was also observed between these two genomes. Two continuous conserved regions (blocks 4 and 5) in the SGIV genome were located at two separate gene blocks in the ATV genome, in which blocks 2 and 6 inserted. Orthologous genes between SGIV and ATV are quite similar in sequence conservation and also in gene order. Conserved linkages between SGIV and ATV indicate that they evolved from a common ancestor.

FIG. 4.
Conserved segments between the SGIV and ATV genomes. Both genomes are linearized and shifted genes encoding MCP as the start point. Only linked genes or annotated ORFs are indicated. Straight lines represent the gene linkages between two species. Black ...

Identification of SGIV proteins by MALDI-TOF MS and RT-PCR.

Purified viral proteins of SGIV extracted from the lowest band (50% sucrose) were separated by SDS-PAGE (Fig. (Fig.5).5). Thirty-nine clearly defined bands were excised and subjected to reduction, alkylation, tryptic digestion, and mass spectrometric analysis by matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry. Peak lists of tryptic peptide molecular weights of each band were searched against the 162-ORF database of SGIV to identify the proteins and corresponding genes. Twenty-six proteins, covering 5 to 67% of amino acid sequences, were matched with the theoretical SGIV ORF database by using the AutoMS-Fit search program (Table (Table3).3). Of those proteins matched in this study, only six are known viral proteins; these are MCP (ORF072R), DNA polymerase (ORF128R), two proteins relevant to DNA replication (ORFs 052L and 060R), RNase III (ORF084L), and tyrosine protein kinase (ORF078L). Several SDS-PAGE bands in the low-molecular-mass area (bands 35 to 39 and molecular masses around 10 kDa) matched ORF052L and ORF060R. However, the molecular weight search scores were quite low, and the identities of these proteins cannot be confirmed from the data. Matching of multiple numbers of SDS-PAGE bands to ORF052L and ORF060R may be explained by possible degradation of these large proteins during virus purification, since no protease inhibitors were used during these procedures. We were able to verify 12 SGIV genes which exhibited homologies to genes from other iridoviruses but whose biological function remain to be established. Another eight SGIV genes of unknown function, showing no homologies to any other viruses, were also verified.

FIG. 5.
SDS-PAGE of SGIV proteins. Viral proteins were purified and separated via one-dimensional SDS-PAGE. Thirty-nine visible gel-separated protein bands were excised and digested enzymatically, and their mass spectra were obtained and automatically searched ...
TABLE 3.
Identification of SGIV proteins corresponding to ORFs by MS

Mass spectrometry is a powerful and a high-throughput technique used to identify proteins. It has been applied to analyze the proteome of white spot shrimp virus (17). The completion of the genomic DNA sequence of SGIV greatly facilitated the discovery of new proteins by the proteomic approach, which proved to be an effective and sensitive way for discovering SGIV proteins. We have analyzed the SGIV proteome by one-dimensional gel. Furthermore, two-dimensional gel analysis will be used later to identify more novel proteins.

All 20 novel genes mentioned above were further checked and verified at the RNA level by RT-PCR. Total RNA (including virus and host) was extracted at 0-, 6-, 12-, 24-, 48-, and 72-h infective stages. Several genes started transcription early after the cell line was inoculated, 12 h (i.e., ORF090R and ORF093R) (data not shown). All novel genes were detected by RT-PCR after 48 h of infection (Fig. (Fig.6).6). Full lengths of 14 novel genes were amplified by reverse transcriptase and HotStarTaq DNA polymerase (Fig. (Fig.6A6A and and6).6). However, only partial sequences of ORF012L (2,107 to 3,075 bp), ORF039L (1 to 900 bp), ORF046L (17 to 747 bp), ORF050L (9 to 600 bp), ORF055R (12 to 588 bp), and ORF057L (7 to 832 bp) were amplified (Fig. (Fig.6C).6C). Furthermore, RT-PCR products were used for DNA sequencing to confirm their respective authenticity.

FIG. 6.
Amplification of 20 novel genes of SGIV via RT-PCR. Total RNA (harvested after 48 h of infection) was isolated by using the RNeasy Mini kit and amplified by using the OneStep RT-PCR kit. Full lengths of 14 genes were amplified (A and B). Partial sequences ...

Prediction of potential novel proteins.

The existence of an ORF in genomic data does not necessarily imply the existence of a functional gene. Despite the advances in bioinformatics, it is difficult to predict genes accurately from the genomic data alone (27). Although the genome sequence of the SGIV will ease the problem of gene prediction through comparative genomics, the success rate for correct prediction of the primary structure is still low. Therefore, verification of a gene product by proteomic methods is an important first step in annotating the genome. We predicted the secondary protein structures for these novel or unidentified proteins. Using a protein secondary structure predicting program, PSIPRED (20, 22), for the 20 novel proteins identified by MS in this study, we found that two proteins encoded by ORF046L and ORF050L consisted of random coils. ORF012L encoded a protein containing only α helices. Another 17 proteins were categorized as α/β proteins. The prediction of transmembrane regions and orientation was also done via TMpred on the ISREC server and is listed in Table Table1.1. We intend to elucidate the three-dimensional structures of these novel proteins by analyzing structural biology and their functions by using small interfering RNA and other related technologies.

CONCLUSION

We report a complete sequence of SGIV. Genomic analysis of SGIV provided fundamental knowledge of viral functions, such as DNA replication and transcription, nucleotide metabolism, protein processing, manipulation of cellular responses, and virus-host interaction. We compared the SGIV genome with other five iridovirus genomes at the DNA and protein levels. Besides the conserved and known proteins, we also identified 20 novel proteins by using the proteomic approach. Proteomic analysis showed evidence of novel proteins detected at the posttranscriptional level. Our studies will provide important information on molecular mechanism of virus-host interactions and will have a broad impact on future strategies for the design of specific inhibitors or drugs to control these pathogens in general.

Acknowledgments

We greatly appreciate Shashikant Joshi for modification of the manuscript. We thank Yunhan Hong for helpful discussions. Swarup Sanjay's suggestions regarding the construction of the shotgun library are acknowledged. We are grateful to Xianhui Wang for advice on mass spectrometry and Yun Ping Lim for her assistance in the bioinformatic work.

This work was financially supported by the grant “Establishment of a Laboratory of Excellence in Aquatic and Marine Biotechnology (LEAMB)” to Choy Leong Hew.

REFERENCES

1. Adams, J. M., and S. Cory. 1998. The Bcl-2 protein family: arbiters of cell survival. Science 281:1322-1326. [PubMed]
2. Afonso, C. L., E. R. Tulman, Z. Lu, L. Zsak, G. F. Kutish, and D. L. Rock. 2000. The genome of fowlpox virus. J. Virol. 74:3815-3831. [PMC free article] [PubMed]
3. Ahn, J. S., and M. C. Whitby. 2003. The role of the SAP motif in promoting holliday junction binding and resolution by SpCCE1. J. Biol. Chem. 278:29121-29129. [PubMed]
4. Chan, Y. L., K. Suzuki, and I. G. Wool. 1995. The carboxyl extensions of two rat ubiquitin fusion proteins are ribosomal proteins S27a and L40. Biochem. Biophys. Res. Commun. 215:682-690. [PubMed]
5. Chew-Lim, M., G. H. Ngoh, M. K. Ng, J. M. Lee, P. Chew, J. Li, Y. C. Chan, and J. L. C. Howe. 1994. Grouper cell line for propagating grouper viruses. Singap. J. Prim. Ind. 22:113-116.
6. Chua, F. H. C., M. L. Ng, K. L. Ng, J. J. Loo, and J. Y. Wee. 1994. Investigation of outbreaks of a novel disease, ‘sleepy grouper disease,’ affecting the brown-spotted grouper, Epinephelus tauvina Forskal. J. Fish Dis. 17:417-427.
7. Eklunda, H., U. Uhlina, M. Färnegårdh, D. T. Loganb, and P. Nordlundb. 2001. Structure and function of the radical enzyme ribonucleotides reductase. Prog. Biophys. Mol. Biol. 77:177-268. [PubMed]
8. Falquet, L., M. Pagni, P. Bucher, N. Hulo, C. J. Sigrist, K. Hofmann, and A. Bairoch. 2002. The PROSITE database, its status in 2002. Nucleic Acids Res. 30:235-238. [PMC free article] [PubMed]
9. Gibson-Kueh, S., P. Netto, G. H. Ngoh-Lim, S. F. Chang, L. L. Ho, Q. W. Qin, F. H. C. Chua, M. L. Ng, and H. W. Ferguson. 2003. The pathology of systemic iridoviral disease in fish. J. Comp. Pathol. 129:111-119. [PubMed]
10. Gohring, F., B. L. Schwab, P. Nicotera, M. Leist, and F. O. Fackelmayer. 1997. The novel SAR-binding domain of scaffold attachment factor A (SAF-A) is a target in apoptotic nuclear breakdown. EMBO J. 16:7361-7371. [PMC free article] [PubMed]
11. Goorha, R., and K. G. Murti. 1982. The genome of frog virus 3, an animal DNA virus, is circularly permuted and terminally redundant. Proc. Natl. Acad. Sci. USA 79:248-252. [PMC free article] [PubMed]
12. Hayashi, S., and H. C. Wu. 1990. Lipoproteins in bacteria. J. Bioenerg. Biomembr. 22:451-471. [PubMed]
13. He, J. G., L. Lu, M. Deng, H. H. He, S. P. Weng, X. H. Wang, S. Y. Zhou, Q. X. Long, X. Z. Wang, and S. M. Chan. 2002. Sequence analysis of the complete genome of an iridovirus isolated from the tiger frog. Virology 292:185-197. [PubMed]
14. He, J. G., M. Deng, S. P. Weng, Z. Li, S. Y. Zhou, Q. X. Long, X. Z. Wang, and S. M. Chan. 2001. Complete genome analysis of the mandarin fish infectious spleen and kidney necrosis iridovirus. Virology 291:126-139. [PubMed]
15. Hofmann, K., and W. Stoffel. 1993. TMbase—a database of membrane spanning proteins segments. Biol. Chem. Hoppe-Seyler 374:166. http://www.ch.embnet.org/software/TMPRED_form.html.
16. Huang, C. H., L. R. Zhang, J. H. Zhang, L. C. Xiao, Q. J. Wu, D. H. Chen, and J. K. K. Li. 2001. Purification and characterization of white spot syndrome virus (WSSV) produced in an alternate host: crayfish, Cambarus clarkia. Virus Res. 76:115-125. [PubMed]
17. Huang, C. H., X. B. Zhang, Q. S. Lin, X. Xu, Z. H. Hu, and C. L. Hew. 2002. Proteomic analysis of shrimp white spot syndrome viral proteins and characterization of a novel envelope protein VP466. Mol. Cell. Proteomics 1:223-231. [PubMed]
18. Jakob, N. J., K. Muller, U. Bahr, and G. Darai. 2001. Analysis of the first complete DNA sequence of an invertebrate iridovirus: coding strategy of the genome of Chilo iridescent virus. Virology 286:182-196. [PubMed]
19. Jancovich, J. K., J. H. Mao, V. G. Chinchar, C. Wyatt, S. T. Case, S. Kumar, G. Valente, S. Subramanian, E. W. Davidson, J. P. Collins, and B. L. Jacobs. 2003. Genomic sequence of a ranavirus (family Iridoviridae) associated with salamander mortalities in North America. Virology 316:90-103. [PubMed]
20. Jones, D. T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195-202. [PubMed]
21. Kadokura, H., and J. Beckwith. 2001. The expanding world of oxidative protein folding. Nat. Cell Biol. 3:E247-E249. [PubMed]
22. McGuffin, L. J., K. Bryson, and D. T. Jones. 2000. The PSIPRED protein structure prediction server. Bioinformatics 16:404-405. http://bioinf.cs.ucl.ac.uk/psipred/psiform.html. [PubMed]
23. Nadeau, J. H., and D. Sankoff. 1998. The lengths of undiscovered conserved segments in comparative maps. Mamm. Genome 9:491-495. [PubMed]
24. Nielsen, H., and A. Krogh. 1998. Prediction of signal peptides and signal anchors by a hidden Markov model, p. 122-130. In Proceedings of the 6th International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, Calif.
25. Nielsen, H., J. Engelbrecht, S. Brunak, and G. V. Heijne. 1997. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 10:1-6. [PubMed]
26. Orr, H. T., D. Lancet, R. J. Robb, J. A. Lopez de Castro, and J. L. Strominger. 1979. The heavy chain of human histocompatibility antigen HLA-B7 contains an immunoglobulin-like region. Nature 282:266-270. [PubMed]
27. Pandey, A., and M. Mann. 2000. Proteomics to study genes and genomes. Nature 405:837-846. [PubMed]
28. Qin, Q. W., S. F. Chang, G. H. Ngoh-Lim, S. Gibson-Kueh, C. Shi, and T. J. Lam. 2003. Characterization of a novel ranavirus isolated from grouper Epinephelus tauvina. Dis. Aquat. Org. 53:1-9. [PubMed]
29. Qin, Q. W., T. J. Lam, Y. M. Sin, H. Shen, S. F. Chang, G. H. Ngoh, and C. L. Chen. 2001. Electron microscopic observations of a marine fish iridovirus isolated from brown-spotted grouper, Epinephelus tauvina. J. Virol. Methods 98:17-24. [PubMed]
30. Schnitzler, P., J. B. Soltau, M. Fischer, M. Reisner, J. Scholz, H. Delius, and G. Darai. 1987. Molecular cloning and physical mapping of the genome of insect iridescent virus type 6 further evidence for circular permutation of the viral genome. Virology 160:66-74. [PubMed]
31. Shevchenko, A., M. Wilm, O. Vorm, and M. Mann. 1996. Mass spectrometric sequencing of protein silver-stained gels. Anal. Chem. 68:850-858. [PubMed]
32. Swanson, M. S., E. A. Malone, and F. Winston. 1991. SPT5, an essential gene important for normal transcription in Saccharomyces cerevisiae, encodes an acidic nuclear protein with a carboxy-terminal repeat. Mol. Cell. Biol. 11:3009-3019. [PMC free article] [PubMed]
33. Tidona, C. A., and G. Darai. 1997. The complete DNA sequence of lymphocystis disease virus. Virology 230:207-216. [PubMed]
34. Tidona, C. A., P. Schnitzler, R. Kehm, and G. Darai. 1998. Is the major capsid protein of iridoviruses a suitable target for the study of viral evolution? Virus Genes 16:59-66. [PubMed]
35. van Belkum, A., S. Scherer, L. van Alphen, and H. Verbrugh. 1998. Short-sequence DNA repeats in prokaryotic genomes. Microbiol. Mol. Biol. Rev. 62:275-293. [PMC free article] [PubMed]
36. van Regenmortel, M. H., C. M. Fauquet, and D. H. L. Bishop. 2000. Virus taxonomy. Seventh report of the International Committee on Taxonomy of Viruses. Academic Press, New York, N.Y.
37. Ward, C. W. 1993. Progress towards a higher taxonomy of viruses. Res. Virol. 144:419-453. [PubMed]
38. Webby, R., and J. Kalmakoff. 1998. Sequence comparison of the major capsid protein gene from 18 diverse iridoviruses. Arch. Virol. 143:1949-1966. [PubMed]
39. Williams, A. F., and A. N. Barclay. 1988. The immunoglobulin superfamily—domains for cell surface recognition. Annu. Rev. Immunol. 6:381-405. [PubMed]
40. Williams, T. 1996. The iridoviruses. Adv. Virus Res. 46:345-412. [PubMed]
41. Winberg, M. L., J. N. Noordermeer, L. Tamagnone, P. M. Comoglio, M. K. Spriggs, M. Tessier-Lavigne, and C. S. Goodman. 1998. Plexin A is a neuronal semaphorin receptor that controls axon guidance. Cell 95:903-916. [PubMed]

Articles from Journal of Virology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...