Synthesis of Gag and Gag-Pro-Pol Proteins

Publication Details

The full-length transcript from the integrated provirus serves as the mRNA for synthesis of Gag and Gag-Pro-Pol proteins. This RNA species is identical to the viral genome, copies of which are packaged into progeny virions. Translation takes place on free polyribosomes in the cytosolic spaces of the cell, but it is not known whether the sites used for the production of viral proteins are random or specific compartments. The marked clustering of immature M-PMV particles (ICAPs) in the cytoplasm does suggest that the latter might occur (Rhee and Hunter 1987, 1991).

Translation Initiation

Several lines of evidence suggest that the RNA molecules used for the synthesis of Gag and Gag-Pro-Pol are not the same molecules that are packaged into virions during assembly (see below Viral RNA Packaging, Identification of Genomic RNA for Packaging); i.e., Gag proteins do not appear to aggregate around and capture the RNA contained in the polyribosome from which they emerged, but rather bind to and ultimately encapsidate free transcripts elsewhere. Hence, there must be a regulatory mechanism that allows some of the viral RNAs to escape recognition and binding of ribosomes, but little is known about how this works. In the case of hepatitis B virus, binding of ribosomes to the RNA prevents packaging (Nassal et al. 1990).

Ribosome Binding

Because retroviral RNAs are capped at their 5′ends, it has long been thought that the mechanism for initiation of translation is similar to that used for most cellular mRNAs; once bound, the ribosomes would scan toward the 3′end until they encountered the first AUG in a favorable context: (A/G)CCAUGG (Kozak 1989, 1992). Although in some retroviruses (e.g., HIV) the first codon of gag is in fact the first AUG of the RNA, the 5′leader sequences of many retroviruses have two features that seem to be inconsistent with this mechanism of translation initiation: open reading frames (ORFs) and extensive secondary structure.

The initiation codon used to express gag is typically not the first AUG from the 5′end of the viral RNA, and, in many cases, the upstream AUGs are in contexts favorable for translation. In ASLV, for example, there are three small ORFs situated ahead of the gag AUG. The lengths and positions of these ORFs are conserved, but the sequences of the encoded peptides are not (Bizub et al. 1984; Hackett et al. 1991). Nevertheless, the first AUG has been shown to direct the synthesis of a seven-amino-acid peptide in vitro (Hackett et al. 1986) and in vivo (Peterson et al. 1989). Thus, for ASLV gag to be translated following the usual method of ribosome binding, a mechanism of leaky scanning or efficient translational reinitiation would have to take place, as is the case for certain cellular genes where the ORFs have a regulatory role in gene expression (e.g., GCN4 in yeast; Hinnebusch 1984; Mueller and Hinnebusch 1986). However, detailed mutational analyses of the ASLV ORFs have shown that they are not especially important for regulating translation of gag (Donzé and Spahr 1992; Moustakas et al. 1993a,b), although their translation appears to be critical for viral replication. Thus, it is not obvious how ribosomes would bypass these ORFs.

The second reason to consider mechanisms of initiation other than typical ribosome scanning is the high degree of stable secondary structure present in the leader of retroviral RNAs (Tounekti et al. 1992; Baudin et al. 1993; Mougle et al. 1993). These complex structures are needed for many aspects of the replication cycle, including reverse transcription, RNA packaging, and dimerization. The problem is that stable secondary structures such as those found in the leaders of retroviral RNAs will inhibit the procession of scanning ribosomes when placed between the 5′end of an RNA and its initiation codon (Kozak 1986).

Recently, evidence has been found for an internal ribosome entry site (IRES) within the RNA of MLV (Berlioz and Darlix 1995), suggesting that ribosomes may be able to avoid sequences upstream of gag. IRES sequences were first discovered in poliovirus (Sonenberg and Meerovitch 1990), and the site in MLV appears to have essentially the same properties. In particular, translation of the MLV gag gene has been shown to be both cap-independent and capable of directing synthesis in the context of bicistronic messenger RNAs. Further work is needed to determine whether IRESs are present in other retroviruses.

Unusual Initiation Sites and Glycosylated Gag Proteins

A non-AUG initiation codon (CUG) can also be used to initiate translation upstream of gag (Prats et al. 1989; for review, see Corbin et al. 1994). In the case of MLV, this leads to an amino-terminal extension of Gag that provides a hydrophobic signal for targeting Gag to the ER, where it is translocated across the membrane and glycosylated. The extra sequence is not removed but serves to anchor the Gag molecule in the membrane, with the majority of the protein extending into the lumen of the ER (Evans et al. 1977; Edwards and Fan 1979; Saris et al. 1983; Pillemer et al. 1986). These molecules are glycosylated in the ER and are most descriptively referred to as glyco-Gag proteins. They are expressed on the surface of the cell but not incorporated into progeny virions, the production of which is directed by the normal, unglycoslyated form of Gag. Hence, glyco-Gag proteins can be considered as nonstructural retroviral proteins.

Synthesis of glyco-Gag can be blocked by mutating the CUG initiation codon. In cell culture, the mutants replicate as well as the wild-type virus, indicating that the modified Gag protein is not absolutely essential for replication (Fan et al. 1983; Schwartzberg et al. 1983). Other experiments have demonstrated the importance of the glyco-Gag protein for the spread of pathogenic murine viruses in infected animal systems (Corbin et al. 1994; Portis et al. 1994; see also Chapter 10. This is particularly interesting since HIV-1 has also been shown to express glycosylated forms of Gag (Pinter et al. 1992).

Modifications of Gag Proteins

Gag proteins are subjected to many modifications both during and after their synthesis. In some cases, these changes are critical for viral replication, whereas in others they are not. The most dramatic change, of course, is PR-mediated proteolytic processing, which occurs late in assembly to generate the mature cleavage products (see below Maturation of Viral Particles). Well before that, however, two other modifications occur.

Amino-terminal Modifications

The Gag proteins of most retroviruses (as well as their Gag-Pro-Pol proteins) are cotranslationally modified at their amino termini by the addition of myristate, which is therefore also present at the amino terminus of MA (Fig. 5). This rare 14-carbon fatty acid is always attached to a terminal glycine, which is encoded by the second codon of gag and exposed after removal of the initiator methionine (Henderson et al. 1983; Wilcox et al. 1987; Schultz et al. 1988). Myristate is required for the binding of Gag to the plasma membrane. Replacement of the critical glycine with another amino acid prevents myristylation, and such mutants are invariably defective for particle formation (see, e.g., Rein et al. 1986; Rhee and Hunter 1987; Bryant and Ratner 1990). It should be emphasized, however, that the mere presence of myristate at the amino terminus of a Gag protein is not sufficient for membrane binding (Rhee and Hunter 1990b; J.W. Wills et al. 1991); therefore, flanking amino acid residues must make an essential contribution (Resh 1994). Direct evidence for this has been obtained from experiments in which myristylated peptides were used to assess membrane interactions in vitro. These studies demonstrated that the 14 carbons of myristate are not sufficient for strong membrane association, in contrast to palmitylated peptides, which contain 16 carbons that are sufficient for strong membrane association (Peitzsch and McLaughlin 1993). Consequently, myristate is especially well suited for proteins that need to cycle on and off membranes in response to additional factors that affect their binding. Indeed, there are now many examples of myristylated proteins whose release from the membrane is induced by phosphorylation, including the MARCKS protein and pp60c-src (Taniguchi and Maneti 1993; Walker et al. 1993; Resh 1994; McLaughlin and Aderem 1995). Hence, the presence of myristate does not necessarily prevent MA proteins from leaving the membrane to play additional parts in replication when the virus infects a new cell (Chapter 5.

The importance of elements other than myristate for membrane binding is best illustrated by the many retroviruses that replicate without it. These include bovine immunodeficiency virus (BIV; Tobin et al. 1994), equine infectious anemia virus (EIAV; Henderson et al. 1987), ASLV (Schwartz et al. 1983), and visna virus (Sonigo et al. 1985). Among these, the Gag protein of BIV is the only one for which no modification of any type has been found at the amino terminus (Tobin et al. 1994). The others are all blocked for amino-terminal sequencing, and therefore are modified in some way. In the case of ASLV, an acetyl group is added to the amino-terminal methionine (Palmiter et al. 1978), but this 2-carbon fatty acid is too short to provide a stable membrane interaction. Replacement of acetate, and the methionine to which it is attached, with myristate (i.e., by substituting glycine at the second position to create a site for myristylation) does not interfere with budding or infectivity (Erdie and Wills 1990). In contrast, the structural proteins of certain yeast viruses (e.g., L-A double-stranded virus) exhibit an absolute requirement of N-acetyltransferase for their intracytoplasmic assembly and infectivity (Tercero and Wickner 1992; Tercero et al. 1993). Thus, it remains possible that some retroviruses will be found that require acetylation of Gag.


The only other modification known to take place on Gag proteins is phosphorylation. The relevance of this modification has been difficult to ascertain in part because the Gag cleavage products that are phosphorylated differ widely from virus to virus. The best evidence for a role comes from HIV-1, where very low levels of tyrosine and serine phosphorylation on MA are associated with a loss of membrane binding for a small fraction of the molecules. These phosphorylated MA proteins are associated with the integration complex during viral entry (Gallay et al. 1995a,b, 1996; Bukrinskaya et al. 1996). Phosphorylation is proposed to reveal the nuclear localization signal contained within HIV-1 MA protein (Bukrinsky et al. 1993a,b) and to enable the preintegration complex to enter the nucleus of nondividing cells (von Schwedler et al. 1994; see also Chapter 5.

The MA protein of ASLV is also phosphorylated on tyrosine at low levels (<1%), although the significance of this modification has not been tested (T.D. Nelle and J.W. Wills, unpubl.). In ASLV, there is a single site of phosphorylation on serine (Pepinsky et al. 1986) at Ser-106. Substitution of this residue with alanine eliminates all readily detectable phophosphorylation of Gag; however, the specific infectivity of the mutant is fully wild type, implying that serine modification is not important for the replication of ASLV (T.D. Nelle and J.W. Wills, unpubl.).

Synthesis of the Gag-Pro-Pol Fusion Protein

During the replication of retroviruses, large numbers of Gag molecules must be generated to serve as precursors to the structural proteins of the virions. However, the enzymes encoded by the pro and pol genes (PR, RT, and IN) are, in most cases, needed in smaller numbers to carry out their catalytic functions. Retroviruses have developed a mechanism that permits expression of the Gag protein at high levels relative to the protein sequences encoded in the pro and pol genes, while retaining coregulated expression. This linkage results from the use of the same initiation codon in the same mRNA to express the gag, pro, and pol genes. Translation of this RNA leads occasionally to synthesis of a fusion protein that is usually called the Gag-Pol precursor but is now more appropriately called the Gag-Pro-Pol precursor (Jamjoom et al. 1977; Oppermann et al. 1977; Hayman 1978). Typically, 10–20 Gag molecules are made for every molecule of Gag-Pro-Pol. This permits the same mechanism that targets the Gag precursor to the site of virion assembly also to direct the Gag-Pro-Pol precursor.

In all retroviruses, the gag gene is positioned at the 5′end of the viral genome, upstream of the pro and pol genes. The Gag-Pro-Pol precursor is generated using a strategy in which the termination codon that defines the 3′terminus of the gag reading frame is bypassed, allowing translation to continue into the adjacent pro and pol reading frames. Bypass of the termination codon occurs by one of two mechanisms. The first mechanism (used by the mammalian type-C retroviruses) is readthrough (termination) suppression, in which the gag termination codon is occasionally misread as a sense codon. Translation then continues past the termination codon and into the pro-pol reading frame. The second mechanism, used by most retroviruses, is ribosomal frameshifting. Here, occasional ribosomes slip backward one nucleotide (–1 frameshift, i.e., in the 5′direction) during translation of gag; thus, the ribosome leaves the gag reading frame (with its downstream termination codon) and shifts into an overlapping portion of the pro-pol reading frame. Although the pro gene always lies between gag and pol, the exact arrangement of the reading frames varies in different retroviral genomes (Fig. 5). In HIV and other lentiviruses, pro lies in the pol reading frame. In ASLV, pro is translated as part of gag (Bennett et al. 1991), and, in some retroviruses (e.g., M-PMV, MMTV, and human T-cell leukemia virus, HTLV-1), pro lies in a separate reading frame distinct from both the gag and pol reading frames. In these latter cases, there are two –1 frameshifts, one to create a Gag-Pro fusion protein, and a second further downstream to generate the full-length Gag-Pro-Pol precursor protein (for reviews, see Jacks 1990; Hatfield et al. 1992; Levin et al. 1993)

Readthrough (Termination) Suppression

During MLV replication, one Gag-Pro-Pol precursor is synthesized for every 10–20 Gag polyproteins (Jamjoom et al. 1977). The potential for readthrough suppression in the synthesis of the Gag-Pro-Pol proteins was first demonstrated by in vitro translation experiments using viral RNA. Inclusion of a yeast amber suppressor transfer RNA increased the amount of Gag-Pro-Pol synthesized relative to Gag using Moloney MLV (Mo-MLV) RNA as template (Philipson et al. 1978). However, direct sequencing of the amino-terminal region of the Mo-MLV and feline leukemia virus (FeLV) proteases (Yoshinaka et al. 1985a,b) was required to show that termination suppression was the mechanism used during viral replication. The amino terminus of the mature viral protease, generated by protease-mediated processing of the Gag-Pro-Pol precursor, is encoded within the gag gene (Shinnick et al. 1981) four codons upstream of the gag termination codon. Glutamine is incorporated into PR at the position encoded by the gag termination codon (Yoshinaka et al. 1985a,b). Thus, the amber (UAG) termination codon is occasionally decoded by glutamine tRNA, presumably by misreading the first base of the termination codon so that it functions as a glutamine-encoding CAG codon (Fig. 6). An equivalent amber codon is found in the sequence of a variety of different mammalian type-C retroviruses. Although the amber codon is highly conserved, if either of the other two termination codons is introduced at this position in the Mo-MLV genome, the resulting viruses replicate reasonably well (Feng et al. 1989b; Jones et al. 1989). In vitro suppression of a UAA codon during translation of a mini-Mo-MLV gag-pro also leads to the insertion of glutamine, but suppression of a UGA codon results in the insertion of arginine, cysteine, or tryptophan (Feng et al. 1990). Sequences responsible for suppression in MLV reside within a 300-nucleotide segment spanning the suppression site (Panganiban 1988), indicating that the viral factors responsible for suppression are cis-acting and that uninfected cells have the potential to carry out translational suppression in the absence of any viral proteins (Feng et al. 1989a).

Figure 6. MLV readthrough suppression.

Figure 6

MLV readthrough suppression. This figure shows the predicted structure of the RNA around the gag-pol junction, including the proposed pseudoknot downstream from the amber termination codon (shown in boldface; Levin et al. 1993). S1 and S2 are stem structures, (more...)

The cis-acting region that regulates suppression has been further defined by deletion and mutation analysis. In the case of MLV, the controlling determinants consist of two regions immediately downstream from the termination codon. The first region is a purine-rich sequence immediately 3′of the amber codon, whereas the second region starts at nucleotide 8 downstream from the amber codon and extends to nucleotide 57 downstream (Honigman et al. 1991; N.M. Wills et al. 1991; Feng et al. 1992). The region from bases 8 to 57 is believed to form a pseudoknot (Fig. 6), a complex RNA structure in which bases in the loop of a proximal stem-loop pair with bases in an adjacent downstream region to create a second stem (Pleij et al. 1985). A C-rich sequence in the first loop of MLV RNA could pair with a G-rich region just downstream from the stem-loop; these homopolymer tracts are well conserved in mammalian type-C viruses (ten Dam et al. 1990). Many positions within the two stems and loops of the proposed pseudoknot are sensitive to mutation (Honigman et al. 1991; N.M. Wills et al. 1991, 1994; Felsenstein and Goff 1992). However, mutational analysis of the putative stem 2 region has not fully resolved its contribution to readthrough suppression (Honigman et al. 1991; Felsenstein and Goff 1992). The presumed effect of these sequences is to slow translation and to allow enhanced competition of the suppressor tRNA with the translation release factor/termination mechanism at the position of the amber termination codon. Analogous structures have a critical role in regulating ribosomal frameshifting (see below).

Frameshift Suppression

Two observations with ASLV suggested the need for an alternative mechanism to readthrough suppression for joining the gag, pro, and pol reading frames: (1) Contrary to what was observed with MLV, suppression of the ASLV gag termination codon does not lead to the synthesis of a Gag-Pro-Pol polyprotein (Weiss et al. 1978) and (2) the sequence of ASLV shows that gag and pro are in the same reading frame, whereas pol is in a different reading frame that slightly overlaps pro (Schwartz et al. 1983).

Like MLV, ASLV synthesizes about 5% as much Gag-Pro-Pol fusion protein as Gag-Pro (or Gag) protein (Oppermann et al. 1977; Hayman 1978). Two alternative mechanisms were originally considered for synthesis of Gag-Pro-Pol fusion proteins: RNA splicing and ribosomal frameshifting. The latter explanation was strongly favored by in vitro translation experiments in which a truncated viral RNA, generated in vitro by transcription of cloned viral DNA, was able to direct the synthesis of both Gag-Pro and Gag-Pro-Pol-like molecules (Jacks and Varmus 1985). A similar strategy was used to demonstrate two frameshifting events in the synthesis of the MMTV Gag-Pro-Pol fusion protein (Jacks et al. 1987; Moore et al. 1987).

Direct sequencing of a partially processed protein product showed both the position and direction (–1) of the first (i.e., gag-pro) MMTV frameshift (Hizi et al. 1987). The efficiency of frameshifting in vitro with MMTV RNA is higher at the first frameshift site (∼25%) than that seen with ASLV, showing that cis-acting viral sequences control the efficiency of frameshifting and that equivalent amounts of the full-length Gag-Pro-Pol precursor can be synthesized even though two frameshifting events are required (Dickson and Atterwill 1979; Jacks et al. 1987; Moore et al. 1987). Analogous frameshifting in the synthesis of Gag-Pro or Gag-Pro-Pol polyproteins has been demonstrated for HIV-1 (Jacks et al. 1988b; Wilson et al. 1988), BLV (Yoshinaka et al. 1986), HTLV-1 (Nam et al. 1988, 1993), HTLV-2 (Mador et al. 1989; Falk et al. 1993), FIV (Morikawa and Bishop 1992), and simian retrovirus-1 (SRV-1) (ten Dam et al. 1994). In all retroviruses that rely on frameshifting, the overlap between the gag and pol gene, or between the gag and pro genes and the pro and pol genes, is in the –1 direction, implying that the ribosome must back up by one base to continue translation in the alternate frame. In some retroelements, such as Ty1 of yeast, frameshifting must occur in the +1 direction, implying a rather different mechanism (Chapter 8.

Mutational analysis of the frameshift region of ASLV and HIV-1 (Jacks et al. 1988a,b; Wilson et al. 1988) and direct sequence analysis of transframe protein products of MMTV, ASLV, and HIV-1 (Hizi et al. 1987; Jacks et al. 1988a,b) provided the initial evidence demonstrating that –1 slippage occurs after a specific codon is read. After translocation, the nucleotide in the third position of the last codon read in the original (0) reading frame becomes the first-position nucleotide of the first codon in the new (–1) reading frame (Fig. 7). Mutational analysis has demonstrated the importance of a seven-nucleotide region (known as a slippery or shifty sequence) representing the final two codons in the original reading frame and the preceding nucleotide (Jacks et al. 1988a,b).

Figure 7. Frameshift suppression in the synthesis of Gag-Pro-Pol.

Figure 7

Frameshift suppression in the synthesis of Gag-Pro-Pol. Shown are the nucleotide sequences at the frameshift site and the amino acids encoded in the Gag-Pro-Pol precursors of the indicated viruses. The upper amino acid sequence is read from either the (more...)

The model suggests that –1 slippage occurs as follows: (1) tRNA binding/decoding at the downstream A (acceptor) site on the ribosome in the 0 reading frame with continued occupancy of the upstream P (polymerization) site by the tRNA carrying the nascent polypeptide chain; (2) –1 slippage of the tRNAs in both the A and P sites so that they are now bound to codons in the –1 reading frame; (3) peptidyl transfer of the nascent polypeptide chain to the tRNA in the A site; and (4) translocation of the tRNA from the A to the P site and continued translation in the –1 reading frame (Jacks et al. 1988a; Jacks 1990). One aspect of this model is that tRNA anticodon base pairing to the mRNA after the –1 slippage could stabilize the shifted translation apparatus; the potential for forming such base pairs is a feature of the heptanucleotide sequences found at the frameshift sites (Fig. 7). The heptanucleotide sequences at the frameshift sites have the general feature 5′-X XXA AAC-3′ or X XXU UUU/A (where X is a specific base, the underlined base is read in both the original and –1 frames, and the spacing designates codons in the original reading frame). In each motif, base pairing of two nucleotides in the anticodon of each of the tRNAs decoding the last two 0 reading frame codons is maintained after the –1 shift. Thus, this model accounts for the seven-nucleotide region needed at the frameshifting site, i.e., the last two codons are read in the original (0) reading frame, and the –1 slippage of the two tRNAs results in base pairing with the adjacent (5′) nucleotide. Frameshifting must, by definition, occur at a site where two reading frames overlap. The amount of overlap can vary between just a few nucleotides to several hundred nucleotides.

The role of downstream secondary structure in promoting synthesis of the fusion protein was first shown in the context of frameshifting (Fig. 8). Deletion mutagenesis and incorporation of mutations that disrupt putative base pairing (along with compensatory mutations) have been used to demonstrate a role in frameshifting for a downstream hairpin for ASLV (Jacks et al. 1988a) and for HTLV-2 (Falk et al. 1993; Kollmus et al. 1994). The heptanucleotide slippery sequence of HIV-1 can mediate a basal level of frameshifting under some conditions (Wilson et al. 1988; Reil et al. 1993); however, a downstream hairpin is required for efficient frameshifting in vivo (Parkin et al. 1992; Cassan et al. 1994; Kollmus et al. 1994). The more complex structure of a pseudoknot appears to have an equivalent role in frameshifting for MMTV (Chamorro et al. 1992), for FIV (Morikawa and Bishop 1992), and for SRV-1 (ten Dam et al. 1994). These structures lie within ten nucleotides downstream from the frameshift site and, although their precise (true) function is unknown, it was initially suggested that they potentiate frameshifting by causing ribosomes to pause over the heptanucleotide sequence (Jacks et al. 1988a). The real mechanism is likely to be more complex, since mutational analysis of the MMTV pseudoknot indicates that special features of the pseudoknot structure, rather than just the presence of secondary structure, have an important role in the efficiency of frameshifting (Chen et al. 1995). In this regard, the presence of an unpaired A has been proposed to reduce the coaxial stacking of the two helices of the pseudoknot as a contributing feature of the frameshifting signal (Fig. 8C) (Chen et al. 1995; Shen and Tinoco 1995). The presence of secondary structures immediately downstream from the frameshift site can be predicted for a wide range of retroviruses (Le et al. 1989; Jacks 1990; ten Dam et al. 1990).

Figure 8. Secondary structure downstream from frameshifting sites.

Figure 8

Secondary structure downstream from frameshifting sites. (A) Hairpins downstream from the ASLV, HIV-1, and HTLV-2 frameshift sites. (Modified from Jacks 1990.) (B) Pseudoknots downstream from the MMTV, FIV, and SRV-1 frameshift sites. Elements of secondary (more...)

Frameshifting in the –1 direction plays a part in the expression of genes from other viruses in addition to retroviruses. A role for a pseudoknot in –1 frameshifting was first described for the coronavirus avian infectious bronchitis virus (Brierley et al. 1989), and –1 frameshifting is known to occur for at least one cellular gene, the dnaX gene of Escherichia coli (Flower and McHenry 1990; Tsuchihashi and Kornberg 1990). The Ty elements of yeast also use frameshifting to regulate gene expression, but, curiously, in these retrotransposons, the frameshift occurs in the +1 direction (see Chapter 8.

Recent evidence has revealed yet another mechanism for expression of the pro and pol genes of the spumavirus, human foamy virus (HFV). Unlike all other retroviruses, a Gag-Pro-Pol precursor cannot be detected in HFV-infected cells. Additionally, a mutation in the PR-coding domain results in the expression of a 120-kD protein apparently corresponding to a Pro-Pol precursor, since it reacts with an RT antiserum but not with a Gag antiserum (Konvalinka et al. 1995a). This observation implies that the HFV pro and pol domains are expressed independently of gag. Support for this conclusion is provided by identification of a spliced RNA that functions as a pro-pol mRNA (Yu et al. 1996a). Thus, in this case, the pro- and pol-coding domains are expressed independently of the gag reading frame. Additional protein targeting signals, or specific recognition of genome RNA, must then be required to direct the Pro-Pol precursor to the site of particle assembly.