NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Coffin JM, Hughes SH, Varmus HE, editors. Retroviruses. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press; 1997.

Cover of Retroviruses


Show details

The Integration Reaction

Integrase Recognizes Specific Features at the Ends of the Viral DNA Molecule and Forms Stable Complexes with Viral DNA

An overview of the molecular events in retroviral DNA integration is shown in Figure 8. The colinearity of the unintegrated and integrated forms of proviral DNA implicated sequences at the ends of the viral DNA molecule as the sites at which the integration machinery would act. The presence of similar or identical sequences, extending 5–15 base pairs from the two ends of the viral DNA, suggested that these sequences might be recognized by the protein that mediated integration. The earliest biochemical studies of the ASLV integrase, before its role in integration was established, identified an endonuclease activity (Grandgenett et al. 1978). That activity appeared disturbingly indifferent to the circle-junction sequences that were initially used as candidate substrates, providing scant biochemical support for its role in integration (Duyk et al. 1983, 1985; Grandgenett and Vora 1985; Grandgenett et al. 1986; Cobrinik et al. 1987). Nevertheless, genetic experiments provided clear evidence that integrase, and the inverted repeat sequences at the viral DNA ends, played a part in integration (see above). These experiments also supported the view that only the sequences in the immediate vicinity of the viral DNA ends were specifically required for integration (Colicelli and Goff 1985, 1988; Cobrinik et al. 1991).

Figure 8. Schematic outline of the principal steps in retroviral DNA integration.

Figure 8

Schematic outline of the principal steps in retroviral DNA integration.

The in vivo genetic results were corroborated and extended by biochemical data, once the ends of the linear DNA, rather than the circle-junction, were recognized as the correct DNA substrates for integration, and faithful biochemical assays for integration were developed.

The basic conclusions from numerous experiments addressing the essential features of the viral DNA substrate for integration are listed below:


The attachment sites need to be located at the ends of a DNA molecule. The sequences that determine the viral DNA sites for attachment to host DNA are found both at the termini and at the internal edges of the long terminal repeats, yet only the terminal sites are used as attachment sites for integration. Extending the natural ends of the viral DNA by three or more base pairs severely impairs their use as substrates for integration, either in vivo or in vitro (Colicelli and Goff 1988; Craigie et al. 1990; Katz et al. 1990; Vink et al. 1991a; Leavitt et al. 1992). How integrase distinguishes the terminal location of the attachment sites remains to be established. The ability of ends to be distorted—in particular, the lower cost in free energy of melting terminal as compared to internal base pairs—may be the decisive factor in distinguishing ends from internal sites. In vitro experiments suggest that artificial disruption of normal complementarity between the three terminal base pairs makes a model viral DNA substrate more active for integration (Scottoline et al. 1997).


The single most important sequence feature in specifying the viral attachment site is a CA/TG dinucleotide pair, invariably found precisely at the site of joining to host DNA (see Fig. 1). This sequence is virtually always positioned exactly two base pairs from the end of the linear precursor. When its position relative to the DNA end is altered slightly by mutation or in a synthetic substrate, the site of 3′-end processing and DNA joining moves correspondingly, so that it always corresponds to the 3′-OH of the conserved A (Colicelli and Goff 1988; Roth et al. 1989; Vora et al. 1990; Bushman and Craigie 1991; Vink et al. 1991a; Leavitt et al. 1992; Murphy et al. 1993; Chow and Brown 1994b). The same dinucleotide pair is found, at a frequency much greater than expected by chance, at the corresponding position in transposable elements of diverse provenance (Howe and Berg 1989; Fayet et al. 1990; Polard and Chandler 1995). The basis for this sequence conservation is enigmatic. In vivo, most mutations that alter either of these two base pairs result in severe defects in replication, although at least one such mutant, in which the CATT sequence at the U5 end of MLV is altered to TATA, replicates virtually normally (Roth et al. 1989). Likewise, sequence alterations or chemical modifications affecting these two base pairs substantially impair both end processing and integration, as well as disintegration in vitro, but they do not completely abolish activity (Craigie et al. 1990; Sherman and Fyfe 1990; Bushman and Craigie 1991; LaFemina et al. 1991; Vink et al. 1991a; Chow et al. 1992; Leavitt et al. 1992; Sherman et al. 1992; Chow and Brown 1994b; van den Ent et al. 1994). These conserved bases are therefore crucial for recognition of viral DNA ends by integrase, but they are not likely to participate directly in catalysis. Their precise role in the reaction, and the reason for their striking phylogenetic conservation, remains to be discovered.


Sequences internal to CA, extending up to 15 base pairs from the termini, have significant but less important roles (Colicelli and Goff 1985, 1988; Katzman et al. 1989; Bushman and Craigie 1990, 1991; Craigie et al. 1990; Roth et al. 1990; Vink et al. 1990; LaFemina et al. 1991; Leavitt et al. 1992; Sherman et al. 1992; Murphy et al. 1993; van den Ent et al. 1994; Vicenzi et al. 1994). Model viral DNA oligonucleotides smaller than 15 base pairs are used with reduced efficiency by RSV integrase (Katzman et al. 1989), and 32-base-pair substrates are used more efficiently than 21-base-pair substrates by HIV-1 integrase (Bushman and Craigie 1991), suggesting that binding interactions extend inward at least 14–21 base pairs from the viral DNA end. No simple rules or consensus sequences have emerged to account for the variations in activity that result from differences at these internal sites. There is no evidence that sequences further than 15 base pairs from the ends have any role in substrate specificity of MLV, HIV, or RSV integrases, and most of the specificity appears to reside in the terminal eight base pairs (Katzman et al. 1989; Bushman and Craigie 1991; LaFemina et al. 1991; Leavitt et al. 1992; Sherman et al. 1992; Murphy et al. 1993; van den Ent et al. 1994). The U5 and U3 ends of many retroviruses differ in their subterminal sequences, and these difference in some cases lead to consistent differences between model oligonucleotides corresponding to the U5 or U3 end, in their activities as substrates for integrase in vitro (Bushman and Craigie 1991; LaFemina et al. 1991; Leavitt et al. 1992; Sherman et al. 1992; Vora et al. 1994). There is no evidence, however, to suggest that the differences between the U3 and U5 termini play a significant part in integration. It is likely that the differences reflect the superimposition of distinct requirements imposed on the U3 or U5 terminal sequences relating to their roles in viral assembly, DNA synthesis, or other steps in the life cycle in which the LTRs, or coding regions that overlap them, play a key part (Cobrinik et al. 1991; Vicenzi et al. 1994).


The two base pairs distal to the conserved CA/TG, which are removed in the integration process, appear at first to be a gratuitous and inconvenient feature of the viral DNA ends. In vivo, they are ephemeral, existing only during the interval between completion of DNA synthesis and integration (see Fig. 3). Their existence creates a requirement for an additional step in the integration process, the 3′-end processing step, which would otherwise be unnecessary. Yet this two (or, rarely, three)-base-pair extension is a universal characteristic of retroviral genomes. The identity of the terminal two base pairs does not appear to be critical; they vary among retroviral species, and substitutions generally have little effect on replication (Colicelli and Goff 1985, 1988; Roth et al. 1989, 1990). However, deletion of these two base pairs is lethal (Colicelli and Goff 1985). In vitro, the two bases at each 5′end, which remain after 3′-end processing, are critical for stable binding between integrase and the processed viral DNA ends (Ellison and Brown 1994). Since the interval between DNA synthesis and integration can be many hours, this role may explain why these two base pairs are essential. However, other possible functions, including a role during viral DNA synthesis, or in the final 5′-end joining step of the integration process (Chow et al. 1992; Kulkosky et al. 1995; Roe et al. 1997), have not been excluded. The U5 end of HIV-2 is exceptional in that three rather than the usual two base pairs are removed between reverse transcription and integration (Whitcomb and Hughes 1991).


Functional interactions of integrase with the two viral DNA ends are coordinated. In vivo, processing of either end may depend on the simultaneous presence of specific sequences at both ends. Mutations that altered conserved bases at the right end of the MLV DNA molecule have been shown to impair 3′-end processing by integrase, not only at the altered end, but also at the unaltered left end of the viral DNA molecule (Murphy and Goff 1992). The basis for this long-range effect remains to be determined. It has yet to be reproduced in vitro or with other retroviruses. Two leading possibilities are (1) that it reflects an important role of simultaneous interactions with both viral DNA ends in the formation or stability of the preintegration complex, analogous to the requirement for both ends of bacteriophage Mu in forming a stable, active complex with the Mu transposase (Baker and Mizuuchi 1992; Aldaz et al. 1996), or (2) that it reflects an allosteric effect of binding one viral DNA end on the activity of the active site that cleaves the other end.

The specificity of integrase for the sequence and structure of viral DNA ends in vivo is generally matched by the specificity observed in assays of enzyme activity in vitro. However, the specificity of DNA binding, whether measured directly or by competition in an activity assay, appears inadequate to account for the specificity manifested in vivo. The apparent kilodalton for binding to nonspecific DNA sequences is typically no more than approximately tenfold higher than for binding to model viral DNA ends (van Gent et al. 1991; Schauer and Billich 1992; Hazuda et al. 1994b; Dotan et al. 1995). Since the approximately 10-kb viral genome provides a vast excess of such nonspecific sequences, which could presumably compete for integrase binding, the efficient localization of integrase to the proper sites at the ends of the viral DNA cannot be accounted for on the basis of the weak discrimination observed with model substrates in vitro. What allows integrase in vivo to avoid nonproductive binding to internal sequences, or directs it to the ends? Possible explanations for this discrepancy include the following: (1) Standard in vitro conditions may be suboptimal for detecting the high inherent binding specificity of integrase. (2) In vivo, much of the viral genome could be protected against integrase binding, perhaps by another DNA-binding protein such as the viral NC protein, by a cellular protein (Lee and Craigie 1994; Farnet and Bushman 1997), or by steric restrictions imposed by the ordered structure of the preintegration complex (Bowerman et al. 1989; Farnet and Haseltine 1991b; Bukrinsky et al. 1993b; Karageorgos et al. 1993; Lee and Craigie 1994). This possibility would also be consistent with the observations suggesting that the viral genome is protected in vivo against being used as a target DNA for integration (Brightman et al. 1990; Lee and Coffin 1990; Farnet and Haseltine 1991a; Roe et al. 1993; Lee and Craigie 1994). (3) Cooperation between integrase and reverse transcriptase could somehow deliver integrase directly to the viral DNA ends at the completion of DNA synthesis.

Upon entry of the virion core into the newly infected cell, the viral DNA substrate for integrase does not yet exist. Yet, in view of its low abundance in a typical infection, and its high nonspecific DNA-binding activity, integrase presumably needs to maintain its association with the replication intermediate before and during viral DNA synthesis. As noted above, there is circumstantial evidence for an interaction between integrase and reverse transcriptase in ALV and in the MLV virion. Such an interaction could provide for the stable association between integrase and the viral replication intermediate during DNA synthesis. At present, however, there is no clear experimental evidence for any stable, specific interaction between integrase and any other protein component of the virion core. The relatively nonspecific nucleic-acid-binding activity of integrase, although it exhibits a preference for DNA over RNA under the usual experimental conditions, provides another possible mechanism for maintaining this association (Grandgenett et al. 1978; Allen et al. 1995). Destabilization of the association of integrase with early replication intermediates might account for the in vivo replication defects of some integrase mutants that appear to have perfectly normal activity in standard in vitro assays (Cannon et al. 1994; Shin et al. 1994; Taddeo et al. 1994; Wiskerchen and Muesing 1995).

The timing of integrase binding to the viral DNA ends has yet to be established. The 3′ends are the last parts of the viral DNA molecule to be synthesized, so final assembly of integrase onto these sites needs to follow completion of DNA synthesis. The time course of 3′-end processing in vivo suggests that it must generally occur within approximately 1 hour after the completion of viral DNA synthesis, but it may be much more rapid (Roth et al. 1989; Roe et al. 1997). Thus, it is likely that integrase binds its substrate very rapidly after its synthesis.

Once integrase binds to the viral DNA ends, it needs to maintain its association with the viral DNA for a period that may extend for many hours. Although other viral or cellular proteins in the preintegration complex may contribute to this stable association, in vitro experiments suggest that purified integrase alone can form very stable complexes with the ends of viral DNA molecules (Ellison and Brown 1994; van den Ent et al. 1994).

Juxtaposition of two viral DNA ends is essential to ensure their coordinated integration. The ability of sequences at one viral DNA end to influence 3′-end processing at the other implies that this juxtaposition precedes 3′-end processing, i.e., that it is a very early event in the integration process (Murphy and Goff 1992). Purified integrase can mediate the concerted integration (or disintegration) of two viral DNA ends in vitro (Bushman et al. 1990; Craigie et al. 1990; Katz et al. 1990; Chow and Brown 1994a; Mazumder et al. 1994; Vora et al. 1994). Thus, integrase alone is sufficient for proper juxtaposition of the two ends of the viral DNA molecule. The possibility that other components of the preintegration complex can stabilize the juxtaposed ends is not ruled out, however. Indeed, the efficiency with which viral DNA ends are juxtaposed by purified integrase in vitro is far lower than the in vivo process (Craigie et al. 1990; Bushman and Craigie 1991; Vora et al. 1994).

The 3′-end processing and DNA-joining steps in the integration process reflect a common phosphate transesterification activity (Fig. 9). The DNA-joining step of integration, which involves the formation of new phosphodiester bonds joining the viral and host DNAs, proceeds without an extrinsic source of chemical energy (Brown et al. 1987). This suggested that the energy from the target DNA bonds that need to be broken in this step is used to form the new bonds that join the viral and target DNAs. Such a concerted DNA cleavage-ligation reaction could, in principle, proceed either via a protein-DNA covalent intermediate, as occurs in bacteriophage λ integration, or by a direct attack of the viral 3′-OH on a target DNA phosphodiester bond, as occurs in bacteriophage Mu integration (Mizuuchi and Adzuma 1991). Examination of the stereochemical course of the joining reaction showed that the target DNA phosphate group that transfers from a target 3′-OH to the viral 3′-OH undergoes inversion in the process (Engelman et al. 1991) (Fig. 9C). This is the expected result for a direct transesterification mechanism, whereas a reaction involving a covalent protein-DNA intermediate would be expected to proceed with net retention of the stereochemical configuration of the phosphate.

Figure 9. Chemistry of the reactions catalyzed by integrase.

Figure 9

Chemistry of the reactions catalyzed by integrase. (A) 3′end cleavage. This reaction is brought about by a direct nucleophilic attack on an internucleotide phosphate by an OH group, usually in a water molecule, but alcohols, glycerol, and even (more...)

Fortuitously, in the presence of Mn++, a major product of the viral 3′-end processing reaction is a cyclic dinucleotide in which the 3′-OH of the viral DNA is joined to the 5′-P of the penultimate nucleotide, allowing the stereochemistry of this step to be determined as well (Engelman et al. 1991). The stereochemical configuration of the phosphate group was found to have undergone inversion during the cyclization reaction, implying that the 3′-end processing step also involves a direct nucleophilic attack, in this case on the penultimate phosphodiester bond of the viral 3′end.

Locally, the transition state in all three of the reactions catalyzed by integrase is probably quite similar, but the relationship of the phosphate undergoing substitution to the appended DNA molecules varies among the reactions. This phosphate group can belong to the viral DNA (in the 3′-end processing reaction) (Fig. 9A), the target DNA (in the DNA joining step; Fig. 9B), or the junction between the two (in the disintegration reaction). In two of the three reactions, end processing and disintegration, the 3′-OH of the conserved subterminal A is the leaving group; however, in the DNA joining step, this same hydroxyl is the attacking nucleophile. The variations in the arrangement of DNA surrounding the phosphate at the center of the reactions suggest a requirement for flexibility in the viral DNA end, and in the target DNA site, to allow accommodation of variable arrangements of DNA in the active site. The importance of conformational flexibility in the viral DNA substrate is supported by the observation that noncomplementarity of the terminal base pairs increases the rate of end processing (Scottoline et al. 1997) and that the 3′-OH at the viral DNA end can serve as the attacking nucleophile in the 3′-end processing reaction (Engelman et al. 1991).

Target Site Selection for Retroviral Integration Is Multifactorial, Varies among Retroviruses, and Is Influenced Both by Intrinsic Features of the DNA and by Proteins Bound to Target DNA

Once there was a heated debate as to whether there might be only a very small number of possible target sites. Now it is clear that the number of potential targets is enormous, perhaps including virtually all points in the genome (Pryciak et al. 1992a; Withers-Ward et al. 1994). The extent to which specific sites in the genome are preferred as targets and the basis for this preference appear to vary among retroviruses, but all viruses show preferences for some target sites over others. At the single-nucleotide level of resolution, individual sites are used as targets at rates that vary over a range that probably exceeds 1000-fold, even within a sequence of a few thousand base pairs (Shih et al. 1988; Kitamura et al. 1992; Pryciak et al. 1992a,b; Pryciak and Varmus 1992b; Withers-Ward et al. 1994).

The degree to which integration specificity in vivo is determined by local features versus regional differences is not clear. However, the influence of local features has been a more experimentally tractable question. Determination of the sequences of hundreds of target sites for integration has clearly established that local homology between viral and target DNA sequences plays no significant part in target site selection. Comparisons of the integration patterns into a DNA sequence in vivo, and as a naked molecule in vitro, show that the target DNA sequence is one important determinant of integration specificity (Pryciak and Varmus 1992b). Weak consensus sequences for preferred integration sites have been deduced, but their ability to account for the observed pattern of integration is poor, suggesting that DNA structural features determined indirectly by base sequence (e.g., dimensions of the major and minor groove, flexibility, and bend angles) may be more important than nucleotide identity per se in determining target selection (Pryciak et al. 1992b; Pruss et al. 1994b).

In vivo, most of the potential target sites for integration are assembled into nucleosomes. This structure profoundly affects the ability of a DNA to serve as a target. Nucleosomal DNA is used in preference to naked DNA as a target for MLV or HIV integration (Pryciak et al. 1992a,b; Pryciak and Varmus 1992b). Moreover, the pattern of integration into nucleosomal DNA targets in vitro indicates that phosphodiester bonds at specific positions in the nucleosome are highly preferred as integration targets (Fig. 10) (Pryciak and Varmus 1992b; Pruss et al. 1994a). In general, at sites where the major groove is facing directly outward from the nucleosome particle, phosphodiester bonds flanking that major groove are most accessible as integration targets, implying that integrase binds to the face of the double helix toward which the two target phosphodiester bonds are most exposed (Pryciak and Varmus 1992b). The most highly preferred sites on the nucleosome are those where the major groove dividing the target phosphodiester bonds on the two DNA strands is widest (Pruss et al. 1994a). This interpretation is supported and extended by in vitro experiments using more defined artificial methods to induce sharp bends in the target DNA (Muller and Varmus 1994). In these experiments, integration occurred preferentially at sites where the double helix was sharply bent, and particularly into phosphodiester bonds flanking widened major grooves.

Figure 10. Nucleosome structure influences local target site specificity of integration.

Figure 10

Nucleosome structure influences local target site specificity of integration. (A) Distribution of targets used by MLV integration complexes in an in vitro assay using a 1453-base-pair minichromosome target. The positions of nucleosomes are indicated by (more...)

Whereas histones bound to target DNA promote integration, other DNA-binding proteins can apparently occlude potential target sites (Pryciak and Varmus 1992b; H.P. Muller et al. 1993; Bushman 1994; Muller and Varmus 1994). Sequence or structure-specific DNA-binding proteins may have a role in selecting specific sites for integration. The best-studied example of this phenomenon among retrotransposons is in Ty3 integration, which is directed to sites immediately adjacent to the transcription start sites of RNA polymerase III transcription units, by interactions be- tween basal transcription factors and the Ty3 integration machinery (Chalker and Sandmeyer 1990, 1992, 1993; Kirchner et al. 1995). Among retroviruses, the best examples of such a phenomenon are artificial ones (Bushman 1994; Goulaouic and Chow 1996; Katz et al. 1996). However, the discovery of a specific protein-protein interaction between HIV-1 integrase and a putative mammalian transcriptional regulatory protein has led to the suggestion that this might provide a mechanism for directing integration to sites that are especially favorable for transcription (Kalpana et al. 1994).

It is not clear to what extent local effects of this kind can account for the pattern of integration over the whole genome. Indeed, the distribution of integration sites across the whole genome remains to be adequately characterized. For RSV, the evidence is somewhat contradictory. One set of experiments suggested that certain rare sites were used as integration targets at a frequency one-million-fold greater than the average site in the whole genome (Shih et al. 1988). Yet, a second, more recent, set of experiments, using a virtually identical experimental system, led to the conclusion that there was little variation across the genome in the frequency with which individual intervals were used as integration targets (Withers-Ward et al. 1994). Moreover, the previously reported putative “hot spots” for RSV integration appeared to be no “hotter” than average sites in the genome. Other, more anecdotal data suggest that some genes are used more (Frankel et al. 1985; King et al. 1985) or less (Hubbard et al. 1994) frequently as integration targets than would be expected if integration targets were uniformly distributed in the genome. Analysis of a small sample of sequenced target sites for HIV-1 integration reveals highly repetitive elements in close proximity to the integration site at a greater than expected frequency (Stevens and Griffith 1994). Several reports have suggested that MLV integration sites map within a few hundred base pairs of DNase-hypersensitive sites, and near transcriptionally active regions and CpG islands, at a rate significantly greater than would be expected by chance, lending weight to the popular notion that integration events are directed to sites favorable for transcription (Vijaya et al. 1986; Rohdewohld et al. 1987; Mooslehner et al. 1990; Scherdin et al. 1990).

Final Stages of the Integration Process

The initial joining step in integration yields a viral genome joined to host DNA by only one strand at each end (see Fig. 8) (Fujiwara and Mizuuchi 1988; Brown et al. 1989). To complete the process leading to a mature provirus, the gaps flanking the viral DNA must be filled in by extending the free 3′end of the target DNA, the mismatched viral 5′end must be trimmed, and the resulting 3′and 5′ends must be ligated (Fig. 11). This maturation process remains completely unexplored. Does the virus, after completing the numerous steps leading up to establishment of a provirus, leave this ultimate step in the integration process to chance? The cell has the enzymatic wherewithal to carry out this maturation process without the participation of viral functions. It is nevertheless conceivable that virally encoded proteins might direct the necessary cellular enzymes to this substrate or even act directly to repair the gaps. The intriguing possibility that integrase could have a direct role in gap repair is raised by the observation that integrase can repair discontinuities in a DNA strand when they include a 5′single-stranded tail—a structure expected as an intermediate in the gap repair process (Chow et al. 1992; Kulkosky et al. 1995; Roe et al. 1997).

Figure 11. Joining the 5′end of viral DNA to target DNA.

Figure 11

Joining the 5′end of viral DNA to target DNA. Following initial joining of the viral 3′ends to target DNA, the gap flanking the viral 5′end needs to be repaired by DNA synthesis (1), which may continue into the proviral DNA, displacing (more...)

The stability of the complex between integrase and viral DNA presents an obvious problem after integration. How does the complex disassemble, to allow the provirus to become an ordinary resident of the host genome, and ultimately to be replicated along with cellular DNA? Following bacteriophage Mu transposition, disassembly of the corresponding Mu integration complex is catalyzed by a specific host-encoded molecular chaperone, the E. coli ClpX protein (Levchenko et al. 1995). It is likely that host factors play a similar part in postintegration disassembly of the retroviral integration complex. Indeed, this is one candidate role for the host SWI2/SNF2 complex, a component of which, Ini1, may interact with HIV-1 integrase (Kalpana et al. 1994). This complex has been implicated in other processes involving remodeling of nucleoprotein complexes (Hirschhorn et al. 1992).

The possibility that proteins of the preintegration complex can play a part after integration is completed also merits consideration. Retroviral integration creates an opportunity rarely encountered in cells by introducing a DNA molecule into the genome with neither nucleosomes nor (apparently) transcription factors prebound. This may be a propitious moment for the virus to promote the assembly of an optimal constellation of transcription factors for expression of the provirus. The higher expression levels of proviruses introduced into the genome by the viral integration process, as compared to transfected proviral DNA, has usually been attributed to preferentional targeting of integration to genomic regions favorable for transcription (Vijaya et al. 1986; Rohdewohld et al. 1987; Mooslehner et al. 1990; Scherdin et al. 1990). There is scant experimental evidence for this model, however (Reddy et al. 1991; Withers-Ward et al. 1994). The phenomenon could equally well reflect a viral mechanism that actively exploits the opportunity to set up a favorable transcriptional program immediately following integration. The possibility that viral proteins in the preintegration complex might promote assembly of transcription factors onto viral DNA prior to or immediately after its integration has not yet been investigated. Indeed, the reported interaction between HIV-1 integrase and the Ini1 component of the SWI2/SNF2 complex (Kalpana et al. 1994), which has a key role in gene activation (Hirschhorn et al. 1992), may suggest an early role for viral proteins in promoting transcription of the newly integrated provirus.

Influence of Host Cellular Proteins on Integration

Several lines of evidence, discussed in the previous section, point to a role for cellular proteins in modulating the target site specificity of integration.

In vitro, proteins present in cytoplasmic extracts from uninfected cells can repress autointegration (Lee and Craigie 1994) and stimulate integration by preintegration complexes into a separate target DNA molecule (Fujiwara and Craigie 1989; Lee and Craigie 1994; Farnet and Bushman 1997), raising the possibility that host proteins could have a similar role in vivo.

Fv1 restriction, the only well-documented example of a genetic host-range restriction that operates between entry into the cell and integration, is perhaps the clearest example of an effect of cellular components on the integration process. Replication of certain strains of MLV is inhibited in strains of mice or cell lines that carry particular alleles of the Fv1 gene (Hartley et al. 1970; Pincus et al. 1975; Jolicoeur 1979). The Fv1 gene serves an unknown cellular function in the absence of infection. The two best-studied alleles of this gene, Fv1 n and Fv1 b, determine the ability of alternative MLV strains to establish a provirus upon infection of a mouse cell (Hartley et al. 1970; Pincus et al. 1975). MLV strains are designated as either N-tropic, B-tropic, or NB-tropic depending on their ability to replicate in cells of different Fv1 genotypes. Cells carrying the Fv1 n allele do not allow establishment of a provirus by B-tropic MLV strains. Cells carrying the Fv1 b allele restrict N-tropic MLV strains. Cells carrying both Fv1 alleles restrict both N- and B-tropic viruses. NB-tropic MLV strains are not restricted by either Fv1 allele, although phenotypically mixed viruses carrying both determinants are restricted by both alleles. Viral N/B tropism has been mapped to the CA coding region of the gag gene (Hopkins et al. 1976, 1977; Boone et al. 1983; DesGroseillers and Jolicoeur 1983). A simple interpretation of the Fv1 restriction phenomenon is that the host Fv1 gene product interacts adversely with the cognate CA protein of an infecting subviral particle. Although they tend to corroborate the in vitro evidence suggesting that CA is an essential part of the MLV preintegration complex, in vivo studies of Fv1 restriction do not specifically implicate integration as the sole target for the inhibitory action of the Fv1 product (Jolicoeur and Rassart 1980, 1981; Yang et al. 1980b; Chinsky and Soeiro 1981; Pryciak and Varmus 1992a). In an Fv1-restricted infection, there is a marked reduction in recovery of full-length linear DNA in the soluble cytoplasmic fraction, when infected cells are fractionated (Pryciak and Varmus 1992a). The levels of full-length linear DNA in a fraction that includes the nuclear contents and associated insoluble material from the cytoplasm are virtually identical between Fv1-restricted and -permissive infections (Pryciak and Varmus 1992a). Moreover, processing of the 3′ends of the viral DNA by integrase is not impaired in vivo by Fv1 restriction, and viral nucleoprotein complexes isolated from cells infected by the cognate Fv1-restricted virus appear normal in structure and mediate integration of the viral DNA with normal efficiency when assayed in vitro (Pryciak and Varmus 1992a). Yet integration of this DNA in vivo is markedly impaired by Fv1 restriction. Moreover, circularization of the viral DNA, a process independent of integration that appears only to occur after the viral DNA enters the nucleus, is severely impaired by Fv1 restriction (Jolicoeur and Rassart 1980, 1981; Yang et al. 1980a,b; Chinsky and Soeiro 1981). These characteristics represent striking parallels between Fv1 restriction and cell cycle restriction of MLV integration, suggesting that, just as cell cycle arrest can block entry of viral intermediates into the nucleus, Fv1 restriction may block an essential step preceding integration, perhaps also entry into the nucleus (Pryciak and Varmus 1992a; Roe et al. 1993). The Fv1 gene has recently been cloned, and it appears to be derived from the gag gene of an endogenous retrovirus not closely related to MLV (Best et al. 1996). Although the significance of this relationship, if any, remains to be established, the isolation of this long-sought gene provides the means for a detailed biochemical and genetic dissection of the molecular basis of Fv1 restriction.

Integrase as a Target for Antiviral Agents

The apparent dependence of HIV-1 viremia and depression of CD4+ T-cell counts on new cycles of infection, even in late stages of AIDS, suggests that effective inhibitors of any essential step in the viral life cycle could slow or halt progression of HIV-1 disease. As an essential step in the retroviral life cycle, mediated by a virally encoded protein, integration is therefore an attractive target for antiviral chemotherapy. Indeed, the lack of any known analogous activity essential to cells raises the hope that an antiviral agent directed at integrase might be relatively free of cytotoxic side effects.

When HIV was recognized as an important human pathogen, the search for antiviral agents directed at integrase was delayed by the lack of knowledge regarding the biochemistry of the integration process, which contrasted with the extensive body of knowledge regarding DNA polymerases and proteases. As gains have been made in understanding the integration process, the search for drugs that target this process has begun in earnest and compounds that inhibit integration in vitro have been identified by several groups (Cushman and Sherman 1992; Carteau et al. 1993a,b, 1994; Fesen et al. 1993, 1994; Mouscadet et al. 1994; Allen et al. 1995; Cushman et al. 1995; LaFemina et al. 1995; Mazumder et al. 1995; Puras Lutzke et al. 1995; Raghavan et al. 1995; Robinson et al. 1996a,b; Hazuda et al. 1997).

The choice of a strategy for developing an antiviral agent directed at integrase is not straightforward, however. Because the entire allotment of integrase molecules in the virion, about 50–100, is only required to catalyze a total of four chemical steps, and may have a long interval in which to do so, it is not clear that an inhibitor of the catalytic activity of integrase could prevent the ultimate progression of the integration process, unless it were extremely potent or irreversible. On the other hand, integrase has the potential to become lethal to the virus. Its endonuclease activity is ordinarily highly specific. However, if the specificity of the endonuclease activity could be undermined, then cleavage of the viral DNA at an incorrect site could abort the integration process. Similarly, integration of the viral DNA ends into an internal site in the viral DNA, rather than into cellular DNA, would be fatal to the virus. The long interval between completion of viral DNA synthesis and integration, although perhaps placing great demands on an inhibitor of integrase, ironically makes the virus highly vulnerable to any agent that could undermine the usual specificity of the enzyme, suggesting that this might be a useful strategy to explore. To date, there has been no systematic search for compounds that alter the specificity of the enzyme, and none have been reported.

At this writing, numerous compounds have already been identified as inhibitors of integrase by in vitro screens using purified recombinant HIV-1 integrase. Many of the inhibitors identified to date bind to DNA, often by intercalation. Not all of the reported inhibitors appear to bind DNA, however, and no general statements can yet be made regarding either their structures or mechanisms of action. Although the clinical potential of any of these compounds as antiviral agents remains to be determined, the rapid progress in our understanding of the molecular details of integration and the availability of convenient assays with which to screen for drugs targeted at integrase provide grounds for optimism that integrase will be a target of useful new anti-HIV therapeutics.

Copyright © 1997, Cold Spring Harbor Laboratory Press.
Bookshelf ID: NBK19394