Genome Engineering: Drosophila melanogaster and beyond
Abstract
A central challenge to investigating biological phenomena is the development of techniques to modify genomic DNA with nucleotide precision that can be transmitted through the germ line. Recent years have brought a boon in these technologies, now collectively known as genome engineering. Defined genomic manipulations at the nucleotide level enable a variety of reverse engineering paradigms, providing new opportunities to interrogate diverse biological functions. These genetic modifications include controlled removal, insertion, and substitution of genetic fragments, both small and large. Small fragments up to a few kilobases (e.g., single nucleotide mutations, small deletions, or gene tagging at single or multiple gene loci) to large fragments up to megabase resolution can be manipulated at single loci to create deletions, duplications, inversions, or translocations of substantial sections of whole chromosome arms. A specialized substitution of chromosomal portions that presumably are functionally orthologous between different organisms through syntenic replacement, can provide proof of evolutionary conservation between regulatory sequences. Large transgenes containing endogenous or synthetic DNA can be integrated at defined genomic locations, permitting an alternative proof of evolutionary conservation, and sophisticated transgenes can be used to interrogate biological phenomena. Precision engineering can additionally be used to manipulate the genomes of organelles (e.g., mitochondria). Novel genome engineering paradigms are often accelerated in existing, easily genetically tractable model organisms, primarily because these paradigms can be integrated in a rigorous, existing technology foundation. The Drosophila melanogaster fly model is ideal for these types of studies. Due to its small genome size, having just four chromosomes, the vast amount of cutting-edge genetic technologies, and its short life-cycle and inexpensive maintenance requirements, the fly is exceptionally amenable to complex genetic analysis using advanced genome engineering. Thus, highly sophisticated methods developed in the fly model can be used in nearly any sequenced organism. Here, we summarize different ways to perform precise inheritable genome engineering using integrases, recombinases, and DNA nucleases in the D. melanogaster.
2. Introduction
Genome engineering is collectively defined as the technologies and methods used to modify an organism’s genetic material in a defined manner. So far, extensive genome engineering has been performed in prokaryotes and single celled eukaryotes1,2, e.g., synthetic version of entire genomes of the bacteria Mycoplasma genitalium3 and Mycoplasma mycoides3, and a full chromosome of the yeast Saccharomyces cerevisiae4,5 have been generated and manipulated. Primarily due to their greater complexity, similar manipulations in larger, multicellular animals have historically lagged behind. In recent years, however, our ability to modify the genomes of higher organisms has expanded tremendously, becoming ever more precise and defined to nucleotide resolution6,7. While these efforts have largely been directed toward the nuclear genome, some can also target DNA-containing organelles (e.g., mitochondria)8–10. Importantly, changes introduced by genome engineering allow scientists to better recapitulate known genetic dysfunctions and interrogate their consequences at any biological level (Fig. 1).
A. The biology level. Mutant phenotypes and DNA modifications can be analyzed at systems-, tissue-, cell-, subcellular-, and molecular levels. B. The DNA level. DNA modifications can be performed at genome-, chromosome-, subchromosome-, chromatin-, or nucleotide-levels. C. Methods to perform genome engineering at the different DNA levels. Currently, engineering technologies to manipulate genomes or chromosomes in toto are unavailable for Drosophila melanogaster or any other multicellular eukaryotic organism. On the other hand, numerous methods are available to precisely engineer chromosomal sections, gene environments, and gene properties. Note: Drawings for panel B were acquired by modifying free vector templates available from www.vector.me.
Most whole-animal genome engineering paradigms focus on well-studied model organisms, both invertebrates, e.g., the worm Caenorhabditis elegans11 and the fly Drosophila melanogaster12–16, and vertebrates, e.g., the mouse Mus musculus17,18, the rat Rattus norvegicus19,20, and the zebrafish Danio rerio21,22. Due to evolutionary conservation of cellular and developmental signaling pathways, findings from model organisms often provide insight into the function (and malfunction) of other animal species that are less amenable to genome engineering23. Ideal model organisms are easily amenable to experimental manipulation, exhibit a short life-cycle, and require minimal (and reasonably inexpensive) maintenance24,25. Moreover, they frequently have a substantial pre-existing experimental toolkit encompassing a wide portfolio of techniques for somatic genetic manipulation, germline mutagenesis, genome engineering, differential cellular labeling, whole tissue analysis, and systems physiology interrogation23. Hence, small model organisms, notably C. elegans11, and D. melanogaster12–16, provide valuable platforms for sophisticated biological studies, including those relevant to human disease12,26,27.
The fly in particular is a powerful model system that has contributed enormously to our understanding of genetics, developmental biology, neurobiology, and more recently, to factors leading to human discomfort and disease12,27–29. Importantly, many pathways responsible for disease and mutant phenotype ontogenesis (e.g., notch, hedgehog and hippo) were first discovered and characterized in D. melanogaster28,30. The fly has also been a powerful model to study intrinsic processes that constitute disease and mutant phenotype formation12,31,32, such as DNA instability, tissue growth, cell invasion, and other well-known hallmarks of cancer33–35. Several of these hallmarks also have significant relevance in other diseases36. Importantly, sequencing of the entire Drosophila genome37, recently updated to its sixth version38, accelerated the development of several cutting-edge genetic technology innovations that can be adapted and combined to create ever more powerful genetic paradigms, including tools for genetic manipulation, genome engineering, and disease modeling12–16,39. Moreover, systems pharmacology using flies can identify optimal therapeutic drugs12,31,40,41. Finally, in vivo genome-wide analysis of tractable disease development using complex genetic backgrounds within defined, labeled cell populations is most advanced in flies42,43. Based on these advantages, Drosophila is currently unsurpassed as a multicellular organism for gene discovery, genetic manipulation, and disease modeling12,15,16,32. Recent genome engineering innovations founded on recombinases44, integrases45–47, and programmable nucleases39,48,49, have made it possible to further expand the paradigms for sophisticated targeted genome manipulations.
Herein we discuss aspects of precision genome engineering, i.e., genome manipulation at nucleotide resolution, in the fly D. melanogaster. When introduced in the germ line, such manipulations can be maintained over generations through convenient crossing schemes. We will first examine the different components of precision engineering paradigms, summarizing key enzymes (recombinases, integrases, and a variety of nucleases) used to edit the genome and the mechanisms by which they do so. Next, we review how genome engineering is experimentally achieved, outlining commonly used methods used to introduce genetic modifications. Finally, we summarize various outcomes of genome engineering paradigms, reviewing changes both small (e.g., single nucleotide substitutions or kb-sized deletions or insertions) and large (e.g., multi-kilobase inversions or translocations), as well as specialized modifications such as syntenic replacement. We also examine paradigms geared toward the addition of novel genetic material via site-specific transgenesis and briefly touch on the manipulation of organellar genomes.
3. Molecular Players and Reaction Outcomes in Genome Engineering
Precise genome engineering can be carried out by a number of enzymes, including a plethora of recombinases44 and integrases45, and a variety of nucleases48. Although these enzymes catalyze different types of reactions, they all exhibit one key feature: the ability to bind DNA at a predetermined recognition site. The specificity of a given enzyme for its recognition site is critical to the success of genome engineering since it determines specificity of the enzymatic reaction. For recombinases, integrases, and meganucleases, the recognition site is defined by the enzyme itself. In contrast, recognition sites for zinc finger and transcription activator-like effector nucleases are user-defined and set by varying the enzyme’s amino acid sequence. RNA-guided nucleases are also custom designed by the researcher but use a complementary RNA molecule to target the DNA sequence of interest.
To precisely edit the genome, engineering enzymes must catalyze one of two types of reactions. Recombinases and integrases, which induce recombination and integration events, respectively, catalyze an exchange reaction between two DNA templates, each containing an enzyme-specific recognition site44–47. In contrast, nucleases cut DNA at a precise location and exploit cellular repair mechanisms to complete genome modification39,48,49.
3.1 Recombinases and Integrases
As enzymes for precision genome engineering, recombinases and integrases are very similar. They both use short, enzyme-defined recognition sites, and both catalyze an exchange reaction between two DNA substrates44–47. In both recombination and integration events, the DNA exchange reforms the recognition sites, though they may be nonfunctional for future exchange events. Numerous recombinases and integrases are available for DNA manipulation.
A. Recombinases
Recombinases were one of the first enzymes used for precise genome engineering and have been used extensively in a number of model organisms, including the fly14,44,50. As their name suggests, recombinases catalyze a recombination reaction between two DNA templates, each containing a 30–40 bp enzyme-specific recognition site, e.g., minimal LOcus of crossing (X) over, P1 (LoxP) and FLP recombinase Recognition Target (FRT) sites are 34 bp long44,51. These recognition sites contain perfect inverted repeats flanking an asymmetric spacer (Fig. 2A and Table 1). Under normal conditions the recognition sites are identical and the reactions they catalyze bidirectional. That is, since the outcome of any recombination event is reformation of the same two recognition sites, recombination can proceed in both “forward” and “reverse” directions44 (Fig. 2A and 2B). However, recognition sites can be altered to drive the reaction directionality (Table 1). For example, recombination between a site with a mutation in its left-side inverted repeat and one with a mutation in its right-side inverted repeat will produce two new recognition sites—one inverted repeat double mutant site and one wild type—that can no longer recombine52,53. Such sites are called inverted repeat mutants51 (Fig. 2B). Non-compatible or orthogonal (i.e., mutually exclusive) recognition sites have also been engineered for some recombinase family members54–57 (Table 1). These sites contain mutations in the asymmetric spacer of the recombinase recognition site and are therefore called spacer mutants51 (Fig. 2B). Spacer mutant sites allow multiple, simultaneous independent recombination events using the same enzyme58 (Fig. 2B).
A. Molecular reaction performed by the Cre recombinase. B. Comparison of mutant and wild type recombination recognition sites. Inverted repeat mutant- and spacer mutant recombination recognition sites are indicated. C. Outcomes of recombination reactions using wild type-, inverted repeat mutant-, and spacer mutant recombination sites. Recombination reactions can result in inversion, excision, integration, or recombinase-mediated cassette exchange (RMCE) events. Reaction directionality, i.e., uni- or bidirectional, using different recombination site classes is indicated.
Table 1
Recombinases.
| Recombinase | Target Site | Sequencea |
|---|---|---|
| Cre | LoxP | 5’-ATAACTTCGTATAatgtatgcTATACGAAGTTAT-3’ |
| Lox2272 | 5’-ATAACTTCGTATAaagtatccTATACGAAGTTAT-3’ | |
| Loxm2 | 5’-ATAACTTCGTATAagaaaccaTATACGAAGTTAT-3’ | |
| LoxLE | 5’-TACCGTTCGTATAatgtatgcTATACGAAGTTAT-3’ | |
| LoxRE | 5’-ATAACTTCGTATAatgtatgcTATACGAACGGTA-3’ | |
| Fre | LoxH | 5’-ATATATACGTATAtatgtctaTATACGTATATAT-3’ |
| Tre | LoxLTR | 5’-ACAACATCCTATTacaccctaTATGCCAACATGG-3’ |
| FLP | FRT | 5’-GAAGTTCCTATTCtctagaaaGTATAGGAACTTC-3’ |
| F3 | 5’-GAAGTTCCTATTCttcaaataGTATAGGAACTTC-3’ | |
| F5 | 5’-GAAGTTCCTATTCttcaaaagGTATAGGAACTTC-3’ | |
| FRT-LE | 5’-GAAGTTCATATTCtctagaaaGTATAGGAACTTC-3’ | |
| FRT-RE | 5’-GAAGTTCCTATTCtctagaaaGTATATGAACTTC-3’ | |
| mFlp5 | mFRT71 | 5’-GAAGTTTCTATTCtctagaaaGTATAGAAACTTC-3’ |
| Dre | Rox | 5’-TAACTTTAAATAATgccaATTATTTAAAGTTA-3’ |
| KD | KDRT | 5’-AAACGATATCAGACATTTGTCTGATAATgcttcATTATCAGACAAATGTCTGATATCGTTT-3’ |
| B2 | B2RT | 5’-GAGTTTCATTAAGGAATaactaATTCCCTAATGAAACTC-3’ |
| B3 | B3RT | 5’-GGTTGCTTAAGAATaagtaATTCTTAAGCAACC-3’ |
| R | RSRT | 5’-TTGATGAAAGAATAacgTATTCTTTCATCAA-3’ |
| VCre | VloxP | 5’-TCAATTTCTGAGAactgtcatTCTCGGAAATTGA-3’ |
| SCre | SloxP | 5’-CTCGTGTCCGATAactgtaatTATCGGACATGAT-3’ |
| Vika | Vox | 5’-AATAGGTCTGAGAacgcccatTCTCAGACGTATT-3’ |
The most commonly used recombinases in eukaryotic genome engineering are Cre (Causes REcombination), which catalyzes recombination between LoxP sites, and FLP (FLiPpase), which induces recombination between FRT sites51 (Table 1). Within the fly community, Cre usage got off to a slow start59,60, partly due to toxicity61–63. This toxicity can be remediated by adding an inducible hormone binding domain (e.g., estrogen receptor)62, including light inducible conditional protein domains that catalyze Cre complementation64, or limiting availability by redirecting the recombinase to the proteasome through a fused PEST domain63. Despite these challenges, several experimental paradigms have been developed using Cre in flies, e.g., chromosomal manipulations65, transgenesis66, gene targeting61, and conditional activation63,67. FLP on the other hand, was quickly adopted by the fly community14,42,50. Additional recombinases recently tested for in vivo genome modification in flies include Dre (recognizing DNA recombinase sites known as rox sites)63,68,69, KD (KDRT sites)63, B2 (B2RT sites)63, B3 (B3RT sites)63, and R (RSRT sites)63 (Table 1). Although not yet tested in flies, other recombinases such as Vika70, SCre and VCre71,72, will likely provide additional opportunities for genome engineering (Table 1).
To further expand the number of recombinases available for genome engineering, existing recombinases can be evolved to recognize novel sites not identified by their normal counterparts. Novel versions of Cre (e.g., Fre and Tre recognizing LoxH and LoxLTR sites, respectively)73,74 and FLP (e.g., mFlp5 recognizing mFRT71 sites)75,76 have been engineered through molecular evolution to function orthogonally to their normal counterparts (Table 1). Only mFlp5 has been used in flies to date75.
B. Integrases
Like recombinases, integrases catalyze an exchange reaction—an integration—between two DNA substrates, each carrying an enzyme recognition site45–47. Unlike recombinases though, integrase recognition sites contain a short integration core flanked by imperfect inverted repeats that are not identical45 (Fig. 3A and Table 2). Upon integration, these sites known as attB (i.e., ATTachment site used by Bacteria) and attP (i.e., ATTachment site used by Phage), form the hybrid sites attL (i.e., ATTachment site at the Left containing the 5’ half of attP and 3’ half of attB) and attR (i.e., ATTachment site at the Right containing the 5’ half of attP and 3’ half of attB). The attL and attR sites are not substrates for integrase, conveniently driving the reaction in one direction45 (Fig. 3A). The reverse reaction, an excision, can occur but only in the presence of an appropriate excisionase45,77. Importantly, each integrase recognizes its own cognate attB and attP sites45,78 (Table 2). However, as with recombinases, orthogonal recognition sites (i.e., spacer mutants) have been engineered for some integrase family members, allowing them to catalyze multiplex integration reactions79 (Fig. 3B and Table 2).
A. Molecular reaction performed by the ΦC31 integrase. B. Comparison of mutant and wild type integration recognition sites. Inverted repeat mutant- and spacer mutant integration recognition sites are indicated. C. Outcomes of integration reactions using wild type- and spacer mutant integration sites. Integration reactions can result in inversion, excision, integration, or integrase-mediated cassette exchange (IMCE) events. Reaction directionality, i.e., unidirectional, using different integration site classes is indicated.
Table 2
Integrases.
| Integrase | Target Site | Sequencea |
|---|---|---|
| ΦC31 | AttBTT | 5’-TGCGGGTGCCAGGGCGTGCCCttGGGCTCCCCGGGCGCGTACTCC-3’ |
| AttPTT | 5’-GTGCCCCAACTGGGGTAACCTttGAGTTCTCTCAGTTGGGGG-3’ | |
| AttBCT | 5’-TGCGGGTGCCAGGGCGTGCCCctGGGCTCCCCGGGCGCGTACTCC-3’ | |
| AttPCT | 5’-GTGCCCCAACTGGGGTAACCTctGAGTTCTCTCAGTTGGGGG-3’ | |
| AttBGT | 5’-TGCGGGTGCCAGGGCGTGCCCgtGGGCTCCCCGGGCGCGTACTCC-3’ | |
| AttPGT | 5’-GTGCCCCAACTGGGGTAACCTgtGAGTTCTCTCAGTTGGGGG-3’ | |
| AttBCA | 5’-TGCGGGTGCCAGGGCGTGCCCcaGGGCTCCCCGGGCGCGTACTCC-3’ | |
| AttPCA | 5’-GTGCCCCAACTGGGGTAACCTcaGAGTTCTCTCAGTTGGGGG-3’ | |
| AttBCC | 5’-TGCGGGTGCCAGGGCGTGCCCccGGGCTCCCCGGGCGCGTACTCC-3’ | |
| AttPCC | 5’-GTGCCCCAACTGGGGTAACCTccGAGTTCTCTCAGTTGGGGG-3’ | |
| AttBTC | 5’-TGCGGGTGCCAGGGCGTGCCCtcGGGCTCCCCGGGCGCGTACTCC-3’ | |
| AttPTC | 5’-GTGCCCCAACTGGGGTAACCTtcGAGTTCTCTCAGTTGGGGG-3’ | |
| Bxb1 | AttB | 5’-TCGGCCGGCTTGTCGACGACGgcggtctcCGTCGTCAGGATCATCCGGGC-3’ |
| AttP | 5’-GTCGTGGTTTGTCTGGTCAACCACCgcggtctcAGTGGTGTACGGTACAAACCCCGAC-3’ | |
| R4 | AttB | 5’-GCGCCCAAGTTGCCCATGACCATGCCgaagcagtggtaGAAGGGCACCGGCAGACAC-3’ |
| AttP | 5’-AGGCATGTTCCCCAAAGCGATACCACTTgaagcagtggtaCTGCTTGTGGGTACACTCTGCGGGTGATGA-3’ |
The ΦC31 integrase is the most frequently used integrase for genome engineering14,15,77. Effective in many species, e.g., mouse and zebrafish80–82, ΦC31 has been most extensively used in flies, where numerous genome-wide collections have been generated for RNAi83, genomic rescue84–86, cellular labeling87, and overexpression studies88,89. Bxb1 integrase was recently tested for in vivo genome modification in D. melanogaster90, but its use is not yet widespread. Many other integrases exist (e.g., ΦBT1, TG1, R4, ΦRv1, TP901-1, A118 and ΦMR11)45–47,91; although they have not been tested in multicellular eukaryotes, including flies, they similarly provide tremendous opportunity to explore novel integration paradigms.
C. Reaction Outcomes
Reactions induced by recombinases and integrases can produce a variety of genetic alterations, including DNA deletions, duplications, inversions, translocations, insertions, and replacements51. The outcome of a particular recombination or integration event largely depends on properties of the enzyme recognition sequences. Recognition site orientation (e.g., direct or inverted), location (i.e., where and on which chromosome(s), and composition (i.e., wild type, inverted repeat- or spacer mutants) all influence reaction outcomes (illustrated for inversions, excisions, and integrations, Fig. 2C and and3C).3C). While exchange reactions commonly occur between one pair of recognition sites, two pairs of sites can also be used for a total of four recognition sites. When two recognition sites on the endogenous chromosome are paired with two sites on an incoming plasmid, replacement occurs. This reaction is commonly defined as Recombinase- or Integrase-Mediated Cassette Exchange (RMCE or IMCE)92–95 (Fig. 2C and and3C3C).
D. Advantages and Disadvantages
Since recombinases and integrases recognize very small sequence elements, they are ideal tools for genetic engineering14,96. However, these enzymes rely on the presence of their recognition sites, which must themselves be incorporated into the genome through transposition66,92,97–103, P element transposon replacement (i.e., transposon backbone mediated gene conversion)104, or nuclease-catalyzed genome engineering105,106. Moreover, these target recognition sequences persist after the exchange reaction, potentially affecting the surrounding genome. Additionally, pseudo-sites, endogenous sequences with some homology to wild type recognition sites, have been identified for both enzyme families in eukaryotic genomes and can permit undesired recombination exchanges or integration events98,107,108. Finally, when expressed at high levels, recombinases62, and integrases86 can have detrimental off-target effects. Keeping these hurdles in mind, recombinases and integrases are highly useful reagents for genome engineering.
3.2 Nucleases
A variety of nucleases are used in the genome engineering field. These enzymes stimulate genome editing by cutting a designated nucleotide sequence and inducing DNA repair. Historically, transposases (i.e., enzymes that catalyze the mobilization of transposons from one location to another) initiated the field of targeted genome engineering at defined loci109–111. However, transposases (e.g., P transposase) requires target sites to be “pre-activated” with a corresponding transposon insertion (i.e., P element in this case). Here, we focus on nucleases that do not rely on a transposon previously integrated at the intended cutting site. Currently, the most frequently used protein-guided nucleases include MegaNucleases (MNs)112, and Zinc Finger Nucleases (ZFNs) and Transcription Activator-Like Effector Nucleases (TALENs)48; while RNA-Guided Nucleases (RGNs) are conveniently guided by an RNA sequence113. Here we describe each of these nuclease families.
A. Meganucleases
Meganucleases, also known as homing nucleases, are rare-cutting endonucleases that recognize a 20–30 bp cognate recognition site (e.g., I-SceI and I-CreI recognize 18 bp and 22 bp target sites, respectively) (Fig. 4A). These sites are rare in eukaryotes, making MNs ideal candidates for targeted cutting of synthetic fragments introduced into these genomes114. Many MNs function as dimers with each monomer generating a single cut in complementary DNA strands, ultimately forming a Double Stranded Break (DSB)115. Although numerous MNs are available, I-SceI and I-CreI are the most frequently used for in vivo genome engineering in flies114,116.
A. The I-SceI meganuclease and recognition of its DNA target site. B. Zinc finger nucleases and DNA recognition through an array of stitched zinc fingers that each recognizes a unique DNA triplet. C. Transcription activator-like effector (TALE) nucleases and DNA recognition through an array of stitched TALE repeats that each recognize a unique single DNA base pair. D. RNA-guided nucleases and DNA recognition through base pairing of an RNA guide and its DNA target site. E. Nickases based on zinc finger binding domains (Left), TALE binding domains (Middle), and RNA guided nucleases (Right).
B. Zinc Finger Nucleases
A zinc finger (ZF) is a DNA-binding protein motif consisting of approximately 30 amino acids stabilized by a zinc ion. Each zinc finger binds to just three nucleotides, with different ZFs having higher affinities for certain DNA triplets117. A number of in vitro assembly methods have been developed to combine individual ZF motifs into ZF repeat arrays118. These arrays bind longer DNA sequences and can be customized to a specific target117 (Fig. 4B). Fusion of a ZF array to the catalytic domain of FokI results in an artificial restriction enzyme better known as a zinc finger nuclease119. As a type IIs restriction enzyme, FokI cuts DNA at a defined position outside of its recognition sequence120. FokI is an obligate dimer, necessitating that ZFNs be designed and implemented in pairs with each nuclease targeting one of the complementary DNA strands121. Although the usefulness of ZFNs has been extensively characterized in flies119,122–128, they unfortunately failed to become fully adopted into the mainstream genetic toolbox by the Drosophila community.
C. Transcription Activator-Like Effector Nucleases
Similar to ZFNs, TALENs are hybrid proteins composed of a DNA-binding domain—in this case obtained from naturally occurring transcription factor family called Transcription Activator-Like Effectors (TALEs)—and the FokI nuclease domain129–131, which converts TALEs into potent genome editors. The most commonly used TALE repeats are from Xanthomonas spp.132. A TALE contains a central DNA-binding domain consisting of a series of tandem repeats obtained from a natural TALE132. A single repeat motif is ~34 amino acids long, only two of which contribute to DNA binding sequence specificity133,134. Unlike the remainder of the repeat, these two amino acids are hypervariable and are thus termed the Repeat Variable Diresidues (RVDs). The four most common RVDs, NI (Asn-Ile), NN (Asn-Asn), NG (Asn-Gly) and HD (His-Asp), account for each of the four nucleotides (A, G, T, and C, respectively)135. These RVD-nucleotide associations are not exclusive, and most pairs can incorporate mismatches135. Nevertheless, multiple TALE repeats can be serially linked to form a synthetic DNA binding protein that recognizes up to two dozen nucleotides with high specificity135 (Fig. 4C). TALE domain repeats can be constructed using a number of modular assembly methods136, enabling construction of a custom TALE repeat protein in a matter of days129–131. As with ZFNs, fusion of a TALE to the FokI nuclease domain requires TALENs be used in pairs129–131, while fusion with a monomeric nuclease domain can negate this constraint137,138. Although TALENs have been adopted to gene editing paradigms within the Drosophila community106,123,139–144, their use has been limited, most likely driven by the discovery of RGNs.
D. RNA-Guided Nucleases
The adaptive immune systems of eubacteria and archae, better known as CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats–CRISPR-ASsociated proteins) systems, have recently become highly attractive tools in for genome engineering113,145–149. The simplest CRISPR-Cas, the Cas9 nuclease complex (e.g., from Streptomyces pyogenes), has been adapted to serve as an RGN, a highly versatile precision genome engineering tool. In contrast to ZFNs and TALENs that use a protein-nucleotide recognition code, Cas9 RGNs target the genome using a guide RNA molecule that contains 20 nucleotides complementary to the DNA sequence of interest150–152 (Fig. 4D). In addition to RNA/DNA heteroduplex formation, Cas9 nuclease activity depends on the presence of a short Protospacer Adjacent Motif (PAM)153,154. The PAM must be located immediately downstream of the target DNA recognition sequence. In nature, bacteria use the PAM sequence, which is species-specific, to distinguish self from non-self. Cas9 RGNs contain two nuclease domains allowing them to cut both DNA strands and generate DSBs despite functioning as monomers151,154. RGN activity can be inactivated while maintaining DNA binding. Consequently, inactive RGNs can be fused to the catalytic domain of FokI, allowing them to function as dimers similar to ZFNs and TALENs155,156. The use of RGNs in nuclease-mediated applications has recently exploded, including in fruit flies for numerous gene editing paradigms157–176.
E. Nuclease Reactions
All engineered DNA nucleases (i.e., MNs, ZFNs, TALENs, and RGNs) function in a similar manner: specificity elements direct nuclease function to a designated DNA target sequence that is subsequently cut and repaired. Previously, nuclease-induced DNA breaks were all described as DSBs, but nucleases can also be designed to generate single stranded DNA nicks (Fig. 4E). For example, ZFNs and TALENs function as obligate dimers and must be designed in pairs; however, inactivating the FokI catalytic domain in one half of the pair will induce a DNA nick at the target site rather than a DSB177–180. Alternatively, ZFs and TALEs can be directly fused to a nickase instead of FokI, a strategy that was recently used to combine a TALE with the monomeric DNA mismatch repair endonuclease MutH181. Finally, the two nuclease activities of Cas9 RGNs can be independently inactivated to stimulate nicking on the DNA strand of choice151,152,182,183. The latter strategy was recently explored in flies170.
Once cut, there are two major pathways for DNA repair: Non-Homologous End Joining (NHEJ) and Homologous Recombination (HR)184 (Fig. 5A). DSBs can be repaired by either mechanism, while single stranded DNA nicks are usually repaired by HR. During NHEJ, cut strands religate, but ligation can be inaccurate, i.e., repetitive cutting followed by ligation will eventually select for NHEJ events that destroy the original cut site, resulting in formation of insertions or deletions (i.e., indels), unless the nuclease is titrated away by loss of expression, degradation or other means. Although the end products of NHEJ are unpredictable and intrinsically mutagenic often leading to the generation of knockout strains, this process should be avoided for precise genome engineering39. Conversely, HR uses a homologous DNA repair template to repair DSBs, making this process highly accurate185. Under normal cellular conditions the repair template is the endogenous homologous chromosome. However, the cellular machinery can be tricked into incorporating engineered sequences through a process called “gene targeting”, described below. Since NHEJ is more efficient than HR, several strategies have been developed to skew the balance toward HR. This can conveniently be accomplished by interfering with genes involved in the NHEJ pathway (e.g., Ligase4), through mutation (i.e., Lig4 is located on the X chromosome)126,127,176 or knockdown of the Lig4 transcript by RNAi or shRNAi159.
A. DNA repair mechanisms after a double stranded cut. Non-homologous end joining (Left), and homologous recombination using an oligonucleotide (Middle) or double stranded DNA fragment (Right) are indicated. B. Homology configuration of a double stranded DNA template during homologous recombination. Ends-in (i.e., insertional) and ends-out (i.e., replacement) gene targeting are illustrated.
F. Gene Targeting and Exogenous Templates
Gene targeting is the modification of an endogenous genomic sequence through HR with an exogenous DNA fragment rather than its homologous chromosomal counterpart. This technique has been widely used in various multicellular model organisms and can be exploited to generate precise genetic modifications, including deletions, targeted insertions, and point mutations39,139. The exogenous targeting template can take many forms (e.g., oligonucleotides, PCR fragments, in vitro-prepared circular and linearized plasmids, or in vivo-generated targeting donors) as long as some homology with the intended DNA target is present (Fig. 5A). Oligonucleotides have been used with ZFNs, TALENS, and RGNs to generate targeted point mutations105,167,186,187, deletions157, or insertions105,162,163,167,188,189. PCR fragments159,164, and in vitro-prepared circular plasmids124,125,127,140,161,169,175 have also been combined with TALENs and/or RGNs to generate insertions140,159,161,169,175, deletions124,164,175, and point mutations125,175. Interestingly, in vitro-linearized plasmid donors, almost exclusively used in mouse ES cell gene targeting190, do not work in flies, at least using ZFN124, while targeting donors generated in vivo have been successfully used exclusively by D. melanogaster researchers for the same purposes114,160.
Experimental gene targeting occurs in one of two ways: as insertional gene targeting (i.e., ends-in targeting)191–194 or as replacement gene targeting (i.e., ends-out targeting)191,195 (Fig. 5B). Ends-in and ends-out refer to the arrangement of the targeting template during HR. Ends-in gene targeting results in site-directed insertion of the entire targeting vector by a not yet fully understood mechanism that depends on vector linearization192,193. The outcome of an ends-in event is a localized tandem duplication of the targeted region. This duplication is often resolved by nuclease-mediated cutting (i.e., using I-CreI) within the integrated fragment, followed by tandem repeat reduction through HR194. Conversely, ends-out gene targeting occurs as a synthesis-dependent strand annealing event between homology arms present in the targeting template (e.g., a gene deletion construct) and the target DNA sequence126,196–198. Ends-out targeting thus results in substitution of endogenous material with exogenous DNA195, including the introduction of additional sequences for screening or selection markers as part of the targeting template90,114,140,159,163,164,167,169,199,200.
G. Advantages and Disadvantages of Nucleases
Each nuclease has its own advantages and disadvantages. Meganuclease recognition sites are not common in eukaryotes, requiring their introduction into the genome prior to enzyme use115,201. This property makes MNs very useful for in vivo linearization of gene targeting constructs as illustrated extensively for D. melanogaster114. A big advantage of ZFNs, TALENs, and RGNs is that they are programmable, i.e., their action is targeted to sites determined by the researcher48. A primary drawback of ZFNs is the time needed to identify optimal repeat domains for superior target sequence affinity and specificity118. Constructing and validating TALEN assemblies can also be problematic due to their size and repetitive nature135. Fortunately, recoding of existing amino acid sequences using gene synthesis can ameliorate this issue202. Alternatively, TALE-like containing domains that are not perfect direct repeats have been discovered that should facilitate assembly of multi-repeat containing nucleases203–205. Finally, both ZFN and TALEN strategies require that two new proteins be designed and generated for each target sequence118,135. In comparison, RGNs, which rely on simple RNA synthesis making them convenient tools to simultaneously target multiple independent loci (i.e., multiplex genome engineering)150,152. Multiplex genome engineering up to 4 independent loci was recently demonstrated to work in flies162,169. However, unlike ZFNs and TALENs, the function of RGNs is somewhat restricted by the presence of a PAM sequence (e.g., Streptomyces pyogenes PAM is NGG)113. This hurdle could be overcome by combining Cas9 systems from different bacterial species, each recognizing orthogonal PAM sequences, e.g., Streptococcus thermophiles153,154,206, Neisseria meningitides153, Treponema denticola153,207 and Staphylococcus aureus206,208, or generating mutant versions of S. pyogenes’ Cas9 that recognize orthogonal PAM sequences as well206.
Off-target events are a major concern for all programmable nucleases209–211. Bioinformatic analysis can identify sites most likely to suffer from off-target events212–215. Such reactions can be functionally validated by mismatch analysis (e.g., Surveyor nuclease test)216 or next generation sequencing210,214, thus helping researchers choose the best possible sequence to target for genome engineering. Alternatively, off-target sites can be physically captured using genome-wide, unbiased identification of DSBs enabled by sequencing (i.e., GUIDE-seq)217.
4. Practical Aspects of Genome Engineering
To produce heritable changes, genome engineering must occur during early development, prior to germ cell formation or directly within the germ line precursors. In flies, this can be achieved through three experimental approaches. First, editing tools are physically introduced via embryo transformation by microinjection, requiring outcrossing to ensure germ line transmission and identify unique events. Alternatively, in vivo remobilization allows gene targeting of complicated gene targeting constructs. Finally, in vivo upgrading allows novel material to be incorporated into previously established “docking” sites. The latter two paradigms are performed through genetic crosses conveniently circumventing any physical manipulation.
4.1 Genome Engineering by Embryo Microinjection
The first step for any kind of genome engineering introduced into flies is always embryo microinjection218,219. Under this paradigm, fertilized embryos are injected at a very specific developmental stage, the multinucleated syncytial one cell stage, just before cellularization. This timing maximizes the number of germ cells to be transformed before cellular membranes omit accessibility to the injected material218,219 (Fig. 6A). Injected animals must be outcrossed to identify germ line transmission of the engineered changes to the next generation. This strategy has been successful for many germ line-based manipulations in insects219.
A. Syncytial microinjection performed during early embryonic stages in the fly D. melanogaster. B. In vivo remobilization during a classical gene targeting experiment in the fly D. melanogaster. F, FRT site; LA, left homology arm; RA, right homology arm. C. In vivo upgrading of transposon insertion sites using the InSITE system in the fly D. melanogaster. L, LoxP site.
Microinjection are generally done as “co-injection” or “direct” injections. During “co-injection” microinjection, two components are introduded: template, i.e., oligonucleotide, PCR fragment, circular or linearized plasmid, and catalyst, i.e., recombinase, integrase, or nuclease with or without guide RNA. The catalyst can be provided in trans, as plasmid DNA encoding a promoter driving the catalyst, i.e., ΦC31 integrase220, Cre recombinase66, Flp recombinase99, TALEN140, or Cas9105,162,163,174. Such plasmids are commonly known as “helper” plasmids219. Alternatively, mRNA encoding the catalyst can be injected, i.e., ΦC31 integrase92,98,103,221, Bxb1 integrase90, ZFN124,222, TALEN106,123,141,143,144,175, or RGN157,158,175. Finally, purified protein could be injected, e.g., Cas9166, but this is rarely performed. Co-injections limit catalyst activity over time (i.e., through dilution), which is often advantageous219. Alternative to co-injections, expression of the catalyst from a genomic source, i.e., “direct” injections, can be accomplished by injecting a genome engineering template without a helper plasmid; this requires that the catalyst (enzyme coding sequence) must have been established in the genome first before any genome engineering experiment can be performed97,164,165,167,168,171. So far, “direct” injections have been done for ΦC31 integrase97, ZFN119,122,125, and the Cas9 RGN164,165,167,168,171. While this approach simplifies the procedure tremendously, care must be taken to remove the catalyst-bearing chromosome by genetic outcrossing219.
Catalyst expression using helper plasmids or genomic sources can be driven by a variety of promoters. Constitutive expression can be achieved by using a CMV174, Copia140, or Act5C167 promoter. Temporal control can be achieved by using an inducible promoter, e.g., Hsp70 heat-shock promoter65,105,119,122,125,162,163,223. Alternatively, expression can be restricted spatially or regulated by normal endogenous physiology by using promoters from the germline-specific genes vasa97,164,171,173 and nanos97,165,168, directing expression in the embryonic germ cell precursors (i.e., pole cells), or the cystoblast-specific gene bag of marbles160, directing expression in ovarian germline stem cell progenitors (i.e., cystoblasts). Guide RNAs required for RGN function can be delivered as purified RNA species175,186 or can be expressed from helper plasmids driven by U6 small RNA expressing promoters (i.e., RNA polymerase III promoters directing U6-1, U6-2, U6-3 smRNAs)105,162,163,167,168, with the U6:3 promoter being optimal167. Interestingly, Cas9 and gRNA can be expressed from a single bicistronic helper plasmid, which simplifies injection procedures tremendously162.
4.2 In Vivo Remobilization of Gene Targeting Templates
Facing an inability to directly transform embryos with linearized targeting fragments (i.e., as performed in mouse embryonic stem cells)190, an alternative approach for precision genome engineering in flies was developed192,195. This new strategy uses crossing schemes to remobilize previously introduced transgenesis constructs from one “donor” location to the desired target locus, also known as the “acceptor” location191. In this paradigm, a gene targeting template containing one or two internal MN sites is flanked by two directly-oriented recombinase recognition sites. The targeting fragment is integrated into the genome via either transposase-191, or ΦC31 integrase-mediated transgenesis224,225 and subsequently recircularized out of the genome in the presence of an inducible recombinase (i.e., FLP)192,195. The circularized template is then linearized with an inducible MN (i.e., I-SceI) to generate reactive DSBs and trigger endogenous DNA repair. Placement of the MN recognition site(s) determines whether ends-in192,193 or ends-out195 targeting events will occur. A single cut in the middle of the homology region produces ends-in gene targeting and a tandem duplication of the region of interest192. This duplication can be resolved to a single copy through tandem repeat reduction with a second nuclease (i.e., I-CreI)194. Two cut sites outside the region of homology produce an exchange event through ends-out gene targeting, a much more commonly used experimental scenario160,195 (Fig. 6B). Gene targeting through in vivo remobilization has been used extensively in the D. melanogaster community114,199,200,226 and can be expanded by including ZFN-122 or RGN-mediated cutting at the target locus160. Recently, this method was extensively improved through the inclusion of cystoblast targeted expression of catalysts (i.e., Flp, I-SceI, and Cas9)160.
4.3 In Vivo Upgrading of Large Loci Collections
Due to some difficulties associated with in vivo remobilization-mediated gene targeting, several innovative methods were developed. The first uses FLP recombinase, resulting in remobilization of a transgene from its initial insertion site to a second FRT-containing site227. Successful events are easily identified by phenotypic screening since many transgenes contain the dominant eye color marker white+, which has varied expression in response to position effects219. Because the bidirectional nature of the FLP recombinase reaction may immediately excise the novel integration event, a second innovative method was developed using the unidirectional ΦC31 integrase, Integrase Swappable In vivo Targeting Element (InSITE)220 (Fig. 6C). Using highly efficient in vivo manipulation (i.e., FLP excises the donor and ΦC31 integrates it at the acceptor location), InSITE can simultaneously upgrade large collections of acceptor elements and is particularly attractive for its ability to easily incorporate future genetic tools. More recently, a derivative of InSITE, “Trojan exons”, was established to adapt a large collection of characterized MiMIC insertions towards gene-specific expression of binary activators228–230. “Trojan exons” was subsequently adapted towards endogenous protein tagging231. The InSITE and “Trojan exons” approaches still require a large collection of characterized custom insertion loci to be generated, necessitating a substantial upfront workload229–231.
5. Genome Engineering Paradigms
Precision genome engineering can occur on many levels. Individual genes may be targeted with small modifications such as a single nucleotide exchange, microdeletion, or tag insertion. Larger fragments containing regulatory elements (e.g., enhancers, silencers, and insulators) may be targeted with similar modifications232, or even orthologous sequence exchange, a method to detect evolutionary conservation of gene regulation across species233–236. Manipulation of larger chromosomal segments can be used to study the effect of chromosomal rearrangements237–239. Additionally, precision genome engineering is not restricted to manipulation of endogenous loci. Defined introduction of exogenous material can also be achieved through site-specific transgenesis219. While the bulk of engineering protocols focus on targeted manipulation of the nuclear genome, mitochondrial DNA can also be modified, though current engineering paradigms are fairly limited8–10.
5.1 Localized Genome Engineering
Localized genome engineering is the small scale manipulation of DNA at the “gene” level. Experiments target a single open reading frame or a localized feature within its vicinity (Fig. 7). These modifications include deletions (e.g., full or partial gene deletions)124,157,175,240,241, null or conditional alleles61,242, point mutations (i.e., hypomorphic-, null-, or disease-causing mutations)105,167,175,186,187,243–246, substitutions (i.e., replacing one gene with its paralog, or genomic portion with its cDNA counterpart)240,241,247, or tagged genes (e.g., using peptides tags, fluorescent protein tags, or cellular localization signals)175,229,230,247–251. Reporter alleles can also be generated, allowing endogenous gene expression patterns to be documented directly via a colorimetric marker (e.g., predominantly β-galactosidase activity in mice)252, or indirectly via a binary activation system (e.g., GAL4/UAS, QF/QUAS, and LexA/LexO)220,224,225,228,253–255. Gene-specific alleles have been incorporated using many experimental approaches including microinjection218,219, in vivo remobilization114,200,226, and in vivo upgrading220,228.
5.2 Chromosomal Exchanges and Syntenic Replacements
Occasionally, replacement of an entire locus is desired, i.e., the open reading frame plus its adjacent regulatory elements (Fig. 8A). Such substitutions can be easily achieved in flies using Captured Segment Exchange (CSE)232,256 (Fig. 8B). CSE allows exchange of a large genomic segment located between FLP recombinase and/or ΦC31 integrase recognition sites, with a new sequence that was manipulated in bacterial cells. Given the abundance of FLP and ΦC31 recognition sites throughout the fly genome229,230,257,258, CSE can be used to explore gene functions for most of the fly genome. Although only used to replace one version of the teashirt gene with mutated ones so far, CSE could be used to perform syntenic replacements, i.e., the exchange of presumably functionally orthologous DNA from other organisms233–236. These replacements may provide evidence of evolutionary conservation between species234 or permit functional resurrection of genetic material extracted from preserved tissue samples259,260. Although current syntenic applications are biased toward humanization233,235,236, opportunities exist to perform general “species-zation” within the fly community; 12 Drosophila species have been sequenced and can be used for genetic fragment exchange to demonstrate the presence or absence of evolutionary conservation261,262. The availability of fosmid84,261,263–265 and Bacterial Artificial Chromosome (BAC) libraries85,266–269 covering most of the genomes of these species should facilitate these endeavors.
5.3 Genome Engineering of Chromosomal Rearrangements
The manipulation of large chromosomal sections can be used to generate sizable chromosomal rearrangements, forming deletions, duplications, inversions and translocations. Chromosomal rearrangements can be engineered using site-specific recombination or gene targeting.
A. Deletions
Deletions, also known as deficiencies or excisions, are chromosomal rearrangements in which a substantial chromosome section is missing238. Because a deletion involves loss of genetic material, these rearrangements represent true null alleles for genes within the deleted fragment. Hence, they can be ideal “loss of function” control alleles for other mutations whose phenotypic nature is unclear or ambiguous.
There are two strategies to generate precise deletions in multicellular eukaryotes: site-specific recombination239 and HR270 (Fig. 9A). Site-specific recombination is more widely used and results in the deletion of genetic material located between two recognition sites (e.g., FRT) oriented in a direct fashion. These recognition sites can be located in cis on the same chromosome, or in trans on sister chromatids239. The site-specific FLP/FRT recombination system has been used extensively in flies and is the basis of a deletion collection that provides almost full (~98%) genome-wide coverage238,257,271,272. Currently, all deletions in this collection have been generated by bringing together two independent FRT-containing transposon insertion sites in trans and inducing recombination in vivo via the FLP recombinase. In addition to providing deficiencies for nearly the entire fly genome, this resource is an efficient and inexpensive way to determine the genetic location of unmapped lethal mutations by absence of complementation238.
A. Generation of a deletion using recombination sites located in cis. B. Generation of a reciprocal deletion and duplication using recombination sites located in trans. C. Generation of an inversion using recombination sites located in cis. Such inversions can be similarly generated using shared homology instead. D. Generation of an engineered balancer chromosome using recombination sites located in cis. Reconstitution of 5’- and 3’-parts of a marker results in visualization of the inversion event. E. Generation of a translocation between two non-homologous chromosomes using recombination sites located in trans. Such translocation can be similarly generated using shared homology instead. F. Generation of a translocation between two homologous chromosomes using recombination sites located in trans during mitotic recombination. This experimental paradigm can be further developed to include strategies for clonal marking, i.e., here the Mosaic Analysis with a Repressible Cell Marker (MARCM) is illustrated (Lee and Luo, 1999).
Deletions can also be generated by ends-out gene targeting273 and a derivative of ends-in gene targeting has been used to generate a gene deletion using HR270. Unlike ends-out targeting, ends-in targeting does not leave exogenous sequence at the target locus. Nuclease-based engineering can also generate precise deletions274,275. Such deletions require the generation of two DSBs followed by NHEJ between the two DSBs275, or gene conversion using a homologous template encompassing the intended deletion274. However, although HR was proven in cell culture, these methods have not been adapted to fruit flies yet.
B. Duplications
Chromosomal duplications are a major mechanism of generating new genetic material during molecular evolution276,277. In the laboratory, duplications are ideal experimental reagents that can be used as “rescue” fragments by complementing null or hypomorphic alleles of the same gene86,237.
To date, duplications have only been engineered using site-specific recombinases (Fig. 9B)239,271. Conveniently, they can be recovered as the reciprocal chromosomal rearrangement of deletions generated via site-specific recombination between two target recognition sequences located in trans. Precise duplications have been generated in D. melanogaster using the FLP/FRT recombination system239,271. Similar to deletions, duplications are expected to easily combine with nuclease strategies resulting in recombination sites at target loci in the genome followed by inter-chromosomal recombination.
C. Inversions and Engineered Balancer Chromosomes
An inversion is yet another rearrangement, one in which a chromosome segment is reversed end to end. Inversions occur when a single chromosome undergoes breakage and rearrangement within itself. An inversion does not typically involve loss of genetic information but merely rearranges the linear gene sequence. Hence, inversions only produce phenotypic effects when DNA break points are located within genes or other important genetically encoded features, e.g., disrupting interactions between enhancer and promoter fragments277. Lowered fertility can be observed in individuals with a heterozygous inversion due to formation of abnormal germ cells, which result from recombination events between inverted and non-inverted chromatids.
A specialized inversion event uniquely seen in experimental biology is the balancer chromosome. Balancer chromosomes were first developed using imprecise strategies (i.e., irradiation) in D. melanogaster, and are invaluable genetic reagents278. Balancer chromosomes contain three critical features: (i) inversions that eliminate the propagation of meiotic recombination events, (ii) recessive lethal (or sterile) mutations that affect reproductive viability or fitness of homozygous animals, and (iii) a dominant marker that can be followed from one generation to another in heterozygotes278. These three features allow populations of flies carrying heterozygous mutations to be stably maintained without the need for constant screening278.
Precise inversions can be generated using site-specific recombination systems with recombinase recognition sites located in cis (Fig. 9C). This technique has produced defined inversions in D. melanogaster via FLP/FRT239. Interestingly, a recombinase strategy (i.e., Cre/LoxP) has been used to generate balancer chromosomes in mice279–281 (Fig. 9D). As described above for flies, these balancer chromosomes contain inversions as well as two engineered recessive lethal mutations and a dominant coat color marker. A similar genome engineering paradigm might be used to generate balancer chromosomes for unbalanced regions of the fly genome, or novel versions incorporating innovative features.
Precise inversions can additionally be generated by nuclease strategies but require some degree of homology at the cut sites. Consequently, such rearrangements have thus far only been created using a MN (e.g., I-SceI), which generated DSBs in two cis transposons, both containing an MN target recognition site and sharing homologous sequences65. In the absence of homology, nuclease-induced inversions catalyzed by TALENs and RGNs might be used but will contain indels as a result of NHEJ fusion of DSBs275.
D. Translocations and Mitotic Recombination
Classic chromosome translocations result from sequence exchange between non-homologous chromosomes282,283. Such translocations can be balanced (exchange occurs without loss or gain of information), or unbalanced (exchange results in deleted or duplicated genes). A translocation that occurs between two otherwise separate loci may produce abnormal gene fusions. Such catastrophic fusions are common in certain cancers282.
As with other chromosomal rearrangements, precise translocations have only been generated using site-site-specific recombination239 (Fig. 9E). Precise translocations require recombinase recognition sites that are located in trans, and have been created in D. melanogaster using FLP/FRT239. Nuclease strategies can also be used to generate precise translocations. However, as with inversions, the DSBs necessary for rearrangement have until now only occurred in transposons, exploiting their shared homology to prevent NHEJ and indel formation65. Programmable nucleases have been used to generate precise translocations, but only through NHEJ284–286. Such strategies remain to be explored in flies.
Specialized translocations can also occur between homologous chromosomes as a result of mitotic recombination. If a cell is heterozygous for a given gene mutation, mitotic recombination permits two genetically distinct daughter cells to be created following cell division—one cell will be homozygous wild type and one will be homozygous mutant. This phenomenon has been exploited in D. melanogaster using the FLP/FRT recombinase system in a technique known as mitotic analysis287 (Fig. 9F). In this paradigm, controlled FLP expression induces recombination between centromeric FRT sites on homologous chromosomes. Any genes located distal to the FRT site are exchanged as part of this reaction. Hence, tissue- or temporal-specific recombinase expression results in homozygous mutant cells in an otherwise heterozygous animal. By incorporating sophisticated genetic gadgets, this technology allows differential labeling of both daughter cells and their descendants, which has been extensively illustrated in flies288–292. Currently, genes are located between the available FRT sites and the centromere cannot be interrogated with this technology. Hence, extra effort will be required to generate novel FRT sites that are located as proximal as possible to the centromere, relative to the known genome sequence38.
5.4 Precise Genomic Transgene Addition
Genome engineering also includes the addition of ectopic material to the fly genome219. For precise engineering these sequences must be introduced by site-specific integration to avoid unwanted gene disruption, multi-insertion events, and position effect variegation; a phenomenon in which transgene expression varies due to differences in the surrounding genomic environment219. Site-specific integration is most easily achieved through integrase-based paradigms or nuclease-mediated transgene targeting since both of these methods produce stable integration events219,293. Integrase-based methods rely on the previous introduction of “docking” or “landing” sites into benign genomic locations that contain an appropriate attP recognition sequence allowing subsequent site-specific integration of transgenes containing attB-sites219. Integration can be achieved using “typical” single site-specific transgenesis, which is the more common method involving a single pair of integrase recognition sites97,98,100,102,103,294 (Fig.10A), or by using two pairs of integrase recognition sites, better known as IMCE92,232 (Fig. 10B). Two pairs of orthogonally acting recombination sites may also be used, resulting in RMCE66,99 (Fig. 10C).
A. Site-specific integration using the ΦC31 integrase and a single set of attachment sites. MCS, multiple cloning site. B. Site-specific integration using the ΦC31 integrase and a double set of attachment sites resulting in integrase-mediated cassette exchange (IMCE). Integration can occur bidirectionally (i.e., in both orientations). C. Bidirectional recombinase-mediated cassette exchange (RMCE) using orthogonal LoxP or FRT sites.
Site-specific transgenesis can be done using both small and large plasmids. Small plasmids typically encompass expression modules (i.e., RNAi or open reading frames encoding endogenous cDNA or fluorescent markers), or small genomic fragments. Such plasmids can be easily generated and modified by in vitro cloning methods83,87–89,97,295,296. Conversely, large plasmids often include extensive genomic fragments84–86,103. Genome-wide transgene coverage for site-specific integration can be generated as genomic DNA libraries using fosmids or BACs as plasmid backbones84,85. The transgene vector size limit of fosmids is ~48 kb, defined by phage packaging constraints84, while the insert limit of BACs is primarily defined by limitations of DNA isolation and stability of the DNA insert in E. coli85,266. Genomic DNA libraries covering most of the genome are available for several fly species and other insects (https://bacpac.chori.org/)84,85,261,263–269. DNA libraries can be upgraded with elements required for site-specific transgenesis by plasmid retrofitting of fosmids and BACs103,264, or they can be generated in a plasmid backbone that is fully transgenesis compatible84,85. Moreover, recombineering technology allows easy modification of fosmid and BAC plasmids, allowing incorporation of deletions, protein tags, gene tags, as well as point mutations15,84,85,297.
Integrase strategies for targeted integration of large transgenes were pioneered in D. melanogaster using the ΦC31 integrase97,98. Interestingly, ΦC31 integrase-mediated site-specific integration can integrate very large fragments (i.e., 146 kb)85,86,103,298, even up to 220 kb (Unpublished data, KJTV), and has been used to generate a close to full BAC TransgeneOmic complement of two entire D. melanogaster chromosomes, the X chromosome86, and the 4th “dot” chromosome (Unpublished data, KJTV)299.
In contrast to integrase strategies, nuclease assisted site-specific integration is only in its infancy293. It would be attractive to combine both technologies by introducing attP sites in the genome using nuclease strategies followed by integration of very large DNA transgenes using integrase-directed site-specific integration.
5.5 Combining Gene Targeting and Transgene Addition
Due to the significant labor needed for successful gene targeting, several methods have been developed that combine an initial labor intensive gene targeting stage with a second relatively easy site-specific integration step. These methods have been primarily developed in D. melanogaster200,242,300–303 and include Site-specific Integrase-mediated Repeated Targeting (SIRT)301, Integrase-Mediated Approach for Gene Knockout (IMAGO)242, RMCE303, Genomic Engineering (GE)200, and In Situ Integration for Repeated Targeting (InSIRT)302.
SIRT was developed to facilitate downstream modification following ends-in gene targeting301 (Fig.11A). In this method ends-in targeting introduces an attP site for ΦC31, which is then used to integrate novel engineered alleles contained within an attB plasmid, allowing limitless manipulation of the target locus for structure/function analysis301. A derivative of the method was used to perform long-range targeted manipulation by inserting a novel attP site 70 kb from the initial attP site, making 140 kb of the genome surrounding any existing attP site available for manipulation304. SIRT and its derivatives are fairly labor intensive. Similar methods were developed to generate novel engineered alleles after ends-out gene targeting. One such method, GE or InSIRT, is based on regular ΦC31-mediated transgenesis followed by Cre reduction200,300,302 (Fig. 11B). These methods are less labor intensive than SIRT but still require substantial work. Alternatively, IMAGO, likely the easiest method, is founded on ΦC31-catalyzed IMCE242,303,305 (Fig. 11C). A derivative of IMAGO uses existing MiMIC insertions to integrate targeting donors directly at the acceptor locus. After donor release with a MN (I-SceI and/or I-CreI), homologous recombination occurs at both sides of the transposon insertion site resulting in a targeted allele and cleanup of the original MiMIC transposon insertion251. Integrase-based engineering has also expanded to include the Bxb1 enzyme for subsequent genome manipulations90. These combined gene targeting/recombinase/integrase strategies have allowed the creation of knock-in90,242,300 and conditional knockout alleles61,242, small deletions200,302, point mutations200,300, deletions251, and insertions200,248,251,300. Each of these strategies can easily take advantage of the current nuclease revolution. Nuclease strategies can be used to integrate attP sites using gene targeting, followed by targeted insertion of modified transgenic constructs61,176,306.
A. Site-specific Integrase-mediated Repeated Targeting (SIRT) in the fly D. melanogaster. MA, middle homology arm. B. Genomic engineering (GE)/In situ integration for repeated targeting (InSIRT) in D. melanogaster. C. Integrase-mediated cassette exchange (IMCE)/Integrase-mediated approach for gene knockout (IMAGO) used in D. melanogaster.
5.6 Organelle Engineering
A specialized form of precision genome engineering in animal models involves the modification of mitochondrial DNA. Mitochondrial genomes are often less than 20kb in length, yet their mutations can contribute to cellular dysfunction and disease307. Until recently high copy number, random segregation, and heteroplasmy (i.e., variation in the DNA sequence of different mitochondria in the same cell or organism) confounded genetic analysis of these genomes. Moreover, experiments involving mitochondrial DNA editing are constrained due to two technical challenges: the lack of DNA repair pathways in mitochondria that are naturally present in the nucleus (i.e., NHEJ and HR), and difficulties of delivering repair templates to that organelle. Nonetheless, researchers have been able to manipulate organellar DNA by targeting Type II restriction enzymes (e.g., XhoI) specifically to the mitochondria through addition of mitochondrial signal sequences10. Similarly, programmable nucleases (e.g., TALENs) have been targeted to mitochondria as well308, but this strategy has yet to be applied in vivo. However, due to the absence of repair pathways, targeted cutting results in depletion and not repair of the targeted unwanted genome10,308. Additionally, recombinase- and integrase-based manipulations also have yet to be developed for the mitochondrial genome, although they have been extensively applied to chloroplasts, a plant-specific organelle that also has its own DNA309. Hence, numerous opportunities exist to devise ingenious ways to manipulate the mitochondrial genome.
6. Future Directions
Detailed genomic sequence annotation is available for most classical model systems (e.g., D. melanogaster)37,38. Using a plethora of in vitro and in vivo phenotypic screening paradigms, establishing the functional relevance of each of their genes is now within reach. Moreover, genomic sequences of many other insects are currently available310–313, and the speed with which additional species are sequenced increases daily. Thus, the ability to design and conduct experiments that interrogate the genome at virtually any level in any organism is on the horizon. Indeed, any sequenced species can now be repurposed as a genetically-modifiable model system314,315. Researchers can therefore select the organism best-suited to study their particular biological question of interest rather than relying on more traditional models. Nonetheless, the availability of well-established model organisms (e.g., D. melanogaster) allows sophisticated genome engineering paradigms to be established that then can be easily extrapolated to more exotic choices.
For classic model organisms, extrapolation of a given method can quickly result in substantial resource collections that are generally geared toward one application. Even when upgrading capabilities are incorporated into the “master” allele collection, the location of the insertion site is defined by experimental system (e.g., transposon jumping)220,230, or experimenter’s choice (e.g., Cas9 guided insertion)61,176,228, and therefore not customizable unless a new integration event is generated. Although such collections may rapidly advance scientific progress by allowing genome-wide interrogation for specific biological phenomena, these genome-wide reagent collections must be easily accessible to every lab. In reality such access is often problematic. The maintenance of, and accessibility to, such collections, especially genome-wide in vivo libraries, is nearly impossible. Moreover, generation of novel tools is currently so fast paced that by the time a genome-wide in vivo collection has been generated, the tool has already been improved, replaced or even challenged316,317, even when “upgrading” capabilities are included220,230. Hence, current trends in technology evolution should perhaps shift toward increasing the efficiency of rapid and easy dissemination of highly specialized methods using detailed protocol papers (e.g., Nature Protocols, www.nature.com/nprot/), video articles (e.g., Journal of Visualized Experiments, http://www.jove.com/), specialized knowledge distribution networks (e.g., Insect Genetic Technologies Research Coordination Network, http://igtrcn.org/), or other means. Access to specialized methodology using the least expensive hardware would facilitate the generation of precisely customized alleles in any research lab, enabling highly specific experimental paradigms. Consequently, the scientific community would shift away from large in vivo genome-wide libraries that are generated for a custom purpose. Public repositories could then focus on maintaining and distributing a large variety of novel highly flexible reagent tools rather than the very few, often highly customized, in vivo collections that currently dominate such repositories.
7. Conclusions
We have summarized the most common tools and methods used for genome engineering in fruit flies. Recombinases and integrases allow researchers to generate virtually any chromosomal rearrangement. Currently numerous recombinases and integrases allow ever more sophisticated experimental paradigms to be designed. Although recombination recognition sequences are traditionally introduced by random transposition or laborious gene targeting paradigms, they can now be placed virtually anywhere using easily programmable nucleases. Moreover, RGNs allow researchers to incorporate multiple genome changes at once, i.e., in a multiplex fashion. Finally, nucleases also allow researchers to generate precisely engineered alleles through gene targeting using constructs with homology arms, PCR fragments, or even oligonucleotides. Since many of the paradigms discussed in this overview are still in the early stages of development, most if not all will be improved over time. We are witnessing a technological revolution at several levels that will not only facilitate biological research in fruit flies, but can also be extrapolated to other existing and emerging insect model organisms. Perhaps within the near future, we will witness the experimental manipulation of entire chromosomes and genomes of a multicellular eukaryotic organism. Let’s keep our fingers crossed this will be the fruit fly Drosophila melanogaster.
Acknowledgements
We apologize to those whose work we did not cite due to focus and space limitations. The Venken team’s research is supported by startup funds kindly provided by Baylor College of Medicine, the Albert and Margaret Alkek Foundation, and the McNair Medical Institute, as well as grants from the March of Dimes Foundation (#1-FY14-315), the Cancer Prevention and Research Institute of Texas (R1313), and the National Institutes of Health (1R21HG006726, 1R21GM110190, and R01GM109938).
Abbreviations
| attB | ATTachment site used by Bacteria |
| attL | ATTachment site at the Left |
| attP | ATTachment site used by Phage |
| attR | ATTachment site at the Right |
| BAC | Bacterial Artificial Chromosome |
| Cas | CRISPR-ASsociated proteins |
| Cre | Causes REcombination |
| CRISPR | Clustered Regularly Interspaced Short Palindromic Repeats |
| DSB | Double Stranded Break |
| FLP | FLiPpase |
| FRT | FLP recombinase Recognition Target |
| GE | Genomic Engineering |
| HR | Homologous Recombination |
| IMAGO | Integrase-Mediated Approach for Gene KnockOut |
| IMCE | Integrase-Mediated Cassette Exchange |
| InSIRT | IN Situ Integration for Repeated Targeting |
| InSITE | Integrase Swappable In vivo Targeting Element |
| LoxP | LOcus of crossing (X) over, P1 |
| MARCM | Mosaic Analysis with a Repressible Cell Marker |
| MN | MegaNuclease |
| NHEJ | Non-Homologous End Joining |
| PAM | Protospacer Adjacent Motif |
| RGN | RNA-Guided Nuclease |
| RMCE | Recombinase-Mediated Cassette Exchange |
| RVD | Repeat Variable Diresidue |
| SIRT | Site-specific Integrase-mediated Repeated Targeting |
| TALE | Transcription Activator-Like Effector |
| TALEN | Transcription Activator-Like Effector Nuclease |
| ZF | Zinc Finger |
| ZFN | Zinc Finger Nuclease |











