• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Mar 28, 2000; 97(7): 3718–3723.
Plant Biology

Random GFP::cDNA fusions enable visualization of subcellular structures in cells of Arabidopsis at a high frequency


We describe a general approach for identifying components of subcellular structures in a multicellular organism by exploiting the ability to generate thousands of independent transformants in Arabidopsis thaliana. A library of Arabidopsis cDNAs was constructed so that the cDNAs were inserted at the 3′ end of the green fluorescent protein (GFP) coding sequence. The library was introduced en masse into Arabidopsis by Agrobacterium-mediated transformation. Fluorescence imaging of 5,700 transgenic plants indicated that ≈2% of lines expressed a fusion protein with a different subcellular distribution than that of soluble GFP. About half of the markers identified were targeted to peroxisomes or other subcellular destinations by non-native coding sequence (i.e., out-of-frame cDNAs). This observation suggests that some targeting signals are of sufficiently low information content that they can be generated frequently by chance. The potential of the approach for identifying markers with unique dynamic processes is demonstrated by the identification of a GFP fusion protein that displays a cell-cycle regulated change in subcellular distribution. Our results indicate that screening GFP-fusion protein libraries is a useful approach for identifying and visualizing components of subcellular structures and their associated dynamics in higher plant cells.

Much of our current understanding of cellular structure has been derived from the use of methods that create static images or that obscure structure in individual cells, such as the analysis of fixed tissue specimens or fractionated cellular constituents. More representative kinds of information that allow cellular structures and dynamics to be observed in their native states can be obtained from observations of living cells. The use of green fluorescent protein (GFP) facilitates the construction of cytological markers for live cell biological studies as chimeric proteins composed of GFP and a protein of interest (1, 2). This approach can be highly successful but typically requires prior knowledge of protein localization or targeting signals (3).

We have explored a high-throughput approach to examining subcellular organization by producing large numbers of transgenic plants that express random GFP::cDNA fusions and observing them by using fluorescence microscopy. In principle, this strategy should facilitate the identification of large numbers of targeted GFP fusion proteins that can be used for the direct identification of novel features of subcellular organization. The ease with which Arabidopsis can be transformed by Agrobacterium facilitated the production of large numbers of independent transgenic lines. In addition, Arabidopsis has large cells that are readily amenable to microscopic observation of subcellular structure. To assess the utility of the approach, we have analyzed 5,700 transgenic plants to determine the frequency with which useful fusions can be isolated, the spectrum of structures marked, and the identity of the fusion protein sequences isolated.

The approach we have taken is similar, in principle, to a GFP marking strategy performed by using the yeast Saccharomyces pombe (4). In that study, random GFP::genomic DNA fusions were prescreened for inducible cytotoxicity and then were examined microscopically for interesting localization features. In addition, during the course of this work, a similar study reported the use of GFP::cDNA fusion libraries to mark subcellular structure in mammalian tissue culture cells (5).

Our approach is complementary to these studies in that the use of a multicellular organism permits the identification of fusion proteins that exhibit differential localization in different cells, intracellular structures, or other localization dependent on multicellularity. Also, by exploiting the extensive genome sequence information for Arabidopsis, we have observed that many out-of-frame fusions result in GFP localization to various subcellular structures because of the low information content of some targeting signals. Overall, our results indicate that random GFP::cDNA fusion libraries can be used to efficiently mark a wide variety of subcellular structures and dynamic processes in plant cells. In addition to facilitating marker isolation, the approach is effective for exploring the existence of novel subcellular structures, domains, and dynamic processes in living cells.

Materials and Methods

Vector Construction.

pEGAD was constructed in three major steps: (i) A binary Ti plasmid with the basta herbicide resistance locus was constructed by using a pBI121 derivative, pBIMC (Deanne Falcone, University of Kentucky), as a starting point. A SphI fragment containing the NPTII gene and the T-DNA right border was removed from pBIMC and was replaced with a PCR fragment containing the T-DNA right border and several unique cloning sites. A Basta herbicide resistance gene was isolated from pDHB321.1 (David Bouchez, Institut National de la Recherche Agronomique) as an EcoRI/ClaI fragment, was blunted with the Klenow fragment of DNA polymerase I, and was inserted into the nascent pEGAD precursor at a Klenowed EcoRI site adjacent to the T-DNA left border. The remaining EcoRI and HindIII sites in this derivative were destroyed, yielding pBasta. (ii) A DNA fragment containing mulitcloning sites (MCS) and an (Ala)10 flexible linker was synthesized with BamHI and BspEI overhangs and was ligated to pEGFP-C1 (CLONETECH) digested with BspEI and BamHI. The (Ala)10-MCS synthetic linker was made by annealing the following synthetic oligonucleotides in vitro [(Ala)10MCS(+) 5′-CCGGAGCTGCGGCCGCTGCCGCTGCGGCAGCGG- CCGAATTCCCCGGGCTCGAGAAGCTTG; (Ala)10MCS(−) 5′-GATCCAAGCTTCTCGAGCCCGGGGAATTCGGCC-GCTGCCGCAGCGGCAGCGGCCGCAGCT]. (iii) The final vector, pEGAD, was constructed by subcloning the EGFP-(Ala)10MCS cassette into pBasta, driving GFP expression from a CaMV 35S promoter present in the parent plasmid pBI121. The deduced sequence of pEGAD was deposited in GenBank (accession no. AF218816). A diagram of the vector is presented in the supplemental data on the PNAS web site, www.pnas.org. The (Ala)10 linker was modeled after that used by Doyle and Botstein (6). The EGFP gene used in these studies contains the chromophore described by Cormack et al. (7), and the codon optimization described by Haas et al. (8) and Chiu et al. (9). The expression plasmid and strains described here are available from the Arabidopsis Biological Resource Center.

cDNA Library Construction.

Two cDNA libraries were made by using poly(A) mRNA isolated from an equal weight mixture of mRNA isolated from 3-day-old etiolated and 3-day-old etiolated, 1-amino-cyclopropane-1-carboxylate (10 μM) treated seedlings and petri plate-grown callus tissue. First strand cDNA was primed by using an equimolar ratio of two phosphorylated primers: (T)15 and (N)6TT using 5 μg of mRNA with Superscript II reverse transcriptase (GIBCO/BRL). Reactions were preformed according to the manufacturer's instructions with the exception that methyl-dCTP (Boehringer Mannheim) was used in place of dCTP. After first strand synthesis, reverse transcriptase was heat-inactivated and second strand cDNA was synthesized by the addition of RNase H, Escherichia coli DNA polymerase I, E. coli DNA ligase, β-NAD, and ammonium sulfate to the first strand reaction. cDNA was ligated to a phosphorylated linker made by annealing the synthetic oligonucleotide 5′ PO4 GCTTGAATTCAAGC. When ligated to the cDNA, this linker generates a unique HindIII site at the 3′ end of the cDNA, enabling directional cloning. Second strand cDNA was digested with EcoRI and HindIII and was gel purified. cDNA fragments 0.3–1.5 kb were ligated to pEGAD digested with EcoRI and HindIII. Ligations were transformed into ultracompetent XL10-Gold E. coli (Stratagene). Each library had a complexity of ≈40,000 clones.

Transformation of Libraries into Arabidopsis.

A modified version of in planta transformation was developed for Arabidopsis shotgun transformation. Agrobacterium tumefaciens (strain GV3101) was transformed with pEGAD cDNA libraries by electroporation, generating sufficient colonies to represent 3-fold library coverage. Bacteria were scraped off selective plates, were washed in infiltration medium, and were resuspended in infiltration medium to a final OD600 of 0.5. This plate amplification step was introduced to avoid misrepresentations of library diversity that could occur from growth in liquid culture. Arabidopsis plants were submerged in Agrobacterium solutions for 2–5 min. Use of plate-grown Agrobacterium for infections gave transformation rates similar to Agrobacterium grown in liquid culture; ≈0.5–4% of T1 plants were transgenic, and, of these, 0.1–1% expressed detectable levels of GFP. Three to five percent of the primary transgenics (T1) in these populations had a visible morphological phenotype, suggesting the populations could be used to identify phenotypes induced by dominant negative fusion proteins, cosuppression, or other mechanisms. Infiltration medium is 1× Murashige and Skoog salts, 5% sucrose, 0.02% Silwet detergent, and 10 μg/liter benzylaminopurine.

Isolation and Microscopic Screening of GFP::cDNA Transgenic Seedlings.

Seed from plants inoculated with Agrobacterium were germinated on agar-solidified media containing Murashige and Skoog salts (MS agar) and were screened for plants expressing GFP by using a Leica (Deerfield, IL) dissecting microscope equipped with a mercury lamp and epifluorescence filter set. GFP+ T1 seedlings were transferred onto microscope coverslips and were imaged on a Nikon inverted fluorescence microscope equipped with a Nikon 60× 1.2 numerical aperture water immersion objective and a Bio-Rad MRC 1024 confocal head. Plants possessing non-wild-type GFP distributions were transferred from coverslips to MS agar Petri plates to rehabilitate from the imaging process. After 1–2 weeks, seedlings were transferred to soil for seed production. Seed isolated from these lines was germinated and retested for marker phenotypes by using confocal microscopy.

Sequence Analysis of GFP::cDNA Fusions.

Primers derived from the pEGAD vector sequence [EGAD(+) GCGCGATCACATGGTCCT, EGAD(−) TCCTCGAGATCAGTTATCTAG] were used to amplify inserts from transgenic plants. Single insert lines produced a single major amplification product. Fusion sequences were determined by sequencing PCR products with a GFP primer [EGAD(seq+) CTCGGCATGGACGAGCTG] adjacent to the cDNA fusion junction and the EGAD(−) primer, which is adjacent to the 3′ cloning junction.

Whole-Mount Immunocytochemistry.

Whole-mount immunocytochemistry was performed as in the work by Boudronk et al. (10), with some modifications. Tissue samples were incubated overnight in a 1:50 dilution of primary antisera raised against cottonseed catalase [provided by Dick Trelease (University of Arizona)], Arabidopsis cofilin [Rose Biochemicals, (www.rosebiotech.com)], or no antibody. The primary antibody was detected by TRITC-conjugated secondary antibody (Vector Laboratories). Confocal images of GFP and Texas Red fluorescence were acquired sequentially, using identical instrument settings for experimental and control samples.

Results and Discussion

Libraries of plants expressing GFP::cDNA fusions were constructed in a two-step process. Initially, cDNA was synthesized from Arabidopsis poly(A)+ mRNA and was ligated directionally into a plant transformation vector (pEGAD) downstream of the gene for an enhanced GFP variant lacking a stop codon; approximately one-third of cDNA fusions to GFP are expected to be translated in their native coding frame using this approach. This cloning strategy allows for the isolation of protein domains and carboxy terminal sequences sufficient to cause GFP relocalization. pEGAD cDNA libraries were transformed en masse into A. tumefaciens, and these Agrobacterium populations were used to infect the Columbia wild type of Arabidopsis. The progeny of infected plants were subsequently screened by fluorescence microscopy to identify transgenic plants expressing GFP.

A pilot screen of 5,700 transgenic seedlings was performed to establish the frequency with which GFP::cDNA fusion proteins were directed to new subcellular locations. The GFP variant used in these studies normally localizes to the cytoplasm and nucleoplasm of Arabidopsis cells (Fig. (Fig.11 A and B). Conventional fluorescence microscopy was used to screen for seedlings differing from this wild-type pattern. When identified, these lines were subsequently imaged by using confocal microscopy for further characterization. Lines with altered GFP localization were rescued and retested for GFP localization patterns in the subsequent generation. More than 50 seedlings could be screened per hour, a high throughput facilitated by the ease with which Arabidopsis transgenics can be isolated and their small size, which allows 30–40 young seedlings to be mounted on a single microscope slide.

Figure 1
Canonical images of marker classes. (A and B) Wild type: Hypocotyl epidermal cells of transgenic seedlings expressing pEGAD GFP; nuclei and cytoplasmic strands are evident. Shown for comparison are a single confocal optical section (A) and a brightest ...

Of the 5,700 lines screened, 120 displayed heritable, non-wild-type GFP distribution patterns, representing a success rate of 1 per 45 plants screened. These markers were grouped into 17 phenotypic classes defining similar patterns of subcellular GFP localization, as visualized by confocal microscopy (Table (Table1).1). Fig. Fig.11 shows canonical images for many of the phenotypic classes identified. Because it is conceivable that different subcellular compartments may be visually indistinguishable at the level of confocal imaging, descriptive aspects of microscopic phenotypes were frequently chosen to name phenotypic classes rather than inferred subcellular localizations. Localization patterns consistent with several major subcellular compartments or domains were identified: plasma membrane (Fig. (Fig.11C), sites of contact between neighboring cells (Fig. (Fig.11D), endoplasmic reticulum (Fig. (Fig.11E), vacuolar membrane (Fig. (Fig.11F), the nucleus (Fig. (Fig.11I), the nucleolus (Fig. (Fig.11J), condensed chromosomes (Fig. (Fig.11K), the peroxisome (Fig. (Fig.22A), and preferential retention in the cytosol (exclusion from the nucleoplasm) (Figs. (Figs.11 H and and44A). In addition, several structures were marked that will require further characterization to establish their identity or authenticity as native subcellular structures (Fig. (Fig.11 G, L, and M).

Table 1
Marker classes identified by random GFP::cDNA screening
Figure 2
Dynamics of torus marker. Sequential images were acquired at 0.75-second intervals from a hypocotyl cell of EGAD line C2, which expresses an out-of-frame fusion protein. The time series shows a Torus structure adopting a tubular morphology (marked by ...
Figure 4
GFP::GF14 shows dynamic nuclear localization. (A) Confocal image of hypocotyl cells expressing GFP::GF14. GFP::GF14 localizes to the nuclei of cells undergoing cytokinesis (yellow arrows) but not to the nuclei of most interphase cells ...

In principle, the screening approach described here may be useful for identifying components of diverse subcellular compartments. Because the full genome sequence of Arabidopsis will soon be available (11), it will be possible to access the complete coding sequence of each fusion protein by simply sequencing the cDNA in the corresponding fusion constructs. However, the utility for this purpose depends on the frequency with which random GFP::cDNA fusions retain localization patterns faithful to their native states. To gain insight into this aspect of the method, we sequenced the cDNAs responsible for the localization patterns in 109 of the 120 marker lines. PCR amplification using primers that flank the pEGAD cDNA cloning site was used to isolate cDNAs from marker lines. Twenty-three lines contained more than a single PCR amplification product and were excluded from this analysis. Fifty-six of the eighty-six cDNAs from single insert transgenic lines were found to have significant homology to characterized Arabidopsis gene products by using blastn and blastx searches of GenBank. This set of 56 cDNAs with homology to characterized genes was used as a dataset to make inferences about the general molecular features of the GFP-fusion protein markers identified.

One generalization that can be made from this analysis is that GFP can be directed to many subcellular locations by fusion to non-native protein sequences that are created by out-of frame translation. This is illustrated by markers in the “Torus” phenotypic class, where 13 of the 14 markers with homology to characterized genes were generated by out-of-frame cDNA fusions to GFP (Table (Table1).1). In the Torus lines, GFP is targeted to a torus-shaped structure ≈1 μm in diameter (Fig. (Fig.2).2). This organelle contains a central region lacking GFP fluorescence and is remarkably dynamic, frequently adopting transient tubular morphologies (Fig. (Fig.2;2; supplemental data). One Torus marker, J5, was created by an in-frame translational fusion to the carboxy terminal 30 amino acids of an Arabidopsis homolog of a Cucumis sativa protein purified from peroxisomes (12) and a related sequence from Brassica napus (GenBank accession no. AJ000886) (Table 4) (11). The carboxy terminal three amino acids of these three proteins and most of the Torus markers resemble the canonical tri-amino acid peroxisomal targeting sequence SKL* (Table (Table2)2) (13). In addition, one of the carboxy terminal sequences identified, SRL*, has been shown to be required for peroxisomal localization of a short-chain dehydrogenase/reductase in mammalian cells (14). Collectively, these observations suggest that the organelle tagged by the Torus class markers is the plant peroxisome. This conclusion is supported by immunolocalization experiments that show that catalase, a peroxisome resident, and GFP colocalize in one of the Torus marker lines (Fig. (Fig.3).3). Because some of the carboxy terminal tripeptide sequences in Table Table22 are highly divergent from proposed peroxisomal targeting sequences (13, 14), it may be that sequences internal to the carboxy terminus may act as peroxisomal targeting sequences (13).

Table 2
Carboxyterminal sequence of Torus markers
Figure 3
Torus structures react with an antibody directed against a peroxisomal protein. Shown are confocal images of whole-mount Arabidopsis seedling tissue probed with anticatalase serum (catalase, upper row) and serum from a control rabbit (serum control, lower ...

The high frequency with which non-native coding sequences direct GFP to this organelle suggest that some of its targeting sequences are degenerate in that they can be generated at a high frequency by chance. Interestingly, several other subcellular destinations were also labeled by non-native protein fusions (Table (Table1),1), suggesting the possibility that minimal targeting sequences for some other cellular destinations may also be of low information content. A larger collection of out-of-frame fusions will be required to explore further the question of chance targeting to nonperoxisomal destinations in Arabidopsis. It was previously reported that, in Saccharomyces cerevisiae, replacement of the signal sequence of invertase with random peptides led to correct targeting in ≈20% of the cases (15). Although our analysis suggests that a high percentage of GFP::cDNA fusion proteins display artifactual localization, they can frequently be recognized by comparing the reading frame of the fusion protein to the reading frame deduced from the results of the Arabidopsis genome sequencing project.

Excluding the Torus class, 29 of 42 markers were caused by in-frame fusion proteins. Where sequence homology was available, the general trend among this class was that in-frame markers displayed localization patterns consistent with published data or suggested by sequence homology (Table (Table3).3). For example, some members of the cell surface group are homologous to water channels (PIPs) purified from plasma membranes (16). Members of the vacuolar membrane phenotypic class are homologous to proteins experimentally localized to vacuolar membranes (1719), and members of the nuclear group are similar to described nuclear proteins (20) (Table (Table3).3). These observations imply that random screening can be used to isolate both markers of structures and components native to those structures. Ultimately, an accurate estimate of the percentage of faithfully localized proteins will require detailed analyses of many additional lines.

Table 3
In-frame fusion proteins identified as markers using random GFP::cDNA fusion screening

Sufficiently large collections of markers generated by this and future screens could be used to help extract protein targeting information by searching for peptide motifs shared by similar markers. The limited collection of markers isolated in this initial study allowed identification of sequence similarity shared by the majority of torus markers, the largest class we recovered. We have also noticed sequence similarities in other marker classes. For example, some of the nucleolar tags are short peptides rich in arginine and lysine (results not presented). The diversity of markers that can be identified with the method explored in this paper should allow similar analyses to be performed for a wide variety of subcellular locations.

Perhaps the greatest utility of the approach described here is its ability to facilitate the identification of novel features of subcellular structure and dynamic processes. Although our collection of markers represents a small percentage of the localization classes one might expect to find by microscopic screening, they have revealed many intriguing aspects of plant cell biology and have been useful for exploring aspects of plant subcellular dynamics. A number of examples illustrating this point are provided as supplemental data. One interesting observation we have made is that plant cells expressing a fusion between GFP and the carboxy terminus of GF14, a 14-3-3 protein, show differential accumulation in the cell nucleus. This pattern of localization is regulated in part by cell cycle state. The fusion protein is excluded from most nuclei; however, it accumulates in nuclei just after completion of nuclear division and again departs from the nuclei shortly before cytokinesis is complete (Fig. (Fig.4;4; D.W.E. and S.R.C., unpublished work). The function of this redistribution in the context of cytokinesis remains to be determined; however, its identification suggests the feasibility of future screens aimed directly at identifying proteins with regulated changes in subcellular distribution. Directed screens for these kinds of markers may be particularly valuable because changes in subcellular distribution are a frequent form of regulation in many signaling events.

Collectively, our results demonstrate that random GFP::cDNA fusions efficiently generate novel in vivo subcellular tags for Arabidopsis. This approach should be applicable to other organisms in which large scale transformation is possible. The spectrum of the method could be enhanced by using normalized cDNA or constructing libraries of amino terminal fusions to GFP, modifications that could allow rare markers or markers requiring amino terminal targeting information to be identified. The expression of random localization tags in living cells may enable screens for proteins on the basis of predefined dynamic properties such as the redistribution of markers in response to signals like wounding, infection, or phytohormones. Kinetic images, technical details, and additional information is available in the supplemental data and at our web site (deepgreen.stanford.edu).

Supplementary Material

Supplemental Data:


We thank Stephan Schmidheiny for the generous gift of the Leica dissecting microscope that made this work possible. We thank Marie-Theres Hauser for advice on immunocytochemistry and Farhah Assaad, Wolfgang Lukow-itz, Joe Ogas, Stewart Gillmor, and Seung Yung Rhee for valuable discussion during the course of this work. We also thank Dick Trelease for the catalase antibody, Deanne Falcone for pBIMC, and David Bouchez for pDHB321.1. S.R.C. was partially supported by a U.S. Department of Energy/National Science Foundation/U.S. Department of Agriculture TRI-Agency Training Grant. This work was supported in part by a grant (DE-FG02-97ER20133) from the U.S. Department of Energy Biological Energy Research Program.


green fluorescent protein
multicloning site
Tris-buffered saline
endoplasmic reticulum


Data deposition: The sequence reported in this paper has been deposited in the GenBank database (accession no. AF218816).


1. Chalfie M, Tu Y, Euskirchen G, Ward W W, Prasher D C. Science. 1994;263:802–805. [PubMed]
2. Tsien R Y. Annu Rev Biochem. 1998;67:509–544. [PubMed]
3. Cubitt A B, Heim R, Adams S R, Boyd A E, Gross L A, Tsien R Y. Trends Biochem Sci. 1995;20:448–455. [PubMed]
4. Sawin K, E, Nurse P. Proc Natl Acad Sci USA. 1996;93:15146–15151. [PMC free article] [PubMed]
5. Rolls M M, Stein P A, Taylor S S, Ha E, McKeon F, Rapoport T A. J Cell Biol. 1999;146:29–43. [PMC free article] [PubMed]
6. Doyle T, Botstein D. Proc Natl Acad Sci USA. 1996;93:3886–3891. [PMC free article] [PubMed]
7. Cormack B P, Valdivia R H, Falkow S. Gene. 1996;173:33–38. [PubMed]
8. Haas J, Park E C, Seed B. Curr Biol. 1996;6:315–324. [PubMed]
9. Chiu W L, Niwa Y, Zeng W, Hirano T, Kobayashi H, Sheen J. Curr Biol. 1996;6:325–330. [PubMed]
10. Boudronk K, Dolan L, Shaw P J. J Cell Sci. 1998;111:3687–3694. [PubMed]
11. Somerville C, Somerville S. Science. 1999;285:380–383. [PubMed]
12. Preisigmuller R, Guhnemannschafer K, Kindl H. J Biol Chem. 1994;269:20475–20481. [PubMed]
13. Subramani S. Annu Rev Cell Biol. 1993;9:445–478. [PubMed]
14. Fransen M, VanVeldhoven P P, Subramani S. Biochem J. 1999;340:561–568. [PMC free article] [PubMed]
15. Kaiser C A, Preuss D, Grisafi P, Botstein D. Science. 1987;235:312–317. [PubMed]
16. Kammerloher W, Fischer U, Piechottka G P, Schaffner A R. Plant J. 1994;6:187–199. [PubMed]
17. Johnson K D, Hofte H, Chrispeels M J. Plant Cell. 1990;2:525–532. [PMC free article] [PubMed]
18. Hofte H, Hubbard L, Reizer J, Ludevid D, Herman E M, Chrispeels M J. Plant Physiol. 1992;99:561–570. [PMC free article] [PubMed]
19. Chrispeels M J, Agre P. Trends Biochem Sci. 1994;19:421–425. [PubMed]
20. Kobayashi K, Kanno S, Smit R, Vanderhorst G T J, Takao M, Yasui A. Nucleic Acids Res. 1998;26:5086–5092. [PMC free article] [PubMed]
21. Haseloff J, Siemering K R, Prasher D C, Hodge S. Proc Natl Acad Sci USA. 1997;94:2122–2127. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...