• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Mar 19, 2002; 99(6): 3701–3705.
Published online Feb 19, 2002. doi:  10.1073/pnas.042700299
PMCID: PMC122587
From the Cover

A spliceosomal intron in Giardia lamblia


Short introns occur in numerous protist lineages, but there are no reports of intervening sequences in the protists Giardia lamblia and Trichomonas vaginalis, which may represent the deepest known branches in the eukaryotic line of descent. We have discovered a 35-bp spliceosomal intron in a gene encoding a putative [2Fe-2S] ferredoxin of G. lamblia. The Giardia intron contains a canonical splice site at its 3′ end (AG), a noncanonical splice site at its 5′ end (CT), and a branch point sequence that fits the yeast consensus sequence of TACTAAC except for the first nucleotide (AACTAAC). We have also identified several G. lamblia genes with spliceosomal peptides, including homologues of eukaryote-specific spliceosomal peptides (Prp8 and Prp11), several DExH-box RNA-helicases that have homologues in eubacteria, but serve essential functions in the splicing of introns in eukaryotes, and 11 predicted archaebacteria-like Sm and like-Sm core peptides, which coat small nuclear RNAs. Phylogenetic analyses show the Giardia Sm core peptides are the products of multiple, ancestral gene duplications followed by divergence, but they retain strong similarity to Sm and like-Sm peptides of other eukaryotes. Although we have documented only a single intron in Giardia, it likely has other introns and fully functional, spliceosomal machinery. If introns were added during eukaryotic evolution (the introns-late hypothesis), then these results push back the date of this event before the branching of G. lamblia.

The origin of eukaryotic introns has been debated since the discovery of spliceosomal introns nearly 25 years ago (1). The introns-early hypothesis suggests protein-encoding genes developed in the last universal common ancestor (LUCA) through the coalescence of minigenes, which are now represented by exons (exon theory of genes) (2, 3). Introns were used to shuffle small exons together to make more complex genes. In support of the introns-early hypothesis, some intron/exon borders statistically correlate with the ends of secondary structural elements of proteins (e.g., α-helices and β-sheets) (4). The introns-late hypothesis suggests introns were absent from LUCA, but arose in eukaryotes when spliceosomal enzymes inserted DNA sequences into continuous protein-coding regions (5, 6). In support of the introns-late hypothesis, alignments of homologous genes reveal a greater diversity of intron positions than might be possible in a single ancestral gene, and most introns appear to be phylogenetically restricted and therefore recent (79). Protists that we hypothesize to represent the deepest branches in eukaryotic phylogenies, e.g., Giardia lamblia and Trichomonas vaginalis, appear to have no introns, whereas moderately deep branches, e.g., Entamoeba histolytica and Euglena gracilis, have few introns (1014).

Spliceosomes, which remove introns from pre-mRNAs in the nucleus, are small nuclear RNA-protein complexes (snRNPs) composed of small nuclear RNAs (snRNAs) and nearly 100 associated peptides that are conserved across evolution (1, 1518). Seven Sm core peptides form hetero-oligomeric rings around snRNAs, which resemble the homo-oligomeric rings around archaeal RNAs (1921). In addition, some Sm core peptides (SmB, SmD1, and SmD3) have a C-terminal tail composed of basic residues, which appear to be important for binding the snRNAs and for reimportation of the snRNPs into the nucleus (22, 23). Other spliceosomal peptides, which are named Prp for conditional yeast mutants affecting precursor RNA processing, have eubacterial homologues (e.g., DExH-box RNA helicases) (24, 25). Single DExH-box RNA-dependent helicases (e.g., Prp2p, Prp16p, Prp22p, and Prp43p) are involved in catalyzing the splicing reaction or in releasing spliced products. Brr2p, which is a fusion protein composed of two DExH-box RNA helicases, is involved in the unwinding of U4/U6 (26). Eukaryote-specific spliceosomal peptides include Prp8p, which has been used to argue for the presence of spliceosomal introns in T. vaginalis (27, 28).

In this study we identified a 35-bp intron in a gene encoding a putative [2Fe-2S] ferredoxin of G. lamblia, which is a deep-branching eukaryote that causes diarrhea (13, 14, 2931). We also identified G. lamblia genes encoding spliceosomal peptides, which are eukaryote-specific (Prp8 and Prp11) or have homologues in eubacteria (DExH-box RNA helicases) and archaea [Sm and like-Sm (Lsm) core peptides] (1628, 32, 33).

Materials and Methods

Identification of the Intron in the [2Fe-2S] Ferredoxin Gene of G. lamblia.

The trichomonad hydrogenosomal ferredoxin used as a query in a tblastn search identified a putative [2Fe-2S] ferredoxin gene from >54,000 shotgun sequences in the G. lamblia genome project single-pass read database (14, 30, 31, 34). Because this G. lamblia gene had an in-frame stop codon, primers were designed to test the possibility that an intron was removed from mRNAs. The sense primer was 5′-GAGACTCCATGTCTCTAC-3′, and the anti-sense primer was 5′-CGCTTGCAAGAATGTCAC-3′. PCR was performed from genomic DNA, whereas reverse transcription (RT)-PCR was performed from mRNAs extracted with CsCl. RT-PCR and PCR products with [2Fe-2S] ferredoxin primers were purified from agarose gels, cloned into TA vectors, and sequenced by dideoxynucleotide methods. The predicted peptide from the spliced product was compared with protein sequences in GenBank by using blastp and compared with the Conserved Domain Database at GenBank by using rps-blast (34).

Identification of G. lamblia and E. histolytica Genes Encoding Putative Spliceosomal Peptides.

G. lamblia shotgun sequences deposited in the High Throughput Genomic Sequences database at the National Center for Biotechnology Information (NCBI) were searched by using tblastn and Saccharomyces cerevisiae spliceosomal peptides (Brr2p, Prp2p, Prp8p, Prp11p, Prp16p, Prp22p, Prp43p, SmB, SmD1, SmD2, SmD3, SmE, SmF, SmG, and Lsm1 to Lsm8). These query sequences were obtained through the NCBI's Entrez server (15–28, 3234). Shotgun sequences were assembled into contigs by using blastn (34). As a control, E. histolytica shotgun genomic DNA sequences deposited in the Genome Survey Sequences database at the NCBI were searched by using the same set of S. cerevisiae spliceosomal peptides (10, 11, 34, 35).

Phylogenetic Analyses of Sm Core Peptides.

The predicted G. lamblia and E. histolytica Sm core peptides were aligned to other eukaryotic Sm core peptides by using clustalw, and regions of ambiguous alignment were removed by using the seqlab program (Genetics Computer Group, Madison, WI) (36). Phylogenetic relationships were inferred by using distance and parsimony methods by the computer programs tree-puzzle and the phylip package (37, 38). Pairwise distances were computed by using tree-puzzle under the Dayhoff model, with the inclusion of estimated amino acid frequencies, estimated proportion of invariant sites, and estimation of among-site variation for the remaining sites according to a gamma distribution (Γ) (eight gamma rate categories) (39). The optimal tree was inferred by using the Fitch–Margoliash algorithm with global rearrangements and 100 random-addition replicates (40). Bootstrap values were obtained by using 100 resampled datasets with puzzleboot (www.tree-puzzle.de/#puzzleboot) and phylip's protpars program for parsimony (37).

Results and Discussion

The G. lamblia [2Fe-2S] Ferredoxin Gene Contains a 35-bp Intron.

We discovered the first example of an intron in the G. lamblia genome from blast analyses of >54,000 single-pass shotgun sequences that identified a putative hydrogenosome-like [2Fe-2S] ferredoxin gene (30, 31). Inspection of sequence alignments in the blast report revealed the occurrence of an in-frame stop codon in the Giardia hydrogenosome-like [2Fe-2S] ferredoxin-coding region. We designed primers to test the possibility that the in-frame stop-codon resided within an intron. RT-PCR products from Giardia mRNA, with primers spanning the coding region of the [2Fe-2S] ferredoxin gene, were composed of two bands, the larger of which matched the sequences of the PCR product from G. lamblia genomic DNA and the shotgun sequences (Fig. (Fig.11A). In contrast, the smaller RT-PCR product lacked the 35-bp intron from the [2Fe-2S] ferredoxin mRNA. The Giardia intron contained a canonical splice site at its 3′ end (AG), a noncanonical splice site at its 5′ end (CT), and a branch point sequence that fits the yeast consensus sequence of TACTAAC except for the first nucleotide (AACTAAC) (1). In contrast, there was no polypyrimidine tract upstream of the 3′ AG. The resulting [2Fe-2S] ferredoxin ORF encoded a 133-aa protein, which contained four conserved Cys residues that form the iron–sulfur binding site (Fig. (Fig.11B) (30, 31). No G. lamblia intron has been identified previously (14, 29).

Figure 1
Demonstration of an intron in the G. lamblia [2Fe-2S] ferredoxin gene by RT-PCR and the sequencing of the RT-PCR product. (A) Lane 1, PCR with G. lamblia RNA with no RT reaction (negative control) using primers flanking the [2Fe-2S] ...

The G. lamblia Genome Predicts Spliceosomal Peptides That Are Eukaryote-Specific or Are Homologous to Eubacterial Peptides.

The discovery of an intron in the Giardia hydrogenosome-like [2Fe-2S] ferredoxin prompted a search for G. lamblia genes encoding spliceosomal peptides that work in concert to remove introns from mRNAs (15–33). We identified a putative G. lamblia Prp8p, which is a eukaryote-specific spliceosomal peptide used to argue for the presence of introns in trichomonads (27, 28). The predicted G. lamblia Prp8p showed a 27% identity with the S. cerevisiae Prp8p. The predicted G. lamblia Prp11p, which is another eukaryote-specific spliceosomal peptide, showed a 30% identity with the S. cerevisiae Prp11p (32). Searches of the G. lamblia shotgun sequences identified six DExH-box RNA helicases (the expected number) that are present in spliceosomes and have homologues in ribosomes of eukaryotes and eubacteria (2426). Two predicted G. lamblia Brr2p, which each contain two DExH-box RNA helicases fused head-to-tail with each other, showed ≈30% identity with S. cerevisiae Brr2p (26). Four predicted G. lamblia DExH-box helicases, each of which contained a single helicase domain, showed 25–31% identity with the most similar S. cerevisiae Prp2p, Prp16p, Prp22p, or Prp43p (24, 25). Because the predicted G. lamblia DExH-box RNA helicases were relatively divergent from those of other eukaryotes, it was not possible to use phylogenetic methods to infer the identity of each Giardia helicase. In contrast, the G. lamblia Sm core peptides, which appear to result from extensive duplication and divergence of archaeal genes, retain sufficient phylogenetic signal for inferring evolutionary history.

G. lamblia Sm Spliceosomal Peptides Resemble Those of Other Eukaryotes.

Sm and Lsm peptides form heptameric rings, which surround spliceosomal snRNAs and mRNAs (1923, 33). The shotgun sequences of G. lamblia contained 11 putative Sm or Lsm peptides, whereas the E. histolytica sequences (positive control) contained 10 putative Sm/Lsm peptides (of 14 theoretical peptides total) (10, 11, 14, 32, 35). Six putative G. lamblia Sm peptides (all except SmG or SmE) that are aligned with those of S. cerevisiae in Fig. Fig.22A contained the conserved Sm domain (pfam01423) (1921, 36). In addition, G. lamblia SmB, SmD1, and SmD3 peptides had C-terminal tails with numerous basic amino acids, which are important for binding snRNA and targeting snRNP complexes from the cytosol to the nucleus (22, 23). Phylogenetic analyses, which included Sm peptides of G. lamblia, E. histolytica, S. cerevisiae, Trypanosoma brucei, Drosophila melanogaster, and Homo sapiens, supported our tentative identification of G. lamblia Sm peptides with the exception of a putative Sm peptide that could correspond to either GlSmE or GlSmG. This Sm peptide failed to branch with any particular clade (Fig. (Fig.22B) (3740). Classifications of G. lamblia SmD2, SmD3, and SmF were supported by bootstrap analyses, as were placements of most of the amoebic and trypanosome Sm peptides (20, 37). These results suggest that Sm peptides were reasonably well-developed at the time that Giardia and other protists branched from the main eukaryotic trunk (13, 1921). In the same way, chaperonin peptides, which result from duplications and divergence of heat-shock protein genes called CCT, are well developed in G. lamblia (41).

Figure 2
(A) Alignment of Sm peptides of G. lamblia (Gl) with those of S. cerevisiae (Sc). Green marks where the sequences are strongly similar, whereas red indicates weaker similarity. The basic residues at the C termini of SmB, SmD1 and SmD3 peptides are marked ...

Origin of Introns in Eukaryotes.

The debate concerning the origins of spliceosomal introns has focused on which organisms have introns and what the introns look like (19). The occurrence of introns in Giardia and its basal position in molecular evolution studies—e.g., phylogenetic analyses of rRNA genes (42), translation initiation factors (43), and elongation factors (44)—suggests that spliceosomal machinery evolved very early, possibly in the last common eukaryotic ancestor (59, 13). But alternative interpretations of molecular trees argue that deep-branching protists are merely artifacts of “long branch attraction” between rapidly evolving basal eukaryotic branches and the distantly related archaeal and bacterial outgroups (45, 46). This has led to the “Big Bang hypothesis,” which states that all extant eukaryotes are descendants of a sudden evolutionary radiation that occurred ≈1,000 million years ago (Mya). This interpretation is not without problems. The paleontological record offers evidence of 1,800- to 2,100-Mya eukaryotic microfossils (47, 48) and 2,700-Mya archaean molecular signatures in the form of steranes that are attributed to eukaryotes (49). To explain the disparity between the Big Bang hypothesis and the paleontological record, Philippe et al. (46) hypothesize that only a single early eukaryotic lineage survived extinction events that preceded the ≈1,000-Mya evolutionary radiation of plants, animals, fungi, and all other protists. If this scenario is correct, the thread of life leading to contemporary eukaryotes is incredibly thin and difficult to explain. In molecular phylogenies, long, unbroken basal branches are characteristic of extinction events. This phenomenon is well known for mammals, birds, and angiosperms, but has not been documented for protists. Eukaryotic microbial communities consist of associations of thousands of species separated by large evolutionary distances relative to that of angiosperms, birds, or mammals. Their mutual extinction would require global events encompassing an enormous variety of the earth's environments and ecological diversity. Finally, analyses of protein clocks do not support the Big Bang hypothesis. Comparisons of 57 sets of amino acid sequences suggest a 2,000- to 3,500-Mya divergence between eukaryotes and prokaryotes followed by the ≈1,500-Mya divergence of protists and the ≈1,200-Mya separation of plants from animals plus fungi (50, 51).

Our discovery of an intron in the Giardia putative [2Fe-2S] ferredoxin and the occurrence of proteins involved in spliceosomal machinery cannot resolve which of these theories about eukaryotic evolutionary history is correct. However, it would appear that spliceosomal machinery was present in the last common ancestor to extant eukaryotes. To date, only a single intron has been identified and demonstrated from a G. lamblia shotgun sequence database that includes ≈5,800 ORFs (G. Olsen, H.G.M., and M.L.S., unpublished data). Because it is unlikely that Giardia has carried all of the spliceosomal peptide machinery required for processing a single intron, we expect to find other examples of introns in the G. lamblia genome. Although fragmentary and preliminary, our results suggest that the spliceosomal machinery is conserved across eukaryotes and may, in some cases (e.g., Sm core peptides), be used to study the evolution of introns (15–28). Because spliceosomal peptides are encoded by genes that derive from eubacteria and archaebacteria and by genes that are eukaryote-specific, it is unlikely that all of these genes were present in prokaryotic ancestors and subsequently lost, as suggested by the introns-early hypothesis (2, 3).


This work was supported by National Institutes of Health Grants AI33492 (to J.S.), AI43273 (to M.L.S.), and AI46516 (to B.J.L.). Additional support was provided by the G. Unger Vetlesen Foundation and LI-COR Biotechnology.


small nuclear RNA
small nuclear ribonucleoprotein
reverse transcription


Data deposition: The sequence of the G. lamblia [2Fe-2S] ferredoxin has been deposited in the GenBank database (accession no. AF393829).

See commentary on page 3359.


1. Sharp P A. Cell. 1994;77:805–815. [PubMed]
2. Darnell J E, Doolittle W F. Proc Natl Acad Sci USA. 1986;83:1271–1275. [PMC free article] [PubMed]
3. Gilbert W, de Souza S J, Long M. Proc Natl Acad Sci USA. 1997;94:7698–7703. [PMC free article] [PubMed]
4. de Souza S J, Long M, Klein R L, Roy S, Lin S, Gilbert W. Proc Natl Acad Sci USA. 1998;95:5094–5099. [PMC free article] [PubMed]
5. Cavalier-Smith T. Trends Genet. 1991;7:145–148. [PubMed]
6. Palmer J D, Logsdon J M. Curr Opin Genet Dev. 1991;1:470–477. [PubMed]
7. Cho G, Doolittle R F. J Mol Evol. 1997;44:573–584. [PubMed]
8. Stoltzfus A, Logsdon J M, Jr, Palmer J D, Doolittle W F. Proc Natl Acad Sci USA. 1997;94:10739–10744. [PMC free article] [PubMed]
9. Tarrio R, Rodriguez-Trelles F, Ayala F J. Proc Natl Acad Sci USA. 1998;95:1658–1662. [PMC free article] [PubMed]
10. Lohia A, Samuelson J. Gene. 1993;127:203–207. [PubMed]
11. Willhoeft U, Campos-Gongora E, Touzni S, Bruchhaus I, Tannich E. Protist. 2001;152:149–156. [PubMed]
12. Breckenridge D G, Watanabe Y, Greenwood S J, Gray M W, Schnare M N. Proc Natl Acad Sci USA. 1999;96:852–856. [PMC free article] [PubMed]
13. Sogin M L, Silberman S D. Int J Parasitol. 1998;28:11–20. [PubMed]
14. McArthur A G, Morrison H G, Nixon J E J, Passamaneck N Q E, Kim U, Hinkle G, Crocker M K, Holder M E, Farr R, Reich C I, et al. FEMS Microbiol Lett. 2000;189:271–273. [PubMed]
15. Staley J P, Guthrie C. Cell. 1998;92:315–326. [PubMed]
16. Kambach C, Walke S, Nagai K. Curr Opin Struct Biol. 1999;9:222–230. [PubMed]
17. Mount S M, Salz H K. J Cell Biol. 2000;150:F37–F43. [PMC free article] [PubMed]
18. Kaufer N F, Potashkin J. Nucleic Acids Res. 2000;28:3003–3010. [PMC free article] [PubMed]
19. Mura C, Cascio D, Sawaya M R, Eisenberg D S. Proc Natl Acad Sci USA. 2001;98:5532–5537. [PMC free article] [PubMed]
20. Palfi Z, Lucke S, Lahm H-W, Lane W S, Kruft V, Bragado-Nilsson E, Seraphin B, Bindereif A. Proc Natl Acad Sci USA. 2000;97:8967–8972. [PMC free article] [PubMed]
21. Salgado-Garrido J, Bragado-Nilsson E, Kandels-Lewis S, Seraphin B. EMBO J. 1999;18:3451–3462. [PMC free article] [PubMed]
22. Bordonne R. Mol Cell Biol. 2000;20:7943–7954. [PMC free article] [PubMed]
23. Zhang D, Abovich N, Rosbash M. Mol Cell. 2001;7:319–329. [PubMed]
24. de la Cruz J, Kressler D, Linder P. Trends Biol Sci. 1999;24:192–198. [PubMed]
25. Luking A, Stahl U, Schmidt U. Crit Rev Biochem Mol Biol. 1998;33:259–296. [PubMed]
26. Van Nues R W, Beggs J D. Genetics. 2001;157:1451–1467. [PMC free article] [PubMed]
27. Achsel T, Ahrens K, Brahms H, Teigelkamp S, Luhrmann R. Mol Cell Biol. 1998;18:6756–6766. [PMC free article] [PubMed]
28. Fast N M, Doolittle W F. Mol Biochem Parasitol. 1999;99:275–278. [PubMed]
29. Adam R D. Clin Microbiol Rev. 2001;14:447–475. [PMC free article] [PubMed]
30. Cammark R. Adv Inorg Chem. 1992;38:281–322.
31. Johnson P J, d'Oliveira C E, Gorrell T E, Müller M. Proc Natl Acad Sci USA. 1990;87:6097–6101. [PMC free article] [PubMed]
32. Wiest D K, O'Day C L, Abelson J. J Biol Chem. 1996;271:33268–33276. [PubMed]
33. He W, Parker R. Curr Opin Cell Biol. 2000;12:346–350. [PubMed]
34. Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
35. Bhattacharya A, Satish S, Bagchi A, Bhattacharya S. Int J Parasitol. 2000;30:401–410. [PubMed]
36. Thompson J D, Higgins D G, Gibson T J. Nucleic Acids Res. 1994;22:4673–4680. [PMC free article] [PubMed]
37. Felsenstein J. Cladistics. 1989;5:164–166.
38. Strimmer K, von Haeseler A. Mol Biol Evol. 1996;13:964–969.
39. Dayhoff M O, Schwartz R M, Orcutt B C. In: Atlas of Protein Sequence and Structure. Dayhoff M O, editor. Vol. 5. Silver Spring, MD: Natl. Biomed. Res. Found.; 1978. , Suppl. 3, pp. 345–352.
40. Fitch W M, Margoliash E. Science. 1967;155:279–284. [PubMed]
41. Archibald J M, Logsdon J M, Jr, Doolittle W F. Mol Biol Evol. 2000;17:1456–1466. [PubMed]
42. Sogin M L, Gunderson J H, Elwood H J, Alonso R A, Peattie D A. Science. 1989;243:75–77. [PubMed]
43. Keeling P J, Fast N M, McFadden G I. J Mol Evol. 1998;47:649–655. [PubMed]
44. Hashimoto T, Nakamura Y, Kamaishi T, Hasegawa M. Arch Protistenkd. 1997;148:287–295.
45. Philippe H, Germot A. Mol Biol Evol. 2000;17:830–834. [PubMed]
46. Philippe H, Germot A, Moreira D. Curr Opin Genet Dev. 2000;10:596–601. [PubMed]
47. Knoll A H. Science. 1992;256:622–627. [PubMed]
48. Han T, Runnegar B. Science. 1992;257:232–235. [PubMed]
49. Brocks J J, Logan G A, Buick R, Summons R E. Science. 1999;285:1033–1036. [PubMed]
50. Feng D, Cho G, Doolittle R F. Proc Natl Acad Sci USA. 1997;94:13028–13033. [PMC free article] [PubMed]
51. Gu X. Mol Biol Evol. 1997;14:861–866. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...