• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Jun 6, 2000; 97(12): 6619–6624.

Anopheles gambiae pilot gene discovery project: Identification of mosquito innate immunity genes from expressed sequence tags generated from immune-competent cell lines


Together with AIDS and tuberculosis, malaria is at the top of the list of devastating infectious diseases. However, molecular genetic studies of its major vector, Anopheles gambiae, are still quite limited. We have conducted a pilot gene discovery project to accelerate progress in the molecular analysis of vector biology, with emphasis on the mosquito's antimalarial immune defense. A total of 5,925 expressed sequence tags were determined from normalized cDNA libraries derived from immune-responsive hemocyte-like cell lines. The 3,242 expressed sequence tag-containing cDNA clones were grouped into 2,380 clone clusters, potentially representing unique genes. Of these, 1,118 showed similarities to known genes from other organisms, but only 27 were identical to previously known mosquito genes. We identified 38 candidate genes, based on sequence similarity, that may be implicated in immune reactions including antimalarial defense; 19 of these were shown experimentally to be inducible by bacterial challenge, lending support to their proposed involvement in mosquito immunity.

Transmission of the malaria parasite, Plasmodium, involves two complex and obligatory life cycles in the vector mosquito as well as in the human host. Interruption of either cycle would attenuate the spread of the disease. The prospect of control strategies based on transmission blocking in the vector (1, 2) has energized studies on the molecular genetics of Anopheles gambiae. Special attention has been directed toward the main organs with which the parasite interacts during its development in the mosquito, the midgut and the salivary glands (36). The observation that the parasite is destroyed completely in refractory mosquito strains but also sustains substantial losses in fully susceptible mosquitoes (7) has recently drawn attention to the study of the mosquito's innate immune system (815). Immune reactions induced by malaria infection correlate with the life cycle of Plasmodium in the vector mosquito; they have been demonstrated at the molecular level in the midgut and salivary gland epithelia, in hemocytes, and in the fat body, a liver analogue in insects (11).

Difficulties in rearing malaria mosquitoes under laboratory conditions and the limited amount of biological material that can be obtained from mosquito organs are obstacles to the isolation of A. gambiae genes. Gene cloning from this organism started nearly a decade ago and has generated only ≈450 putative coding sequences in the public protein databases. Massive sequencing of cDNAs from source-specific libraries of other organisms has proven to be a powerful approach to gene discovery (16). Putative functions can be proposed for the discovered genes either through homology searches of global databases or by mass expression profiling with the recently developed cDNA microarray technologies (17). A powerful stimulus to the study of parasites has already resulted from genomics projects, including the expressed sequence tag (EST) projects of the worms Brugia malayi and Schistosoma sp. and the genome sequencing projects of the protozoa Plasmodium falciparum and Leishmania major (1821). In a pilot attempt to evaluate the efficiency of mass cDNA sequencing for gene discovery in A. gambiae, ESTs were generated from random clones of normalized cDNA libraries. Our special interest in immunity genes led us to the choice of recently established immune responsive hemocyte-like cell lines, which are known to express high levels of various immune markers including antimicrobial peptides, putative recognition molecules, serine proteases and their inhibitors (serpins), and prophenoloxidases (PPO) (refs. 10, 15, and 22 and A. Danielli, personal communication). A significant number of clone clusters were similar to genes that encode proteins known to operate in invertebrate and vertebrate innate defense mechanisms; half of them were shown experimentally to be immune responsive.


Cell Cultures and Immune Challenge.

The previously described cell lines 4A-3A and 4A-3B were cultured in Schneider (Sigma) medium supplemented with 10% (vol/vol) BSA at 27°C as described (15) and harvested at a confluent growth phase. The 4A-3B cell line was challenged with 10 μg/ml lipopolysaccharide (Sigma) 6 h before harvest.

RNA Extraction and Library Construction.

Extraction of mRNA was performed with the Oligotex Direct mRNA Maxi kit (Qiagen, Chatsworth, CA), and 7 μg of mRNA was used for construction of the cDNA libraries. cDNAs were cloned directionally into a phagemid vector (pT7T3-Pac), and the libraries were normalized as described (23).


The plasmid libraries were electroporated into a DH10-α Escherichia coli strain, and DNA extracted from randomly selected clones was subjected to automated sequencing.

Sequence Analysis.

ESTs were checked for vector sequence contaminants. Sequence analysis against databases was performed with the blastx software (24). The clone sequences were subjected to an all-against-all sequence comparison where clones sharing at least one EST with 97% or greater identity over a 100-bp region were grouped together in the same cluster. The database entry keywords of the homologue sequences were used for the grouping of clone clusters in functional classes with a modified version of the euclid software (25, 26).

Reverse Transcription–PCR Expression Assays.


Results and Discussion

Source Libraries and EST Sequencing.

Two cell lines with overlapping but distinct immune expression profiles, 4A-3A and 4A-3B, were used as the starting material. The 4A-3B cell line expresses high levels of PPO transcripts, and 4A-3A expresses strongly various other immune markers (15). The latter line, 4A-3B, was challenged with lipopolysaccharide, a potent bacterial immune elicitor, for 8 h before mRNA extraction for enrichment of immune gene transcripts (10, 15). Normalized, directionally cloned poly(T)-primed cDNA libraries were constructed as described (23), and randomly selected clones were sequenced. The average insert size was estimated as 1.5 kilobases by PCR amplification of inserts from 100 randomly selected clones.

A total of 3,242 clones were sequenced, mostly from both ends, generating 5,925 ESTs with an average length of 375 bp (range: 22–1,114 bp; 375 × 5,925 = ≈2.2 megabases of total sequence generated; Table Table1).1). The generated ESTs corresponded to 2,380 clone clusters potentially representing individual genes, suggesting that the overall redundancy of the libraries may be only ≈27%. However, failure to detect overlaps between partial cDNA clones cannot be excluded. Normalization of the libraries did suppress the highly abundant messages and consequently increased the number of discovered genes.

Table 1
EST statistics

Sequence Similarities.

blastx analysis comparing all clone clusters against a nonredundant database generated from swissprot and sptrembl (25) revealed that 1,118 clone clusters (47% of the total) are significantly similar (E value < 10−4) to known genes. Of these, 57 clusters showed the highest similarity to other known A. gambiae genes, but only 27 were identical. Thus, the vast majority of the clone clusters can be considered putative, previously unidentified A. gambiae genes. Of the clone clusters that showed significant blastx hits, only 99 (8.9%) showed similarity to known insect genes alone. For 45.7% of the clone clusters, similarities were comparable for both insect and mammalian sequences. For 36.4%, they were highest to mammalian sequences, and for 9%, they were highest to genes from other organisms (Fig. (Fig.11A). The similarities suggested putative functions for 654 clone clusters, which were grouped in nine distinct functional groups (26) as indicated in Fig. Fig.11B.

Figure 1
Distribution of clone clusters in gene classes based on blastx E values (24). Of the 2,380 clone clusters, 1,262 (53%) did not show significant similarity (E < 10−4) to genes in the nonredundant swissprot and sptrembl databases. ...

Potential Immunity Genes.

As many as 38 clone clusters showed significant similarities to classes of known innate immunity genes, reflecting the origin of the cDNAs from hemocyte-like cell lines (15). Blood cells from both vertebrates and invertebrates are known to play key roles in defensive innate immunity mechanisms, such as melanization leading to encapsulation, coagulation and complement cascades, phagocytosis, and production of antimicrobial peptides. These 38 clusters are discussed below in the order of their presentation in Tables Tables224. For each cluster, inference of putative function was based on consideration of several matches with high scores. The encoded gene products were classified into three broad groups, as follows.

Table 2
Putative serine proteases and serpins
Table 4
Other putative immunity proteins

Putative Serine Proteases and Serpins.

Seven clusters (I.1–I.7) encode putative homologues of serine proteases that bear “clip-domains,” a common feature of regulatory immunity proteases (27). Notably, similarities were detected to the A. gambiae immune responsive 14D serine protease (28), to clotting, to coagulation, to complement factors, and to PPO-activating enzymes. One cluster (I.8) is identical to the chitin-binding domain of a multidomain, modular immune responsive serine protease of A. gambiae characterized by others (29, 30). Four clusters (I.9–I.12) encode putative serpins with similarities to those of mammals, insects, and other invertebrates, including a coagulation inhibitor. One cluster (I.10) is identical to a previously isolated mosquito serpin (GenBank accession nos. AJ271352 and AJ271353). Serine proteases and their inhibitors are of interest as potential components of the regulated cascade/amplification reactions of blood clotting, complement, and other immune responses. Examples are the known regulators of PPO and coagulation cascades that have been isolated and characterized from the moths Bombyx mori and Manduca sexta (27, 31).

Putative Adhesive Proteins.

Proteins encoded by 15 clone clusters resemble adhesive proteins, including proteins capable of recognizing and binding to microorganisms. Two of these (II.1 and II.2) resemble lectins, including a rat intracellular mannose binding lectin (32) and an immune-inducible galactose binding lectin of the mosquito (6, 10). Lectins are involved frequently in opsonization and aggregation of microorganisms through their carbohydrate-binding domains (33). Three distinct mosquito homologues of the B. mori GNBP, one of them previously isolated and characterized in A. gambiae as an immune marker (10), are represented by the clusters II.3–II.5. GNBPs show similarities to the β-1,3 glucan-binding region of glucanases and are likely components of the PPO-activation cascade (31), as is PGRP, which is highly similar to cluster II.6. The PPO cascades can be triggered by diverse microbial surface components such as lipopolysaccharide, β-1,3 glucan, and peptidoglycan; the latter moiety is believed to trigger the PPO cascade in B. mori through binding to PGRP (34). The products of clusters II.7 and II.8 resemble domains found, respectively, in hemomucin (35), the putative Drosophila opsonin, and the multidomain Drosophila scavenger receptor C1 (36). Cluster II.9 encodes the previously isolated mosquito CD36 (accession no. Q17012), a homologue of the Drosophila croquemort protein that is involved in phagocytosis of apoptotic cells (37). Cluster II.10 is similar to chitin-binding domains of diverse proteins such as chitinases, mucins, and peritrophins. Finally, clusters II.11–II.15 encode proteins with putative microorganism-binding fibrinogen-like domains. Such domains have been encountered previously in two additional infection-responsive A. gambiae genes (G.D., unpublished material), a crab innate immunity lectin that can agglutinate bacteria and enhance defensin activity, and the vertebrate putative phagocytosis mediators, the ficolins (38, 39).

Other Putative Immune Proteins.

Cluster III.1 encodes a putative mosquito antimicrobial peptide, cecropin (40), different from that characterized by others (22). One cluster, III.2, corresponds to a recently isolated A. gambiae infection-responsive peptide gene of unknown function (accession no. AJ237664). III.3 encodes a member of the complement/α-2-macroglobulin family, other members of which are immune-responsive in Anopheles and Drosophila (M. Lageux, E. Levashina, and L. Moita, personal communication). Clusters III.4–III.6 encode proteins potentially involved in intracellular immune signaling pathways (41) including putative homologues of the Pelle-associated protein Pellino, of the IκB-like Cactus factor, and of an NFκB motif-binding phosphoprotein (42). Clusters III.7 and III.8 correspond to the previously characterized PPO2 and PPO5 genes (15, 43). Finally, clusters III.9–III.11 correspond to components involved in iron metabolism and regulation: IRP (iron regulatory protein) and ferritin, both of which are implicated in immunity (4446).

Infection Responsiveness.

The 38 putative immune-related clone clusters were subjected to an experimental test of their response to immune challenge. Cell lines 4A-3A and 4A-3B were cocultured for 8 h with heat-killed bacteria, and a reverse transcription–PCR assay was used to detect changes in mRNA prevalence. Indeed, transcripts of one previously known (II.3; not shown) and 18 not previously examined clusters were immune induced, and one, with similarity to domains found in the putative Drosophila scavenger receptor homologue, was repressed by exposure to heat-killed bacteria (Fig. (Fig.2).2). It is notable that some members belonging to the same protein family are inducible, and others are not. For example, of the eight putative serine proteases, only five are inducible. Similarly, one of four putative serpins and four of five fibrinogen-like domain proteins are inducible. Differential induction specificities between the two cell lines were noted for some genes, e.g., cluster I.8 is up-regulated by bacterial challenge in cell line 4A-3A but not in 4A-3B. The present number of immune-inducible clone clusters is likely to be an underestimate of the prevalence of immune-related sequences in the collection. Some proteins that are known to be implicated in defense reactions are translationally rather than transcriptionally regulated (e.g., PPO and IRP); they may also be synthesized constitutively and released from the cell or posttranslationally activated on microbial challenge (27). Furthermore, the 1,262 clone clusters that as yet showed no similarity to database entries have not been examined for inducibility.

Figure 2
Reverse transcription–PCR expression assays of putative immunity genes in naïve and bacterially challenged cell lines. Expression levels of the 38 A. gambiae putative immunity genes (Tables (Tables224) were assayed by ...

Concluding Remarks.

This pilot gene discovery project multiplied several fold the number of A. gambiae gene sequences that had been deposited in the public databases during a decade of increasing interest in the molecular genetics of this major malaria vector. The 8 previously identified and the 30 newly discovered putative immunity genes will contribute significantly to the dissection of A. gambiae innate immunity; their potential involvement in the mosquito's antiparasitic defense mechanisms is a matter to be addressed experimentally. The constantly increasing amount of expressed DNA that is sequenced from other organisms will permit identification of more homologues within the generated set of A. gambiae ESTs, thus increasing the number of mosquito genes with putative functions. Such an increase would be particularly useful for the 1,262 clone clusters that did not yield significant blastx hits to date. The normalized cDNA libraries that we have constructed from the cell lines remain a promising source for additional gene discovery through further large-scale EST determination. Similar, normalized libraries could be constructed from whole mosquitoes (or from isolated tissues such as midgut and salivary glands that are involved in the malaria life cycle), permitting discovery of important genes that may not be expressed significantly in the cell lines. Systematic expression analysis of the already available clone clusters with cDNA-microarray technology is expected to reveal additional immune-responsive components, developmentally regulated genes, or genes that may be induced by parasitic infections.

Table 3
Putative adhesive proteins


We thank Ryan Kinkaid, Vladan Miljovic, Greg Doonan, Robert Brown, Christy Smith, Jurgen Zimmermann, Barbara Schwager, Monica Benes, and Holger Erfle for sequencing; Brad Johnson and Keith Crouch for template production; M. Andrade for assistance with analysis of the sequences; A. Danielli, M. Lagueux, E. Levashina, and L. Moita for permission to refer to unpublished material; and H.-M. Müller for the hemocyte-like cell lines. This investigation received financial assistance from the United Nations Development Program/World Bank/World Health Organization Special Program for Research and Training in Tropical Diseases.


expressed sequence tag

Note Added in Proof

Note Added in Proof

The recent criticism of Wang et al. (47) that normalization/subtractive hybridization can lead to systematic loss of rare mRNA sequences bearing long poly(A) tails does not apply to the methods used in this study (23). These have been optimized to reduce the length of poly(A) tails to 26 ± 12 nucleotides as verified by sequencing; hybrids of that length do not get subtracted by HAP chromatography.


1. Collins F H. Parasitol Today. 1994;10:370–371. [PubMed]
2. Curtis C F. Parasitol Today. 1994;10:371–374. [PubMed]
3. Ribeiro J M, Nussenzveig R H, Tortorella G. J Med Entomol. 1994;31:747–753. [PubMed]
4. Shen Z, Dimopoulos G, Kafatos F C, Jacobs-Lorena M. Proc Natl Acad Sci USA. 1999;96:6510–6515.
5. Arcà B, Lombardo F, de Lara Capurro Guimarães M, della Torre A, Dimopoulos G, James A A, Coluzzi M. Proc Natl Acad Sci USA. 1999;96:1516–1521. [PMC free article] [PubMed]
6. Dimopoulos G, Richman A, della Torre A, Kafatos F C, Louis C. Proc Natl Acad Sci USA. 1996;93:13066–13071. [PMC free article] [PubMed]
7. Beier J C. Annu Rev Entomol. 1998;43:519–543. [PubMed]
8. Collins F H, Sakai R K, Vernick K D, Paskewitz S, Seeley D C, Miller L H, Collins W E, Campbell C C, Gwadz R W. Science. 1986;234:607–610. [PubMed]
9. Vernick K D, Fujioka H, Seeley D C, Tandler B, Aikawa M, Miller L H. Exp Parasitol. 1995;80:583–595. [PubMed]
10. Dimopoulos G, Richman A, Müller H-M, Kafatos F C. Proc Natl Acad Sci USA. 1997;94:11508–11513. [PMC free article] [PubMed]
11. Dimopoulos G, Seeley D, Wolf A, Kafatos F C. EMBO J. 1998;17:6115–6123. [PMC free article] [PubMed]
12. Barillas-Mury C, Charlesworth A, Gross I, Richman A, Hoffman J A, Kafatos F C. EMBO J. 1996;15:4691–4701. [PMC free article] [PubMed]
13. Zheng L, Cornel A J, Wang R, Erfle H, Voss H, Ansorge W, Kafatos F C, Collins F H. Science. 1997;276:425–428. [PubMed]
14. Richman A, Dimopoulos G, Seeley D, Kafatos F C. EMBO J. 1997;16:6114–6119. [PMC free article] [PubMed]
15. Müller H-M, Dimopoulos G, Blass C, Kafatos F C. J Biol Chem. 1999;274:11727–11735. [PubMed]
16. Adams M D, Kelley J M, Gocayne J D, Dubnick M, Polymeropoulos M H, Xiao H, Merril C R, Wu A, Olde B, Moreno R F, et al. Science. 1991;252:1651–1656. [PubMed]
17. Duggan D J, Bittner M, Chen Y, Meltzer P, Trent J M. Nat Genet. 1999;21,Suppl.:10–14. [PubMed]
18. Lawson D. Parasitology. 1999;118:S15–S18. [PubMed]
19. Williams S A, The Filarial Genome Project, Johnston D A. The Schistosome Genome Project. Parasitology. 1999;118:S19–S38. [PubMed]
20. Myler P J, Audleman L, deVos T, Hixson G, Kiser P, Lemley C, Magness C, Rickel E, Sisk E, Sunkin S, et al. Proc Natl Acad Sci USA. 1999;96:2902–2906. [PMC free article] [PubMed]
21. Gardner M J, Tettelin H, Carucci D J, Cummings L M, Aravind L, Koonin E V, Shallom S, Mason T, Yu K, Fujii C, et al. Science. 1998;282:1126–1132. [PubMed]
22. Vizioli J, Bulet P, Lowenberger C, Blass C, Müller H-M, Dimopoulos G, Hoffmann J, Kafatos F C, Richman A. Insect Mol Biol. 2000;9:75–84. [PubMed]
23. Bonaldo M F, Lennon G, Soares M B. Genome Res. 1996;6:791–806. [PubMed]
24. Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
25. Bairoch A, Apweiler R. Nucleic Acids Res. 2000;28:45–48. [PMC free article] [PubMed]
26. Tamames J, Ouzounis C, Casari G, Sander C, Valencia A. Bioinformatics. 1998;14:542–543. [PubMed]
27. Muta T, Iwanaga S. Curr Opin Immunol. 1996;8:41–47. [PubMed]
28. Paskewitz S M, Reese-Stardy S, Gorman M J. Insect Mol Biol. 1999;8:329–337. [PubMed]
29. Gorman M, Andreeva O V, Paskewitz S. Insect Biochem Mol Biol. 2000;30:35–46. [PubMed]
30. Danielli, A., Loukeris, T. G., Langueux, M., Müller, H.-M., Richman, A. & Kafatos, F. C. (2000) Proc. Natl. Acad. Sci. USA97, in press.
31. Söderhäll K, Cerenius L. Curr Opin Immunol. 1998;10:23–28. [PubMed]
32. Lathinen U, Hellman U, Wernstedt C, Saraste J, Petterson R F. J Biol Chem. 1996;271:4031–4037. [PubMed]
33. Ham P J. In: Advances in Disease Vector Research. Harris K, editor. New York: Springer; 1992. pp. 101–149.
34. Yoshida H, Kinoshita K, Ashida M. J Biol Chem. 1996;271:13854–13860. [PubMed]
35. Theopold U, Samakovlis C, Erdjument-Bromage H, Dillon N, Axelsson B, Schmidt O, Tempst P, Hultmark D. J Biol Chem. 1996;271:12708–12715. [PubMed]
36. Paerson A, Lux A, Krieger M. Proc Natl Acad Sci USA. 1995;92:4056–4060. [PMC free article] [PubMed]
37. Franc N C, Heitzler P, Ezekowitz R A, White K. Science. 1999;284:1991–1994. [PubMed]
38. Lu J. BioEssays. 1997;19:509–518. [PubMed]
39. Gokudan S, Muta T, Tsuda R, Koori K, Kawahara T, Seki N, Mizunoe Y, Wai S N, Iwanaga S, Kawabata S-I. Proc Natl Acad Sci USA. 1999;96:10086–10091. [PMC free article] [PubMed]
40. Hoffmann J A, Reichhart J-M. Trends Cell Biol. 1997;7:309–316. [PubMed]
41. Grosshans J, Schnorrer F, Nusslein-Volhard C. Mech Dev. 1999;81:127–138. [PubMed]
42. Ostrowski J, Van Seuningen I, Seger R, Rauch C T, Sleath P R, McMullen B A, Bomsztyk K. J Biol Chem. 1994;269:17626–17634. [PubMed]
43. Jiang H, Wang Y, Korochkina S E, Benes H, Kanost M R. Insect Biochem Mol Biol. 1997;27:693–699. [PubMed]
44. Weiss G, Wachter H, Fuchs D. Immunol Today. 1995;16:495–500. [PubMed]
45. Rouault T, Klausner R. Curr Top Cell Regul. 1999;35:1–19. [PubMed]
46. Dunkov B C, Zhang D, Choumarou K, Winzerling J J, Law J H. Arch Insect Biochem Physiol. 1995;29:293–307. [PubMed]
47. Wang S M, Fears S C, Zhang L, Chen J-J, Rowley J D. Proc Natl Acad Sci USA. 2000;97:4162–4167. [PMC free article] [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...