• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 1, 2004; 32(Database issue): D334–D338.
PMCID: PMC308849

Full-malaria 2004: an enlarged database for comparative studies of full-length cDNAs of malaria parasites, Plasmodium species

Abstract

Full-malaria (http://fullmal.ims.u-tokyo.ac.jp), a database for full-length cDNAs from the human malaria parasite, Plasmodium falciparum has been updated in at least three points. (i) We added 8934 sequences generated from the addition of new libraries, so that our collection of 11 424 full-length cDNAs covers 1375 (25%) of the estimated number of the entire 5409 parasite genes. (ii) All of our full-length cDNAs and GenBank EST sequences were mapped to genomic sequences together with publicly available annotated genes and other predictions. This precisely determined the gene structures and positions of the transcriptional start sites, which are indispensable for the identification of the promoter regions. (iii) A total of 4257 cDNA sequences were newly generated from murine malaria parasites, Plasmodium yoelii yoelii. The genome/cDNA sequences were compared at both nucleotide and amino acid levels, with those of P.falciparum, and the sequence alignment for each gene is presented graphically. This part of the database serves as a versatile platform to elucidate the function(s) of malaria genes by a comparative genomic approach. It should also be noted that all of the cDNAs represented in this database are supported by physical cDNA clones, which are publicly and freely available, and should serve as indispensable resources to explore functional analyses of malaria genomes.

INTRODUCTION

Malaria is the most devastating parasitic disease in the world; it kills more than a million people every year. Plasmodium falciparum is the causative agent of the lethal form of malaria in humans. Thus, the recent completion of the genome sequencing for P.falciparum, ~23 Mb on 14 chromosomes (seven finished and seven unfinished) has been a great milestone, which provides invaluable information about this organism (15). Mass spectrometry and oligonucleotide array techniques have been utilized to characterize ~5000 candidate genes (6,7). However, these techniques depend upon the correct annotation of the gene structure. Furthermore, to understand the mechanism(s) by which the parasite controls expression of genes throughout its complicated life cycle, the elucidation of transcription factors and binding motifs are mandatory.

Full-malaria started as a database for full-length cDNA clones produced from the erythrocyte-stage parasite of P.falciparum using the oligo-capping method, while the genome sequencing efforts were concurrently underway (8,9). It consisted of 5′ one-pass information, supported by corresponding physical plasmid clones, which are deposited at MR4 (http://www.malaria.mr4.org/).

NEW FEATURES

In this update, we made two additional libraries from P.falciparum and determined 8934 sequences. Originally we used a full-length enriched library from erythrocyte-stage parasites of P.falciparum and reported 5′ end one-pass sequence of 2490 random clones (8). Since then, we have produced two additional libraries from parasites, which were grown under different condition(s), and determined a total of 11 424 clones. Determined sequences were compared with genome nucleotide sequences and displayed on the graphical map along with annotated and predicted genes with three different software packages (PlasmoDB). In total, 1375 genes were represented by full-length clones. Their physical plasmids are available for various experiments (Table (Table11).

Table 1.
The numbers of predicted annotated genes and genes represented by full-length clones are shown for Plasmodium falciparum and Plasmodium yoelii

As the genome sequences became publicly available, all the cDNA sequences were mapped on 14 chromosomes using BLAT and sim4 programs (10,11) and the exact alignment was graphically presented.

The chromosome map is viewed by choosing the chromosome number and the positions of both ends of the region of interest, or by searching for the Full-malaria clone name or the annotated gene name (Fig. (Fig.1).1). The magnification level can easily be changed. Alternatively, BLASTN will search for similar sequences within the database, enabling the location of the gene to be determined. Regarding each of the genes, hydropathy plot analysis and motif searches (Pfam: http://www.ebi.ac.uk/interpro/) were performed based on the deduced amino acid sequences and the results are represented graphically. Predictions of protein subcellular localization is also possible, using PSORT, PSORTII (http://psort.ims.u-tokyo.ac.jp) and SubLoc (http://www.bioinfo.tsinghua.edu.cn/SubLoc/eu_batchpredict.htm) (Fig. (Fig.11).

Figure 1
(Next page) A view of the map showing a region of chromosome 12 (1800001–182000). The scale in the center shows the position within the P.falciparum genome sequence. Structures of the annotated genes and genes predicted by Genefinder, GlimmerM ...

We incorporated EST sequence data downloaded from GenBank and mapped on the chromosomes. Interestingly, some Full-malaria clones and ESTs represent different sets of genes. Using both Full-malaria cDNAs and ESTs, numerous modifications in gene structures were identified, including the existence of non-coding exon(s), alternative splicing events, correction of splicing and even the identification of hitherto unknown genes. A summary of the statistics from the current Full-malaria database is shown in Table Table11.

Furthermore, in order to provide a useful platform for the comparative genomics of Plasmodium species, we constructed a full-length cDNA library from murine malaria parasite Plasmodium yoelii, which was propagated in vivo. As a result of random sequencing analysis, we determined 4257 5′end one-pass sequences. We also mapped those cDNA sequences along with 5×-coverage draft genome sequences of this organism (12) (Fig. (Fig.11 upper part). Comparisons of contig nucleotide sequences of P.yoelii with the amino acid sequences of annotated genes of P.falciparum using TBLASTN, successfully aligned 1740 contigs with 4136 genes (Figs (Figs11 and and2).2). Synteny is conserved in all P.yoelii genes at the genomic level, except for one contig in which the gene order is reversed.

Figure 2
The results of TBLASTN are shown in table and graphic view. A click of the Lalign button will show the results of Lalign (as in Fig. Fig.33).

The sequence alignments were further analyzed at the nucleotide level using Lalign (13). These results are shown in the P.falciparum chromosome map and a click on the P.yoelii contig box will display the details of these comparisons (Fig. (Fig.3).3). Furthermore, at the nucleotide level synteny is quite well preserved between these two species. The locations of full-length clones are mostly in accordance with the predicted gene structures. Comparison of the promoter regions of both species is of great interest.

Figure 3
Similarity of the local nucleotide sequences is shown as red lines. A click on the Redraw button will show a new picture of the alignment at a different level.

Comparative analysis of full-length cDNA of P.falciparum and conservation of amino acid sequences with P.yoelii revealed that the start sites of some of the annotated genes are predicted falsely. The actual gene may start from a position further downstream. Some very large annotated genes seem to represent two or more genes. Indeed, exact information on full-length cDNAs supported by physical full-length cDNA clones is indispensable for precise annotation of the correct gene structures. For further information regarding genes for which revision of the annotation should be necessary, please refer to our database (http://fullmal.ims.u-tokyo.ac.jp/annotation); the details of this issue will be described elsewhere (J. Watanabe, M. Sasaki, Y. Suzuki and S. Sugano, in preparation). Expansion of comparative analysis to genome sequences along with full-length cDNA of other apicomplexan organisms will be also useful for investigations of evolution and for analysis of the pathogenicity of respective parasites.

ACKNOWLEDGEMENTS

We thank DYNACOM Co., Ltd for providing experienced technical assistance. Nucleotide sequences and gene predictions were downloaded from PlasmoDB (http://plasmoDB.org). This database has been constructed and maintained by a Grant-in-Aid for Publication of Scientific Research Results from the Japan Society for the Promotion of Science.

REFERENCES

1. Gardner M.J., Hall,N., Fung,E., White,O., Berriman,M., Hyman,R.W., Carlton,J.M., Pain,A., Nelson,K.E., Bowman,S. et al. (2002) Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419, 498–511. [PMC free article] [PubMed]
2. Florens L., Washburn,M.P., Raine,J.D., Anthony,R.M., Grainger,M., Haynes,J.D., Moch,J.K., Muster,N., Sacci,J.B., Tabb,D.L. et al. (2002) A proteomic view of the Plasmodium falciparum life cycle. Nature, 419, 520–526. [PubMed]
3. Hall N., Pain,A., Berriman,M., Churcher,C., Harris,B., Harris,D., Mungall,K., Bowman,S., Atkin,R., Baker,S. et al. (2002) Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13. Nature, 419, 527–531. [PubMed]
4. Gardner M.J., Shallom,S.J., Carlton,J.M., Salzberg,S.L., Nene,V., Shoaibi,A., Ciecko,A., Lynn,J., Rizzo,M., Weaver,B. et al. (2002) Sequence of Plasmodium falciparum chromosomes 2, 10, 11 and 14. Nature, 419, 531–534. [PubMed]
5. Hyman R.W., Fung,E., Conway,A., Kurdi,O., Mao,J., Miranda,M., Nakao,B., Rowley,D., Tamaki,T., Wang,F. et al. (2002) Sequence of of Plasmodium falciparum chromosome 12. Nature, 419, 534–537. [PubMed]
6. Lasonder E., Ishihama,Y., Andersen,J.S., Vermunt,A.M., Pain,A., Sauerwein,R.W., Eling,W.M., Hall,N., Waters,A.P., Stunnenberg,H.G. et al. (2002) Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature, 419, 537–542. [PubMed]
7. Le Roch K.G., Zhou,Y., Blair,P.L., Grainger,M., Moch,J.K., Haynes,J.D., De la Vega,P., Holder,A.A., Batalov,S., Carucci,D.J. et al. (2003) Discovery of gene function by expression profiling of the malaria parasite life cycle. Science, 301, 1503–1508. [PubMed]
8. Watanabe J., Sasaki,M., Suzuki,Y. and Sugano,S. (2001) FULL-malaria: a database for a full-length enriched cDNA library from human malaria parasites, Plasmodium falciparum. Nucleic Acids Res., 29, 70–71. [PMC free article] [PubMed]
9. Suzuki Y. and Sugano,S. (2003) Construction of a full-length enriched and a 5′-end enriched cDNA library using the oligo-capping method. Methods Mol. Biol., 221, 73–91. [PubMed]
10. Florea L., Hartzell,G., Zhang,Z., Rubin,G.M. and Miller,W. (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res., 8, 967–974. [PMC free article] [PubMed]
11. Kent W.J. (2002) BLAT—the BLAST-like alignment tool. Genome Res., 12, 656–664. [PMC free article] [PubMed]
12. Carlton J.M., Angiuoli,S.V., Suh,B.B., Kooij,T.W., Pertea,M., Silva,J.C., Ermolaeva,M.D., Allen,J.E., Selengut,J.D., Koo,H.L. et al. (2002) Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature, 419, 512–519. [PubMed]
13. Huang X., Miller,W., Schwartz,S. and Hardison,R.C. (1992) Parallelization of a local similarity algorithm. Comput. Appl. Biosci., 8, 155–165. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...