Send to

Choose Destination
Mol Biochem Parasitol. 2001 Dec;118(2):201-10.

Profiling the malaria genome: a gene survey of three species of malaria parasite with comparison to other apicomplexan species.

Author information

Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20892, USA.


We have undertaken the first comparative pilot gene discovery analysis of approximately 25,000 random genomic and expressed sequence tags (ESTs) from three species of Plasmodium, the infectious agent that causes malaria. A total of 5482 genome survey sequences (GSSs) and 5582 ESTs were generated from mung bean nuclease (MBN) and cDNA libraries, respectively, of the ANKA line of the rodent malaria parasite Plasmodium berghei, and 10,874 GSSs generated from MBN libraries of the Salvador I and Belem lines of Plasmodium vivax, the most geographically wide-spread human malaria pathogen. These tags, together with 2438 Plasmodium falciparum sequences present in GenBank, were used to perform first-pass assembly and transcript reconstruction, and non-redundant consensus sequence datasets created. The datasets were compared against public protein databases and more than 1000 putative new Plasmodium proteins identified based on sequence similarity. Homologs of previously characterized Plasmodium genes were also identified, increasing the number of P. vivax and P. berghei sequences in public databases at least 10-fold. Comparative studies with other species of Apicomplexa identified interesting homologs of possible therapeutic or diagnostic value. A gene prediction program, Phat, was used to predict probable open reading frames for proteins in all three datasets. Predicted and non-redundant BLAST-matched proteins were submitted to InterPro, an integrated database of protein domains, signatures and families, for functional classification. Thus a partial predicted proteome was created for each species. This first comparative analysis of Plasmodium protein coding sequences represents a valuable resource for further studies on the biology of this important pathogen.

[Indexed for MEDLINE]

Publication types, MeSH terms, Substances, Secondary source ID, Grant support

Publication types

MeSH terms


Secondary source ID

Grant support

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center