• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2009; 37(Database issue): D526–D530.
Published online Sep 29, 2008. doi:  10.1093/nar/gkn631
PMCID: PMC2686445

GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis

Abstract

GiardiaDB (http://GiardiaDB.org) and TrichDB (http://TrichDB.org) house the genome databases for Giardia lamblia and Trichomonas vaginalis, respectively, and represent the latest additions to the EuPathDB (http://EuPathDB.org) family of functional genomic databases. GiardiaDB and TrichDB employ the same framework as other EuPathDB sites (CryptoDB, PlasmoDB and ToxoDB), supporting fully integrated and searchable databases. Genomic-scale data available via these resources may be queried based on BLAST searches, annotation keywords and gene ID searches, GO terms, sequence motifs and other protein characteristics. Functional queries may also be formulated, based on transcript and protein expression data from a variety of platforms. Phylogenetic relationships may also be interrogated. The ability to combine the results from independent queries, and to store queries and query results for future use facilitates complex, genome-wide mining of functional genomic data.

INTRODUCTION

The amitochondriate protists Giardia lamblia (G. intestinalis; G. duodenalis) and Trichomonas vaginalis are ubiquitous microaerophilic parasites. Giardia lamblia, a major source of enteric infection in humans and potential bioterrorism agent (category B priority pathogen), is spread through fecal–oral transmission of highly stable cysts, with manifestations such as diarrhea, cramps, bloating, weight loss and maladsorption in symptomatic cases (1). Trichomonas vaginalis is the causative agent of trichomoniasis, and is considered the most common nonviral sexually transmitted disease of humans, with ~170 million cases annually (2). This parasite infects the urogenital epithelia of both the sexes, causing inflammation (although men are usually asymptomatic) and increased risk of HIV infection.

The 12 Mb G. lamblia genome (3) and ~160 Mb T. vaginalis genome (4) have been deposited in GenBank, and are also accessible at GiardiaDB (http://GiardiaDB.org) and TrichDB (http://TrichDB.org), respectively, along with both manually curated automatically generated annotation, and a variety of functional genomics data. Data can may accessed and queried directly via the individual genome sites or through the Eukaryotic Pathogen DataBase portal (EuPathDB: http://EuPathDB.org) (5), which also accommodates other eukaryotic pathogen databases including CryptoDB (Cryptosporidium spp.) (6), PlasmoDB (Plasmodium spp.) (7) and ToxoDB (Toxoplasma gondii) (8).

DATA CONTENT OF CURRENT RELEASES

GiardiaDB

GiardiaDB release 1.1 is based on the 12 Mb genome of the WBC6 clinical isolate of G. lamblia. The sequence is distributed among 306 contigs, assembled on 92 scaffolds (supercontigs), with an average depth of coverage of 11×. A total of 4976 genes have been annotated, including 4889 protein coding genes, 61 tRNAs and 17 rRNAs. In addition, 1611 genes have been flagged as ‘deprecated’ (demoted) in this release of GiardiaDB as they appear unlikely to represent true genes, based on incompatibility with longer gene models, published data, or alternative models for which functional evidence is available.

Transcript and proteomic expression data sets are both available for analysis through GiardiaDB. These include expressed sequence tag (EST) evidence from the trophozoite life stage (3, and data deposited in dbEST; http://www.ncbi.nlm.nih.gov/dbEST/), and ten serial analysis of gene expression (SAGE) data sets, representing time points distributed throughout the Giardia parasite life cycle: trophozoites, encystation, cyst and excystation (9). Understanding various parasite life stages will be critical for vaccine and therapy development strategies, making the SAGE time series particularly valuable. SAGE data may be queried for evidence of gene expression at a particular life stage(s), and also based on relative levels of expression at different stages (differential expression). As part of the Pathogen Functional Genomics Resource Center (http://pfgrc.tigr.org/index.php/microarray/available_microarrays.html), the J. Craig Venter Institute (JCVI) has developed a microarray based on the WBC6 genome which is available to Giardia research groups. Results from one completed microarray study identifying genes that are up- or downregulated in response to stress (heat or DTT) are available in the current database (A. Hehl et al., unpublished). Trophozoites during log-phase growth are also represented by mass spectrometry-based proteomics data, with peptide assignments made using the SEQUEST (10) and DTASelect (D. Ratner et al., unpublished data). These data have been used to further validate gene calls and restore a small number of genes from ‘deprecated’ status.

Because Giardia has provided an extraordinarily valuable window into the evolution of eukaryotic cells, GiardiaDB also provides precomputed phylogenetic trees for 1441 genes. Giardial genes and homologs in other eukaryotic organisms were aligned using MUSCLE (11) and phylogenetic relationships inferred using MrBayes (12). While such trees cannot be computed dynamically, these precomputed trees provide a starting point for further analysis.

TrichDB

TrichDB release 1.0 is based on the 160 Mb genome sequence of the G3 isolate of T. vaginalis (4) distributed among many contigs and scaffolds. The annotated genome is comprised of 59 672 protein-coding genes (only 65 of which contain introns), 1136 RNA-coding genes (668 rRNA and 438 tRNA) and 38 201 ‘repeat genes’ (protein-coding genes present in high copy number). Transcript expression includes data from 11 T. vaginalis EST libraries, courtesy of TvXpress at the Chang Gung Bioinformatics Center (http://tvxpress.cgu.edu.tw/). These represent ESTs from four isolates, grown under different conditions, and from various cell cycle stages.

AVAILABLE QUERIES AND DATA-MINING TOOLS

TrichDB and GiardiaDB are both accessed via the standard EuPathDB web interface, providing a wide variety of tools for genomic database mining. In addition to BLAST (13) and pattern/motif similarity searches, users can identify genes based on genomic position; common name or keyword; gene attributes (such as gene type, or number of exons); evidence of transcript expression including ESTs (both TrichDB and GiardiaDB), SAGE tags, microarray and proteomics (GiardiaDB only); gene product annotation (such as GO function, or EC enzyme number); and predicted cellular location (based on signal peptide and transmembrane predictions). Figure 1A and B illustrates a set of queries supported in GiardiaDB (TrichDB is very similar). Query results are returned as a tabular list, with columns that users can sort, or manipulate by adding or removing attributes to be displayed (Figure 1C). Clicking on any gene identifier links to the gene record page, providing all the information associated with the gene of interest. The right-hand panel of Figure 1E shows a representative gene record page from GiardiaDB.

Figure 1.
Screenshots of queries available through GiardiaDB (similar queries are also available for TrichDB). (A) starting with the home page of GiardiaDB (or TrichDB), users may access ‘Queries & Tools’ via the green navigation bar. ( ...

The EuPathDB infrastructure also provides a set of tools leveraging these basic queries, enabling users to perform higher level operations on their query results. For example, users may use the query history functionality to combine results using the Boolean operators (AND, OR, NOT) (Figure 1D), allowing them to identify genes that possess a specified combination of attributes, such as putative kinase genes expressed in trophozoites for which either proteomics or EST evidence is available. Investigators interested in drug target discovery, may wish to search for genes with EST, microarray or proteomics evidence for expression, that appear likely to encode small soluble proteins assigned EC numbers or GO terms associated with catalytic activity, and lack evident orthologs in humans or other mammals. Similar queries against TrichDB might be further refined by asking for protein coding genes that are not highly repeated. The resulting candidate list could be expanded or restricted based on the addition of additional criteria, or parameter refinement.

At any point, users may download the results of any query in various formats, including a detailed report contains all of the data stored for each gene record, enabling further bioinformatics analysis. FASTA format allows users to simply retrieve transcript and/or protein sequences, as along with flanking genomic sequences if desired. For example, users might wish to identify a set of genes and download all sequences that lie within 1000-bp upstream of each. Optional registration and logins enable users to retain their query history over time, so that these results can be further refined, combined with additional queries, or re-run at a later date. Registered users may also submit comments on any gene or sequence entity in the database, providing support for community annotation of these parasite genomes. User comments (labeled as such) are immediately available to other users of the database, and indexed for retrieval using keyword searches.

THE EuPathDB PORTAL

EuPathDB provides a unified query interface for TrichDB and GiardiaDB, as well as other pathogen databases including CryptoDB (supporting three species of Cryptosporidium), PlasmoDB (six Plasmodium species) and ToxoDB (three Toxoplasma gondii strains, and the closely related species Neospora caninum). In support of functional and evolutionarily relevant studies, the organism parameter facilitates searches for ‘anaerobic protists’ (Cryptosporidium, Giardia, Trichomonas) or ‘apicomplexans’ (Crytosporidium, Plasmodium, Toxoplasma), in addition to ‘all organisms’ or any user-defined subsets of species. All queries available on the component websites are available in EuPathDB, enabling users to leverage orthologous relationships between organisms to identify genes based on data types that may not be available for their organism of primary interest. (Ortholog functionality not available for T. vaginalis at the time of manuscript submission, but scheduled for the autumn 2008 release of TrichDB.)

FUTURE DIRECTIONS

GiardiaDB is expected to grow substantially over the coming, year as next-generation sequencing and assembly technologies are now being applied to three new genomes, including a second assemblage A isolate and two assemblage B isolates (14). The ability to query across related genomes will likely identify both a core set of Giardia genes, and genes that appear to be strain-specific. Proteomic data sets corresponding to various life cycle stages and subcellular fractions (particularly the ESV excretory/secretory vesicles) are also anticipated (Gillin et al., Tachezy et al., personal communication).

The next release of TrichDB is expected to incorporate several new data sets and data upgrades including proteomic, phosphoproteomic, microRNA and microarray data, new EST libraries, and transposable element annotation and categorization. MicroRNA and EST data are also expected for other trichomonads including T. tenax, T. foetus and Pentatrichomonas hominis. Additionally, we anticipate loading and providing access to the genomic sequence and annotation for a second strain of T. vaginalis (TO16). The scheduled incorporation of T. vaginalis into the OrthoMCL database of orthologous proteins (http://OrthoMCL.org) (15) will allow users to leverage orthology relationships between T. vaginalis and other protozoan parasites.

FUNDING

Federal funds from the National Institute of Allergy and Infectious Diseases; Department of Health and Human Services, National Institutes of Health (HHSN266200400037C). Funding for open access charge: National Institutes of Health (HHSN266200400037C).

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

The authors wish to thank members of the Giardia and Trichomonas research communities for their willingness to share genomic-scale datasets, often prior to publication, and for numerous comments and suggestions that have helped to improve the functionality of GiardiaDB and TrichDB. We also thank past and present staff associated with the ApiDB-BRC project, and our research laboratory colleagues whose contributions have facilitated the creation and maintenance of this database resource.

REFERENCES

1. Kucik CJ, Martin GL, Sortor BV. Common intestinal parasites. Am. Fam. Physician. 2004;69:1161–1168. [PubMed]
2. Global Prevalence and Incidence of Selected Curable Sexually Transmitted Infections: Overview and Estimates. (World Health Organization, Geneva, 2001). http://www.who.int/docstore/hiv/GRSTI/006.htm (July 2008, last date accessed)
3. Morrison HG, McArthur AG, Gillin FD, Aley SB, Adam RD, Olsen GJ, Best AA, Cande WZ, Chen F, Cipriano MJ, et al. Genomic minimalism in the early diverging intestinal parasite Giardia lamblia. Science. 2007;317:1921–1926. [PubMed]
4. Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q, Wortman JR, Bidwell SL, Alsmark UC, Besteiro S, et al. Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis. Science. 2007;315:207–212. [PMC free article] [PubMed]
5. Aurrecoechea C, Heiges M, Wang H, Wang Z, Fischer S, Rhodes P, Miller J, Kraemer E, Stoeckert CJ, Jr, Roos DS, et al. ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Res. 2007;35:D427–D430. [PMC free article] [PubMed]
6. Heiges M, Wang H, Robinson E, Aurrecoechea C, Gao X, Kaluskar N, Rhodes P, Wang S, He CZ, Su Y, et al. CryptoDB: a Cryptosporidium bioinformatics resource update. Nucleic Acids Res. 2006;34:D419–D422. [PMC free article] [PubMed]
7. Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, et al. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. 2003;31:212–215. [PMC free article] [PubMed]
8. Gajria B, Bahl A, Brestelli J, Dommer J, Fischer S, Gao X, Heiges M, Iodice J, Kissinger JC, Mackey AJ, et al. ToxoDB: an integrated Toxoplasma gondii database resource. Nucleic Acids Res. 2008;36:D553–D556. [PMC free article] [PubMed]
9. Palm D, Weiland M, McArthur AG, Winiecka-Krusnell J, Cipriano MJ, Birkeland SR, Pacocha SE, Davids B, Gillin F, Linder E, et al. Developmental changes in the adhesive disk during Giardia differentiation. Mol. Biochem. Parasitol. 2005;141:199–207. [PubMed]
10. Yates JR, III, Eng JK, McCormack AL, Schieltz D. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 1995;67:1426–1436. [PubMed]
11. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. [PMC free article] [PubMed]
12. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001;17:754–755. [PubMed]
13. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410. [PubMed]
14. Thompson RC, Monis PT. Variation in Giardia: implications for taxonomy and epidemiology. Adv. Parasitol. 2004;58:69–137. [PubMed]
15. Chen F, Mackey AJ, Stoeckert C.J., Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 2006;34:D363–D368. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...