• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2011; 39(Database issue): D596–D600.
Published online Oct 6, 2010. doi:  10.1093/nar/gkq869
PMCID: PMC3013766

Pseudomonas Genome Database: improved comparative analysis and population genomics capability for Pseudomonas genomes

Abstract

Pseudomonas is a metabolically-diverse genus of bacteria known for its flexibility and leading free living to pathogenic lifestyles in a wide range of hosts. The Pseudomonas Genome Database (http://www.pseudomonas.com) integrates completely-sequenced Pseudomonas genome sequences and their annotations with genome-scale, high-precision computational predictions and manually curated annotation updates. The latest release implements an ability to view sequence polymorphisms in P. aeruginosa PAO1 versus other reference strains, incomplete genomes and single gene sequences. This aids analysis of phenotypic variation between closely related isolates and strains, as well as wider population genomics and evolutionary studies. The wide range of tools for comparing Pseudomonas annotations and sequences now includes a strain-specific access point for viewing high precision computational predictions including updated, more accurate, protein subcellular localization and genomic island predictions. Views link to genome-scale experimental data as well as comparative genomics analyses that incorporate robust genera-geared methods for predicting and clustering orthologs. These analyses can be exploited for identifying putative essential and core Pseudomonas genes or identifying large-scale evolutionary events. The Pseudomonas Genome Database aims to provide a continually updated, high quality source of genome annotations, specifically tailored for Pseudomonas researchers, but using an approach that may be implemented for other genera-level research communities.

INTRODUCTION

Pseudomonas is a genus of bacteria known for its metabolic capacity and ability to occupy a wide range of environments as free-living soil microorganisms or as serious opportunistic pathogens. Since the mosaic-like structure of its members’ genomes can influence niche and degree of virulence (1–3), it exhibits the ability to horizontally transfer elements between strains (4) and species can have notable broad-based antimicrobial resistance, the genus is widely studied at the genome level.

The Pseudomonas Community Annotation Project (PseudoCAP) was originally formed to meet the need of providing a conservative, peer-reviewed annotation of the Pseudomonas aeruginosa PAO1 genome sequence using an internet-based approach coupled with community-assisted genome annotation. This led to the development of the Pseudomonas Genome Database (http://www.pseudomonas.com), which initially was specific for P. aeruginosa (5–7). As development of the database continued in parallel with an increase in the number of Pseudomonas genomes being sequenced, PseudoCAP recognized the importance of capitalizing on insight obtained through comparative genome analysis (8). By developing comparative analyses for a specific genus like Pseudomonas, methods can be developed that are custom made for their level of divergence, generating higher precision comparisons between closely-related species that would otherwise be difficult to achieve when hosting data from more diverse taxonomic sources (8). Databases including the Tuberculosis Database Project (9) and the Burkholderia Genome Database (10) focus on providing tools to facilitate comparison with multiple genomes from smaller taxonomic groups and resources more specifically geared for the associated research community interests.

An overview is provided here of the latest updates to the Pseudomonas Genome Database, including a framework for viewing polymorphisms in closely-related P. aeruginosa genome sequences, strain-specific portals for accessing whole-genome data (including experimental data) and updated high-precision computational predictions and comparative genome analyses based on robust methods for predicting and clustering orthologs. The Pseudomonas Genome Database aims to provide a continually updated, curated source of genome annotations, improved precise computational predictions, genome-scale experimental data and integrated annotations from external databases, tailored specifically for the Pseudomonas research community, but using a framework that could be utilized by other similar research communities and groups.

NEW FEATURES

Ability to view polymorphisms/SNPs in closely-related Pseudomonas genomes

Next-generation sequencing technologies have already demonstrated that P. aeruginosa reference strain PAO1 isolates are undergoing microevolution contributing to multiple phenotypic differences (11). With the release of incompletely sequenced genomes outpacing completed genomes it has become important to apply a comprehensive, yet conservative approach to comparing reference genomes against these kinds of sequences. As a solution, we developed a framework for aligning partial genome sequences or single genes against the PAO1 reference sequence and indexing positions where polymorphisms are located. The MUMmer 3.22 (12) program NUCmer was used to align genomic DNA or cDNA sequences from incomplete P. aeruginosa genomes or individual polymorphic sequences available at the National Center for Biotechnology Information (NCBI) whereas the MUMmer program show that SNPs were used to identify SNPs and indels in non-repetitive regions. Once identified at the genomic level, polymorphisms were classified according to type of mutation (e.g. missense, silent, etc.) and its location within the gene’s nucleotide and protein sequences, along with the downstream effect on the amino acid properties being recorded. Polymorphisms in genomic DNA, cDNA and amino acid sequences were also assigned a standardized description following recommendations made by the Human Genome Variation Society (13), a standard adopted by other SNP databases including dbSNP (14) and Ensembl (15). Access to this data is currently under the ‘polymorphisms’ tab of any P. aeruginosa PAO1 gene (or ‘gene card’) page where an overview of point mutations is presented along with an image of the specific sequence containing hyperlinked mutations at each site. The nucleotides at these sites are colored in order to distinguish synonymous and non-synonymous mutations from insertions and deletions, whereas a single strain can be selected from an adjacent list in order to apply a filter that hides all other strains. Links are provided to more detailed point mutation descriptions, experimental details, sequence or alignment downloads and effects that a mutation would have on amino acid properties. Links to a GBrowse genome viewer (16) representation of the local genomic landscape is provided, whereby all known SNPs/indels are annotated and colored by type. This presentation can be quite useful for visualizing regions undergoing a putatively higher degree of selection, as represented by a clustering of non-synonymous mutations (for example, see OprD at http://www.pseudomonas.com/getAnnotation.do?locusID=PA0958). Finally, we offer additional SNP information including the origin of the SNP call (e.g. experimental method such as restriction digest or computational prediction based on sequence alignment) to help provide users with a level of confidence regarding a given result. This can be easily expanded in the future to handle a wider range of Pseudomonas species and incorporate additional information for SNP validation and quality scores.

New search options including a portal to strain-specific analyses and experimental data, plus expanded comparative genomic analysis capabilities

All annotations may be searched using either the simple or advanced Boolean-based search tools which have recently been enhanced with an option to filter the list of genes returned to a specified size range or protein subcellular localization of the gene products. Other filters limit for properties such as genes encoding known drug targets (based on manually curated data) or also putatively essential genes as determined by saturation transposon mutagenesis [currently P. aeruginosa only (17–19)].

We have also recently introduced a higher-level portal to strain-specific, whole-genome analyses and experimental data. By linking from the home page, one can go to an overview of any reference strain’s chromosome or plasmids with a summary of their genes, broken down by feature type or its product’s primary subcellular localization. It also serves as a starting point for other strain-related searches including browsing by function, localization, virulence factors, drug targets and genes located in genomic islands or by performing sequence searches. Under the ‘Tools’ tab, it facilitates an IUPAC-formatted DNA motif search against other Pseudomonas reference genomes to return a list of genomic features containing that motif which can be applicable to finding putative transcription factor binding sites and other interesting motifs. The ‘Experimental data’ tab provides a starting point for identifying expression data studies and linking to respective data at NCBI's Gene Expression Omnibus (20) or ArrayExpress (21) or to sequence read data at NCBI's Trace Archive or Short Reads Archive (22). Finally, the strain-specific access page provides an entry point to whole-genome comparative analyses including a high-precision set of putative orthologs and sets of human homologs which are of particular relevance to researchers investigating Pseudomonas-specific drug targets. The orthologous genes set consists of Pseudomonas genes mapped to their respective orthologs in other Pseudomonas species using our Pseudomonas Orthologous Genes (POGs) classification (8) with further assessment by a high-precision method called Ortholuge which examines phylogenetic distance ratios between two comparison species and an outgroup species (23). This Ortholuge method was also recently improved (24).

We have recently added a search feature that enables users to perform their own comparative analysis of putatively orthologous genes. This should help address questions including what orthologous genes are present in one set of Pseudomonas genomes that are not present in another or find genes in one strain that have no orthologs in the other strains. In order to perform this analysis, users simply indicate which species they would like to have orthologs returned for by selecting from a list of genomes. They can optionally make a selection that limits results to putatively essential genes or genes with/without human homologs and even specify results matching multiple keywords specified in a Boolean search form.

Precise computational predictions added: improved protein subcellular localization prediction and identification of genomic islands

Our desire to make high quality Pseudomonas annotations available has led to the integration of computational predictions based on high-precision methods. We have recently updated our computational predictions of protein subcellular localization data based on PSORTb version 3.0 software (25) containing enhancements over our very successful PSORTb version 2.0 (26). PSORTb 3.0 adds several new sub-category localizations related to bacterial organelles while differentiating proteins targeted to a host cell. The new version also exhibits higher sensitivity and genome prediction coverage compared with the previous version, with an average of 15% genome coverage increase for Gram-negative species over PSORTb 2.0. For P. aeruginosa annotations, additional experimentally demonstrated localizations in P. aeruginosa or highly similar proteins from closely related species are made available in place of PSORTb predictions and are assigned a value based on the degree of confidence in the localization.

Pseudomonas genomes are a mosaic of horizontally-transferred genes (e.g. arising from genomic islands) which play an important role in its species’ adaptation to environmental niches. To identify genomic islands (GIs), our lab developed IslandViewer (27), a computational tool integrating two sequence composition GI prediction methods, SIGI-HMM (28) and IslandPath-DIMOB (29), with a comparative GI prediction method, IslandPick (30). Views of the data in IslandViewer have been integrated into the Pseudomonas Genome Database from a ‘browse genomic island’ page and from individual strain summary pages. These sections link to IslandViewer website pages containing circular chromosome images overlaid with details of genomic islands identified by the various methods with links to download lists of more detailed results about genes within islands, for further analysis.

Integration of analyses from external sources

For all genomes, we now provide precise operon predictions based on the Database of Prokaryotic Operons (DOOR), rated one of the best programs for operon prediction (31,32). We also incorporated updated Rho-independent transcription terminator predictions using TransTermHP (33) and computationally-identified inverted repeats (palindromes) using EMBOSS’s Palindrome software (34). The database also hosts data from three transposon mutant libraries based on P. aeruginosa PAO1 (17,18) and P. aeruginosa PA14 (19) and will continue to add relevant data from these and other strains as it arises in order to form a foundation for identifying putative essential genes.

Curated updates to genome annotations

Our continual updates to genome annotations come from a variety of sources including in-house curation, submission from members of the Pseudomonas research community, curators belonging to other Pseudomonas sequencing projects and directly from NCBI, where many genome centers directly submit their annotation updates. Since 2001, more than 2000 annotation updates have been made to the P. aeruginosa PAO1 annotation through a variety of curation methods described earlier (7). Annotation updates are also made to other Pseudomonas genomes in the database through the process of contacting curators at other genome centers including those responsible for the ongoing annotation of the P. syringae DC3000 (35), P. aeruginosa PA14 (2) and P. aeruginosa LESB58 (36) reference strains. The quality of annotations available and flexibility of ways to view results is widely recognized. Large annotation databases, including the Comprehensive Microbial Resource (J. Craig Venter Institute) (37), UniProt (38) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) (39), as well as Pseudomonas-specific resources including the P. syringae project and Systomonas (40) link to views of data on our website or have requested our curated data sets of updated gene annotation information.

FUTURE DEVELOPMENT

In addition to continually updating genome annotations with a focus on P. aeruginosa curation, we will continue to add new strains and update other Pseudomonas species annotations from publicly available repositories and Pseudomonas genome sequencing centers. We plan to extend the scope of the database to include a system for viewing and interrogating microarray expression data and RNA-seq-based transcriptome data in a more sophisticated manner. We are also developing a parallel interaction database using the InnateDB framework we previously developed for human and mouse innate immunity (and larger proteome) protein–protein interactions (41). Throughout these and other future efforts, we aim to continue to provide a high quality, user-friendly and powerful resource for the Pseudomonas research community.

AVAILABILITY

All features of this database are fully accessible to the public. The source code is freely available under the GNU GPL license.

FUNDING

The Cystic Fibrosis Foundation (Cystic Fibrosis Foundation Therapeutics) with additional support for some tool development by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the SFU Community Trust; Michael Smith Foundation for Health Research (MSFHR), Junior Graduate Studentship Award (to M.D.W.); F.S.L.B. is a MSFHR Senior Scholar and R.E.W.H. holds a Canada Research Chair. Funding for open access charge: Cystic Fibrosis Foundation and SFU Community Trust Endowment Fund.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Dr Jens Klockgether (Hannover Medical School, Germany) and Dr Shannan Ho Sui (SFU, Canada) for their feedback on the website presentation of SNP data. We also thank all 150 community annotation update participants (listed at http://www.pseudomonas.com/researchList.jsp) for their valuable contributions and all the Pseudomonas genome projects, without which this database would not be possible.

REFERENCES

1. Mathee K, Narasimhan G, Valdes C, Qiu X, Matewish JM, Koehrsen M, Rokas A, Yandava CN, Engels R, Zeng E, et al. Dynamics of Pseudomonas aeruginosa genome evolution. Proc. Natl Acad. Sci. USA. 2008;105:3100–3105. [PMC free article] [PubMed]
2. Lee DG, Urbach JM, Wu G, Liberati NT, Feinbaum RL, Miyata S, Diggins LT, He J, Saucier M, Deziel E, et al. Genomic analysis reveals that Pseudomonas aeruginosa virulence is combinatorial. Genome Biol. 2006;7:R90. [PMC free article] [PubMed]
3. Joardar V, Lindeberg M, Jackson RW, Selengut J, Dodson R, Brinkac LM, Daugherty SC, Deboy R, Durkin AS, Giglio MG, et al. Whole-genome sequence analysis of Pseudomonas syringae pv. phaseolicola 1448A reveals divergence among pathovars in genes involved in virulence and transposition. J. Bacteriol. 2005;187:6488–6498. [PMC free article] [PubMed]
4. Qiu X, Gurkar AU, Lory S. Interstrain transfer of the large pathogenicity island (PAPI-1) of Pseudomonas aeruginosa. Proc. Natl Acad. Sci. USA. 2006;103:19830–19835. [PMC free article] [PubMed]
5. Brinkman FS, Hancock RE, Stover CK. Sequencing solution: use volunteer annotators organized via internet. Nature. 2000;406:933. [PubMed]
6. Stover CK, Pham XQ, Erwin AL, Mizoguchi SD, Warrener P, Hickey MJ, Brinkman FS, Hufnagle WO, Kowalik DJ, Lagrou M, et al. Complete genome sequence of Pseudomonas aeruginosa PA01, an opportunistic pathogen. Nature. 2000;406:959–964. [PubMed]
7. Winsor GL, Lo R, Sui SJ, Ung KS, Huang S, Cheng D, Ching WK, Hancock RE, Brinkman FS. Pseudomonas aeruginosa genome database and PseudoCAP: facilitating community-based, continually updated, genome annotation. Nucleic Acids Res. 2005;33:D338–D343. [PMC free article] [PubMed]
8. Winsor GL, Van Rossum T, Lo R, Khaira B, Whiteside MD, Hancock RE, Brinkman FS. Pseudomonas genome database: facilitating user-friendly, comprehensive comparisons of microbial genomes. Nucleic Acids Res. 2009;37:D483–D488. [PMC free article] [PubMed]
9. Reddy TB, Riley R, Wymore F, Montgomery P, DeCaprio D, Engels R, Gellesch M, Hubble J, Jen D, Jin H, et al. TB database: an integrated platform for tuberculosis research. Nucleic Acids Res. 2009;37:D499–D508. [PMC free article] [PubMed]
10. Winsor GL, Khaira B, Van Rossum T, Lo R, Whiteside MD, Brinkman FS. The Burkholderia genome database: facilitating flexible queries and comparative analyses. Bioinformatics. 2008;24:2803–2804. [PMC free article] [PubMed]
11. Klockgether J, Munder A, Neugebauer J, Davenport CF, Stanke F, Larbig KD, Heeb S, Schock U, Pohl TM, Wiehlmann L, et al. Genome diversity of Pseudomonas aeruginosa PAO1 laboratory strains. J. Bacteriol. 2010;192:1113–1121. [PMC free article] [PubMed]
12. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. [PMC free article] [PubMed]
13. Ogino S, Gulley ML, den Dunnen JT, Wilson RB. Association for Molecular Patholpogy Training and Education Committtee. Standard mutation nomenclature in molecular diagnostics: practical and educational challenges. J. Mol. Diagn. 2007;9:1–6. [PMC free article] [PubMed]
14. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. [PMC free article] [PubMed]
15. Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, et al. The Ensembl genome database project. Nucleic Acids Res. 2002;30:38–41. [PMC free article] [PubMed]
16. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, et al. The generic genome browser: a building block for a model organism system database. Genome Res. 2002;12:1599–610. [PMC free article] [PubMed]
17. Jacobs MA, Alwood A, Thaipisuttikul I, Spencer D, Haugen E, Ernst S, Will O, Kaul R, Raymond C, Levy R, et al. Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc. Natl Acad. Sci. USA. 2003;100:14339–14344. [PMC free article] [PubMed]
18. Lewenza S, Falsafi RK, Winsor G, Gooderham WJ, McPhee JB, Brinkman FS, Hancock RE. Construction of a mini-Tn5-luxCDABE mutant library in Pseudomonas aeruginosa PAO1: a tool for identifying differentially regulated genes. Genome Res. 2005;15:583–589. [PMC free article] [PubMed]
19. Liberati NT, Urbach JM, Miyata S, Lee DG, Drenkard E, Wu G, Villanueva J, Wei T, Ausubel FM. An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc. Natl Acad. Sci. USA. 2006;103:2833–2838. [PMC free article] [PubMed]
20. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Marshall KA, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res. 2009;37:D885–D890. [PMC free article] [PubMed]
21. Parkinson H, Kapushesky M, Kolesnikov N, Rustici G, Shojatalab M, Abeygunawardena N, Berube H, Dylag M, Emam I, Farne A, et al. ArrayExpress update–from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Res. 2009;37:D868–D872. [PMC free article] [PubMed]
22. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Federhen S, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2010;38:D5–D16. [PMC free article] [PubMed]
23. Fulton DL, Li YY, Laird MR, Horsman BG, Roche FM, Brinkman FS. Improving the specificity of high-throughput ortholog prediction. BMC Bioinformatics. 2006;7:270. [PMC free article] [PubMed]
24. Min JE, Whiteside MD, Brinkman FSL, McNeney B, Graham J. A statistical approach to high-throughput screening of predicted orthologs. Comput. Stat. Data Anal. 2011;55:935–943.
25. Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, et al. PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010;26:1608–1615. [PMC free article] [PubMed]
26. Gardy JL, Laird M, Chen F, Rey S, Walsh CJ, Tusnády GE, Ester M, Brinkman FSL. PSORT-B v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinformatics. 21:617–623. [PubMed]
27. Langille MG, Brinkman FS. IslandViewer: an integrated interface for computational identification and visualization of genomic islands. Bioinformatics. 2009;25:664–665. [PMC free article] [PubMed]
28. Waack S, Keller O, Asper R, Brodag T, Damm C, Fricke WF, Surovcik K, Meinicke P, Merkl R. Score-based prediction of genomic islands in prokaryotic genomes using hidden markov models. BMC Bioinformatics. 2006;7:142. [PMC free article] [PubMed]
29. Hsiao W, Wan I, Jones SJ, Brinkman FS. IslandPath: aiding detection of genomic islands in prokaryotes. Bioinformatics. 2003;19:418–420. [PubMed]
30. Langille MG, Hsiao WW, Brinkman FS. Evaluation of genomic island predictors using a comparative genomics approach. BMC Bioinformatics. 2008;9:329. [PMC free article] [PubMed]
31. Dam P, Olman V, Harris K, Su Z, Xu Y. Operon prediction using both genome-specific and general genomic information. Nucleic Acids Res. 2007;35:288–298. [PMC free article] [PubMed]
32. Brouwer RW, Kuipers OP, van Hijum SA. The relative value of operon predictions. Brief Bioinform. 2008;9:367–375. [PubMed]
33. Kingsford CL, Ayanbule K, Salzberg SL. Rapid, accurate, computational discovery of rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol. 2007;8:R22. [PMC free article] [PubMed]
34. Rice P, Longden I, Bleasby A. EMBOSS: the European molecular biology open software suite. Trends Genet. 2000;16:276–277. [PubMed]
35. Buell CR, Joardar V, Lindeberg M, Selengut J, Paulsen IT, Gwinn ML, Dodson RJ, Deboy RT, Durkin AS, Kolonay JF, et al. The complete genome sequence of the Arabidopsis and tomato pathogen Pseudomonas syringae pv. tomato DC3000. Proc. Natl Acad. Sci. USA. 2003;100:10181–10186. [PMC free article] [PubMed]
36. Winstanley C, Langille MG, Fothergill JL, Kukavica-Ibrulj I, Paradis-Bleau C, Sanschagrin F, Thomson NR, Winsor GL, Quail MA, Lennard N, et al. Newly introduced genomic prophage islands are critical determinants of in vivo competitiveness in the liverpool epidemic strain of Pseudomonas aeruginosa. Genome Res. 2009;19:12–23. [PMC free article] [PubMed]
37. Peterson JD, Umayam LA, Dickinson T, Hickey EK, White O. The comprehensive microbial resource. Nucleic Acids Res. 2001;29:123–125. [PMC free article] [PubMed]
38. UniProt Consortium. The universal protein resource (UniProt) 2009. Nucleic Acids Res. 2009;37:D169–D174. [PMC free article] [PubMed]
39. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–D484. [PMC free article] [PubMed]
40. Choi C, Munch R, Leupold S, Klein J, Siegel I, Thielen B, Benkert B, Kucklick M, Schobert M, Barthelmes J, et al. SYSTOMONAS–an integrated database for systems biology analysis of Pseudomonas. Nucleic Acids Res. 2007;35:D533–D537. [PMC free article] [PubMed]
41. Lynn DJ, Winsor GL, Chan C, Richard N, Laird MR, Barsky A, Gardy JL, Roche FM, Chan TH, Shah N, et al. InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Mol. Syst. Biol. 2008;4:218. [PMC free article] [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...