![]() | ![]() |
Formats:
|
||||||||||||
Copyright © 2008 The Author(s) dictyBase—a Dictyostelium bioinformatics resource update 1dictyBase, Northwestern University Biomedical Informatics Center and Center for Genetic Medicine, Chicago, IL 60611, USA, 2Faculty of Computer and Information Science, University of Ljubljana, Slovenia and 3Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA Corresponding author.*To whom correspondence should be addressed. Tel: Phone: +1 312 503 2303; Fax: +1 312 503 5603; Email: r-chisholm/at/northwestern.edu Correspondence may also be addressed to Pascale Gaudet. Tel: Phone: +1 312 503 2303; Fax: +1 312 503 5603; Email: pgaudet/at/northwestern.edu Received September 10, 2008; Revised October 14, 2008; Accepted October 15, 2008. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract dictyBase (http://dictybase.org) is the model organism database for Dictyostelium discoideum. It houses the complete genome sequence, ESTs and the entire body of literature relevant to Dictyostelium. This information is curated to provide accurate gene models and functional annotations, with the goal of fully annotating the genome. This dictyBase update describes the annotations and features implemented since 2006, including improved strain and phenotype representation, integration of predicted transcriptional regulatory elements, protein domain information, biochemical pathways, improved searching and a wiki tool that allows members of the research community to provide annotations. INTRODUCTION Dictyostelium discoideum is a eukaryotic microorganism that exhibits a rather unusual life cycle characterized by a unicellular stage and a facultative multicellular stage, earning it the nickname ‘social amoeba’. This relatively complex behavior for a simple organism makes it an informative system in which to study cellular processes relevant to higher eukaryotes such as cell motility, cell to cell signaling, interspecies interactions and mechanism of drug action, to name a few recent important scientific contributions made using Dictyostelium (1). dictyBase (http://dictybase.org) is the manually annotated genome database for Dictyostelium. It contains the entire 34 Mb nuclear genome sequence of the commonly used haploid laboratory strain, AX4 (2), the 55-kb mitochondrial genome (3), the extrachromosomal ribosomal RNA genes (4) and over 162 000 EST sequences (5, Urushihara, H., unpublished data). In addition, all relevant literature is integrated in the database, linked to the appropriate genes and used to annotate gene product functions, strains and mutant phenotypes, and to associate gene ontology terms with gene products. Here, we describe the new annotations and features that have been implemented in dictyBase since our last report in 2006 (6): a new system for the annotation of strains and phenotypes, the integration of predicted transcriptional regulatory elements, the display of protein domains on the Gene Page and the annotation of biochemical pathways with the dictyCyc tool based on the Pathway Tools software (7). We have also improved search abilities and are now providing a wiki for researchers to share information about Dictyostelium genes with other users. NEW DATA AND ANNOTATIONS Gene annotations in dictyBase as of September 2008 are shown in Table 1. In addition to the extraction of biological information from the literature, one of the priorities at dictyBase is to manually review every gene model. All available evidence (ESTs, published sequences, sequence similarity) is taken into account to produce the best possible gene model, which is labeled ‘Curated Model’. More than 40% of the predicted genes have been individually inspected. During the gene model curation process, we occasionally encounter pseudogenes as well as splice variants, two types of genes that are difficult to detect by gene prediction softwares.
Several types of nonprotein coding genes have been annotated. We have analyzed the Dictyostelium genome for putative tRNA genes using tRNAscan-SE software (8). We also collaborated with the Soderbom group (9) to do an all-automated load of nonprotein coding RNAs such as snoRNAs, signal recognition particle RNAs and snRNAs, most of which have been experimentally verified. The genome browser can be configured to view these various RNAs by turning on the tRNA and the ncRNA tracks. STRAINS AND PHENOTYPES Mutational analyses are widely used to study a gene's function or elucidate important biological pathways. Moreover, many diseases are caused by mutations in genes, and model organisms often provide essential insight into the molecular mechanisms of diseases. Mutations in conserved genes often cause similar phenotypes in all organisms that share comparable processes, while other phenotypic manifestations are distinctive to certain organisms. Thus, phenotype annotation poses a unique challenge: the annotations need to use a similar structure in order to be shared while describing the anatomical and behavioral features specific to every organism. To provide consistent annotations, dictyBase employs a precomposed phenotype ontology based on the EQ syntax developed by the National Center for Biomedical Ontology (10). There are two parts to the phenotype ontology: the entity (E) changed in the mutant, and a quality (Q) describing that modification. For example, a ‘small spore’ phenotype qualifies the spore (entity) as having ‘decreased size’ (quality) (Figure 1
Because phenotypes are characteristics of strains rather than genes, the database schema was modified so that phenotypes are linked to strains, which in turn are associated with the appropriate gene(s). Collecting all available information, dictyBase curators annotate strains from the published literature; subsequently, they annotate the phenotypes displayed by the mutant strain. For example, the dhkK− strain is associated with the phenotypes ‘aberrant slug migration’, ‘delayed culmination’ and ‘delayed gene expression’ (Figure1 REGULATORY ELEMENTS Putative transcriptional regulatory elements have been identified by analyzing genome-based motif information from promoter regions and expression data of about 3600 genes measured in wild-type cells and in fourteen different mutant strains. For every gene, the relation between promoter structure and gene expression in wild-type and mutant cells was identified using a data mining approach called rule-based clustering. This method finds groups of similarly expressed genes with distinct structural similarities in their regulatory regions. It uses a heuristic search, similar to that of the well-known CN2 algorithm (14). Regulatory element patterns are presented as logical expressions that include the assertions on the presence of regulatory elements, their orientation, their distance to other elements and to the first ATG. Details on the methodology can be found at http://dictybase.org/promoters/query.html. The predicted transcriptional regulatory elements are integrated within the dictyBase Genome Browser, presented as an additional Genome Browser track called ‘Putative TF binding sites’ (Figure 2
PROTEIN DOMAINS dictyBase gene pages now contain a graphical display of InterPro protein domains (15); Figure 3
IMPROVED ACCESS TO DATA: dictyMart and DOWNLOADS PAGE We have expanded the database fields that can be searched using the ‘Search dictyBase’ tool to allow searching of ESTs, gene descriptions, plasmids, strains and phenotypes; in addition to gene names, gene product names, gene descriptions, Gene Ontology terms, dictyBase IDs, GenBank accession numbers, authors, colleagues and web pages. For complex queries, we have implemented the open source European Bioinformatics Institute (EBI) package called BioMart (18) to create dictyMart that allows users to combine search criteria to generate custom data sets. dictyMart provides a graphical interface that allows searching for a gene list based on a defined set of dictyBase IDs, common Gene Ontology annotations or chromosomal location. The query output can be specified by selecting several combinations of gene names, identifiers and functional annotations or protein and DNA sequences. In the latter case, it is possible to view only upstream regions, coding sequences, or the complete genomic region. dictyMart is accessible at http://dictybase.org/biomart/martview. In addition to custom data sets acquired through dictyMart, dictyBase has an extensive download environment where a large collection of up-to-date data can be readily accessed (http://dictybase.org/Downloads). Data include gene information such as gene names and protein products, sequences and sequence annotations (GFF3 format), protein domains, mutant phenotypes, GO annotations and publications. dictyCyc: BIOCHEMICAL PATHWAYS dictyCyc provides visualization of predicted biochemical pathways in dictyBase. Pathways are assigned based on curated gene product names matching entries in the MetaCyc database (7). These matches are used to generate associations between Dictyostelium genes and biochemical pathways shared among different organisms. dictyCyc includes software to generate a graphical depiction of the pathways featuring all the reactions, reactants, enzymes and protein complexes involved in a pathway as well as the genes encoding each known protein subunit. The Gene Page shows the names of biochemical pathways in which a gene's product is involved (Supplementary Figure S1A). The link takes the user to a clickable graphical interface displaying all other enzymes in the pathway and linking to genes encoding those enzymes (Supplementary Figure S1B). Pathways are predicted and displayed by the Pathway Tools software from SRI (7) and the dictyCyc main page is at http://dictybase.org/Dicty_Info/dictycyc_info.html. This page can also be accessed from the Biochemical Pathways link under the Research Tools menu of the dictyBase page. COMMUNITY ANNOTATIONS A new feature in dictyBase is the ability for members of the Dictyostelium research community to directly and immediately add annotations to genes in dictyBase. Each gene page is linked to a corresponding wiki page and a link in red alerts the user that a community annotation is available. The wiki can also be directly accessed at http://wiki.dictybase.org. Users have already entered information such as predicted protein function, suggestions for gene and protein nomenclature and unpublished experimental data such as pictures from mutants. We have found this to be a valuable forum for community input into dictyBase and whenever possible, curators use this information to improve the gene's annotations. The community annotation section was developed using the MediaWiki software, the same wiki software used to develop Wikipedia. The utility of wiki as an annotation tool in biology has recently been a topic receiving much discussion (19–21). CONCLUSION/FUTURE DIRECTIONS dictyBase provides a focal point for the integration of all information related to Dictyostelium research. The data presented aim to be comprehensive, accurate and easy to access. We will continue to provide new data sets and tools for the research community. Another goal is to expand the scope of external resources where Dictyostelium genes and gene products are represented, which currently includes the Gene Ontology, GenBank, UniProt and orthology analysis tools (InParanoid, OrthoMCL). As the availability of other amoebae genome sequences is close at hand, dictyBase looks forward to becoming the central genome database resource for those other amoebae. We have already created the infrastructure to house multiple genomes under the dictyBase umbrella and plan to develop tools for comparative genomics. FUNDING National Institutes of Health (GM64426 and HG0022) (R.L.C.), (P01-HD39691) (T.C., B.Z., G.S.) and the Slovenian Research Agency (P2-0209, J2-9699) (T.C., B.Z.) (partial). Funding for open access charge: National Institutes of Health. Conflict of interest statement. None declared. Supplementary Data are available at NAR Online. REFERENCES 1. Gaudet P, Fey P, Chisholm RL. Emerging Model Organisms. Cold Spring Harbor, NY: Cold Spring Harbor Laboratories Press; 2008. Dictyostelium discoideum: The Social Amoeba. 2. Eichinger L, Pachebat JA, Glockner G, Rajandream MA, Sucgang R, Berriman M, Song J, Olsen R, Szafranski K, Xu Q, et al. The genome of the social amoeba Dictyostelium discoideum. Nature. 2005;435:43–57. [PubMed] 3. Ogawa S, Yoshino R, Angata K, Iwamoto M, Pi M, Kuroe K, Matsuo K, Morio T, Urushihara H, Yanagisawa K, et al. The mitochondrial DNA of Dictyostelium discoideum: complete sequence, gene content and genome organization. Mol. Gen. Genet. 2000;263:514–519. [PubMed] 4. Sucgang R, Chen G, Liu W, Lindsay R, Lu J, Muzny D, Shaulsky G, Loomis W, Gibbs R, Kuspa A. Sequence and structure of the extrachromosomal palindrome encoding the ribosomal RNA genes in Dictyostelium. Nucleic Acids Res. 2003;31:2361–2368. [PubMed] 5. Urushihara H, Morio T, Tanaka Y. The cDNA sequencing project. Methods Mol. Biol. 2006;346:31–49. [PubMed] 6. Chisholm RL, Gaudet P, Just EM, Pilcher KE, Fey P, Merchant SN, Kibbe WA. dictyBase, the model organism database for Dictyostelium discoideum. Nucleic Acids Res. 2006;34:D423–D427. [PubMed] 7. Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, et al. MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. 2008;36:D623–D631. [PubMed] 8. Lowe TM, Eddy SR. A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25:955–964. [PubMed] 9. Aspegren A, Hinas A, Larsson P, Larsson A, Soderbom F. Novel non-coding RNAs in Dictyostelium discoideum and their expression during development. Nucleic Acids Res. 2004;32:4646–4656. [PubMed] 10. Mabee PM, Ashburner M, Cronk Q, Gkoutos GV, Haendel M, Segerdell E, Mungall C, Westerfield M. Phenotype ontologies: the bridge between genomics and evolution. Trends Ecol. Evol. 2007;22:345–350. [PubMed] 11. Consortium TGO. The Gene Ontology project in 2008. Nucleic Acids Res. 2008;36:D440–D444. [PubMed] 12. Degtyarenko K, deMatos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcántara R, Darsow M, Guedj M, Ashburner M. ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 2008;36:D344–D350. [PubMed] 13. Gaudet P, Williams JG, Fey P, Chisholm RL. An anatomy ontology to represent biological knowledge in Dictyostelium discoideum. BMC Genomics. 2008;9:130–141. [PubMed] 14. Clark P, Niblett T. The CN2 induction algorithm. Machine Learning. 1989;3:261–283. 15. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, Copley R, et al. New developments in the InterPro database. Nucleic Acids Res. 2007;35:D224–D228. [PubMed] 16. Consortium TU. The Universal Protein Resource (UniProt). Nucleic Acids Res. 2007;35:D193–D197. [PubMed] 17. Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics. 2001;2:7–13. [PubMed] 18. Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E. EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004;14:160–169. [PubMed] 19. Giles J. Key biology databases go wiki. Nature. 2007;445:691. [PubMed] 20. Osborne JD, Lin S, Kibbe WA. Other riffs on cooperation are already showing how well a wiki could work. Nature. 2007;445:856. [PubMed] 21. Mons B, Ashburner M, Chichester C, van Mulligen E, Weeber M, den Dunnen J, van Ommen GJ, Musen M, Cockerill M, Hermjakob H, et al. Calling on a million minds for community annotation in WikiProteins. Genome Biol. 2008;9:R89. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||
Nature. 2005 May 5; 435(7038):43-57.
[Nature. 2005]Mol Gen Genet. 2000 Apr; 263(3):514-9.
[Mol Gen Genet. 2000]Nucleic Acids Res. 2003 May 1; 31(9):2361-8.
[Nucleic Acids Res. 2003]Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D423-7.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2008 Jan; 36(Database issue):D623-31.
[Nucleic Acids Res. 2008]Nucleic Acids Res. 1997 Mar 1; 25(5):955-64.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2004; 32(15):4646-56.
[Nucleic Acids Res. 2004]Trends Ecol Evol. 2007 Jul; 22(7):345-50.
[Trends Ecol Evol. 2007]Nucleic Acids Res. 2008 Jan; 36(Database issue):D440-4.
[Nucleic Acids Res. 2008]Nucleic Acids Res. 2008 Jan; 36(Database issue):D344-50.
[Nucleic Acids Res. 2008]BMC Genomics. 2008 Mar 18; 9():130.
[BMC Genomics. 2008]Nucleic Acids Res. 2007 Jan; 35(Database issue):D224-8.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2007 Jan; 35(Database issue):D193-7.
[Nucleic Acids Res. 2007]BMC Bioinformatics. 2001; 2():7.
[BMC Bioinformatics. 2001]Genome Res. 2004 Jan; 14(1):160-9.
[Genome Res. 2004]Nucleic Acids Res. 2008 Jan; 36(Database issue):D623-31.
[Nucleic Acids Res. 2008]Nature. 2007 Feb 15; 445(7129):691.
[Nature. 2007]Nature. 2007 Apr 19; 446(7138):856.
[Nature. 2007]Genome Biol. 2008; 9(5):R89.
[Genome Biol. 2008]