![]() | ![]() |
Formats:
|
||||||||||||||
Copyright © 2006 The Author(s) SynDB: a Synapse protein DataBase based on synapse ontology Center for Bioinformatics, National Laboratory of Protein Engineering and Plant Genetic Engineering, College of Life Sciences, Peking University, Beijing 100871, P.R. China 1Institute of Molecular Medicine, Peking University, Beijing 100871, P.R. China 2Center for Basic Neuroscience, UT Southwestern Medical Center, Dallas, TX 75235, USA 3Department of Medicine, Stanford, CA 94305, USA 4Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA *To whom correspondence should be addressed: Tel: +86 10 6276 4970; Fax: +86 10 6275 2438; Email: weilp/at/mail.cbi.pku.edu.cn *Correspondence may also be addressed to Zhuan Zhou. Tel: +86 10 6276 4986; Fax: +86 10 6275 3212; Email: zzhou/at/pku.edu.cn The authors wish it to be known that, in their opinion, the first three authors should be regarded as joint First Authors Received August 15, 2006; Revised October 6, 2006; Accepted October 6, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract A synapse is the junction across which a nerve impulse passes from an axon terminal to a neuron, muscle cell or gland cell. The functions and building molecules of the synapse are essential to almost all neurobiological processes. To describe synaptic structures and functions, we have developed Synapse Ontology (SynO), a hierarchical representation that includes 177 terms with hundreds of synonyms and branches up to eight levels deep. associated 125 additional protein keywords and 109 InterPro domains with these SynO terms. Using a combination of automated keyword searches, domain searches and manual curation, we collected 14 000 non-redundant synapse-related proteins, including 3000 in human. We extensively annotated the proteins with information about sequence, structure, function, expression, pathways, interactions and disease associations and with hyperlinks to external databases. The data are stored and presented in the Synapse protein DataBase (SynDB, http://syndb.cbi.pku.edu.cn). SynDB can be interactively browsed by SynO, Gene Ontology (GO), domain families, species, chromosomal locations or Tribe-MCL clusters. It can also be searched by text (including Boolean operators) or by sequence similarity. SynDB is the most comprehensive database to date for synaptic proteins. INTRODUCTION Recent developments in genomics, proteomics and systems biology have significantly impacted fields such as oncology and immunology (1–5) and are beginning to be applied to neuroscience research, generating an exponentially increasing amount of data (6–11) and calling for efficient databases. However, neuroinformatics databases at the molecular level are currently limited. For instance, databases listed in the Society for Neuroscience Database Gateway (NDG, http://ndg.sfn.org/eavObList.aspx?cl=81) principally contain imaging, anatomic or clinical data, while few focus on the gene or protein level and their functions. The synapse is a specialized intercellular junction between neurons or between neurons and other excitable cells such as muscle. The synapse plays a key role in information processing in the nervous system that underlies many neurobiological processes, including neurotransmission, learning and memory. Defects in synaptic activity are associated with many neurological disorders, including Alzheimer's disease (12). The synapse has also been proposed as an excellent candidate for large-scale systems biology studies (7,8,13). There is a critical need for a focused yet comprehensive database resource for the synapse ‘proteome’. Creating such a database is non-trivial, because the proteins involved in synaptic activities are numerous and diverse and information is scattered in multiple heterogeneous sources. No simple keyword search and no small number of domains can retrieve all the proteins. These complexities may explain why such a database has not been reported thus far. Here, we present the Synapse protein DataBase (SynDB, http://syndb.cbi.pku.edu.cn) as an information hub for synapse-related proteins. CONSTRUCTION OF SYNAPSE ONTOLOGY Ontology is defined as the ‘specification of a conceptualization’ (14). It describes a domain using a collection of concepts or terms and includes the hierarchical relationships between the terms. In order to formally describe synaptic functions and structures, we extensively reviewed three sources of information: (i) three classic text books, Synapses (15), Principles of Neural Science (16) and Ion Channels of Excitable Membranes (17); (ii) 115 recent (2000–2006) review papers published in Nature Reviews Neuroscience and Annual Review of Neuroscience; and (iii) relevant terms in two general ontologies, Gene Ontology (GO) (18) and Medical Subject Headings (MeSH) (19). By reviewing these resources and iteratively organizing the information, we constructed the first synapse ontology (SynO), a hierarchical description of synaptic structures and functions. SynO has two top-level categories: structure and function. Structure is divided into categories such as presynaptic compartment, postsynaptic compartment and glia; and function is divided into categories such as transmitter release and endocytosis, synapse formation and signal transduction in the postsynaptic neuron. In total, SynO contains 177 terms with hundreds of synonyms and up to eight levels deep. SynO is constructed, as is GO, as a directed acyclic graph (DAG). If the terms are represented by vertices and the relationships between terms are represented by edges, the terms in a DAG can be connected via a directed graph without cycles. We used DAG-edit (20) to input, manage and update SynO (Figure 1
We developed a Perl script to generate a list of search keywords based on SynO, including and expanding from SynO terms and synonyms. If a SynO term consists of more than one word, the Perl script specified which word can be expanded and whether the order of the words can be flexible. All possible combinations were automatically generated. The expanded list of search keywords was used in the next step. ASSOCIATION OF PROTEINS We searched the InterPro database using the search keywords and retrieved 400 protein domains. Through careful manual screening we identified 109 domains as being involved in synaptic activities and assigned them to the most appropriate SynO terms. We retrieved over 5000 proteins using the mapping between InterPro and UniProt (22) and associated these proteins with SynO terms. We then searched UniProt to retrieve additional protein entries that contain the search keywords. While domain-based searches tend to have a high false-negative rate (as not all domains can be modeled), keyword-based searches tend to have a high false-positive rate, requiring that we impose both automated and manual quality control. For example, entries containing ‘immune’ or ‘immunological’ were removed, because ‘immunological synapse’ is a term defining a process in the immunological system that occurs in hundreds of protein entries. In another example, thousands of false-positive entries were removed because they were annotated as being submitted by a company named Synapse. After manual review of thousands of entries, we retrieved over 10 000 proteins and assigned them with SynO. We combined the two sets of proteins and removed redundant entries following the strategy of International Protein Index (23). We considered two UniProt proteins in a species redundant if they were ≥ 95% identical over ≥ 95% of the length of the shorter sequence, based on pair-wise BLASTP of all sequences in the species. Among redundant proteins we selected SwissProt sequences over Trembl sequences. For those sequences from the same data source, we selected longer sequences over shorter ones. The resulting SynDB contains 14 000 non-redundant proteins, including 3000 in human and is the most comprehensive collection of synapse-related proteins to date. ANNOTATIONS AND WEB INTERFACE DESIGN To enhance SynDB's utility as an information resource, we developed parsers in Perl to retrieve extensive information on protein sequences, expression, protein–protein and protein-small molecule interactions, disease associations and literature references. Known 3D structures or potential structure templates were retrieved by pair-wise BLASTP comparison between SynDB proteins and non-redundant proteins with known structures from PDB_SELECT_25 (24). In addition, cross-references to ModBase (25) are also provided. Potential metabolic pathways involved were identified by running the KOBAS system against the KEGG database (26,27). Table 1 shows the protein features and related external databases. The information for each protein is integrated and presented in a single graphical web page. For example, the SynDB entry page for Huntingtin Interacting Protein 1 (HIP1) (Figure 2
We implemented six interactive browsing options in SynDB. Users can browse synapse proteins by ‘SynO’ or ‘GO’, displayed as hierarchical trees. They can zoom in on a particular branch of the ontology by clicking on the ‘+’ sign to expand the branch. For example, a user interested in ‘transmitters release and endocytosis’ may expand this category and focus on ‘synaptic vesicle cycling’ (Figure 3
SynDB supports searching by text with Boolean operators. It also supports searching by amino acid or nucleotide sequence similarity with BLAST. Information in SynDB is stored in a MySQL relational database comprised of over 100 tables. Sequences can be downloaded directly from the web and the complete database is available from the authors. We will keep SynO up-to-date by regular review of the latest literature as well as users' and collaborators' comments. We used the Perl scripts which will automatically update the sequences in SynDB followed by manual review. DISCUSSION The brain is a complex and subtle network of neurons that communicate with each other via synapses. Chemical synapses are asymmetric contact and play key roles in information processing and storage, behavior and disease. In order to better organize the wealth of synapse-related information and facilitate understanding of synapses, we developed SynDB, an online database for the synapse proteome. SynDB aims to enable systematic studies of the synaptic functions and structures at proteomic level. A focused ontology is essential for the development of such a database because of the numerous and diverse proteins involved. Beyond general-purposed ontologies such as MeSH and GO (18,19), focused ontologies such as SynO are important because they can provide more specific, complete and resolved information to scientists, such as neuroscientists interested in synaptic function. In fact, of 177 SynO terms, only 24 were derived from MeSH and GO. In its first year online, SynDB has had over 600 000 external hits (excluding search engine crawlers). SynDB's objective is to serve as a repository for current knowledge and a potential starting point for experimental design or in silico data mining. Acknowledgments This work was supported by grants from the China National High-tech 863 Program to L.W. and 973 Program (2006CB500800) and NSFC to Z.Z. We thank Drs Peace Cheng, Tim Qing-Rong Liu and John Reiland for valuable suggestions. Funding to pay the Open Access publication charges for this article was provided by China Ministry of Education ‘Program of Introducing Talents of Discipline to Universities’ (B06001). Conflict of interest statement. None declared. REFERENCES 1. Buetow K.H. The NCI Center for Bioinformatics (NCICB): building a foundation for in silico biomedical research. Cancer Invest. 2004;22:117–122. [PubMed] 2. Coleman W.B. Cancer bioinformatics: addressing the challenges of integrated postgenomic cancer research. Cancer Invest. 2004;22:161–163. [PubMed] 3. Nakagawara A., Ohira M. Comprehensive genomics linking between neural development and cancer: neuroblastoma as a model. Cancer Lett. 2004;204:213–224. [PubMed] 4. Lefranc M.P. IMGT-ONTOLOGY and IMGT databases, tools and Web resources for immunogenetics and immunoinformatics. Mol. Immunol. 2004;40:647–660. [PubMed] 5. Rammensee H.G. Immunoinformatics: bioinformatic strategies for better understanding of immune function. Introduction. Novartis Found. Symp. 2003;254:1–2. [PubMed] 6. Boguski M.S., Jones A.R. Neurogenomics: at the intersection of neurobiology and genome sciences. Nature Neurosci. 2004;7:429–433. [PubMed] 7. Choudhary J., Grant S.G. Proteomics in postgenomic neuroscience: the end of the beginning. Nature Neurosci. 2004;7:440–445. [PubMed] 8. Grant S.G. Systems biology in neuroscience: bridging genes to cognition. Curr. Opin. Neurobiol. 2003;13:577–582. [PubMed] 9. Skuse D. Genetics and genomics of neurobehavioral disorders. J. Child Psychol. Psychiat. 2004;45:1180–1181. 10. Hamacher M., Klose J., Rossier J., Marcus K., Meyer H.E. ‘Does understanding the brain need proteomics and does understanding proteomics need brains?’—second HUPO HBPP workshop hosted in Paris. Proteomics. 2004;4:1932–1934. [PubMed] 11. Insel T.R., Volkow N.D., Landis S.C., Li T.K., Battey J.F., Sieving P. Limits to growth: why neuroscience needs large-scale science. Nature Neurosci. 2004;7:426–427. [PubMed] 12. Nelson P.G. Activity-dependent synapse modulation and the pathogenesis of Alzheimer disease. Curr. Alzheimer Res. 2005;2:497–506. [PubMed] 13. Husi H., Grant S.G. Construction of a protein–protein interaction database (PPID) for synaptic biology. In: Kotter R., editor. Neuroscience Databases: A Practical Guide. Boston/Dordrecht/London: Kluwer Academic Publishers; 2002. pp. 1–62. 14. Gruber T.R. A translation approach to portable ontology specifications. Knowledge Acquisition. 1993;5:199–220. 15. Cowan W.M., Sũdhof T.C., Stevens C.F., Davies K. Synapses. Baltimore and London: The Johns Hopkins University Press; 2000. 16. Kandel E.R., Schwartz J.M., Jessell T.M. Principles of Neural Science. 4th edn. New York, NY: McGraw-Hill Companies; 2000. 17. Hille B. Ion Channels of Excitable Membranes. 3rd edn. Sunderland, Massachusetts: Sinauer Associates, Inc.; 2001. 18. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T., et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 2000;25:25–29. [PubMed] 19. Lipscomb C.E. Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 2000;88:265–266. [PubMed] 20. Smith B., Ceusters W., Klagges B., Kohler J., Kumar A., Lomax J., Mungall C., Neuhaus F., Rector A.L., Rosse C. Relations in biomedical ontologies. Genome Biol. 2005;6:R46. [PubMed] 21. Mulder N.J., Apweiler R., Attwood T.K., Bairoch A., Bateman A., Binns D., Bradley P., Bork P., Bucher P., Cerutti L., et al. InterPro, progress and status in 2005. Nucleic Acids Res. 2005;33:D201–D205. [PubMed] 22. Apweiler R., Bairoch A., Wu C.H., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., Magrane M., et al. UniProt: the Universal Protein knowledgebase. Nucleic Acids Res. 2004;32:D115–D119. [PubMed] 23. Kersey P.J., Duarte J., Williams A., Karavidopoulou Y., Birney E., Apweiler R. The International Protein Index: an integrated database for proteomics experiments. Proteomics. 2004;4:1985–1988. [PubMed] 24. Boberg J., Salakoski T., Vihinen M. Selection of a representative set of structures from Brookhaven Protein Data Bank. Proteins. 1992;14:265–276. [PubMed] 25. Pieper U., Eswar N., Braberg H., Madhusudhan M.S., Davis F.P., Stuart A.C., Mirkovic N., Rossi A., Marti-Renom M.A., Fiser A., et al. MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res. 2004;32:D217–D222. [PubMed] 26. Mao X., Cai T., Olyarchuk J.G., Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005;21:3787–3793. [PubMed] 27. Wu J., Mao X., Cai T., Luo J., Wei L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. 2006;34:W720–W724. [PubMed] 28. Hamosh A., Scott A.F., Amberger J., Bocchini C., Valle D., McKusick V.A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2002;30:52–55. [PubMed] 29. Enright A.J., Van Dongen S., Ouzounis C.A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 2002;30:1575–1584. [PubMed] 30. Niimura Y., Nei M. Evolution of olfactory receptor genes in the human genome. Proc. Natl Acad. Sci. USA. 2003;100:12235–12240. [PubMed] 31. Young J.M., Kambere M., Trask B.J., Lane R.P. Divergent V1R repertoires in five species: amplification in rodents, decimation in primates, and a surprisingly small repertoire in dogs. Genome Res. 2005;15:231–240. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Cancer Invest. 2004; 22(1):117-22.
[Cancer Invest. 2004]Novartis Found Symp. 2003; 254():1-2.
[Novartis Found Symp. 2003]Nat Neurosci. 2004 May; 7(5):429-33.
[Nat Neurosci. 2004]Nat Neurosci. 2004 May; 7(5):426-7.
[Nat Neurosci. 2004]Curr Alzheimer Res. 2005 Dec; 2(5):497-506.
[Curr Alzheimer Res. 2005]Nat Neurosci. 2004 May; 7(5):440-5.
[Nat Neurosci. 2004]Curr Opin Neurobiol. 2003 Oct; 13(5):577-82.
[Curr Opin Neurobiol. 2003]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Bull Med Libr Assoc. 2000 Jul; 88(3):265-6.
[Bull Med Libr Assoc. 2000]Genome Biol. 2005; 6(5):R46.
[Genome Biol. 2005]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D201-5.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D115-9.
[Nucleic Acids Res. 2004]Proteomics. 2004 Jul; 4(7):1985-8.
[Proteomics. 2004]Proteins. 1992 Oct; 14(2):265-76.
[Proteins. 1992]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D217-22.
[Nucleic Acids Res. 2004]Bioinformatics. 2005 Oct 1; 21(19):3787-93.
[Bioinformatics. 2005]Nucleic Acids Res. 2006 Jul 1; 34(Web Server issue):W720-4.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2002 Jan 1; 30(1):52-5.
[Nucleic Acids Res. 2002]Nucleic Acids Res. 2002 Apr 1; 30(7):1575-84.
[Nucleic Acids Res. 2002]Proc Natl Acad Sci U S A. 2003 Oct 14; 100(21):12235-40.
[Proc Natl Acad Sci U S A. 2003]Genome Res. 2005 Feb; 15(2):231-40.
[Genome Res. 2005]Nat Genet. 2000 May; 25(1):25-9.
[Nat Genet. 2000]Bull Med Libr Assoc. 2000 Jul; 88(3):265-6.
[Bull Med Libr Assoc. 2000]