![]() | ![]() |
Formats:
|
||||||||||||||
Copyright : © 2007 Larson, Fong, Gupta, Condit, Bug, Martone. A Formal Ontology of Subcellular Neuroanatomy 1National Center for Microscopy and Imaging Research, University of California, USA 2San Diego Supercomputer Center, University of California, USA 3Laboratory for Bioimaging and Anatomical Informatics, Drexel University College of Medicine, USA Edited by: Jan G. Bjaalie, International Neuroinformatics Coordination Facility, Stockholm, Sweden; University of Oslo, Norway Reviewed by: Jose L. Mejino, Structural Informatics Group, University of Washington, USA; Jan G. Bjaalie, International Neuroinformatics Coordination Facility, Stockholm, Sweden; University of Oslo, Norway *Correspondence: Maryann E. Martone, National Center for Microscopy and Imaging Research, Center for Research in Biological Structure and Development of Neurosciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0608, USA. e-mail: mmartone/at/ucsd.edu Received September 1, 2007; Accepted October 7, 2007. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. This article has been cited by other articles in PMC.Abstract The complexity of the nervous system requires high-resolution microscopy to resolve the detailed 3D structure of nerve cells and supracellular domains. The analysis of such imaging data to extract cellular surfaces and cell components often requires the combination of expert human knowledge with carefully engineered software tools. In an effort to make better tools to assist humans in this endeavor, create a more accessible and permanent record of their data, and to aid the process of constructing complex and detailed computational models, we have created a core of formalized knowledge about the structure of the nervous system and have integrated that core into several software applications. In this paper, we describe the structure and content of a formal ontology whose scope is the subcellular anatomy of the nervous system (SAO), covering nerve cells, their parts, and interactions between these parts. Many applications of this ontology to image annotation, content-based retrieval of structural data, and integration of shared data across scales and researchers are also described. Keywords: neuroanatomy, neuronal cell types, electron microscopy, subcellular anatomy, data integration Introduction In neuroscience, scientifically relevant complexity occurs at every spatial and temporal scale that is currently open to examination. Unfortunately, our current complement of experimental and analytical techniques generally locks an investigation into a very limited dimensional range, leading to a fragmented and incomplete view of nervous systems across scales. This fundamental “multiscale problem” of neuroscience is, at its core, a problem of information integration. One indication of the extreme difficulty of information integration in the neurosciences is the conspicuous lack of any widely practiced automated methods for integrating information among major classes of neuroscientific data: structural, functional, and behavioral. Many tools have been developed to provide infrastructure to organize and analyze brain data, resulting in large part from the Human Brain Project, funded through the US National Institutes of Health (Huerta et al., 1993; Koslow and Huerta, 1997). Such tools have included databases for storing primary data (e.g., CCDB; Martone et al., 2003, WebQTL; Wang et al., 2003, etc.), knowledge bases for derived information (e.g., BAMS; Bota et al., 2005 and CoCoMac; Stephan et al., 2001), tools for performing novel analyses of brain data and mining the literature (e.g., Textpresso; Muller et al., 2004). However, the integration of diverse types of information still occurs largely through the efforts of individuals who examine the data and construct the necessary bridges between different data based on their knowledge of neuroscience. The grand challenge of neuroinformatics is the creation of systems that seamlessly integrate data across spatial and temporal scales such that information, for example, about white matter bundles derived from diffusion tensor imaging can be analyzed in context with electrophysiological data recorded from the neurons whose axons make up the bundles. The difficulties in performing this type of integration from data alone is illustrated in Figure Figure1,1
In this paper, we describe specific steps toward creating generic information bridges by constructing a formal ontology designed to provide the knowledge necessary to integrate data acquired across multiple scales in structural neuroscience. An ontology is a formal representation of knowledge in a domain (Gruber, 1993). It defines the inter-related set of concepts representing a knowledge area and the common terms used to describe them, for example, “neuron is a cell” and “cell has part plasma membrane.” A critical aspect of modern ontologies is the encoding of these entities and relationships in a standard form where the semantics of the domain are machine interpretable using open source tools and software libraries. Ontologies are used by people, databases, and applications to share information in a semantically precise way within and across particular domains (Gruber, 1993). The ontology for subcellular anatomy (SAO) focuses on the spatial scale that has come to be known as the “mesoscale,” roughly defined as the dimensional range encompassing macromolecular complexes, subcellular structures up to the level of cells and cellular networks. The SAO describes neurons, glia, their parts, and how these parts come together to create the dense feltwork of processes that characterizes the nervous system. The SAO was constructed through the Cell Centered Database (CCDB) project (Martone et al., 2002, 2003, 2007), an on-line resource for disseminating data derived from light and electron microscopic imaging. The CCDB project, as its name implies, takes the view that the cell should provide the rallying point for information integration in biological tissues. Thus, the SAO starts with the cell and models how cell parts, including molecules, fit into coarser levels of anatomy. This view contrasts with the approaches of many ontologies that start at the level of gross anatomy and traverse down to the level of the cell, for example, the Foundational Model of Anatomy (FMA) (Rosse and Mejino, 2003) and BAMS (Bota et al., 2005). The SAO was built as a reference ontology with the ultimate goal of describing data, principally derived from light and electron microscopy, through the use of multiple annotation applications. It is built using the Web Ontology Language (OWL; http://www.w3.org/TR/owl-features) a W3C open standard for ontologies. Version 1.0 of the SAO was presented in Fong et al. (2007), which concentrated on the use of OWL and the associated tools for its construction. In this paper, we present an updated version (1.2) of the SAO, provide considerably greater detail on the design principles from a neuroscience point of view, describe new examples of reasoning, and describe new examples of data that are marked up using the SAO. We also briefly illustrate how it is being used as the semantic “glue” that binds together an environment of tools capable of annotating disparate types of structural data from imaging studies of the nervous system. Materials and Methods The primary source for subcellular anatomy used for the construction of the SAO was Peters et al. (1991) The Fine Structure of the Nervous System Ed 2, the standard reference for neuronal ultrastructure. Additions and modifications to this framework were also made from more recent literature. The source of each entity in the ontology is indicated as an annotation to the concept. As a way to keep epistemological distinctions clear, we adopted as an organizing framework the Basic Formal Ontology version 1.0 (BFO 1.0; Grenon, 2003) (Figure (Figure2).2
The SAO is available for download and browsing at (http://ccdb.ucsd.edu/SAO) and has been incorporated into the BioPortal1, a resource maintained by the National Center for Biomedical Ontologies (http://www.bioontology.org/ncbo/faces/index.xhtml). The SAO is expressed in OWL DL. OWL is a vocabulary extension of the Resource Description Framework (RDF) and is derived from the DAML + OIL OWL. Together with RDF and other components, these tools make up the growing semantic web community (Neumann and Prusak, 2007). One of the goals of the semantic web is to create tools for achieving highly interoperable data resources. The SAO was composed using Protege version 3, an open source authoring tool for OWL ontologies (Noy et al., 2001). The OWL standard is designed as a kind of description logic, which means that an application domain described in OWL is automatically described using formal logic-based semantics. One benefit of this is that tools like Protege and additional reasoning tools such as Pellet (Evren et al., 2005) and Swoop (Kalyanpur et al., 2005) can identify statements that are logically inconsistent. It also supports machine-based inferencing to generate new knowledge and to provide classification. The other major benefit is the machine-readability of OWL, which can be expressed as an XML document. This means that arbitrary software applications can take advantage of the knowledge and data that is encoded in an ontology as their underlying data model. It also means that ontologies written in OWL can be automatically imported and cross-linked by other ontologies. An OWL ontology contains a series of classes, properties, and annotations. The classes are simply the entities that are organized in a top-down hierarchical graph structure (Figure (Figure2A).2 In the OWL language, all properties are first-class entities, meaning they exist independently of classes they are used to describe. Consequently, whether using properties as attributes, or as relations, the same underlying logical mechanism is invoked. Therefore, OWL properties do not have the facility to distinguish between structural properties (i.e., attributes) and relationships between classes (i.e., relations). Instead, structural properties are defined through the use of OWL restrictions, which we have used throughout the SAO. These can be seen in Figure Figure3,3
In constructing the SAO, we have tried to adhere to best practices recommended by the OBO Foundry project (Smith et al., 2005). These practices include unique identifiers for each concept, re-use of existing ontologies where possible, provision of human-readable definitions that are consistent with the machine interpretable definitions encoded within the ontology. The SAO follows the principle of single inheritance as recommended by Smith et al. (2005). Single inheritance results in a is a hierarchy that is a simple tree, where children have only one parent. Through the assignation of the part of relationships, we utilize some of the features of OWL to cross-cut the is a hierarchy such that new hierarchies can be generated. Examples of this concept will be illustrated in the Results section. For the SAO, we incorporated several existing ontologies using the owl:imports mechanism of OWL within Protégé 3. In this way, we do not reinvent content that is already substantially covered in other ontologies. The import mechanism allows wholesale incorporation of existing ontologies into the SAO while maintaining the integrity and source of the original ontology. In addition to the BFO, we imported an extensive set of annotation properties from the BIRNLex (www.nbirn.net/birnlex). Entities may be added to a merged resource, but entities may not be deleted or modified nor the class structure changed. Additional resources of relevance, for example, the cell component hierarchy from Gene Ontology, that were not encoded in OWL, were imported manually and cross referenced to the appropriate identifiers. Results Structure of the SAO Classes The high level structure of the SAO is illustrated in Figure Figure2B.2 Cell We have included a set of cell types found in the nervous system (Figure S1) that include neurons and glial cells, as well as other classes of cells that one would encounter in structural studies of the nervous system, for example, vascular cells, endothelial cells, muscle cells, and macrophages. The class “Nerve Cell” contains neurons and glia, that is, cells that are derived from the neuroepithelium. We also include neuronal stem cell under this category. The SAO lists neurons (Figure S2) according to common names reflecting a mixture of classification criteria, for example, morphology (“pyramidal neuron”), proper names (“Purkinje neuron”). The SAO utilizes these names merely as labels that were assigned to cells and does not further classify cell types into subtrees based on these names, except in instances where the hierarchy is fairly straightforward, for example, layer 3 cortical pyramidal neuron is a cortical pyramidal neuron. The name chosen is meant to have meaning to a neuroscientist and not express the importance of a particular criterion for classification. In other words, we chose the label “layer 3 cortical pyramidal neuron” because we believe that there is a class of cell defined by a set of properties, not because we think its location in layer 3 is its defining characteristic. We deliberately chose to keep the cell classification flat because the SAO can be used to classify neurons along multiple dimensions according to their specific properties (see Subsection User-Defined Reclassification and Query). Rather, we have focused on providing a comprehensive model of subcellular parts and how these parts relate to the parent cell. As we discuss in a later section, we utilize the relationships between cell parts and features to infer hierarchies as they are required. The SAO organizes glial cell types (Figure S3) from a morphological perspective rather than from a strict lineage perspective. Macroglial cells include astrocytes, ependymoglial cells, oligodendrocytes, and NG2 cells, according to classifications outlined in recent literature, for example, Reichenbach and Wolberg (2005). The reference from which a particular entity was drawn is included as an annotation property for that entity. The SAO does not aim to provide a comprehensive list of nerve cells as this domain is covered in other resources, for example, BAMS (Bota et al., 2005) and the Cell Type Ontology (Bard et al., 2005). Because the SAO is meant to be applied to data, we anticipate that users will add cell types from these resources to the SAO as they are encountered. Part of cell The SAO comprises two main classes of cell parts, following the structure of the FMA: regional part and component part. Regional part of cell is elaborated under the BFO concept Fiat Object Part. A fiat object part is a part of an object that possesses at least one boundary where there is no obvious physical discontinuity or landmark structure. For example, the transition between a dendrite and the cell soma has no clear boundary. Regional parts of neurons include processes, such as dendrites and axons, the cell soma and protrusions such as dendritic spines. Regional part of glia include the cell soma and glial processes such as astrocytic endfeet and myelinating processes. Each of these regional parts may in turn be further subdivided into finer parcellations. For example, dendrites are divided into trunk, that is, the primary dendrite emanating from the cell somata, branches, and terminal specializations. Component parts are considered to be independent objects and represent the building blocks common to all cells, for example, plasma membrane, mitochondrion. Components are largely drawn from the Gene Ontology cell component hierarchy (Gene Ontology Consortium, 2002), with additional neuron-specific parts such as post-synaptic density added when necessary. Molecules Macromolecules are also elaborated within SAO under the independent continuant class. Just as with cell types, the SAO does not contain an exhaustive list of macromolecules, because we anticipate that these entities are covered in other resources. As molecules are encountered in biological data, they may be added to the SAO. Because the SAO is designed for annotation of data, we include separate entities for the RNA, DNA, and protein forms of a molecular entity. In this way, users can capture the target of a labeling study according to the molecular species localized and assign the species to the correct subcellular compartment. Properties We have devised three major groups of properties in the SAO: part of, morphological and spatial relationships, again largely following the model of the FMA. Regional parts are assigned to each cell class using restrictions, for example, neurons may only have neuronal regional parts. The geometrical relationships among cell parts are specified by relationships such as continuous with, for example, dendrites are continuous with the cell somata; dendritic spines are continuous with dendrites. Thus, each regional part is assumed to belong to a parent cell. Although some properties are assigned at the level of cell class, for example, morphological type, most are assigned at the level of cell part. In this way, cell components and macromolecules are assigned to the particular part of the nerve cell in which they are found. Similarly, because nerve cells are large and may span many brain regions, the property has anatomical location, designed to situate the cell within a regional part of the nervous system, is assigned separately to each part of the cell. The SAO thus differs from most anatomical ontologies, for example, BAMS (Bota et al., 2005) where anatomical location is assigned at the level of cell class. We have employed ‘restrictions’ within OWL to associate regional parts with the appropriate cell class. Thus, a neuron may only have regional parts of a neuron; an astrocyte may only have regional parts of an astrocyte. In contrast, component parts may be found in any cell. Although certain neuronal classes are distinguished by features such as a characteristic number of dendrites, the presence of spines or a myelinated axon, we have largely avoided creating many restrictions along these lines. Unlike gross anatomy, we usually have very few examples of a given class from which to infer these types of rules and there tends to be considerable variation within and across species of these parameters. We therefore have chosen to create a fairly generic model of a neuron in the SAO which can be used to describe individual instances of neuronal cell classes in a standard way. The SAO places molecules within their cellular contexts through the has molecular constituent property and its inverse is molecular constituent of. This property is defined as a special type of has part. Most of these molecules will be localized using techniques such as immunocytochemistry and in situ hybridization. Molecules may be assigned to any aspect of the cell, both regional and component parts, and at whatever level of granularity can be determined from the technique. An exception to this rule is the assignment of neurotransmitter. Because neurotransmitter has traditionally been one of the defining properties of a neuron to most neuroscientists, we included the property has neurotransmitter as a special type of has molecular constituent and assigned it at the level of cell class. In theory, we should be able to derive the neurotransmitter from a consideration of the types of molecules located within the synaptic region, but because techniques such as immunocytochemistry often determine neurotransmitter indirectly, for example, through the localization of a synthetic or degradative enzyme for a neurotransmitter, and because determination of a neurotransmitter usually involves additional physiological or pharmacological criteria, we decided to assign this as a simple property for now. Through the properties has anatomical location, the SAO situates cells and parts of cells into higher order brain regions. The SAO divides anatomical localization into three categories: has general anatomical location; has specific anatomical location; has atlas location. General anatomical location is assigned to the level of the cell class and is meant to encode the generally known location of a cell class. This property again was included for expediency, because neuroscientists are so used to naming individual cells as parts of anatomical regions, even though only the cell soma may be located there. The level of specification may be fairly coarse in this case, for example, Purkinje cell has general anatomical location cerebellar cortex. Specific anatomical location is meant to be assigned at the instance level and is intended to be assigned at as fine a level of granularity as possible, for example, my Purkinje cell dendrite has specific anatomical location outer third of cerebellar molecular layer. If known, anatomical location can be recorded as a set of atlas coordinates through the has atlas anatomical location property. This property type contains the atlas referenced, the coordinates, and the reference point from which the coordinates are derived, for example, bregma. Currently, the SAO assigns anatomical location in the form of free text. We are in the process of changing the anatomical location to an object property that is drawn from the BIRNLex anatomical ontology, which in turn draws its anatomical entities largely from the Neuronames hierarchy (Dubach and Bowden, 2002). Supracellular structures One of the biggest challenges in constructing the SAO was to provide the specification of supracellular entities like the Node of Ranvier and the synapse. Although these entities are treated by other ontologies (e.g., Zhang et al., 2007) as if they are independent entities, in fact neither of these objects exist independently within complex tissue. Rather, they represent sites where certain configurations of subcellular objects are found (e.g., neuropil, synapses, glomeruli, and the Node of Ranvier) and where certain functions are presumed to occur. Thus, although in preliminary versions of the SAO, we classified synapses and Nodes as objects, starting in v1.0 we utilized the structure of the BFO to classify supracellular domains through the object aggregate and site classes. An object aggregate in BFO 1.0 is defined as “an independent continuant that is a mereological sum of separate objects and possesses non-connected boundaries. Examples: a heap of stones, a group of commuters on the subway, a collection of random bacteria, a flock of geese, the patients in a hospital.” A site is defined as “an independent continuant consisting of a characteristic spatial shape in relation to some arrangement of other continuants and of the medium which is enclosed in whole or in part by this characteristic spatial shape. Sites are entities that can be occupied by other continuants.” The BFO further clarifies sites in this way: “In BFO, ‘site’ allows for a so-called relational view of space which is different from the view corresponding to the class ‘spatial region.’ Space and ‘spatial region’ entities are entities in their own rights which exist independently of any entities which can be located at them. This view of space is sometimes called ‘absolutist’ or ‘the container view.’ In BFO, the class ‘site’ allows for a so-called relational view of space, that is to say, a view according to which spatiality is a matter of relative location between entities and not a matter of being tied to space. The bridge between these two views is secured through the fact that while instances of ‘site’ are not ‘spatial region’ entities, they are nevertheless spatial entities.” (BFO 1.1; http://www.ifomis.org/bfo/1.1). We considered supracellular domains as object aggregates because they represent a somewhat ad hoc grouping of cell parts into a higher order structures. However, many of these ad hoc groupings are given special designations because they are believed to be the locations at which a particular function occurs. For example, the Node of Ranvier is the site of action potential propagation down the axon; the synapse is the site at which neurotransmission occurs. The location of that function is inferred because of the presence of one of more molecules or cell components that have been demonstrated to be involved in the expression of these dynamic processes. Figure Figure33 The synapse is modeled using the object aggregate and site classes (Figure (Figure4).4
Anatomical qualities Version 1.2 of the SAO has included a more extensive list of morphological qualities under the dependent continuant class that are used to modify objects within the SAO (Figure S4). Generic morphological qualifiers such as “round” or “spherical” are imported into SAO through the Phenotype and Trait Ontology (Gkoutos et al., 2005). However, we included a set of qualities that were specific for subcellular anatomy, for example, spine shapes (mushroom, thin, stubby), nuclear shape (round, lobulated, indented), and cell soma shape (pyramidal, fusiform). We elected in most cases not to precoordinate these terms with the independent continuants they describe, because these qualities can be assigned at the time of annotation. By pre coordination, we mean the creation of a set of independent continuants which incorporate the qualifier, for example, mushroom-shaped spine; lobulated nucleus. Precoordination was used for morphological classes that required unique identification like spine classes, where the designation of mushroom shape confers a set of unique properties to that class. We chose not to precoordinate when the qualifier was considered descriptive of an instance and not necessarily indicative of a member of a distinct class. In these cases, we apply the qualifier to the instance, for example, instance of nucleus with morphological quality “indented” at the time of annotation. In this way, we do not have to generate large numbers of classes that differ on what might be a superficial detail. Additional qualities that are assigned to each object are morphometric quantities such as length, surface area, etc., orientation, and polarity. Annotation properties Annotation properties contain information about the ontology entities. We imported the annotation properties from the BIRNLex, a lexicon developed for the Biomedical Informatics Research Network (BIRN) project (www.nbirn.net/birnlex). These properties cover lexical entities such as definitions, synonyms, alternative spellings, and the curation status of each entity. The label assigned to the class name is also an annotation property. The BIRNLex, in turn, imported many entities from the Simple Knowledge Organization System (SKOS; http://www.w3.org/2004/02/skos/), a set of RDF properties and classes for describing the entities in a knowledge resource. The definition property provides a human-readable definition for each entity in the SAO. We believe that such definitions are critical for human annotators to reference when using ontology class terms to describe data, because the equivalence between the descriptions of objects observed in an investigation and the ontology elements provides the ontology with its semantic power. Thus, a human must clearly understand the way the term is defined in the ontology in order to apply it. Because of the somewhat artificial and complicated structure imposed on some entities (see Figures Figures33 User-defined reclassification and query To illustrate how properties in OWL can be used to infer additional hierarchies from the SAO, we constructed some OWL classes which reclassify the neuron cell types based on their properties assigned by the SAO. We classified neurons based on neurotransmitter, morphological type, or the presence of spines simply by defining using OWL and Protégé that these categories ought to include any cell which had the main property of that category (e.g., that the neuron was known to use glutamate or GABA as a neurotransmitter, etc). After defining these categories, we used the open source ontology reasoner Pellet (Sirin et al., 2007) to transform the flat version of the SAO neuron type hierarchy in Figure Figure5A5
SAO as semantic “glue” In order to use the standard names of the SAO to annotate images in different data formats, the SAO is itself used as a data exchange format between three image annotation software applications. To apply the ontology to actual data, we have incorporated annotation with the SAO into our routine segmentation tools for light and electron microscopy. We have created a programmatic interface to the OWL ontology that may be called by Jinx, our 3D segmentation tool for electron tomography data. Through Jinx, users describe the objects contained in electron microscopic volumes of neural tissue as instances of the SAO, rather than as a set of user-defined objects with no relationship among them. The application of SAO captures each object and allows the definition of related objects. Instances of the SAO are then stored in a large instance store, which we call the Cellular Knowledge Base (Fong et al., 2007), where they can be queried (Chen et al., 2006). The data files used to generate the instances are stored in the CCDB which tracks their experimental and data provenance. We are in the process of incorporating SAO into additional analysis tools for analyzing neuronal branching patterns and for annotation of spatially varying signals using our GIS-based brain atlas, the SMART Atlas (Martone et al., 2007b). The SAO and Cellular Knowledge Base architecture enable us to integrate these different data types through the shared semantic representation of biologically significant elements. For example, the image of a dendritic tree generated with two-photon fluorescent microscopy (Figure (Figure1A),1 By structuring the SAO in OWL, we have made its encoded knowledge available to OWL reasoners and RDF query engines. Consequently, we use instances stored in the Cellular Knowledge Base and the knowledge encoded in the ontology to determine what molecular constituents are found in the Node of Ranvier, and which sites on the Node are they respectively found in. We can also query about the glial cell types associated with the Node, and how the parts of the glial cells relate to the different parts of the Node. Discussion We created an OWL ontology representing the subcellular anatomy of the nervous system to provide the necessary scaffold for integrating molecular and anatomical data through accurate description of mesoscale anatomy. By codifying it in OWL, we have enabled algorithmic query and analysis of that knowledge. Moreover, we have enabled the use of formalized knowledge as a standard for making connections between data formats, making connections between other ontologies, and as a data exchange format for image annotation tools. This scaffold is amenable both to tool development and to semantically driven information exchange across the field. It also provides individual researchers a means to perform reasoner-based quality control and inferential analysis of annotated neuroimages. Applying formal semantic representation techniques to neuroanatomical structure has been preliminarily addressed in the macroscopic domain (Martin et al., 2001; Mechouche et al., 2006); little exists in the mesoscopic neuroanatomical domain as yet. A Synapse Ontology was recently constructed (Zhang et al., 2007), but it does not situate synapses in their cellular and tissue contexts, nor is it built on top of community-shared foundational ontologies. Our motivation for creating the SAO was to provide the necessary tools for describing the types of subcellular and supracellular entities located in the dimensional range now termed the mesoscale. The SAO is designed as a reference ontology, defined by Brinkley et al. (2006) in the following way: “Unlike application ontologies, reference ontologies are not designed for any specific application, but are intended to be re-used in multiple application contexts […] Reference ontologies are designed according to strict ontological principles, whereas application ontologies are designed according to the viewpoint of an end-user in a particular domain.” We elected to tackle the more difficult task of creating a reference ontology with formal semantics, because we believe that such resources are needed to build models of mesoscale structures that combine information from multiple domains and to be able to utilize information obtained at the mesoscale at coarser and finer scales of granularity. Through application of the ontology, researchers can work in a narrow dimensional range, but their observations are immediately linked across scales. For example, a researcher segmenting a reconstruction derived from electron tomography may make the observation that an endoplasmic reticulum of a dendritic spine from a Purkinje cells expresses the IP3 receptor. Through the SAO, the following inferences can be made: There exists a Purkinje cell dendrite that expresses the IP3 receptor; the cell class Purkinje cell expresses the IP3 receptor; the cerebellar cortex expresses the IP3 receptor; and the cerebellum expresses the IP3 receptor. The SAO is meant to describe structure, not function nor dynamic processes, following the parcellation of biomedical reality established by the BFO. However, although we try to adhere as much as possible to this distinction within the formal class structure of the ontology, as can be seen by the labels assigned to SAO classes, many labels that are applied to our SAO entities have a functional flavor to them, for example, ‘chemical synapse’. Where possible, we tried to remove entities that mixed a structure with a function, for example, myelinating oligodendrocyte or with a physiological state, for example, activated microglia. However, we also felt in some cases that it was important to assign the labels that are commonly employed by the community. Although these labels appear in the figures and text provided in this paper, SAO classes are actually identified using semantically neutral numeric labels (e.g., SAO class sao1507566336 has the preferred label Post-synaptic Component). The human-readable preferred label is assigned as an annotation property, as are a variety of lexical term variants, such as alternate labels, abbreviations, synonyms, acronyms, and so on. This practice is standard in the ontology community, and although it makes working with the ontology at times cumbersome for humans because of the need to associate the label with the class, we find it philosophically appealing. The entity is the same entity regardless of what we call it, that is, “a rose by any other name would smell as sweet.” So the fact that our neuron labels reflect mixtures of classification schemes does not impact the class structure of the SAO; rather, the class of neuron to which the label is applied is defined by the set of properties assigned to it. Ultimately, the goal of anatomy is to provide the structural substrate for mapping function and understanding the structural constraints on dynamic processes. Anatomy is a mature discipline with a rich history. Many structures have been described, and continue to be described, particularly in electron microscopy, for which no functional property is known. The classic view of structure-function relationships assumes that structural differences reflect functional differences as well. However, mapping function onto structure is a complex issue that is currently beyond the domain of the SAO. We chose to adhere to a strict structural approach to keep the SAO scope tractable. We also, however, believe that by not mixing structural and functional classes together, it will be easier in the future to utilize the SAO within a functional ontology. As an example, the term synapse, as is recounted in all introductory textbooks, was a functional concept introduced by Sherrington to describe the transmission of information between cells. The morphological correlate of the synapse was described by Palay and colleagues using electron microscopy in the 1950s, and is also familiar to beginning students of neuroscience. SAO currently provides a formal description of the set of entities to describe the morphological correlates of what are assumed to be the sites and machinery for synaptic transmission in the nervous system. Although the labels employed, pre-synaptic and post-synaptic compartment, do have functional significance, the precise mapping of the functional aspects onto the morphological correlate is not straightforward. Though these familiar functional labels date back to work on the cellular physiological correlate of Sherrington's synapse first described by Katz and colleagues in the 1940s, as a recent paper indicating evidence for “ectopic release” from the chick ciliary ganglion synapse illustrates (Coggan et al., 2005), our understanding of neural signaling at the cellular level continues to evolve. If release of neurotransmitter can occur at sites other than the active zone visualized in electron micrographs, then the functions associated with a synapse cannot be restricted to this domain. However, by modeling a synapse as a site where objects, and eventually dynamic processes, are located, the definition of a synapse can expand as our functional understanding of synaptic transmission expands. We believe that mapping of function onto structure will be one of the greatest challenges faced by those who are creating ontologies for biomedical science. Reasoning and inference with OWL Biological objects are complex entities that do not fit neatly into single hierarchies. We have chosen to follow the recommended practice of single inheritance for all SAO classes, even when that means providing a very flat hierarchy with minimal utility for classification purposes. However, the power of OWL as an ontology formalism is that it not only enables us to explicitly express the complex qualities and inter-relatedness of entities, the standard tools built around the OWL formalism allows us to automatically infer multiple valid hierarchies for an entity, depending on what is required. For complex entities such as neuronal classes, we can use the OWL inference engine to infer hierarchies based on neurotransmitter, morphological properties, anatomical location, or circuit type (Figure (Figure5).5 We have only begun to experiment with the power of OWL to infer new knowledge about objects that is not explicitly encoded in the ontology that allows information to be inferred across scales. In Larson and Martone (2007), we provide an example of this cross scale reasoning using OWL and rules about how cell parts relate to cells and brain regions. In this example, we showed how annotation of a synapse between a terminal of a thalamocortical axon and the dendritic spine of a cortical neuron observed through axonal tracing and electron microscopy could be used to infer knowledge about regional brain connectivity. Through relationships encoded in SAO, we inferred from the presence of a labeled axon terminal that there must be a neuron in the thalamus that has an axon projecting to the cortex. From the presence of a spine, we inferred that there existed a neuron to which the spine belonged in cortex. From the local observation that an axon terminal synapsed on a dendritic spine, we could infer that thalamic cells synapse with cortical cells, and that thalamus projects to cortex. While the reasoning itself does not provide new insight about brain function, we show here that a computational algorithm was able to infer the same logical cross-scale consequences of the subcellular arrangement of cell parts as would a neuroscientist without our having to write custom code to embed that knowledge in the program. Application of the ontology In construction of the SAO, we have attempted to provide a formal structure for describing data, balancing the needs for a “top-down” versus a “bottom-up” approach. By top-down, we mean that the biological theory governing a domain is used to classify data products; by bottom-up, we mean that we do not impose prior knowledge constraints on interpreting data but let the data speak for themselves (Murphy, 2005). OWL classes are essentially descriptive templates that constrain the possible properties and relationships which instances may have. As such, we only encode knowledge into the class level when we are sure that it ought to constrain all further instances that may be seen. This criterion enforces a certain amount of rigor when describing the properties of biological entities. What are those things that must always be true of a biological entity? Unlike the case of gross anatomy, where we can be reasonably certain of the canonical form taken by the human body, for example, we do not believe that we are at the stage with subcellular anatomy where we can comfortably define such canonical forms. Thus, although we sacrifice some of the reasoning power of OWL through the minimal placement of restrictions on the classes, we designed version 1.2 of the SAO to serve as the basis by which such rules can be derived from the instances. When describing data, we apply the ontology only down to the level of granularity of which we are reasonably certain. For example, if we know the type of neuron we are describing, we can assign instances of properties to that specific class; if we do not, we can assign the observed properties to the class “neuron.” Using the reasoning power of OWL, it may turn out that the properties of this unidentified neuron are equivalent to a known class, but that can be inferred from the actual instance. In this way, the structure of the OWL standard forces the SAO to make careful and conservative descriptions about subcellular anatomy while still allowing a place for uncertainty. Instances within the SAO also serve another important function by allowing us to annotate the biological description of a piece of data with the data and experimental properties from which it was derived. Entities within SAO are not directly observable by humans but must be imaged through a device such as a microscope and recorded in some form on a particular medium. Biologists are well aware that how a specimen was prepared, imaged, and analyzed will impact the types of observations that are made. In many cases, subcellular structures that are observed under certain conditions, for example, chemical fixation, are determined to be artifactual when recorded under different conditions. Most experimentalists are uncomfortable with knowledge management systems that attempt to divorce the biological reality from the methods used for acquisition, visualization, and analysis, because these methods largely determine the form that the reality will take. We must recognize, however, that the entities that we are attempting to describe in the SAO are assumed to transcend any technique. That is, we are assuming that there is such as thing as a dendrite, even though its properties can only be described in a specific experimental context. So, although the SAO itself does not assign technique or data type to the biological entity, for each instance of the entity, we provide a link to the experimental evidentiary context and the data type from which it was derived (e.g., this “instance” of dendrite was stained with a Golgi stain and imaged in a light microscope). Through the construction of the SAO, we have made progress toward the goals of building information bridges in neuroscience in three broad areas: formalization, externalization, and standardization. By formalization, we mean the process of describing concepts in a fully explicit manner in order to clarify and sharpen the meanings of the terms being used. The lengths that we have gone to either find or impose structure on implicit concepts in subcellular anatomy reflect the absence of prior efforts to bring them into a single cohesive framework. Such a framework is important for the growing community interested in producing detailed computational models of structure and function in the nervous system. It is vitally important that experimental neuroscientists be able to communicate with this community and provide increased levels of explanation of their experimental systems. Providing a formal way of communicating, these explanations make it much easier to begin the modeling process. Ontologies in general, and the SAO in particular, is crucial “connective tissue” to help place these goals within reach for neuroscience. In order for formalized information to be used by software applications, the information must be capable of externalization. By externalization, we mean to draw attention to the ability to transform the information into “code,” as opposed to the translation of abstract concepts into a human-only readable explicit representation. Once knowledge has been formalized and subsequently codified into a computer-readable form, that knowledge becomes externalized as an entity that is capable to programmatically interact with other knowledge. This makes information much more flexible than if it resided on the printed page, and it allows algorithms to answer questions for us, saving time and effort. The process of constructing an OWL ontology formalizes the knowledge it contains, but encoding it in OWL and saving it on a computer in its underlying RDF/XML format externalizes the information for other systems to digest and manipulate via standard open source code frameworks. Through externalization, we are able to remix knowledge into other forms. It allows us to generate diagrams, to view it in different software interfaces (e.g., Jinx), to reclassify hierarchies on demand, and to run rule-based reasoning or other automatic inferencing mechanisms. The benefits of this are obvious in the context of the goals of data sharing and model construction. Externalization is also needed in order to construct algorithms that are capable of assisting neuroscientists do their own work, such as to guide them in a literature search or to suggest the name of a structure they are segmenting. Once an information bridge has been formalized, and also externalized, it can be used for the final important purpose of standardization. In this context, the aspect of standardization that we focus on is the ability for OWL ontologies to serve as semantic “glue” which allow disparate data, ontologies, and applications to interoperate. The strategy we have employed in our knowledge environment is to leverage the externalized knowledge in the SAO by embedding it in tools that have first contact with primary data. By embedding the SAO in these tools, we enable the user not only easy access to SAO terms to use in annotating their data, but also we make the tools more intelligent to minimize the amount of implied knowledge that a user must contribute. Future directions We are continuing to develop the SAO, apply it to the type of biological data contained within the CCDB and to refine the structure of the ontology. Current development is focused on the development of a set of entities to describe cellular inclusions observed in neurodegenerative disease, and entities from subcellular anatomy in domains outside of neuroscience. We welcome any feedback or contributions to the ontology from the biological community, and are working on a web-based interface through the NCBO BioPortal (http://www.bioontology.org/ncbo/faces/index.xhtml) that will facilitate this process. The process of ontology construction is laborious and contains many fits and starts that leave legacy errors within the ontology. Besides the complicated nature of the domain, we face additional challenges in developing the SAO using emerging community standards, (e.g., the BFO), that are themselves still developing. Consequently, we periodically have to refactor the ontology as new versions of the constituent come on-line. However, we believe that it is important for neuroscience ontologies to align themselves as much as possible with the broader life sciences community, because ultimately we hope to be able to integrate neuroscience with the broader domains. The act of formalizing knowledge is to make explicit what was once implicit, and in so doing clarifying the boundaries of definitions. Giving something a name gives power over it (Winston, 1992). Once we have assigned appropriate labels, the creation of a system of axioms that interrelate the labeled entities gives us additional power to describe the interactions between the entities. This practice has been at the heart of scientific understanding since the beginning of history. The poster child of formalization is mathematics itself, which is a system where the entities are variables, and the system of axioms consists of mathematical operations. The impact of mathematics, a precise and consistent means of communicating ideas, was to provide extraordinary leverage to thinkers throughout history to build truths upon truths in the service of understanding. A key example of this was the expression in calculus of the fundamental relationships between electric fields, magnetic fields, electric charge, and electric current by Maxwell's equations. It required the formal language of calculus to clarify and distill the knowledge of those physical concepts. As such, we see our attempt to formalize the concepts of the structure and function of the brain with ontologies, whose underpinnings are first-order logic, to be part of a broader pattern in the history of science. The issues we have explored through our formalization efforts might be considered to be part of a larger movement underway to develop formal means to describe biological entities. Conflict of Interest Statement The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Supplemental Data for this article can be found online at http://ccdb.ucsd.edu/SAO/Larson2007/. Figure S1 Click here for additional data file.(128K, tif) Figure S2 Click here for additional data file.(444K, tif) Figure S3 Click here for additional data file.(265K, tif) Figure S4 Click here for additional data file.(281K, tif) Acknowledgments This work was supported by NIH grants NIDA DA016602 (CCDB), NINDS RO1NS058296, NCRR RR04050, and RR08605. The Bioinformatics Research Network is supported by NIH grants RR08605-08S1 (BIRN-CC) and RR021760 (Mouse BIRN). The Protégé resource is supported by grant LM007885 from the United States National Library of Medicine. The authors wish to thank Eric A. Bushong for his help with glial cell types and Sarah M. Maynard for her help with biological entities relevant to neurodegenerative diseases. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||
Trends Neurosci. 1993 Nov; 16(11):436-8.
[Trends Neurosci. 1993]Neuroinformatics. 2003; 1(4):379-95.
[Neuroinformatics. 2003]Neuroinformatics. 2003; 1(4):299-308.
[Neuroinformatics. 2003]Neuroinformatics. 2005; 3(1):15-48.
[Neuroinformatics. 2005]Philos Trans R Soc Lond B Biol Sci. 2001 Aug 29; 356(1412):1159-86.
[Philos Trans R Soc Lond B Biol Sci. 2001]J Struct Biol. 2002 Apr-May; 138(1-2):145-55.
[J Struct Biol. 2002]Neuroinformatics. 2003; 1(4):379-95.
[Neuroinformatics. 2003]Methods Cell Biol. 2007; 79():799-822.
[Methods Cell Biol. 2007]J Biomed Inform. 2003 Dec; 36(6):478-500.
[J Biomed Inform. 2003]Neuroinformatics. 2005; 3(1):15-48.
[Neuroinformatics. 2005]Stud Health Technol Inform. 2004; 102():20-38.
[Stud Health Technol Inform. 2004]Brief Bioinform. 2007 May; 8(3):141-9.
[Brief Bioinform. 2007]Genome Biol. 2005; 6(5):R46.
[Genome Biol. 2005]Neuroinformatics. 2005; 3(1):15-48.
[Neuroinformatics. 2005]Genome Biol. 2005; 6(2):R21.
[Genome Biol. 2005]Genome Res. 2001 Aug; 11(8):1425-33.
[Genome Res. 2001]Neuroinformatics. 2005; 3(1):15-48.
[Neuroinformatics. 2005]Nucleic Acids Res. 2007 Jan; 35(Database issue):D737-41.
[Nucleic Acids Res. 2007]Neuroinformatics. 2005; 3(2):133-62.
[Neuroinformatics. 2005]Nucleic Acids Res. 2007 Jan; 35(Database issue):D737-41.
[Nucleic Acids Res. 2007]Genome Biol. 2005; 6(1):R8.
[Genome Biol. 2005]Proc AMIA Symp. 2001; ():438-42.
[Proc AMIA Symp. 2001]Nucleic Acids Res. 2007 Jan; 35(Database issue):D737-41.
[Nucleic Acids Res. 2007]AMIA Annu Symp Proc. 2006; ():96-100.
[AMIA Annu Symp Proc. 2006]Science. 2005 Jul 15; 309(5733):446-51.
[Science. 2005]Nat Rev Neurosci. 2005 Oct; 6(10):810-8.
[Nat Rev Neurosci. 2005]Cytometry A. 2005 Sep; 67(1):1-3.
[Cytometry A. 2005]