• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2009; 37(Database issue): D251–D260.
Published online Sep 12, 2008. doi:  10.1093/nar/gkn568
PMCID: PMC2686464

OKCAM: an ontology-based, human-centered knowledgebase for cell adhesion molecules

Abstract

‘Cell adhesion molecules’ (CAMs) are essential elements of cell/cell communication that are important for proper development and plasticity of a variety of organs and tissues. In the brain, appropriate assembly and tuning of neuronal connections is likely to require appropriate function of many cell adhesion processes. Genetic studies have linked and/or associated CAM variants with psychiatric, neurologic, neoplastic, immunologic and developmental phenotypes. However, despite increasing recognition of their functional and pathological significance, no systematic study has enumerated CAMs or documented their global features. We now report compilation of 496 human CAM genes in six gene families based on manual curation of protein domain structures, Gene Ontology annotations, and 1487 NCBI Entrez annotations. We map these genes onto a cell adhesion molecule ontology that contains 850 terms, up to seven levels of depth and provides a hierarchical description of these molecules and their functions. We develop OKCAM, a CAM knowledgebase that provides ready access to these data and ontologic system at http://okcam.cbi.pku.edu.cn. We identify global CAM properties that include: (i) functional enrichment, (ii) over-represented regulation modes and expression patterns and (iii) relationships to human Mendelian and complex diseases, and discuss the strengths and limitations of these data.

INTRODUCTION

‘Cell adhesion molecules’ play central roles in much of the connection and communication between cells and their synapses (1). Cell adhesion-related communication is essential for many aspects of the proper development of a variety of organs and tissues (1). This cellular communication also plays substantial roles in the plasticity of cell recognition processes in the developed organism (2).

Cell adhesion molecules (CAM) may be especially important in the brain. The brain requires proper connections of many trillions of synapses to develop properly as well as substantial plasticity in many of these synapses to facilitate learning and memory. The dynamics of neuronal synaptic recognition, connection and disconnection appear to make substantial contributions to disorders that display mnemonic features, including addictions and autism (3,4). Current physiologic and cell biologic studies have implicated CAMs as good candidates to play important roles in synapse adhesion (1,5), neuronal connectivity and communication (1), signal transduction (5–8) and proper arrangement of pre-synaptic active zones and postsynaptic densities at classical synapses (9,10).

Current genetic studies have linked and/or associated variants in cell adhesion molecule genes with psychiatric, neurologic, neoplastic, immunologic and developmental phenotypes. The importance of CAMs in learning and memory-associated disorders is demonstrated in recent genome wide association studies (11). Vulnerabilities to addictions are associated with variants in CAM genes in studies of several independent samples (12–14). Genetic variants of the CAM genes NRXN1 and CNTNAP2 have been associated with autism (4,15). Variants in neuregulin have been associated with vulnerability to schizophrenia (16). Variants in an adhesion-like protein KIAA0319 have been associated with dyslexia (17,18).

These data underscore the importance of cell adhesion molecules in both Mendelian and complex disorders of brain and other organs and suggest that a more comprehensive view of these genes and molecules would be valuable. However, there is currently no systematic study that enumerates: (i) the number of genes and gene families that function as CAMs; (ii) common and/or global CAM functions, including those that might extend beyond their cell/cell recognition functions; (iii) common CAM genetic variants that might provide individual differences in CAM structures and functions; (iv) over-represented regulation modes and expression patterns and (v) CAM associations with diseases, especially with brain disorders.

We now report compilation of a list of 496 human CAM genes and construction of corresponding cell adhesion molecule ontology (CAMO) to systematically address these questions. Detailed annotations on CAM genes are provided. Global properties of CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations are identified. We report a knowledgebase for cell adhesion molecules (OKCAM) that provides ready access to these data and the associated ontologic system that we describe here.

IDENTIFICATION OF HUMAN CAM GENES AND RODENT HOMOLOGS

CAMs were identified based on compilation of data from manual curation of protein domain structures, Gene Ontology annotations, and 1487 annotation entries from keyword queries based on NCBI Entrez Gene annotations (Figure 1). First, we identified features of common protein domains for CAM families based on common motifs from cadherin, immunoglobulin/FibronectinIII (IgFn), integrin, neurexin, neuroligin and catenin families. Using these features, we developed Perl scripts to retrieve and standardize related InterPro domain architectures and the proteins that contain such architectures (19). After manual curation, 44 types of protein domains with 202 detailed domain architectures were identified. These included 532 human proteins that map onto 218 human genes. We used similar protocols to identify cell adhesion gene lists for rat and mouse; these genes were then further mapped to the human genome using Homologene (20). We next extracted CAMs using the Gene Ontology term ‘cell adhesion’ (GO:0007155) (21). We focus on curated entries; entries that are identified only by annotations that display Evidence Code IEA (Inferred from Electronic Annotation) are noted in Supplementary Table 7. Two hundred eighteen human proteins were identified, which mapped onto 196 human genes. Finally, we manually curated 1487 annotation entries selected from results of the Entrez Gene query ‘adhesion AND Homo sapiens [organism]’ (20). This approach added 136 more human genes to the list of cell adhesion molecules. In total, we thus identified 496 unique human CAM genes and their homologs in other species.

Figure 1.
Collection of Human CAMs. CAMs were compiled by integrating Gene Ontology annotations, domain structure information and keywords query against NCBI Entrez Gene annotations. Four hundred and ninety-six unique human genes were identified as CAMs (additional ...

Meta-data about the domain architectures for CAMs in nonhuman species provided information about CAM evolutionary histories. Of the 113 types of protein domains assessed in our dataset, 705 detailed domain architectures were noted. Among these, only 44 domains with 202 domain architectures were identified in all of the three species, human, rat and mouse. For example, in the cadherin superfamily, there is only one human gene encodes a protein with enzymatic activity, though several dozen cadherins with enzymatic activities are found in bacteria and yeast. Several categories with large numbers of domain architectures that can be detected in lower species including Caenorhabditis elegans, Drosophila melanogaster and Danio rerio, are totally absent from human, rat and/or mouse. These categories include ‘IgCAM-like cadherins’ that display 29 such domain architectures, ‘cadherins with Leucine-rich structures’ that display two such domain architectures, ‘toxin-related cadherins’ that display such 36 domain architectures and ‘cadherins with surface anchor structures’ that display seven such domain architectures. In striking contrast, 119 of the 123 ‘cadherin’ genes that can be identified in humans fall into the category of ‘simple cadherins’, that includes genes with only simple combinations of cadherin prodomains, cadherin domains and cadherin cytoplasmic domains. Although 79%, not all, of the proteins that we identify in this study display characteristic InterPro domains, the domain architecture patterns we identify do imply the specification of the CAMs in mammals.

DATA ANNOTATIONS

To elucidate the functions of CAMs, detailed annotations were given to each CAM gene. These data allow interpretation of features of each CAM at five levels: gene family and basic information, genetics, regulation, expression, and Mendelian or complex disease linkage/association.

Information about gene family and basic characterization comes from NCBI Entrez gene annotations (20), Gene Ontology (21), InterPro domains (19), protein interaction databases (22–24), knowledgebases for molecular pathways including KEGG (25), BioCarta and Pathway Interaction Database (PID) and the NCBI PubMed database (20). Genetic variations in these genes, including chromosome recombination hotspots (26), SNPs (20), insertion/deletions (27), chromosomal translocations (27) and CNVs (27), were retrieved from the UCSC Genome Browser Database (26), HapMap (28), NCBI dbSNP database (20) and Database of Genomic Variants (27), respectively. Information about potential or actual modes of regulation was annotated based on the presence of experimentally validated transcription factor binding sites (TFBS) (29), experimentally validated (30,31) and putative miRNA targets (32), noncoding RNA loci (33), cis/trans-natural antisense transcripts (NATs) (34,35), alternative splicing and post-translational modifications (36) from databases that included TransFac (29), Argonaute (31), TarBase (30), PicTar (32), NatsDB (34,35), NONCODE (33) and dbPTM (36). Information about mRNA expression levels came from: (i) integrated human expressed sequence tag profiles based on developmental stages and tissue distributions, as deposited in Unigene (20) and (ii) mouse brain region expression profiles described in the Allen Brain Atlas (37), with mapping of these data to human orthologs using Homologene (20). We integrated gene expression information at peptide/protein levels by collecting expressed proteins and peptides deposited in the PRIDE database (38). To assess potential disease linkages or associations, we integrated OMIM (20) and genome-wide association datasets (39), from public data deposited in the Genetic Association Database (39) and an additional 12 in-house genome wide association datasets.

Full descriptions of the annotation statistics are provided in Table 1. These annotations, extending from genome to post-translational modification, provide a novel avenue for studies of the global properties of CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations, as we discuss in the following sections.

Table 1.
Annotations for CAM genes

CONSTRUCTION OF A CAMO

We iteratively organized the information and knowledge for CAMs to construct a novel CAMO. CAMO was constructed as a directed acyclic graph (DAG) using DAG-Edit (40) to input, manage and update data, as shown in the screenshot (Supplementary Figure 1). We annotated each term with name, definition and source references. We added its relationship to other terms based on manual reviews of domain architecture and functional annotations at the five levels noted above.

If vertices represent terms and the relationships between terms are represented by edges, the terms in a DAG can be connected via a directed graph without cycles. CAMO thus provides a hierarchical description of functions and properties of CAMs with five top-level categories: CAM gene families, CAM genetics, CAM regulation, CAM expression and CAM diseases. Each top-level term is further divided into several categories to describe the functions in detail (Figure 2). In toto, CAMO has 850 terms with up to seven levels of depth. We mapped the 496 human genes that function in cell adhesion onto CAMO, providing a novel systematic description of CAMs (Figure 2). CAMO thus provides more specific, complete and resolved information about CAMs to scientists, especially to neuroscientists, than is available in general-purposed ontologies such as MeSH (41) and Gene Ontology (21).

Figure 2.
Structure of CAMO. CAMO provides a hierarchical description of functions and properties of CAMs with five top-level categories (A): CAM expression (B), CAM diseases (C), CAM genetics (D), CAM gene families (E) and CAM regulation (F). Each top-level term ...

OKCAM WEB INTERFACE DESIGN

We developed a PostgreSQL database termed ‘OKCAM (Ontology-based Knowledgebase for Cell Adhesion Molecules)’ to manage the CAM gene list, annotations and ‘CAMO’. We implemented a web-based user interface of this database that uses PHP and PHP/SQL query scripts. Cross-references to key external databases were included to integrate functional information about CAM genes. These external databases provide annotations for CAM gene families, CAM genetics and genomics, CAM regulation modes and expression patterns, and relationships between CAMs and human diseases (Figure 3).

Figure 3.
Structure of OKCAM Web Server. Several interactive browsing options were implemented to facilitate user queries of OKCAM. These include ontology overview (A), full gene list overview (B), chromosomal overview (C), text search (D) and BLAST search (D). ...

The information for each CAM gene is integrated and presented in a single graphical web page. For example, the OKCAM entry page for cadherin 1 (CDH1) (http://okcam.cbi.pku.edu.cn/entry-info.php?id=999) shows that CDH1 is located on chromosome 16 in a chromosome region that contains a recombination hotspot, copy number variations and insersion/deletions (‘CAM genetics information’). CDH1 transcripts are relatively highly expressed in adult (‘developmental stage’), mammary gland (‘tissue distribution’) and cerebral cortex (‘brain region’). Translation products are also expressed in placenta/blood serum (‘protein expression’). CDH1 is implicated in neoplasia by genomewide association studies and OMIM annotations (‘CAM disease’). Potential CDH1 regulatory modes include alternative splicing regulation, cis-NATs regulation, miRNA regulation as well as post-translational modifications (‘CAM regulation’). Links to the original databases and other resources facilitate information tracing.

We implemented four interactive browsing options in OKCAM to facilitate user queries. Users can browse cell adhesion genes by ‘CAMO’, displayed as hierarchical trees on the homepage. They can zoom in on a particular branch of the ontology by clicking the ‘+’ sign to expand the branch. For example, a user interested in ‘psychiatric disorders’ may expand this category, focus on ‘drug addiction’ and see the 49 CAM genes currently mapped on this term by clicking the number that follows this term (Figures 2 and and3).3). A ‘Chromosomal Overview’ browser supports browsing the CAM genes by clicks on chromosomal locations marked by ‘+++’ (Figure 3). A text search interface facilitates database queries that use either gene IDs or names. A fourth interface supports sequence searching based on BLAST nucleotide and amino acid sequence similarities. Each interactive browsing interface returns CAM gene/gene lists that meet query requirements. Users can then obtain further detailed annotation by clicking on the gene name (Figure 3). A download page makes all data, database schema and PostgreSQL commands available at http://okcam.cbi.pku.edu.cn/download.php.

APPLICATIONS OF OKCAM

The comprehensive annotations and ontology system of OKCAM facilitate studies of the global properties of the CAM genes, overrepresented types of variation, overrepresented regulation modes and expression patterns, and disease associations.

GLOBAL FEATURES OF CAMs

CAMs in our dataset were annotated using Gene Ontology (GO) (21) and the pathway databases KEGG (25), BioCarta and Pathway Interaction Database (PID). We can thus identify significantly enriched Gene Ontology terms and pathways using DAVID (42) and KOBAS (43,44), respectively. We selected the functional categories that were more likely to be biologically meaningful by calculating the statistical significance of each functional category in the input set of genes versus all annotated genes in the human genome. There was statistically significant enrichment for CAM genes in 16 ‘molecular function’ terms (Supplementary Table 1), 11 ‘subcellular localization’ terms (Supplementary Table 2) and 45 ‘biological processes’ terms (Supplementary Table 3), when compared to corresponding data for the whole genome.

Identification of functional enrichment for several of the ‘molecular function’ and ‘subcellular localization’ terms is reassuring. This identification provides relatively little additional information, however, since CAMs do function as ‘adhesion molecules’. Most are well documented to sit within (or be anchored to) plasma membranes. However, there is also significant enrichment for other molecular functions that might not have been so readily anticipated, including calcium binding, protein kinase, and protein phosphatase activities (Supplementary Tables 1 and 4). The significant overrepresentation of CAM localizations within receptor complexes and extracellular matrix is also of interest (Supplementary Table 2). It is interesting that the CAMs identified in this work are overrepresented in not only ‘cell adhesion’ but also in biological processes that include signal transduction, responses to external stimuli, cell motility, migration, and nervous system development (Supplementary Table 3). Reassuringly, the molecular pathway enrichment analyses that used each of the three different pathway databases provided results that implicated their roles in largely similar functional pathways (Supplementary Table 5).

Data from OKCAM annotations for protein interactions allowed us to develop a molecular network based on proteins that could interact with the CAMs identified here (Supplementary Figure 2). As for other established biological networks (45,46), the connectivity distribution of the network that we nominated in this way appears to follow scale-free rules. CAMs appear to interact with each other to form a relatively tight ‘core’ that interrelates with hundreds of other signal transduction genes. Focus on the ‘hub nodes’ in this apparent network (Supplementary Figure 2) may even help to elucidate novel CAM roles in signal transduction that come from its partnerships with other signaling molecules.

CAM REGULATORY MODES

Mapping the CAMs in our dataset onto CAMO and detailed gene structural/regulatory terms allows us to identify specific potential regulatory modes for these CAMs. We can then perform Monte Carlo analyses to test whether these structural/regulatory modes are overrepresented among CAMs. On human genomic level, both recombination ‘hotspots’ (Monte Carlo P = 0.024) and copy number variations (Monte Carlo P < 0.0001) are over-represented in chromosome regions that contain CAM genes. Indeed, ‘cell adhesion molecule’ is the GO category that is most enriched in the genes that overlap with 1447 copy number variants identified using Affymetrix 500 K and whole genome TilePath (WGTP) reagents (47). There is a more modest but still significant 1.42-fold enrichment for CAM genes in chromosomal regions that contain both copy number variations and recombination hotspots (P = 0.07). By contrast, we detected no significant difference for the densities of single nucleotide polymorphisms (SNP) distributions in chromosomal regions that contain CAM genes versus the whole genome (P > 0.5).

When we tested potential overrepresentation of transcriptional regulatory modes using hypergeometric tests, we found that the potential for miRNA regulation was significantly enriched for CAM genes when compared to the whole genome (P < 0.0001). In contrast, no over-represented transcription factor regulation for CAM genes were detected using either low scale experimentally validated (P = 0.37) or ChIP-chip data (P = 0.51). There was no significant over- or under-representation of CAMs among genes involved in either cis- or trans-NAT (35) regulation (P > 0.5 for each).

We can also seek overrepresentation of CAM alternative splicing by compiling the alternative splicing isoforms for each human gene mapped on CAMO and plotting the distributions of the numbers of isoforms for (i) CAMs versus (ii) all human genes (Supplementary Figure 3). The overall distributions appear similar. However, genes that utilize a wealth of alternative transcripts, those that encode ~40–50 alternatively spliced isoforms, are over-represented in the dataset that encodes CAMs. These genes provide an apparently distinct ‘peak’ in the distribution curve (Supplementary Figure 3). This analysis agrees with our previous work that has characterized multiple alternative splicing events in specific addiction-associated CAMs (13).

We integrated post-translational modification (PTM) data to identify possible contributions of this regulatory mode to CAM functions. On the basis of the experimentally validated PTM data deposited in dbPTM, the 496 CAM genes are candidates for involvement in glycosylation (334 genes), phosphorylation (114 genes), amidation (22 genes), palmitoylation (eight genes), methylation (three genes), farnesylation (two genes), myristoylation (two genes), sulfation (one gene) and acetylation (one gene). There is a highly significant enrichment for CAM N-linked glycosylation (331 genes, P < 0.0001), but not for O-linked glycosylation (10 genes). No significant over- or under-representation was detected for other modes of post translational modification.

On the basis of the OKCAM annotations and CAMO, we identified a list of regulatory modes for cell adhesion molecules. These analyses identified both expected and unexpected CAM regulatory modes. First, the data document the overrepresentation of CNVs within CAM genes, in ways that were suggested in even some of the initial descriptions of CNVs (48). Documenting a 1.4-fold enrichment for CAM genes in chromosomal regions that contain both copy number variations and recombination hotspots both supports these initial observations and provides a possible mechanism for the abundance of CNVs in CAM genes. Secondly, although many papers have described many alternative splicing isoforms for CAMs, it was somewhat surprising to note that the largest diversity of alternative transcripts (e.g. ~40–50) was selectively over-represented among CAM genes.

CAM EXPRESSION PATTERNS

Integration of data from human expressed sequence tags (EST) derived from brain libraries and mouse brain atlas expression profiles provided strong levels of agreement that support use of this comparative approach (Supplementary Table 6). We thus analyzed CAM expression patterns and levels in 17 mouse brain regions, based on Allen Brain Atlas profiles from murine brains. For each brain region, we used the program R to plot the density curves that illustrate the frequency distributions of expression levels for (i) CAMs and (ii) all human genes expressed in this brain region (Supplementary Figure 4). For 16 of the 17 brain regions, the expression distribution curves for the two datasets merged. In these brain regions, CAM genes taken as a group appear to be expressed in ways that are not markedly different from those of other brain-expressed genes. However, in the cerebral cortex, CAM genes with the highest expression levels appear to be over-represented. There is thus an additional peak in the CAM distribution curve that is not found when all other genes are examined (Supplementary Figure 4). While much prior data documents expression of many CAMs in cerebral cortex, the specificity of the relatively richer expression of CAMs in this brain region provides a novel observation.

CAM DISEASE ASSOCIATIONS

We assessed potential relationships between CAM variants and disease using data from OMIM, public GWAS data and our in-house datasets. These data nominate 167 human CAMs as likely to contain variants that could contribute to individual differences in vulnerability to disorders in brain and a variety of other organs (Figure 4). CAMs were identified by association and/or linkage findings in disorders of the nervous system (91 genes), immune system (30 genes), metabolism (29 genes), cardiovascular system (28 genes), skin and connective tissues (26 genes), musculoskeletal system (25 genes) and hyperplasia and/or tumors (23 genes). When assessed in relation to specific disorders or narrower classes of disorders, there were relatively large numbers of cell adhesion molecules implicated in substance dependence (49 genes), Alzheimer's disease (42 genes), tumors (21 genes), heart disease (20 genes), bipolar disorder (18 genes), autoimmune diseases (19 genes) and diabetes mellitus (17 genes). The number of CAMs whose variants are tentatively implicated in nervous system phenotypes is larger than anticipated by chance (Figure 4). The distribution of findings in other disorders is similar to that displayed by all genes, when comparing data from either OMIM or GWA datasets.

Figure 4.
Distribution of CAM in OMIM and GWA. OMIM, GWA and/or our in-house GWA data implicates variants in at least 167 (of the 496) CAM genes in various diseases. Data from OMIM shares disease distribution patterns with that from GWA studies.

DISCUSSION

‘Cell adhesion molecules’ are increasingly recognized as ‘cell adhesion receptors’, since many of their functions are just ‘cell glue’ but rather are more consistent with roles in cell–cell and cell–matrix interactions and in molecular recognition events that transduce signals. The computational approaches that we use here to define and characterize a universe of ‘cell adhesion’ molecules provide both expected and unexpected results. These results should be assessed in light of the strengths and limitations of the approaches used here, and the strengths and limitations of the underlying datasets employed for these analyses. We also discuss details of the strengths and limitations of these data in Supplementary Text 1.

We have attempted to provide as comprehensive a list of human CAM genes, annotations and ontology-based CAM knowledgebase as possible. However, it is clear that there will be rapid progress in the study of these molecules and of cell adhesion mechanisms. The OKCAM database provides means for integrating new data and updating knowledge, in ways that should facilitate better and better understanding of the global and specific CAM properties. As CAM genomic features regulatory modes, expression patterns and disease associations become clearer, we thus hope that OKCAM should become even more comprehensive and useful.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online.

FUNDING

National Institutes of Health Intramural Research Program (NIDA), NIH grants P50CA/DA84718; China Scholarship Council (C.Y.L.); China National High-tech 863 Programs (2006AA02A312, 2006AA02Z334); 973 Programs (2007CB946904). Funding for open access charge: NIDA/IRP grants P50CA/DA84718.

Conflict of interest statement. None declared.

ACKNOWLEDGEMENTS

We thank Drs T. Drgon, A. Hishimoto, Y. Zhang and X. Yu for insightful suggestions. We are grateful to Shuqi Zhao, Xizeng Mao, Zhi-Yu Peng and Qi-Yao Li for assistance with OKCAM Web Server development.

REFERENCES

1. Yamada S, Nelson WJ. Synapses: sites of cell recognition, adhesion, and functional specification. Annu. Rev. Biochem. 2007;76:267–294. [PMC free article] [PubMed]
2. Takeichi M, Abe K. Synaptic contact dynamics controlled by cadherin and catenins. Trends Cell. Biol. 2005;15:216–221. [PubMed]
3. Hishimoto A, Liu QR, Drgon T, Pletnikova O, Walther D, Zhu XG, Troncoso JC, Uhl GR. Neurexin 3 polymorphisms are associated with alcohol dependence and altered expression of specific isoforms. Hum. Mol. Genet. 2007;16:2880–2891. [PubMed]
4. Kim HG, Kishikawa S, Higgins AW, Seong IS, Donovan DJ, Shen Y, Lally E, Weiss LA, Najm J, Kutsche K, et al. Disruption of neurexin 1 associated with autism spectrum disorder. Am. J. Hum. Genet. 2008;82:199–207. [PMC free article] [PubMed]
5. Shapiro L, Love J, Colman DR. Adhesion molecules in the nervous system: structural insights into function and diversity. Annu. Rev. Neurosci. 2007;30:451–474. [PubMed]
6. Stoker AW. Protein tyrosine phosphatases and signalling. J. Endocrinol. 2005;185:19–33. [PubMed]
7. Salinas PC, Price SR. Cadherins and catenins in synapse development. Curr. Opin. Neurobiol. 2005;15:73–80. [PubMed]
8. Hirano S, Suzuki ST, Redies C. The cadherin superfamily in neural development: diversity, function and interaction with other molecules. Front. Biosci. 2003;8:d306–355. [PubMed]
9. Song JY, Ichtchenko K, Sudhof TC, Brose N. Neuroligin 1 is a postsynaptic cell-adhesion molecule of excitatory synapses. Proc. Natl Acad. Sci. USA. 1999;96:1100–1105. [PMC free article] [PubMed]
10. Dityatev A, Dityateva G, Schachner M. Synaptic strength as a function of post- versus presynaptic expression of the neural cell adhesion molecule NCAM. Neuron. 2000;26:207–217. [PubMed]
11. Butcher LM, Meaburn E, Dale PS, Sham P, Schalkwyk LC, Craig IW, Plomin R. Association analysis of mild mental impairment using DNA pooling to screen 432 brain-expressed single-nucleotide polymorphisms. Mol. Psychiatry. 2005;10:384–392. [PubMed]
12. Johnson C, Drgon T, Liu QR, Walther D, Edenberg H, Rice J, Foroud T, Uhl GR. Pooled association genome scanning for alcohol dependence using 104,268 SNPs: validation and use to identify alcoholism vulnerability loci in unrelated individuals from the collaborative study on the genetics of alcoholism. Am. J. Med. Genet. B Neuropsychiatr. Genet. 2006;141B:844–853. [PMC free article] [PubMed]
13. Liu QR, Drgon T, Johnson C, Walther D, Hess J, Uhl GR. Addiction molecular genetics: 639,401 SNP whole genome association identifies many ‘cell adhesion’ genes. Am. J. Med. Genet. B Neuropsychiatr. Genet. 2006;141:918–925. [PubMed]
14. Uhl GR, Liu QR, Drgon T, Johnson C, Walther D, Rose JE, David SP, Niaura R, Lerman C. Molecular genetics of successful smoking cessation: convergent genome-wide association study results. Arch. Gen. Psychiatry. 2008;65:683–693. [PMC free article] [PubMed]
15. Arking DE, Cutler DJ, Brune CW, Teslovich TM, West K, Ikeda M, Rea A, Guy M, Lin S, Cook EH, et al. A common genetic variant in the neurexin superfamily member CNTNAP2 increases familial risk of autism. Am. J. Hum. Genet. 2008;82:160–164. [PMC free article] [PubMed]
16. Munafo MR, Attwood AS, Flint J. Neuregulin 1 genotype and schizophrenia. Schizophr. Bull. 2008;34:9–12. [PMC free article] [PubMed]
17. Velayos-Baeza A, Toma C, da Roza S, Paracchini S, Monaco AP. Alternative splicing in the dyslexia-associated gene KIAA0319. Mamm. Genome. 2007;18:627–634. [PubMed]
18. Paracchini S, Thomas A, Castro S, Lai C, Paramasivam M, Wang Y, Keating BJ, Taylor JM, Hacking DF, Scerri T, et al. The chromosome 6p22 haplotype associated with dyslexia reduces the expression of KIAA0319, a novel gene involved in neuronal migration. Hum. Mol. Genet. 2006;15:1659–1666. [PubMed]
19. Mulder N, Apweiler R. InterPro and InterProScan: Tools for Protein Sequence Classification and Comparison. Methods Mol. Biol. 2007;396:59–70. [PubMed]
20. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2008;36:D13–D21. [PMC free article] [PubMed]
21. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 2000;25:25–29. [PMC free article] [PubMed]
22. Breitkreutz BJ, Stark C, Reguly T, Boucher L, Breitkreutz A, Livstone M, Oughtred R, Lackner DH, Bahler J, Wood V, et al. The BioGRID Interaction Database: 2008 update. Nucleic Acids Res. 2008;36:D637–640. [PMC free article] [PubMed]
23. Bader GD, Betel D, Hogue CW. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003;31:248–250. [PMC free article] [PubMed]
24. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al. Human protein reference database–2006 update. Nucleic Acids Res. 2006;34:D411–414. [PMC free article] [PubMed]
25. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 2008;36:D480–D484. [PMC free article] [PubMed]
26. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, et al. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 2008;36:D773–779. [PMC free article] [PubMed]
27. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat. Genet. 2004;36:949–951. [PubMed]
28. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PMC free article] [PubMed]
29. Wingender E, Dietze P, Karas H, Knuppel R. TRANSFAC: a database on transcription factors and their DNA binding sites.10.1093/nar/24.1.238. Nucleic Acids Res. 1996;24:238–241. [PMC free article] [PubMed]
30. Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA. 2006;12:192–197. [PMC free article] [PubMed]
31. Shahi P, Loukianiouk S, Bohne-Lang A, Kenzelmann M, Kuffer S, Maertens S, Eils R, Grone HJ, Gretz N, Brors B. Argonaute–a database for gene regulation by mammalian microRNAs. Nucleic Acids Res. 2006;34:D115–D118. [PMC free article] [PubMed]
32. Krek A, Grun D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M, et al. Combinatorial microRNA target predictions. Nat. Genet. 2005;37:495–500. [PubMed]
33. He S, Liu C, Skogerbo G, Zhao H, Wang J, Liu T, Bai B, Zhao Y, Chen R. NONCODE v2.0: decoding the non-coding. Nucleic Acids Res. 2008;36:D170–D172. [PMC free article] [PubMed]
34. Zhang Y, Li J, Kong L, Gao G, Liu QR, Wei L. NATsDB: Natural Antisense Transcripts DataBase. Nucleic Acids Res. 2007;35:D156–D161. [PMC free article] [PubMed]
35. Zhang Y, Liu XS, Liu QR, Wei L. Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species. Nucleic Acids Res. 2006;34:3465–3475. [PMC free article] [PubMed]
36. Lee TY, Huang HD, Hung JH, Huang HY, Yang YS, Wang TH. dbPTM: an information repository of protein post-translational modification. Nucleic Acids Res. 2006;34:D622–D627. [PMC free article] [PubMed]
37. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature. 2007;445:168–176. [PubMed]
38. Jones P, Cote RG, Martens L, Quinn AF, Taylor CF, Derache W, Hermjakob H, Apweiler R. PRIDE: a public repository of protein and peptide identifications for the proteomics community. Nucleic Acids Res. 2006;34:D659–D663. [PMC free article] [PubMed]
39. Lin BK, Clyne M, Walsh M, Gomez O, Yu W, Gwinn M, Khoury MJ. Tracking the epidemiology of human genes in the literature: the HuGE Published Literature database. Am. J. Epidemiol. 2006;164:1–4. [PubMed]
40. Smith B, Ceusters W, Klagges B, Kohler J, Kumar A, Lomax J, Mungall C, Neuhaus F, Rector AL, Rosse C. Relations in biomedical ontologies. Genome Biol. 2005;6:R46. [PMC free article] [PubMed]
41. Lipscomb CE. Medical Subject Headings (MeSH) Bull. Med. Libr. Assoc. 2000;88:265–266. [PMC free article] [PubMed]
42. Huang da W, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC, et al. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 2007;35:W169–W175. [PMC free article] [PubMed]
43. Wu J, Mao X, Cai T, Luo J, Wei L. KOBAS server: a web-based platform for automated annotation and pathway identification. Nucleic Acids Res. 2006;34:W720–W724. [PMC free article] [PubMed]
44. Mao X, Cai T, Olyarchuk JG, Wei L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics. 2005;21:3787–3793. [PubMed]
45. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN. Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J. 2005;272:5129–5148. [PubMed]
46. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004;430:88–93. [PubMed]
47. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, et al. Global variation in copy number in the human genome. Nature. 2006;444:444–454. [PMC free article] [PubMed]
48. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008;451:998–1003. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...