• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of cancerinformAuthor InfoTable of ContentsEditorial Board
Cancer Inform. 2008; 6: 47–50.
Published online Mar 31, 2008.
PMCID: PMC2623291

MCAM: A Database to Accelerate the Identification of Functional Cell Adhesion Molecules

Abstract

In the post-genomic era, computational identification of cell adhesion molecules (CAMs) becomes important in defining new targets for diagnosis and treatment of various diseases including cancer. Lack of a comprehensive CAM-specific database restricts our ability to identify and characterize novel CAMs. Therefore, we developed a comprehensive mammalian cell adhesion molecule (MCAM) database. The current version is an interactive Web-based database, which provides the resources needed to search mouse, human and rat-specific CAMs and their sequence information and characteristics such as gene functions and virtual gene expression patterns in normal and tumor tissues as well as cell lines. Moreover, the MCAM database can be used for various bioinformatics and biological analyses including identifying CAMs involved in cell-cell interactions and homing of lymphocytes, hematopoietic stem cells and malignant cells to specific organs using data from high-throughput experiments. Furthermore, the database can also be used for training and testing existing transmembrane (TM) topology prediction methods specifically for CAM sequences. The database is freely available online at http://app1.unmc.edu/mcam.

Keywords: cell adhesion molecules, cancer, gene ontology, virtual gene expression, database, organ-specific homing, classification of cell adhesion molecules

Introduction

Cell adhesion molecules (CAMs) are transmembrane (TM) glycoprotein receptors that help cells to undergo a selective process of cell-cell or cell-matrix interactions. By spanning the membrane, these molecules function as links between the intra- and extra-cellular environments of cells1. In addition to adherence, the direct cell-cell or cell-matrix interactions mediated by CAMs play vital roles in various cellular processes including embryogenesis, hematopoiesis, angiogenesis, cellular growth and differentiation, migration, invasion, tumorigenesis and metastasis.13

The current biochemical and cell biology techniques have helped in identification and characterization of several CAMs involved in various functions. However, in the post-genomic era, to accelerate the identification process a combination of high-throughput experimental and computational biology approaches is necessary. Unfortunately, the current resources for CAMs are dispersed in cyber space, and retrieval of all relevant information for CAMs individually from such disparate resources becomes highly inefficient and labor intensive. Therefore, a consolidated database for CAMs that provide sequences and information including gene expression profiles will facilitate research on CAMs. To our best knowledge, there is no such CAM-specific database available for adhesion molecules with cross-reference to other sources including virtual gene expression databases. This motivated us to curate a consolidated record of available CAM sequences including their annotated information.

Design of the Database

Data collection

The MCAM database is a collection of functionally active CAMs curated from two different sources, the GO database and the Entrez Gene database. Construction of the database is shown in Figure 1. We searched the GO database at different periods of time (release dated 2003-10-01 to 2007-01-01) with keywords appropriate for CAMs that were selected from list of biological processes and molecular functions from the GO database. GO entries obtained from the above searches were downloaded and parsed using custom C++ scripts (available online) and used to populate the database. The gene symbols extracted were used as queries for Batch Gene Finder (http://cgap.nci.nih.gov/Genes/BatchGeneFinder) to obtain a list of GenBank4 accession numbers for the CAM entries. The accession numbers were used to obtain sequences from NCBI.

Figure 1
A schematic representation showing the construction of the MCAM database.

In addition to data from the GO database, the NCBI Entrez Gene database was searched using the keywords related to CAMs. Sequences from RefSeq database5 were obtained through the links from the Entrez Gene database entries. Similarly, entries from UniGene6 and Online Mendelian Inheritance in Man™ (OMIM) (Jan 2007)7 were downloaded following the respective links through the Entrez database. Protein sequences from Entrez,8 PIR (release 80)9 and UniProtKB/Swiss-Prot10 databases were also downloaded. The records for each entry were parsed and imported to Microsoft Excel using custom Visual Basic scripts (available online) embedded in Microsoft Excel.

For every CAM entry, the hyperlinks to GeneCards,11 GeneAtlas,12 CGAP — Gene Finder Tool13 and UniGene expression14 were also provided.

Using the gene symbols from mouse as queries, the human and rat CAMs were collected using Batch Gene Finder from CGAP and GeneInfoViz,15 respectively.

Evaluation of data and classification of CAMs

The annotation of the Swiss-Prot entries such as ontologies, keywords and feature table viewer, were evaluated manually for the presence of terms related to CAMs. The entries which did not have CAM related annotations in UniProtKB/Swiss-Prot were validated manually for CAMs using PubMed literature searches. Entries not validated as CAMs were removed from the database. Furthermore, each CAM were classified in to integrins, immunoglobulin-like, cadherin and selectin using the UniProtKB/Swiss-Prot annotations and literature searches.

Implementation

The data from Microsoft Excel were imported into Microsoft Access database and the Web interface was implemented using ColdFusion MX 7 and HTML 4.0. There are 22 tables in the database that include various data from different sources for mouse, human and rat CAMs (available online).

Contents and Web Interface

MCAM contents

The latest release (Version 3.0 dated 24 January, 2007) of the MCAM database includes information for CAMs from 298 GO database entries. The number of entries included in the database corresponding to GO terms from various database sources is listed in Table 1. The total number of entries included 863 from GenBank, 714 from GenPept, 874 from UniGene, 639 from Uni-ProtKB/Swiss-Prot, and 693 from PIR. The number of entries curated per species is summarized in Table 2. The number of entries differs due to the fact that the data sources such as PIR had redundant entries. Also, CAMs have been classified into superfamily of proteins and the number of entries in each class has been shown in Table 3.

Table 1
Number of entries from different database sources associated with GO terms.
Table 2
Number of entries from different database sources representing mouse, human and rat is listed.
Table 3
Superfamily classification of cell adhesion molecules and the number of entries in each class. The number of proteins whose classification is not known has also been shown.

Web interface

The contents of the MCAM database can be searched using gene symbol, gene name or accession number. A search using gene name can be performed either by full text or partial text queries. The text queries are case insensitive and the searches using accession numbers include sources from GenBank, GenPept, UniGene, UniProtKB/Swiss-Prot, PIR or OMIM sources.

For example, a search for a limbic system associated membrane protein can be conducted using the gene symbol “lsamp” (case insensitive) or the gene name (either partial or full). The results will include gene symbol, gene name, and synonymous names of genes, nucleotide (GenBank), protein (GenPept), SPRT (UniProtKB/Swiss-Prot), PIR, OMIM, UniGene accession numbers and sequence data. Hyperlinks to NCBI–GenBank, GenPept, OMIM and UniGene, and UniProtKB/Swiss-Prot database entries are provided to retrieve further information about each CAM using the accession number as the query. Hyperlinks to GeneAtlas, GeneCards and NCBI Homologue database entries are provided with the gene symbol as the query. Literature search link is provided with PubMed using the gene symbol as a keyword. Virtual expression data for normal and cancer tissues and cell lines are provided through the Cancer Gene Anatomy Project (CGAP), and, normal adult and embryonic tissues through UniGene Expression hyperlinks. Functions of each CAM are provided through the GO database process and function.

Discussion and Future Updates

The MCAM database is a web-based consolidated and searchable database of mammalian specific CAMs. It can be used for various bioinformatics and biological analyses including identifying CAMs involved in cell-cell interactions and homing of lymphocytes, hematopoietic stem cells and malignant cells to specific organs. It serves the research community by cataloguing information on CAMs available from many different databases.

With the growing amount of data from high-throughput technologies like phage display peptide library, our online MCAM database is critical for the identification of novel CAMs that are responsible for organ-specific homing of tumor cells. For example, local version of Basic Local Alignment Search Tool (BLAST)16 searches can be performed using any short oligonucleotides or peptides as queries against the CAM sequences available from the Download page as an input database. Once the CAMs are identified, the information including expression and functional profile of the proteins can be searched using the online MCAM database. We have identified 25 novel and known tumor-specific CAMs by BLAST searches utilizing the sequence data available from the MCAM database and seven amino acid peptides as queries.17

The MCAM database may also serve as a gene list for designing CAM specific oligonucleotide or cDNA probes for microarray experiments to examine the expression profiles of CAMs in various disease processes. Furthermore, the evolutionary conservation of each CAM gene within mouse, human and rat genomes can be studied using the MCAM database. Finally, the MCAM database can serve as a test or training dataset for identifying TM proteins, especially CAMs. Therefore, this database facilitates nucleotide and protein sequence analysis of CAMs assisting in CAM-specific genomics and proteomics experiments.

Acknowledgments

This work was supported in part by Molecular Therapeutics Program, Nebraska Department of Health and Human Services and by grant CA72781 (R.K.S.), from National Cancer Institute, National Institutes of Health. We thank Dr. Etsuko Moriyama of the University of Nebraska, Lincoln for her critical evaluation of the website and review of the manuscript. We also thank Eric Haas, Michelle Varney, Thomas J. Wilson and Wade Junker for their critical review of the manuscript, and Atul Rayamajhi and Steve V. Pera for their help in updating the website.

References

1. Lukas Z, Dvorak K. Adhesion molecules in biology and oncology. Acta Vet. Brno. 2004;73:93–104.
2. Stitziel NO, Mar BG, Liang J, Westbrook CA. Membrane-associated and secreted genes in breast cancer. Cancer Res. 2004;64:8682–7. [PubMed]
3. Zhou J, Sargiannidou I, Tuszynski GP. The role of adhesive proteins in the hematogenous spread of cancer. In Vivo. 2000;14:199–208. [PubMed]
4. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucl. Acids Res. 2006;34:D16–D20. [PMC free article] [PubMed]
5. Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucl. Acids Res. 2005;33:D501–D504. [PMC free article] [PubMed]
6. Wheeler DL, et al. Database resources of the National Center for iotechnology. Nucl. Acids Res. 2003;31:28–33. [PMC free article] [PubMed]
7. Hamosh A, et al. Online Mendelian Inheritance in Man (OMIM), a nowledgebase of human genes and genetic disorders. Nucl. Acids Res. 2002;30:52–5. [PMC free article] [PubMed]
8. Wheeler DL, et al. Database resources of the National Center for Biotechnology. Nucl. Acids Res. 2003;31:28–33. [PMC free article] [PubMed]
9. Barker WC, et al. The PIR.-Intesrnational Protein Sequence Database. Nucl. Acids Res. 1999;27:39–43. [PMC free article] [PubMed]
10. Boeckmann B, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids Res. 2003;31:365–70. [PMC free article] [PubMed]
11. Rebhan M, Chalifa-Caspi V, Prilusky J, Lancet D. GeneCards: encyclopedia for genes, proteins and diseasesWeizmann Institute of Science, Bioinformatics Unit and Genome Center (Rehovot, Israel) Ref Type: Report 1997
12. Carson JP, Thaller C, Eichele G. A transcriptome atlas of the mouse brain at cellular resolution. Current Opinion in Neurobiology. 2002;12:562–5. [PubMed]
13. Strausberg RL, Buetow KH, Emmert-Buck MR, Klausner RD. The Cancer Genome Anatomy Project: building an annotated gene index. Trends in Genetics. 2000;16:103–6. [PubMed]
14. Wheeler DL, et al. Database resources of the National Center for Biotechnology. Nucl. Acids Res. 2003;31:28–33. [PMC free article] [PubMed]
15. Zhou M, Cui Y. GeneInfoViz: Constructing and visualizing gene relation networks In Silico Biology 4[0026] Ref Type: Journal (Full) 2004:323–33. [PubMed]
16. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Local Alignment Search Tool. Journal of Molecular Biology. 1990;215:403–10. [PubMed]
17. Sadanandam A, Varney ML, Kanarsky L, Ali H, Mosley RL, Singh RK. Identification of functional adhesion molecules with potential role in metastasis by a combination of in vivo phage display and in silico analysis. OMICS. 2007;11:41–57. [PubMed]

Articles from Cancer Informatics are provided here courtesy of Libertas Academica

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...