![]() | ![]() |
Formats:
|
||||||||||||||||||||||||||||||||||||||
Copyright © 2006 Higgs et al; licensee BioMed Central Ltd. An online database for brain disease research 1Elashoff Consulting LLC, Germantown, MD 20876, USA 2Stanley Medical Research Institute, Bethesda, MD, 20814-2142, USA Corresponding author.#Contributed equally. Brandon W Higgs: brandon/at/elashoffconsulting.com; Michael Elashoff: michael/at/elashoffconsulting.com; Sam Richman: sam/at/elashoffconsulting.com; Beata Barci: barcib/at/stanleyresearch.org Received February 2, 2006; Accepted April 4, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract Background The Stanley Medical Research Institute online genomics database (SMRIDB) is a comprehensive web-based system for understanding the genetic effects of human brain disease (i.e. bipolar, schizophrenia, and depression). This database contains fully annotated clinical metadata and gene expression patterns generated within 12 controlled studies across 6 different microarray platforms. Description A thorough collection of gene expression summaries are provided, inclusive of patient demographics, disease subclasses, regulated biological pathways, and functional classifications. Conclusion The combination of database content, structure, and query speed offers researchers an efficient tool for data mining of brain disease complete with information such as: cross-platform comparisons, biomarkers elucidation for target discovery, and lifestyle/demographic associations to brain diseases. Background Brain disease studies based on experiments using genome-wide measurements with microarrays are traditionally challenging as compared to other disease areas. The biological results are often hindered by statistical issues of small sample sizes, small effect sizes, and patient-to-patient variability [1-3]. Also, clinical information for patients is typically sparse, such that unknown clinical covariates can either confound or confuse many of the gene expression patterns and trends, as opposed to the primary disease. Corrections using such clinical information can greatly improve inference in determining markers for disease, as well as elucidating patterns within the disease. Technical problems in microarray data can also affect the analyses. Meaningful results are often limited by array platform-to-platform comparisons and overall organization/presentation of large data sets/results. Studies conducted on disparate platforms are inherently more difficult to analyze than those conducted on the same platform [4]. Cross-platform comparisons present analysis challenges due to differences in scaling and sensitivity (to name a few) which introduce inconsistencies in reproducibility [5-8]. Large data sets and comprehensive results summaries present another challenge that requires good organization of both analytical and bioinformatics information (e.g. expression profiles, gene summary information, pathway diagrams, fold change value comparisons, etc.) into a user-friendly format to facilitate efficient data mining. A relational web-based tool that logically combines all of these factors can enhance researchers' ability to determine the underlying genomic patterns in brain disease. The SMRIDB is an online data warehouse and analytical system designed to aid researchers in understanding the biological associations both between and within the brain disorders of schizophrenia, bipolar, and major depression. This open source database combines genomic patterns of brain disease with patient clinical metadata into a user-friendly query interface to enable efficient data mining for purposes of biomarker discovery and elucidating biological mechanisms of brain disease. The metadata includes a full summary of clinical history for each patient with hyperlinks to disease-level information, such that demographic- and lifestyle-associated effects can be determined as they relate to brain disorders. The genomic data has been compiled from 12 separate labs (identified as studies), each data set generated from brain tissue isolated from two controlled populations of 165 patients, diagnosed with one of the three brain disorders (plus unaffected control brain tissue). This genomic data has been generated across 6 separate human array platforms (Affymetrix: hgu133a, hgu133plus, hgu95av2, Agilent, Codelink, and cDNA custom array) providing patterns/trends and analytical inferences that are not limited by platform dependencies. Construction and content Bioinformatics mappings NCBI's Database for Annotation, Visualization and Integrated Discovery (DAVID 2.0) was used as the standard source for gene annotation information [9]. The primary fields extracted from DAVID include: LocusLink, gene symbol, and gene summary. Additional annotations include gene product mappings to the Kyoto Encyclopedia of Genes and Genomes (KEGG), and Gene Ontology Consortium (GO) for pathway and GO terms/classes, respectively. For Affymetrix arrays, queries were based on the Affymetrix probe ID (AFFYID). For other arrays, the Genbank accessions (GENBANK) were used. Individual study-level analysis For each of the individual studies, a series of analyses were performed. Each array (representing a single patient) was subjected to a quality control (QC) analysis for chip-level parameters (e.g. scaling factor, gene calls, control gene ratios, average correlation) with respect to the reference distribution for those parameters across the arrays. This QC analysis is represented with both graphical representations (e.g. heatmaps, scatter plots, and histograms (Figure (Figure1))1
The three disease classes were analyzed to provide a list of discriminating genes (adjusted for the demographic terms that met the criteria of significance for that gene) or markers indicative of disease (Figure (Figure3)3
Pathway/GO details page Within this pathway/GO detail page is a comprehensive summary of the gene expression profiles for each gene that is mapped to the associated pathway or GO term within each separate disease class. A confidence interval boxplot is provided within each disease comparison inclusive of every gene mapped to that pathway or GO term queried in the study (Figure (Figure7),7
Gene details page For every probe across the 6 array platforms, primary annotations were determined such that each probe is mapped to either a gene name or EST identifier (refer to Bioinformatics mappings section for mapping criteria). So each gene summary page contains probe-level information for all of the 6 array platforms and 12 studies within the database. In addition to general bioinformatics annotations (e.g. biological summary, LocusLink ID, PubMed search link, and gene symbol) and pathway/GO mappings (associations with gene that link to pathway/GO-centric pages), this page contains gene expression summaries for every probe that maps to this gene across all studies (Figure (Figure8).8
Cross-platform analysis To date, making comparisons across disparate gene expression platforms has been very difficult [5-8]. Chip manufacturing differences such as probe selection, processing protocols, and spot normalization algorithms contribute to variability that can distort mRNA transcript abundance measurements and introduce inconsistencies to hinder cross-platform comparisons. Some success has been demonstrated in reducing the problem to the most consistent sequence-verified gene annotations between two platforms (e.g. UniGene cluster membership) and examining correlations, ratio values, or gene calls, although sensitivity and global statistical inference of such approaches still remains a challenge [7,10-12]. The cross-platform comparisons within the SMRIDB are based on scaled representations of individual study-level analysis across studies to extract biological patterns and relationships. These cross-platform results are provided for both the gene level (Figure (Figure10)10
Utility and discussion The user interface was constructed to enable intuitive navigating and efficient data mining. The main site contains the primary index for the database's 4 general segmented areas: Patients, Studies, Genes, and Analysis, each of which is a gateway to unique focus areas, with mutual associations between each, such as clinical information vs. genomics results and individual study content vs. cross-platform combined analyses. The Genes tab contains an open text search engine (with partial matches) to enable queries by gene, LocusLink, or pathway for any single or combined study results. The intended users of the database include any genomics researchers facing the persistent challenges of sensitivity for biomarker discovery and cross-platform microarray comparisons. However, the content within the SMRIDB is primarily designed for biologists, clinical researchers, bioinformaticians, and scientist in the field of brain disease. The size and scope of the SMRIDB makes it a unique contribution to genomics-based brain disease research. With combined gene expression profile summaries across 12 studies and 6 platforms, there is greater confidence in scientific findings such as biomarkers for disease, biological functional roles, and regulated pathways, as compared to results obtained from any one individual study. Conclusion The SMRIDB is a comprehensive data mining tool to enable researchers to elucidate the biological mechanisms of bipolar disorder, schizophrenia, and depression. A diverse patient population combine with data generated across six microarray platforms and 12 studies to provide robust results to enhance the understanding of brain disease. Availability and requirements The SMRIDB can be accessed at https://www.stanleygenomics.org. All users must register (name and email address) to obtain a username and password. Authors' contributions BWH and ME conducted the data analysis and were involved in drafting the manuscript. SR developed the web services and database backend. BB collected and catalogued the clinical information and samples. All authors read and approved the final manuscript. Acknowledgements Postmortem brain tissue was donated by The Stanley Medical Research Institute's brain collection courtesy of Drs. Michael B. Knable, E. Fuller Torrey, Maree J. Webster, Serge Weis, and Robert H. Yolken. References
|
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||||||||||||||||||||||||
Genome Biol. 2001; 2(10):RESEARCH0042.
[Genome Biol. 2001]J Neurosci Methods. 2004 Sep 30; 138(1-2):173-88.
[J Neurosci Methods. 2004]Bioinformatics. 2002 Mar; 18(3):405-12.
[Bioinformatics. 2002]Nucleic Acids Res. 2003 Oct 1; 31(19):5676-84.
[Nucleic Acids Res. 2003]Bioinformatics. 2002 Mar; 18(3):405-12.
[Bioinformatics. 2002]Nucleic Acids Res. 2003 Oct 1; 31(19):5676-84.
[Nucleic Acids Res. 2003]Physiol Genomics. 2004 Feb 13; 16(3):361-70.
[Physiol Genomics. 2004]Genome Biol. 2003; 4(12):R82.
[Genome Biol. 2003]BMC Bioinformatics. 2003 Jun 25; 4():27.
[BMC Bioinformatics. 2003]