pmc logo image
Logo of envhperEHP WebsiteAbout EHPPublicationsNews By TopicAuthorsSubscribePressEmail AlertsSearch EHP

Formats:

Environ Health Perspect. 2004 March; 112(4): 495–505.
PMCID: PMC1241904
Research Article
Database development in toxicogenomics: issues and efforts.
William B Mattes, Syril D Pettit, Susanna-Assunta Sansone, Pierre R Bushel, and Michael D Waters
Pfizer Inc, Groton, Connecticut, USA. wmattes@genelogic.com
Abstract
The marriage of toxicology and genomics has created not only opportunities but also novel informatics challenges. As with the larger field of gene expression analysis, toxicogenomics faces the problems of probe annotation and data comparison across different array platforms. Toxicogenomics studies are generally built on standard toxicology studies generating biological end point data, and as such, one goal of toxicogenomics is to detect relationships between changes in gene expression and in those biological parameters. These challenges are best addressed through data collection into a well-designed toxicogenomics database. A successful publicly accessible toxicogenomics database will serve as a repository for data sharing and as a resource for analysis, data mining, and discussion. It will offer a vehicle for harmonizing nomenclature and analytical approaches and serve as a reference for regulatory organizations to evaluate toxicogenomics data submitted as part of registrations. Such a database would capture the experimental context of in vivo studies with great fidelity such that the dynamics of the dose response could be probed statistically with confidence. This review presents the collaborative efforts between the European Molecular Biology Laboratory-European Bioinformatics Institute ArrayExpress, the International Life Sciences Institute Health and Environmental Science Institute, and the National Institute of Environmental Health Sciences National Center for Toxigenomics Chemical Effects in Biological Systems knowledge base. The goal of this collaboration is to establish public infrastructure on an international scale and examine other developments aimed at establishing toxicogenomics databases. In this review we discuss several issues common to such databases: the requirement for identifying minimal descriptors to represent the experiment, the demand for standardizing data storage and exchange formats, the challenge of creating standardized nomenclature and ontologies to describe biological data, the technical problems involved in data upload, the necessity of defining parameters that assess and record data quality, and the development of standardized analytical approaches.
Full Text
The Full Text of this article is available as a PDF (412K).
Selected References
These references are in PubMed. This may not be the complete list of references from this article.
  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000 May;25(1):25–29. [PubMed]
  • Ball Catherine A, Sherlock Gavin, Parkinson Helen, Rocca-Sera Philippe, Brooksbank Catherine, Causton Helen C, Cavalieri Duccio, Gaasterland Terry, Hingamp Pascal, Holstege Frank, Ringwald Martin, Spellman Paul, Stoeckert Christian J, Jr, Stewart Jason E, Taylor Ronald, Brazma Alvis, Quackenbush John. Standards for microarray data. Science. 2002 Oct 18;298(5593):539–539. [PubMed]
  • Bassett DE, Jr, Eisen MB, Boguski MS. Gene expression informatics--it's all in your mine. Nat Genet. 1999 Jan;21(1 Suppl):51–55. [PubMed]
  • Brazma Alvis, Parkinson Helen, Sarkans Ugis, Shojatalab Mohammadreza, Vilo Jaak, Abeygunawardena Niran, Holloway Ele, Kapushesky Misha, Kemmeren Patrick, Lara Gonzalo Garcia, Oezcimen Ahmet, Rocca-Serra Philippe, Sansone Susanna-Assunta. ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 2003 Jan 1;31(1):68–71. [PubMed]
  • Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001 Dec;29(4):365–371. [PubMed]
  • Bumm Klaus, Zheng Mingzhong, Bailey Clyde, Zhan Fenghuang, Chiriva-Internati M, Eddlemon Paul, Terry Julian, Barlogie Bart, Shaughnessy John D., Jr CGO: utilizing and integrating gene expression microarray data in clinical research and data management. Bioinformatics. 2002 Feb;18(2):327–328. [PubMed]
  • Burchell B, Nebert DW, Nelson DR, Bock KW, Iyanagi T, Jansen PL, Lancet D, Mulder GJ, Chowdhury JR, Siest G, et al. The UDP glucuronosyltransferase gene superfamily: suggested nomenclature based on evolutionary divergence. DNA Cell Biol. 1991 Sep;10(7):487–494. [PubMed]
  • Bushel Pierre R, Hamadeh Hisham K, Bennett Lee, Green James, Ableson Alan, Misener Stephen, Afshari Cynthia A, Paules Richard S. Computational selection of distinct class- and subclass-specific gene expression signatures. J Biomed Inform. 2002 Jun;35(3):160–170. [PubMed]
  • Bushel PR, Hamadeh H, Bennett L, Sieber S, Martin K, Nuwaysir EF, Johnson K, Reynolds K, Paules RS, Afshari CA. MAPS: a microarray project system for gene expression experiment information and data validation. Bioinformatics. 2001 Jun;17(6):564–565. [PubMed]
  • Castle Arthur L, Carver Michael P, Mendrick Donna L. Toxicogenomics: a new revolution in drug safety. Drug Discov Today. 2002 Jul 1;7(13):728–736. [PubMed]
  • Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics. 2001;2:7. [PubMed]
  • Edgar Ron, Domrachev Michael, Lash Alex E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002 Jan 1;30(1):207–210. [PubMed]
  • Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998 Dec 8;95(25):14863–14868. [PubMed]
  • Ermolaeva O, Rastogi M, Pruitt KD, Schuler GD, Bittner ML, Chen Y, Simon R, Meltzer P, Trent JM, Boguski MS. Data management and analysis for gene expression arrays. Nat Genet. 1998 Sep;20(1):19–23. [PubMed]
  • Finkelstein David, Ewing Rob, Gollub Jeremy, Sterky Fredrik, Cherry J Michael, Somerville Shauna. Microarray data quality analysis: lessons from the AFGC project. Arabidopsis Functional Genomics Consortium. Plant Mol Biol. 2002 Jan;48(1-2):119–131. [PubMed]
  • Gollub Jeremy, Ball Catherine A, Binkley Gail, Demeter Janos, Finkelstein David B, Hebert Joan M, Hernandez-Boussard Tina, Jin Heng, Kaloper Miroslava, Matese John C, Schroeder Mark, Brown Patrick O, Botstein David, Sherlock Gavin. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 2003 Jan 1;31(1):94–96. [PubMed]
  • Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet. 2001;2:343–372. [PubMed]
  • Liao B, Hale W, Epstein CB, Butow RA, Garner HR. MAD: a suite of tools for microarray data management and processing. Bioinformatics. 2000 Oct;16(10):946–947. [PubMed]
  • Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H, Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol. 1996 Dec;14(13):1675–1680. [PubMed]
  • Mattes William B. Annotation and cross-indexing of array elements on multiple platforms. Environ Health Perspect. 2004 Mar;112(4):506–510. [PubMed]
  • Model Fabian, König Thomas, Piepenbrock Christian, Adorján Péter. Statistical process control for large scale microarray experiments. Bioinformatics. 2002;18 Suppl 1:S155–S163. [PubMed]
  • Murphy David. Gene expression studies using microarrays: principles, problems, and prospects. Adv Physiol Educ. 2002 Dec;26(1-4):256–270. [PubMed]
  • Pennie William, Pettit Syril D, Lord Peter G. Toxicogenomics in risk assessment: an overview of an HESI collaborative research program. Environ Health Perspect. 2004 Mar;112(4):417–419. [PubMed]
  • Petricoin Emanuel F, 3rd, Hackett Joseph L, Lesko Lawrence J, Puri Raj K, Gutman Steven I, Chumakov Konstantin, Woodcock Janet, Feigal David W, Jr, Zoon Kathryn C, Sistare Frank D. Medical applications of microarray technologies: a regulatory science perspective. Nat Genet. 2002 Dec;32 Suppl:474–479. [PubMed]
  • Pruitt KD, Maglott DR. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. 2001 Jan 1;29(1):137–140. [PubMed]
  • Rininger JA, DiPippo VA, Gould-Rothberg BE. Differential gene expression technologies for identifying surrogate markers of drug efficacy and toxicity. Drug Discov Today. 2000 Dec 1;5(12):560–568. [PubMed]
  • Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science. 1995 Oct 20;270(5235):467–470. [PubMed]
  • Spellman Paul T, Miller Michael, Stewart Jason, Troup Charles, Sarkans Ugis, Chervitz Steve, Bernhart Derek, Sherlock Gavin, Ball Catherine, Lepage Marc, Swiatek Marcin, Marks WL, Goncalves Jason, Markel Scott, Iordan Daniel, Shojatalab Mohammadreza, Pizarro Angel, White Joe, Hubley Robert, Deutsch Eric, Senger Martin, Aronow Bruce J, Robinson Alan, Bassett Doug, Stoeckert Christian J, Jr, Brazma Alvis. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol. 2002 Aug 23;3(9):RESEARCH0046. [PubMed]
  • Stoeckert C, Pizarro A, Manduchi E, Gibson M, Brunk B, Crabtree J, Schug J, Shen-Orr S, Overton GC. A relational schema for both array-based and SAGE gene expression experiments. Bioinformatics. 2001 Apr;17(4):300–308. [PubMed]
  • Stoeckert Christian J, Jr, Causton Helen C, Ball Catherine A. Microarray databases: standards and ontologies. Nat Genet. 2002 Dec;32 Suppl:469–473. [PubMed]
  • Thomas Russell S, Rank David R, Penn Sharron G, Zastrow Gina M, Hayes Kevin R, Hu Tianhua, Pande Kalyan, Lewis Mark, Jovanovich Stevan B, Bradfield Christopher A. Application of genomics to toxicology research. Environ Health Perspect. 2002 Dec;110 Suppl 6:919–923. [PubMed]
  • Tong Weida, Cao Xiaoxi, Harris Stephen, Sun Hongmei, Fang Hong, Fuscoe James, Harris Angela, Hong Huixiao, Xie Qian, Perkins Roger, Shi Leming, Casciano Dan. ArrayTrack--supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. Environ Health Perspect. 2003 Nov;111(15):1819–1826. [PubMed]
  • Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001 Jun 15;29(12):2549–2557. [PubMed]
  • Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, Afshari C, Paules RS. Assessing gene significance from cDNA microarray expression data via mixed models. J Comput Biol. 2001;8(6):625–637. [PubMed]