• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 1, 2002; 30(1): 163–165.

PharmGKB: the Pharmacogenetics Knowledge Base


The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001.


Variability in response to pharmaceutical drugs poses a significant problem for physicians, patients and pharmaceutical companies (1). The variation has several known causes including environmental factors, ongoing medical conditions and inherited genetic differences. The field of pharmacogenetics (in this paper, the terms ‘pharmacogenetics’ and ‘pharmacogenomics’ are used interchangeably) studies response variability related to inherited genetic differences. The National Institutes of Health (NIH) is sponsoring the Pharmacogenetics Research Network (http://www.nigms.nih.gov/pharmacogenetics/) to accumulate, store and process pharmacogenetic-related results. The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) (2) provides a central repository for data collected by laboratories in the research network and provides tools for submitting, editing viewing and processing that information. In the future, we anticipate having to restrict access to some data (e.g. clinical records), but at the time of writing this paper all of the submitted data is available to the public.

PharmGKB is organized as a knowledge base (3) with an ontology (4) that contains a network of genetic, clinical and cellular phenotype knowledge, interconnected by relations and organized by levels of abstraction. Experimental data and results are added to the knowledge base as instances of terms in the ontology (5). This will allow more complex data processing and inferencing than is allowed by traditional databases.


Much of the current research in the Pharmacogenetics Research Network focuses on identifying single nucleotide polymorphisms (SNPs) in people. PharmGKB contains SNP information and provides several tools described below to both submit and access information in the dbSNP database (6). PharmGKB also accepts other variants such as insertions, deletions and repeats. The knowledge base content pertaining to genetics includes genes, proteins, reference sequences, regions of interest, haplotypes, coordinate systems and populations of individuals.

PharmGKB also models information about cellular phenotypes, including pharmacokinetics and enzyme kinetics information. PharmGKB will also include clinical content, such as descriptions of drugs, information about clinical studies, drug regimes, observations of drug kinetics and various clinical parameters specific to particular studies. The most commonly reported data at this time is clinical data relating to drug metabolism.

Currently, researchers can submit 38 different types of data containing genomic information, drugs, diseases, populations and so on. Submissions are made individually through web forms or in uploaded files containing PharmGKB-defined XML elements. Once a submission has been validated and accepted, it is merged into the existing body of PharmGKB knowledge where it is made accessible. This process is shown in Figure Figure11.

Figure 1
The flow of data into and out of PharmGKB.

Each data submission contains data elements that are merged into the knowledge base in various ways. Some elements are grouped to form new knowledge base objects. Other elements are added to existing objects. Related objects are linked together to form a network of information. For example, genes are related to the proteins they express, to the research centers studying them, and to the drugs that interact with them. In addition, drugs are related to various information, including the diseases they treat. Data retrieval programs find related information by traversing the network of objects starting from a known object. One advantage of this representation is that the same information can be accessed from different starting objects, allowing queries to be formulated from different viewpoints. There are different types of relationships among the objects, including ‘expresses’, as in ‘a gene expresses a protein’, and ‘studied by’, as in ‘a gene is studied by a research center’. There are currently over 600 different relationships used in PharmGKB. A part of the network of PharmGKB information is illustrated in Figure Figure22.

Figure 2
Some of the relationships among data objects in PharmGKB.

Research centers can request that specific categories of knowledge, such as cardiovascular drugs or receptor proteins, be modeled so that data in, or relating to, that category can be submitted to PharmGKB. This ongoing process requires considerable effort by both the PharmGKB staff and the requesting investigator in order for the modeling to be done correctly. Once the modeling is complete, the research center often must reformat their collected data to match the PharmGKB model before submitting it. PharmGKB has extensively modeled several categories of basic sequence-related genomic information, enzyme kinetics, pharmacokinetics and some miscellaneous information including literature citations and external databases.


Researchers in the Pharmacogenetics Research Network are studying a variety of genotype–phenotype relationships including genotypic effects on the metabolism, pharmacokinetics, and clinical efficacy of drugs as well as on the function of metabolizing enzymes and drug targets in cell systems. Drugs currently under study include tamoxifen and other anti-cancer agents, asthma medications, antiarrythmics, antidepressants, statins and ACE inhibitors. PharmGKB contains both abstract and detailed information in all of these areas, and the relationships between the ontological terms form a rich web of information. Details of ongoing research are available at http://www.pharmgkb.org/information.html.


PharmGKB provides a number of useful tools, including KBQuery, which allows the viewer to formulate complex queries on the knowledge base and displays information in a tabular form, which can be downloaded. KBEditor allows selected users to edit the knowledge base. The dbSNP submission program automatically submits any new PharmGKB SNPs to the dbSNP database, while a dbSNP surveillance program generates weekly reports to research members of new submissions to dbSNP involving their genes of interest. Similarly, the PubMed surveillance program generates weekly reports to research network members of new publications related to their areas of interest. For researchers interested in comparative genomics, PharmGKB contains an interface to VISTA (7) that allows genes in PharmGKB to be compared to the same genes in a mouse.


The information in PharmGKB is related to data in other online databases such as dbSNP, LocusLink and OMIM. The automatic submission and surveillance tools allow PharmGKB to both contribute to external databases and to correlate its information with theirs.


We thank Shuo Liu, Yueyi Liu, Charity Y. Lu and Marshall R. Mayberry III for their assistance in developing the PharmGKB software. PharmGKB is financially supported by grants from the National Institute of General Medical Sciences (NIGMS), Human Genome Research Institute (NHGRI) and National Library of Medicine (NLM) within the National Institutes of Health (NIH) and the Pharmacogenetics Research Network and Stanford University’s Children’s Health Initiative (Russ Altman, PI). This work is supported by the NIH/NIGMS Pharmacogenetics Research Network and Database (U01GM61374; Russ Altman, PI).


1. Evans W.E. and Relling,M.V. (1999) Pharmacogenomics: translating functional genomics into rational therapeutics. Science, 286, 487–491. [PubMed]
2. Klein T.E., Chang,J.T., Cho,M.K., Easton,K.L., Fergerson,R., Hewett,M., Lin,Z., Liu,Y., Liu,S., Oliver,D.E. et al. (2001) Integrating genotype and phenotype information: an overview of the PharmGKB project. Pharmacogenom. J., 1, 167–170. [PubMed]
3. Karp P.D., Ouzounis,C. and Paley,S.M. (1996) HinCyc: a knowledge base of the complete genome and metabolic pathways of H. influenzae. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 116–124. [PubMed]
4. Hafner C.D. and Fridman,N. (1996) Ontological foundations for biology knowledge models. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 78–87.
5. Rubin D.L., Hewett,M., Oliver,D.E., Klein,T.E. and Altman,R.B. (2002) Automating data acquisition into ontologies from pharmacogenetics relational data sources using declarative object definitions and XML. In Altman,R.B., Dunker,A.K., Hunter,L., Lauderdale,K. and Klein,T.E. (eds), Proceedings of the Seventh Pacific Symposium on Biocomputing. World Scientific Publishing Co. Pte. Ltd, Singapore, vol. 7, 88–99. [PubMed]
6. Sherry S.T., Ward,M.H., Kholodov,M., Baker,J., Phah,L., Smigielski,E.M. and Sirotkin,K. (2001) The NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. [PMC free article] [PubMed]
7. Mayor C., Brudno,M., Schwartz,J.R., Poliakov,A., Rubin,E.M., Frazer,K.A., Pachter,L.S. and Dubchak,I. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics, 16, 1046–1047. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

  • An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge[Genome Biology. 2014]
    Brownstein CA, Beggs AH, Homer N, Merriman B, Yu TW, Flannery KC, DeChene ET, Towne MC, Savage SK, Price EN, Holm IA, Luquette LJ, Lyon E, Majzoub J, Neupert P, McCallie Jr D, Szolovits P, Willard HF, Mendelsohn NJ, Temme R, Finkel RS, Yum SW, Medne L, Sunyaev SR, Adzhubey I, Cassa CA, de Bakker PI, Duzkale H, Dworzyński P, Fairbrother W, Francioli L, Funke BH, Giovanni MA, Handsaker RE, Lage K, Lebo MS, Lek M, Leshchiner I, MacArthur DG, McLaughlin HM, Murray MF, Pers TH, Polak PP, Raychaudhuri S, Rehm HL, Soemedi R, Stitziel NO, Vestecka S, Supper J, Gugenmus C, Klocke B, Hahn A, Schubach M, Menzel M, Biskup S, Freisinger P, Deng M, Braun M, Perner S, Smith RJ, Andorf JL, Huang J, Ryckman K, Sheffield VC, Stone EM, Bair T, Black-Ziegelbein EA, Braun TA, Darbro B, DeLuca AP, Kolbe DL, Scheetz TE, Shearer AE, Sompallae R, Wang K, Bassuk AG, Edens E, Mathews K, Moore SA, Shchelochkov OA, Trapane P, Bossler A, Campbell CA, Heusel JW, Kwitek A, Maga T, Panzer K, Wassink T, Van Daele D, Azaiez H, Booth K, Meyer N, Segal MM, Williams MS, Tromp G, White P, Corsmeier D, Fitzgerald-Butt S, Herman G, Lamb-Thrush D, McBride KL, Newsom D, Pierson CR, Rakowsky AT, Maver A, Lovrečić L, Palandačić A, Peterlin B, Torkamani A, Wedell A, Huss M, Alexeyenko A, Lindvall JM, Magnusson M, Nilsson D, Stranneheim H, Taylan F, Gilissen C, Hoischen A, van Bon B, Yntema H, Nelen M, Zhang W, Sager J, Zhang L, Blair K, Kural D, Cariaso M, Lennon GG, Javed A, Agrawal S, Ng PC, Sandhu KS, Krishna S, Veeramachaneni V, Isakov O, Halperin E, Friedman E, Shomron N, Glusman G, Roach JC, Caballero J, Cox HC, Mauldin D, Ament SA, Rowen L, Richards DR, Lucas FA, Gonzalez-Garay ML, Caskey CT, Bai Y, Huang Y, Fang F, Zhang Y, Wang Z, Barrera J, Garcia-Lobo JM, González-Lamuño D, Llorca J, Rodriguez MC, Varela I, Reese MG, De La Vega FM, Kiruluta E, Cargill M, Hart RK, Sorenson JM, Lyon GJ, Stevenson DA, Bray BE, Moore BM, Eilbeck K, Yandell M, Zhao H, Hou L, Chen X, Yan X, Chen M, Li C, Yang C, Gunel M, Li P, Kong Y, Alexander AC, Albertyn ZI, Boycott KM, Bulman DE, Gordon PM, Innes AM, Knoppers BM, Majewski J, Marshall CR, Parboosingh JS, Sawyer SL, Samuels ME, Schwartzentruber J, Kohane IS, Margulies DM. Genome Biology. 2014; 15(3)R53
  • Network Based Integrated Analysis of Phenotype-Genotype Data for Prioritization of Candidate Symptom Genes[BioMed Research International. 2014]
    Li X, Zhou X, Peng Y, Liu B, Zhang R, Hu J, Yu J, Jia C, Sun C. BioMed Research International. 2014; 2014435853
  • Discovering Hidden Connections among Diseases, Genes and Drugs Based on Microarray Expression Profiles with Negative-Term Filtering[PLoS ONE. ]
    Wu JS, Kao EF, Lee CN. PLoS ONE. 9(6)e98826
  • An RDF/OWL Knowledge Base for Query Answering and Decision Support in Clinical Pharmacogenetics[Studies in health technology and informatic...]
    Samwald M, Freimuth R, Luciano JS, Lin S, Powers RL, Marshall MS, Adlassnig KP, Dumontier M, Boyce RD. Studies in health technology and informatics. 2013; 192539-542
  • Integrative Gene Set Analysis: Application to Platinum Pharmacogenomics[OMICS : a Journal of Integrative Biology. 2...]
    Fridley BL, Abo R, Tan XL, Jenkins GD, Batzler A, Moyer AM, Biernacka JM, Wang L. OMICS : a Journal of Integrative Biology. 2014 Jan 1; 18(1)34-41
See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...