Logo of bioinfoLink to Publisher's site
Bioinformatics. 2008 Aug 1; 24(15): 1733–1734.
Published online 2008 Jul 2. doi:  10.1093/bioinformatics/btn307
PMCID: PMC2638865

ChemmineR: a compound mining framework for R


Motivation: Software applications for structural similarity searching and clustering of small molecules play an important role in drug discovery and chemical genomics. Here, we present the first open-source compound mining framework for the popularstatistical programming environment R. The integration with a powerful statistical environment maximizes the flexibility, expandability and programmability of the provided analysis functions.

Results: We discuss the algorithms and compound mining utilities provided by the R package ChemmineR. It contains functions for structural similarity searching, clustering of compound libraries with a wide spectrum of classification algorithms and various utilities for managing complex compound data. It also offers a wide range of visualization functions for compound clusters and chemical structures. The package is well integrated with the online ChemMine environment and allows bidirectional communications between the two services.

Availability: ChemmineR is freely available as an R package from the ChemMine project site: http://bioweb.ucr.edu/ChemMineV2/chemminer

Contact: ude.rcu@ekrig.samoht


  • Carhart R, et al. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 1985;25:64–73.
  • Chen J, et al. ChemDB: a public database of small molecules and related chemoinformatics resources. Bioinformatics. 2005;21:4133–4139. [PubMed]
  • Chen X, Reynolds C. Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J. Chem. Inf. Comput. Sci. 2002;42:1407–1414. [PubMed]
  • Gedeck P, et al. QSAR–how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. J. Chem. Inf. Model. 2006;46:1924–1936. [PubMed]
  • Gentleman R, et al. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005.
  • Girke T, et al. ChemMine. A compound mining database for chemical genomics. Plant Physiol. 2005;138:573–577. [PMC free article] [PubMed]
  • Guha R, et al. The Blue obelisk-interoperability in chemical informatics. J. Chem. Inf. Model. 2006;46:991–998. [PubMed]
  • Holliday JD, et al. Analysis and display of the size dependence of chemical similarity coefficients. J. Chem. Inf. Comput. Sci. 2003;43:819–828. [PubMed]
  • Irwin JJ, Shoichet BK. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. [PMC free article] [PubMed]
  • Lang DT, et al. rggobi: interface between R and GGobi. 2007 R package version 2.1.7.
  • O'Boyle NM, et al. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chem. Cent. J. 2008;2:1–7. [PMC free article] [PubMed]
  • R Development Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2008. ISBN 3-900051-07-0.
  • Raymond J, et al. Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm. J. Chem. Inf. Comput. Sci. 2002;42:305–316. [PubMed]
  • Seiler KP, et al. ChemBank: a small-molecule screening and cheminformatics resource database. Nucleic Acids Res. 2008;36(Database issue):351–359. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...