Structural diversity of the MLSMR library captured by smaller curated compound sets. (A) Mean of number of neighbors per compound in the MLSMR library for compounds in the validation set (1,982 unique compounds as judged by canonical SMILES strings), LOPAC collection (1,264 unique compounds), and random samples of 100 of subsets of size 1,000, 2,000, 3,000, 4,000, and 5,000 of the MLSMR library, as judged by a Tanimoto coefficient of greater than 0.3 computed using 1,024-bit ECFP_4 fingerprints generated in Pipeline Pilot (Scitegic, San Diego, CA). For the validation and LOPAC collections, error bars are standard error for number of neighbors per compound; for random subsets, error bars represent a single SD of the mean number of neighbors per compound for each of the 100 samples of a given size. (B) Number of unique neighbors in MLSMR library for validation set, LOPAC, random subsets of the MLSMR library, and a greedy optimization for maximum coverage. Error bars for a single SD of random subsets are too small to be visualized. (C) The mean of the number of neighbors per compound in the validation and LOPAC sets are significantly different as assessed by a two-sample t-test (P = 3.2603 e −042). The triple asterisks indicate P < 0.001. Error bars represent the standard error of the two sets. (D) Composition of MLSMR and validation set compounds. LOPAC, Library of Pharmacologically Active Compounds; MLSMR, Molecular Library Small Molecular Repository; SD, standard deviation. Color images available online at www.liebertonline.com/adt.