Format

Send to

Choose Destination
J Chem Inf Model. 2011 Sep 26;51(9):2345-51. doi: 10.1021/ci200235e. Epub 2011 Sep 7.

Anatomy of high-performance 2D similarity calculations.

Author information

1
Department of Computer Science, Stanford University, Stanford, California 94305, United States.

Abstract

Similarity measures based on the comparison of dense bit vectors of two-dimensional chemical features are a dominant method in chemical informatics. For large-scale problems, including compound selection and machine learning, computing the intersection between two dense bit vectors is the overwhelming bottleneck. We describe efficient implementations of this primitive as well as example applications using features of modern CPUs that allow 20-40× performance increases relative to typical code. Specifically, we describe fast methods for population count on modern x86 processors and cache-efficient matrix traversal and leader clustering algorithms that alleviate memory bandwidth bottlenecks in similarity matrix construction and clustering. The speed of our 2D comparison primitives is within a small factor of that obtained on GPUs and does not require specialized hardware.

PMID:
21854053
PMCID:
PMC4839782
DOI:
10.1021/ci200235e
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for American Chemical Society Icon for PubMed Central
Loading ...
Support Center