Send to

Choose Destination
J Cheminform. 2011 Aug 8;3(1):29. doi: 10.1186/1758-2946-3-29.

Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision.

Author information

Information School, University of Sheffield, Portobello Street, Sheffield S1 4DP, UK.



Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption.


Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided.


Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening.

Supplemental Content

Full text links

Icon for Springer Icon for PubMed Central
Loading ...
Support Center