Send to

Choose Destination
Bioinformatics. 2014 Apr 1;30(7):996-1002. doi: 10.1093/bioinformatics/btt623. Epub 2013 Nov 9.

Parallel content-based sub-image retrieval using hierarchical searching.

Author information

Division of Biomedical Informatics, Department of Biostatistics and Department of Computer Science, University of Kentucky, Lexington, KY, Center for Biomedical Imaging and Informatics, The Cancer Institute of New Jersey, Rutgers University, New Brunswick, NJ, Center for Comprehensive Informatics, Emory University, Atlanta, GA and Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, USA.



The capacity to systematically search through large image collections and ensembles and detect regions exhibiting similar morphological characteristics is central to pathology diagnosis. Unfortunately, the primary methods used to search digitized, whole-slide histopathology specimens are slow and prone to inter- and intra-observer variability. The central objective of this research was to design, develop, and evaluate a content-based image retrieval system to assist doctors for quick and reliable content-based comparative search of similar prostate image patches.


Given a representative image patch (sub-image), the algorithm will return a ranked ensemble of image patches throughout the entire whole-slide histology section which exhibits the most similar morphologic characteristics. This is accomplished by first performing hierarchical searching based on a newly developed hierarchical annular histogram (HAH). The set of candidates is then further refined in the second stage of processing by computing a color histogram from eight equally divided segments within each square annular bin defined in the original HAH. A demand-driven master-worker parallelization approach is employed to speed up the searching procedure. Using this strategy, the query patch is broadcasted to all worker processes. Each worker process is dynamically assigned an image by the master process to search for and return a ranked list of similar patches in the image.


The algorithm was tested using digitized hematoxylin and eosin (H&E) stained prostate cancer specimens. We have achieved an excellent image retrieval performance. The recall rate within the first 40 rank retrieved image patches is ∼90%.


Both the testing data and source code can be downloaded from

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center