Evaluation of information-theoretic similarity measures for content-based retrieval and detection of masses in mammograms Georgia D. Tourassi a and Brian Harrawood Digital Advanced Imaging Laboratories, Department of Radiology, Duke University Medical Center, Durham, North Carolina 27705 Swatee Singh, Joseph Y. Lo, and Carey E. Floyd Digital Advanced Imaging Laboratories, Department of Radiology, Duke University Medical Center, Durham, North Carolina 27705 and Department of Biomedical Engineering, Duke University, Durham, North Carolina 27710 Received 14 June 2006; revised 3 November 2006; accepted for publication 6 November 2006; published 18 December 2006 The purpose of this study was to evaluate image similarity measures employed in an information- theoretic computer-assisted detection IT-CADscheme. The scheme was developed for content- based retrieval and detection of masses in screening mammograms. The study is aimed toward an interactive clinical paradigm where physicians query the proposed IT-CAD scheme on mammo- graphic locations that are either visually suspicious or indicated as suspicious by other cuing CAD systems. The IT-CAD scheme provides an evidence-based, second opinion for query mammo- graphic locations using a knowledge database of mass and normal cases. In this study, eight entropy-based similarity measures were compared with respect to retrieval precision and detection accuracy using a database of 1820 mammographic regions of interest. The IT-CAD scheme was then validated on a separate database for false positive reduction of progressively more challenging visual cues generated by an existing, in-house mass detection system. The study showed that the image similarity measures fall into one of two categories; one category is better suited to the retrieval of semantically similar cases while the second is more effective with knowledge-based decisions regarding the presence of a true mass in the query location. In addition, the IT-CAD scheme yielded a substantial reduction in false-positive detections while maintaining high detection rate for malignant masses. © 2007 American Association of Physicists in Medicine. DOI: 10.1118/1.2401667 I. INTRODUCTION There is conflicting evidence regarding the clinical impact of computer-assisted detection CADsystems for the diagnos- tic interpretation of screening mammograms. For the most part, retrospective studies suggest that CAD technology has a positive impact on early breast cancer detection e.g., Refs. 15. There are, however, several retrospective 68 and prospective 913 studies that produced contradictory conclu- sions. Although it is recognized that more prospective studies are needed on the topic, it is well known that radiologists often dismiss correct CAD cues. The radiologists’ reluctance to trust CAD is mainly attributed to the higher than desired false positive rate. 11 The above observations are particularly true for the detection of masses, a far more challenging task than the detection of calcifications. While the true clinical benefit of CAD is still debated, 14 CAD research continues in an effort to improve diagnostic performance and clinical integration. 15 For example, the cur- rently used “black-box” CAD paradigm is rather limited. A CAD system that is more interactive and capable of justify- ing the visual cues it provides may help radiologists’ cogni- tive process more effectively. Moreover, as clinical image libraries grow rapidly in Radiology, contemporary CAD sys- tems should be able to capitalize on accumulating image data without requiring painstaking retraining or recalibration. Content-based image retrieval CBIRcould facilitate the development of a new generation of interactive CAD tech- nology that takes advantage of the vast amounts of digital image data generated in clinical practice. The main objective of CBIR research is to develop a user-friendly framework that allows users to interact with digital image libraries effectively. 16 CBIR has been identified as an important re- search direction in Radiology to facilitate clinical decision support for medical image interpretation. 17,18 Shifting the CAD paradigm to incorporate image retrieval capabilities is a challenging proposition. The primary task of CBIR in the clinical arena is to help radiologists retrieve images with similar visual content. Medical image retrieval has traditionally been based on text describing the patient clinical data and medical condition depicted in the patient’s imaging studies. These textual descriptors are used as key- words for searching the medical image library. Several re- searchers have recognized the need for more sophisticated image retrieval methods that capture the visual content of images more effectively than textual descriptors. Conse- quently, CBIR has evolved toward feature-based similarity assessment. Images are compared and retrieved based on low-level image features that describe the color, shape, tex- ture, and spatial arrangement of important objects i.e., or- gans, tumors, etc.identified in the medical images. Never- theless, low-level image features are often ineffective in 140 140 Med. Phys. 34 1, January 2007 0094-2405/2007/341/140/11/$23.00 © 2007 Am. Assoc. Phys. Med.