Evaluation of information-theoretic similarity measures for content-based
retrieval and detection of masses in mammograms
Georgia D. Tourassi
a
and Brian Harrawood
Digital Advanced Imaging Laboratories, Department of Radiology, Duke University Medical Center,
Durham, North Carolina 27705
Swatee Singh, Joseph Y. Lo, and Carey E. Floyd
Digital Advanced Imaging Laboratories, Department of Radiology, Duke University Medical Center,
Durham, North Carolina 27705 and Department of Biomedical Engineering, Duke University,
Durham, North Carolina 27710
Received 14 June 2006; revised 3 November 2006; accepted for publication 6 November 2006;
published 18 December 2006
The purpose of this study was to evaluate image similarity measures employed in an information-
theoretic computer-assisted detection IT-CAD scheme. The scheme was developed for content-
based retrieval and detection of masses in screening mammograms. The study is aimed toward an
interactive clinical paradigm where physicians query the proposed IT-CAD scheme on mammo-
graphic locations that are either visually suspicious or indicated as suspicious by other cuing CAD
systems. The IT-CAD scheme provides an evidence-based, second opinion for query mammo-
graphic locations using a knowledge database of mass and normal cases. In this study, eight
entropy-based similarity measures were compared with respect to retrieval precision and detection
accuracy using a database of 1820 mammographic regions of interest. The IT-CAD scheme was
then validated on a separate database for false positive reduction of progressively more challenging
visual cues generated by an existing, in-house mass detection system. The study showed that the
image similarity measures fall into one of two categories; one category is better suited to the
retrieval of semantically similar cases while the second is more effective with knowledge-based
decisions regarding the presence of a true mass in the query location. In addition, the IT-CAD
scheme yielded a substantial reduction in false-positive detections while maintaining high detection
rate for malignant masses. © 2007 American Association of Physicists in Medicine.
DOI: 10.1118/1.2401667
I. INTRODUCTION
There is conflicting evidence regarding the clinical impact of
computer-assisted detection CAD systems for the diagnos-
tic interpretation of screening mammograms. For the most
part, retrospective studies suggest that CAD technology has a
positive impact on early breast cancer detection e.g., Refs.
1–5. There are, however, several retrospective
6–8
and
prospective
9–13
studies that produced contradictory conclu-
sions. Although it is recognized that more prospective studies
are needed on the topic, it is well known that radiologists
often dismiss correct CAD cues. The radiologists’ reluctance
to trust CAD is mainly attributed to the higher than desired
false positive rate.
11
The above observations are particularly
true for the detection of masses, a far more challenging task
than the detection of calcifications.
While the true clinical benefit of CAD is still debated,
14
CAD research continues in an effort to improve diagnostic
performance and clinical integration.
15
For example, the cur-
rently used “black-box” CAD paradigm is rather limited. A
CAD system that is more interactive and capable of justify-
ing the visual cues it provides may help radiologists’ cogni-
tive process more effectively. Moreover, as clinical image
libraries grow rapidly in Radiology, contemporary CAD sys-
tems should be able to capitalize on accumulating image data
without requiring painstaking retraining or recalibration.
Content-based image retrieval CBIR could facilitate the
development of a new generation of interactive CAD tech-
nology that takes advantage of the vast amounts of digital
image data generated in clinical practice. The main objective
of CBIR research is to develop a user-friendly framework
that allows users to interact with digital image libraries
effectively.
16
CBIR has been identified as an important re-
search direction in Radiology to facilitate clinical decision
support for medical image interpretation.
17,18
Shifting the CAD paradigm to incorporate image retrieval
capabilities is a challenging proposition. The primary task of
CBIR in the clinical arena is to help radiologists retrieve
images with similar visual content. Medical image retrieval
has traditionally been based on text describing the patient
clinical data and medical condition depicted in the patient’s
imaging studies. These textual descriptors are used as key-
words for searching the medical image library. Several re-
searchers have recognized the need for more sophisticated
image retrieval methods that capture the visual content of
images more effectively than textual descriptors. Conse-
quently, CBIR has evolved toward feature-based similarity
assessment. Images are compared and retrieved based on
low-level image features that describe the color, shape, tex-
ture, and spatial arrangement of important objects i.e., or-
gans, tumors, etc. identified in the medical images. Never-
theless, low-level image features are often ineffective in
140 140 Med. Phys. 34 „1…, January 2007 0094-2405/2007/34„1…/140/11/$23.00 © 2007 Am. Assoc. Phys. Med.