RAFIKI: Retrieval-Based Application for
Imaging and Knowledge Investigation
Marcos R. Nesso-Jr., Mirela T. Cazzolato, Lucas C. Scabora, Paulo H. Oliveira,
Gabriel Spadon, Jessica A. de Souza, Willian D. Oliveira, Daniel Y. T. Chino,
Jose F. Rodrigues-Jr., Agma J. M. Traina, Caetano Traina-Jr.
Institute of Mathematics and Computer Sciences - University of S˜ ao Paulo
S˜ ao Carlos, S˜ ao Paulo - 13566-590 Brazil
Email: {marcosnesso, mirelac, lucascsb, pholiveira, spadon, jessicasouza}@usp.br,
{willian, chinodyt, junio, agma, caetano}@icmc.usp.br
Abstract—Medical exams, such as CT scans and mammo-
grams, are obtained and stored every day in hospitals all
over the world, including images, patient data, and medical
reports. It is paramount to have tools and systems to improve
computer-aided diagnoses based on such huge volumes of stored
information. The Content-Based Image Retrieval (CBIR) is a
powerful paradigm to help reaching such a goal, providing
physicians with intelligent retrieval tools to present him/her with
similar or complementary cases, in which visual characteristics
improve textual data. Employing comparative inspection on
previous cases, the physician can obtain a more comprehensive
understanding of the case he/she is working on. Current hospital
systems do not carry native CBIR functionalities yet, relying on
add-on subsystems, which often do not adhere to the existing
relational database infrastructures. In this work, we propose
RAFIKI, a software prototype that extends the Relational
Database Management System (RDBMS) PostgreSQL, providing
native support for CBIR functionalities, modular extensibility,
and seamless integration for data science tools, such as Python
and R. We show the applicability of our system by evaluating
three clinical scenarios, performing queries over a real-world
image dataset of lung exams. Our results spot actual potential
in promoting informed decision-making from the physician’s
perspective. Besides, the system exhibited a higher performance
when compared to previous systems found in the literature.
Moreover, RAFIKI contributes with a model to establish how
to put together CBIR concepts and relational data, providing
a powerful design for further development of theoretical and
practical concepts and tools.
Index Terms—Index; Metric Access Method; CBIR; RDBMS.
I. I NTRODUCTION
Over the years, the amount of image data collected at health
care institutions has continuously increased. Such data are
often represented in custom, specific formats, requiring proper
tools and managing techniques. Among these tools the Picture
Archiving and Communication System (PACS) operates as an
interface between the medical equipment and the workstation
where specialists analyze the patients’ data, since they orga-
nize and communicate the data. The problem with PACS is
that, in the current state of the art, they do not perform content-
based queries natively; instead, tools working with PACS
require external add-ons to enable searching images and exams
based on their description. Content-based querying has huge
potential to aid physicians in diagnoses and decision-making
processes [1]. However, it must be provided by another kind
of system, that encompasses Content-Based Image Retrieval
(CBIR). CBIR systems are capable to retrieve images consid-
ering a similarity degree, using distance functions to compare
pairs of stored images expressed by feature vectors, which
consist of a low-level representation of the images’ visual
content [2]. CBIR systems can use Metric Access Methods
(MAMs), which consist of specialized index structures to
improve the query performance [3] [4]. It would be worth to
be able to perform similarity queries within legacy software
infrastructures. Our approach is to enhance existing off-the-
shelf Relational Database Management Systems (RDBMS) to
provide CBIR functionalities [5].
The existing approaches being studied to support similarity
operations in RDBMS are limited. Some tools are rigid in
terms of indexing, employing MAMs in a layer external to
the RDBMS. To execute queries containing both similarity
and traditional predicates, these tools require an additional
step to merge external similarity results with those retrieved
directly from the RDBMS through traditional conditions [6]
[7]. Other tools are rigid in terms of the queries they execute.
That is, they include similarity support in the RDBMS, which
in turn focuses only on similarity queries, becoming unable to
support queries with both similarity and traditional predicates
[8]. Some tools are based on commercial models rather than
open-source alternatives. These approaches depend on the
original features provided by the vendor, preventing users from
extending their functionality [9]. Furthermore, the existing
tools rarely explore the integration of RDBMS with data
analysis tools [10]. Such an integration would be useful for
physicians and researchers to obtain more insights from the
medical data, which in turn would be able to better support
diagnoses and other decision-making processes.
This paper presents RAFIKI (Retrieval-based Application
for Imaging and Knowledge Investigation), an RDBMS-based
similarity retrieval system that provides both physicians and
researchers with a wider range of similarity operations over
medical data stored in RDBMS. Our system, built over the
Kiara framework [5], extracts low-level features from raw
images, employs distance functions over them, as well as
uses the MAM Slim-tree [4] in order to improve query
71
2018 IEEE 31st International Symposium on Computer-Based Medical Systems
2372-9198/18/$31.00 ©2018 IEEE
DOI 10.1109/CBMS.2018.00020
Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 30,2023 at 17:02:36 UTC from IEEE Xplore. Restrictions apply.
Copyright IEEE