RAFIKI: Retrieval-Based Application for Imaging and Knowledge Investigation Marcos R. Nesso-Jr., Mirela T. Cazzolato, Lucas C. Scabora, Paulo H. Oliveira, Gabriel Spadon, Jessica A. de Souza, Willian D. Oliveira, Daniel Y. T. Chino, Jose F. Rodrigues-Jr., Agma J. M. Traina, Caetano Traina-Jr. Institute of Mathematics and Computer Sciences - University of S˜ ao Paulo ao Carlos, S˜ ao Paulo - 13566-590 Brazil Email: {marcosnesso, mirelac, lucascsb, pholiveira, spadon, jessicasouza}@usp.br, {willian, chinodyt, junio, agma, caetano}@icmc.usp.br Abstract—Medical exams, such as CT scans and mammo- grams, are obtained and stored every day in hospitals all over the world, including images, patient data, and medical reports. It is paramount to have tools and systems to improve computer-aided diagnoses based on such huge volumes of stored information. The Content-Based Image Retrieval (CBIR) is a powerful paradigm to help reaching such a goal, providing physicians with intelligent retrieval tools to present him/her with similar or complementary cases, in which visual characteristics improve textual data. Employing comparative inspection on previous cases, the physician can obtain a more comprehensive understanding of the case he/she is working on. Current hospital systems do not carry native CBIR functionalities yet, relying on add-on subsystems, which often do not adhere to the existing relational database infrastructures. In this work, we propose RAFIKI, a software prototype that extends the Relational Database Management System (RDBMS) PostgreSQL, providing native support for CBIR functionalities, modular extensibility, and seamless integration for data science tools, such as Python and R. We show the applicability of our system by evaluating three clinical scenarios, performing queries over a real-world image dataset of lung exams. Our results spot actual potential in promoting informed decision-making from the physician’s perspective. Besides, the system exhibited a higher performance when compared to previous systems found in the literature. Moreover, RAFIKI contributes with a model to establish how to put together CBIR concepts and relational data, providing a powerful design for further development of theoretical and practical concepts and tools. Index Terms—Index; Metric Access Method; CBIR; RDBMS. I. I NTRODUCTION Over the years, the amount of image data collected at health care institutions has continuously increased. Such data are often represented in custom, specific formats, requiring proper tools and managing techniques. Among these tools the Picture Archiving and Communication System (PACS) operates as an interface between the medical equipment and the workstation where specialists analyze the patients’ data, since they orga- nize and communicate the data. The problem with PACS is that, in the current state of the art, they do not perform content- based queries natively; instead, tools working with PACS require external add-ons to enable searching images and exams based on their description. Content-based querying has huge potential to aid physicians in diagnoses and decision-making processes [1]. However, it must be provided by another kind of system, that encompasses Content-Based Image Retrieval (CBIR). CBIR systems are capable to retrieve images consid- ering a similarity degree, using distance functions to compare pairs of stored images expressed by feature vectors, which consist of a low-level representation of the images’ visual content [2]. CBIR systems can use Metric Access Methods (MAMs), which consist of specialized index structures to improve the query performance [3] [4]. It would be worth to be able to perform similarity queries within legacy software infrastructures. Our approach is to enhance existing off-the- shelf Relational Database Management Systems (RDBMS) to provide CBIR functionalities [5]. The existing approaches being studied to support similarity operations in RDBMS are limited. Some tools are rigid in terms of indexing, employing MAMs in a layer external to the RDBMS. To execute queries containing both similarity and traditional predicates, these tools require an additional step to merge external similarity results with those retrieved directly from the RDBMS through traditional conditions [6] [7]. Other tools are rigid in terms of the queries they execute. That is, they include similarity support in the RDBMS, which in turn focuses only on similarity queries, becoming unable to support queries with both similarity and traditional predicates [8]. Some tools are based on commercial models rather than open-source alternatives. These approaches depend on the original features provided by the vendor, preventing users from extending their functionality [9]. Furthermore, the existing tools rarely explore the integration of RDBMS with data analysis tools [10]. Such an integration would be useful for physicians and researchers to obtain more insights from the medical data, which in turn would be able to better support diagnoses and other decision-making processes. This paper presents RAFIKI (Retrieval-based Application for Imaging and Knowledge Investigation), an RDBMS-based similarity retrieval system that provides both physicians and researchers with a wider range of similarity operations over medical data stored in RDBMS. Our system, built over the Kiara framework [5], extracts low-level features from raw images, employs distance functions over them, as well as uses the MAM Slim-tree [4] in order to improve query 71 2018 IEEE 31st International Symposium on Computer-Based Medical Systems 2372-9198/18/$31.00 ©2018 IEEE DOI 10.1109/CBMS.2018.00020 Authorized licensed use limited to: UNIVERSIDADE DE SAO PAULO. Downloaded on May 30,2023 at 17:02:36 UTC from IEEE Xplore. Restrictions apply. Copyright IEEE