Modern Federated Database Systems: An Overview Leonardo Guerreiro Azevedo, Elton Figueiredo de Souza Soares, Renan Souza and Marcio Ferreira Moreno IBM Research, Brazil Keywords: Federated Database, Polyglot Database, Multistore, Polystore, Multidatabase, Heterogeneous Data Stores, NoSQL, Dbaas, Distributed File System, Data Processing Frameworks. Abstract: Usually, modern applications manipulate datasets with diverse models, usages, and storages. “One size fits all” approaches are not sufficient for heterogeneous data, storages, and schemes. The rise of new kinds of data stores and processing, like NoSQL data stores, distributed file systems, and new data processing frameworks, brought new possibilities to meet this scenario’s requirements. However, semantic, schema and storage het- erogeneity, autonomy, and distributed processing are still among the main concerns when building data-driven applications. This work surveys the literature aiming at giving an overview of the state of the art of modern federated database systems. It presents the background, characterizes existing tools, depicts guidelines one should follow when creating solutions, and points out research challenges to consider in future work. This work gives fundamentals for researchers and practitioners in the area. 1 INTRODUCTION Several modern applications manipulate diverse datasets with different models and usages, e.g., med- ical informatics, intelligent transportation, etc. “One size fits all” is not effective in such scenarios. The use of a single database and a unique data model for all data in different data models may degrade perfor- mance and executing ETL (Extract-Transform-Load) processes to load all data in a single database may be very expensive (Stonebraker et al., 2007). Besides, manual data curation and maintenance of the ETL pipelines (due to adaptations caused by, e.g., domain evolution) are labor-intensive (Tan et al., 2017) (Bon- diombouy and Valduriez, 2016) (Stonebraker, 2015). The problem of accessing heterogeneous data sources has been studied in the context of multi- database and data integration systems (Kolev et al., 2016a). Several new data management solutions have emerged, such as distributed file systems (e.g., GFS 1 and HDFS 2 ), NoSQL data stores (e.g., MongoDB, Allegrograph, Neo4J, Titan, Dynamo, BigTable, Re- dis) and new data processing frameworks (e.g., Spark) as well as hybrid (multimodal, e.g., OrientDB, ArangoDB, or NewSQL, e.g., Google F1, LeanX- cale). The RDBMS (Relational Database Manage- 1 Google File System. 2 Hadoop Distributed File System. ment System) has been evolved to manage different kinds of data (e.g., multimedia objects, XML docu- ments, spatial data), like IBM DB2 3 which was built on a standard SQL engine, but it has evolved to be a hybrid data management system for structured and unstructured data. Usually, using one single DBMS results in loss of performance and flexibility for spe- cific applications. For instance, a column-oriented DBMS is one order of magnitude better for On- line Analytical Processing (OLAP) workloads than an RDBMS ( ¨ Ozsu and Valduriez, 2020), while SDBMS (Stream Database Management System) is more effi- cient for stream data, which RDBMS does not even support (Nayak et al., 2013). Thus, a variety of data- processing architectures may be required for special- ized markets (Stonebraker et al., 2007). Schema, semantic, and data sources heterogene- ity, autonomy, and distributed processing are still concerns (Tan et al., 2017). A federated system arises as a solution. It is a middleware that provides a seamless interface to heterogeneous data systems with an independent data model and (perhaps) data schemes (Stonebraker, 2015). This work overviews the state-of-the-art of the new generation federation systems. It is divided as follows. Section 2 presents the main concepts. Section 3 characterizes existing tools. 3 https://www.ibm.com/analytics/db2 276 Azevedo, L., Soares, E., Souza, R. and Moreno, M. Modern Federated Database Systems: An Overview. DOI: 10.5220/0009795402760283 In Proceedings of the 22nd International Conference on Enterprise Information Systems (ICEIS 2020) - Volume 1, pages 276-283 ISBN: 978-989-758-423-7 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved