Towards A Model-Driven Design Tool for Big Data Architectures Michele Guerriero, Saeed Tajfar, Damian A. Tamburri, Elisabetta Di Nitto Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria via Golgi 42, Milano, 20133 - Italy [michele.guerriero,damianandrew.tamburri,elisabetta.dinitto]@polimi.it ABSTRACT Big Data technologies are rapidly becoming a key enabler for modern industries. However, the entry costs inherent to “going Big” are considerable, ranging from learning curve, renting/buying infrastructure, etc. A key component of these costs is the time spent on learning about and design- ing with the many big data frameworks (e.g., Spark, Storm, HadoopMR, etc.) on the market. To reduce said costs while decreasing time-to-market we advocate the usage of Model-Driven Engineering (MDE), i.e., software engineer- ing by means of models and their automated manipulation. This paper outlines a tool architecture to support MDE for big data applications, illustrating with a case-study. CCS Concepts Software Engineering Big Data; Model-Driven De- velopment; Keywords Big Data Applications Design; MDE; meta-models; model transformation; architecture framework; design tool 1. INTRODUCTION Big Data technologies have rapidly achieved widespread adoption for many reasons, e.g., thanks to the versatility with which they foster innovative products by direct anal- ysis of various user contents (e.g., tweets, blogposts, likes, pictures, etc.). However, designing and developing Big Data applications is still a considerable problem since: (a) it in- volves many side-costs such as learning curve for desired technological frameworks; (b) it requires to balance out in- frastructural and corporate governance costs [17] with (non- trivial) development and deployment costs; (c) it most likely requires additional costs for the various trial-and-error ex- periments needed to match desired performance. We argue that a relevant part of said costs can be saved by tack- ling the design, development and deployment of Big Data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. BIGDSE’16, May 16 2016, Austin, TX, USA c 2016 ACM. ISBN 978-1-4503-4152-3/16/05. . . $15.00 DOI: http://dx.doi.org/10.1145/2896825.2896835 applications with Model-Driven Engineering (MDE) [15]. MDE essentially predicates the use of models as means to quickly and flexibly develop code. MDE takes place mainly by means of meta-modelling (i.e., devising a “model of a model”) and model-transformation (i.e., manipulating mod- els in an automated way). By means of MDE, a considerable part of the effort required to design, develop and deploy Big Data applications would be reduced to modelling Big Data jobs using ad-hoc meta-models (e.g., for technological frame- works to be considered) and manipulating them with model- transformations; subsequently, refining these standard mod- els, designers can elaborate a complete deployable applica- tion image, based on the desired technological specifications (e.g., Hadoop/MapReduce, Storm, Spark, etc.) by means of model2text transformation (e.g., think of technologies such as XText 1 or JET 2 ). This paper makes an essential step towards providing model- driven design facilities for Big Data, by offering three novel contributions: (a) an architecture in support of model-driven Big Data design (see Sec. 2); (b) the introduction and dis- cussion of a series of meta-models for supporting said design activity (see Sec. 3); (c) an evaluation of the above using an illustrative case-study (see Sec. 4). 2. MODEL-DRIVEN BIG DATA DESIGN: AN ARCHITECTURE This section elaborates on the architectural details be- hind our proposed solution. The preliminary research idea and conceptual foundations for said idea stem from the state of the art [5]. More in particular, in [5] Casale et Al. ar- ticulate a preliminary exploration of a model-driven archi- tecture solution as part of the formulation of the DICE EU project 3 for the purpose of elaborating and further refin- ing a data-intensive application by means of three abstrac- tion layers consistently with the Model-Driven Architecture framework [7]. Quoting from [5]: “Models in DICE shall be formulated at three levels of abstraction, called DPIM (DICE Platform Independent Model), DTSM (DICE Tech- nology Specific Model), DDSM (DICE Deployment Specific Model): (a) the DPIM model corresponds to the OMG MDA PIM layer and describes the behaviour of the application as a graph that expresses the dependencies between computa- tions and data; (b) a DTSM, consists of a refinement of the DPIM and includes technology specific concepts and frame- 1 http://www.eclipse.org/Xtext/ 2 https://eclipse.org/modeling/m2t/?project=jet 3 http://www.dice-h2020.eu/