Architecture Design of Pattern Detection System for Smart Cities Datasets Valentina-Camelia Bojan, Ionut-Gabriel Raducu, Florin Pop, Mariana Mocanu, Valentin Cristea Computer Science Department, Faculty of Automatic Control and Computers, University Politehnica of Bucharest Bucharest, Romania Emails: bojan.valentinacamelia@gmail.com, gabi.raducu25@gmail.com, florin.pop@cs.pub.ro, mariana.mocanu@cs.pub.ro, valentin.cristea@cs.pub.ro Abstract—Nowadays, there is more and more interest in the research and development of systems, applications, tools or frameworks for ‘smart‘ environments. We do not want to bother with useless actions or decisions anymore because we want to spent our time doing more valuable activities. This would only be one reason to put the bases and after that to build a platform able to extract patterns and useful information from data measured by devices that monitor the ‘smart‘ environment. A platform of this kind would become the main reason for the environment to be a ‘smart‘ one and for more people to understand the value of such an environment. In this paper we investigate the need of a global and generic platform able to work with many type of datasets, with various systems and to serve different ‘Smart cities‘ applications. Through this platform we aim to unify the need of all ‘Smart cities‘ systems for having and using mined data, patterns extracted from the generated (measured) raw data. Index Terms—data analysis; distributed processing; machine learning; pattern recognition; time series; smart cities. I. I NTRODUCTION We live in a society where the concept ‘information means power‘ becomes more and more popular, regarding any busi- ness that relies on its customers. Similarly, for us, people, another concept starts to apply very well: ‘information means comfort‘, because the more we know in advance, the more we can save our time from finding certain pieces of information. The question that rises is ’How can we have any valuable information just a click away?’ This desire marks the begin- nings of ‘Smart cities‘ technologies, systems, ideas and the tool to achieve all these things is big data. To obtain valuable data we have to store every piece of information that we work with, that we give as output, as opinion, as decision. We have to accept to be a little part from the big system that is a city or even the entire world, so that we can complete the smart ensemble. Big data is a term that describes datasets with sizes that cannot be processed, analyzed, queried and managed using the traditional tools, frameworks, techniques. Doug Laney used a ‘3V‘ model to describe big data. The three V’s are volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources) [1]. Using big data often means to be ‘smart‘ and that is because applications that use big data relies on machine learning systems that provide the necessary tools to dig into the data, put apart the noises and extract information of interest. As Mayer-Sch¨ onberger says in his book, big data does not ask why and simply detects patterns [2]. This research has as its purpose the analysis of data for pattern recognition. Because this is such a big domain, we will target only the systems for ‘Smart cities‘ that often work with time series data. The Internet of Things (IoT) produces an enormous amount of data every day through the huge number of sensors that generate thousands of measurement each second [3], [4]. Almost every object in our everyday life has the ability to emit data [5]. Smart meters in plants, smart shirts for athletes, smart watches that monitors us, smartphones, medical sensors are only a few examples. No matter the system or the problem that we want to solve, we need to analyze huge amounts of structured data. As far as that goes storing, big data receives support from all the companies that develop solutions of cloud storage. Meanwhile, the domain of analysis is open to research and improvements and we will focus on it. There is no unique data. Everything has a repetitive part and we intend to find it in order to supply valuable data for valuable moments. But none of these can be possible if it is not supported by a strong, scalable, efficient and effective platform of which architecture relies on suitable components for the ‘Smart cities‘ purposes. In this paper we describe a proposal of such an architecture and prove how this proposal can be used for solving some real problems. ‘Smart cities‘ do not mean only storing data generated by our devices, visualize them or make the devices react to certain signals. To build a smart city we need to build a system able to think for us, able to know how to analyze our data and to provide the necessary information that bring us value. For example, what value can give a system that only stores the temperature value and shows their evolution? The answer is: no value. Everything changes for a system that is able to predict the weather. However, to do this it requires algorithms for pattern discovery and for integration with other parameters of interest. Therefore, the main motivation of this research is to show the strength of a system for ‘Smart cities‘ that relies on machine learning techniques and integrate them into a cloud based infrastructure. Through this platform we want to ex- plore the Big-data in order to detect the unusual data values reported from the sensors, which can be useful in finding the reasons of a failure and preventing it in the future [3]. The detection of patterns in the measured values can also be used