Big Data Platform Development with a Domain Specific Language for Telecom Industries Cüneyt Şenbalcı, Serkan Altuntaş, Zeki Bozkus, Taner Arsan* Kadir Has University Computer Engineering Department Cibali Kampusu 34083 Istanbul, Turkey * arsan@khas.edu.tr Abstract – This paper introduces a system that offer a special big data analysis platform with Domain Specific Language for telecom industries. This platform has three main parts that suggests a new kind of domain specific system for processing and visualization of large data files for telecom organizations. These parts are Domain Specific Language (DSL), Parallel Processing/Analyzing Platform for Big Data and an Integrated Result Viewer. In addition to these main parts, Distributed File Descriptor (DFD) is designed for passing information between these modules and organizing communication. To find out benefits of this domain specific solution, standard framework of big data concept is examined carefully. Big data concept has special infrastructure and tools to perform for data storing, processing, analyzing operations. This infrastructure can be grouped as four different parts, these are infrastructure, programming models, high performance schema free databases, and processing-analyzing. Although there are lots of advantages of Big Data concept, it is still very difficult to manage these systems for many enterprises. Therefore, this study suggest a new higher level language, called as DSL which helps enterprises to process big data without writing any complex low level traditional parallel processing codes, a new kind of result viewer and this paper also presents a Big Data solution system that is called Petaminer. Keywords – Big Data Analysis, Domain Specific Language, Parallel Processing and Analyzing I. INTRODUCTION Steps of data processing/analyzing and visualization of results can be performed easily without writing complex queries or C, Java codes, by using this platform. This also prevents consuming too much time to perform some basic operations. Global data usage has been increasing exponentially since last ten years [1]. If we look at the type of these data, we can show that huge amount of them are generated in our daily life. Customer feedback, call detail records, billing, social media analyzing, emails, web server logs, databases are some of the most important information for enterprises. Collection and correlation of different kind of data is not an easy operation. Traditional storage and processing systems cannot be used to handle these large datasets [2]. All of these large datasets are called as Big Data. Infrastructure creates the base of Big Data concept. Distributed parallel processing and storing tools are placed on this structure, which is generally known as distributed servers or cloud. These are easy to manage virtual systems that are served as IaaS by big companies such as Google, Amazon, and Microsoft. Traditional programming models cannot be designed for large datasets. Therefore, Google has developed a new kind of programming framework that is called Map Reduce [3]. This programming model helps problems to divide multiple tasks and solve them in parallel way. Most important part of Big Data tools (analyzing and processing) use this algorithm to increase efficiency of solving complex tasks like Hadoop [4], Hive [5], Pig [6], Cascading, Cascalog, Sawzall, Dremel and so on. When it comes to large datasets, relational database management systems are not enough to achieve this target. Therefore, High Performance Schema Free Databases – generally known as NoSQL databases- are designed to perform data operations efficiently and rapidly. BigTable, Hbase, Cassandra, MongoDB, and CouchDB are some of the most popular NoSQL database systems. When companies deal with big data, it is very hard to obtain valuable information in it. So it is very important to process and analyze of big data on the distributed parallel systems. There are some special kinds of tools to overcome this issue like Hive, Pig, Mahout [7], R and so on. Although there are lots of advantages of Big Data concept, it is still very difficult to manage these systems for many enterprises. Therefore, this study suggest a new higher level language –called as DSL- which helps enterprises to process big data without writing any complex low level traditional parallel processing codes, a new kind of result viewer and this paper also presents a Big Data solution system that is called Petaminer. Traditional databases, management and analysis tools, algorithms and processes cannot be easily applied these large datasets [8]. This concept has three main characteristics: Volume, Variety, and Velocity. Volume of data that is stored to analyze is extremely increasing. Today, there are almost 2.7 Zettabytes of data in the digital world. [9]. Also Big Data concept has to deal with its variety. There is not only structured data types but also semi structured and unstructured which is not suitable to store in RDBMS.