J. ICT Res. Appl., Vol. 10, No. 2, 2016, 95-109 95 Received October 1 st , 2015, 1 st Revision November 2 nd , 2015, 2 nd Revision February 2 nd , 2016, Accepted for publication March 14 th , 2016. Copyright © 2016 Published by ITB Journal Publisher, ISSN: 2337-5787, DOI: 10.5614/itbj.ict.res.appl.2016.10.2.1 High Performance CDR Processing with MapReduce Mulya Agung* & A. Imam Kistijantoro School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jalan Ganesha No. 10, Bandung 40132, Indonesia *E-mail: agung@tritronik.com Abstract. A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, fraud detection and analytical purposes. However, in the real world these needs face a big data challenge. Billions of CDRs are generated every day and the processing systems are expected to deliver results in a timely manner. The capacity of our current production system is not enough to meet these needs. Therefore a better performing system based on MapReduce and running on Hadoop cluster was designed and implemented. This paper presents an analysis of the previous system and the design and implementation of the new system, called MS2. In this paper also empirical evidence is provided to demonstrate the efficiency and linearity of MS2. Tests have shown that MS2 reduces overhead by 44% and speeds up performance nearly twice compared to the previous system. From benchmarking with several related technologies in large-scale data processing, MS2 was also shown to perform better in the case of CDR batch processing. When it runs on a cluster consisting of eight CPU cores and two conventional disks, MS2 is able to process 67,000 CDRs/second. Keywords: call detail records; Hadoop; high performance; Java EE; MapReduce; telecommunication mediation. 1 Introduction CDRs generated by low-level telecommunication equipment must first be prepared before they can be processed by high-level applications. In telecommunications, the system that handles this preprocessing stage is called the mediation system. Due to the large size of the generated CDRs and the need for fast processing of the results for various purposes, the preprocessing stage in the mediation system is a big data challenge. As discussed in [1], to achieve acceptable performance, this kind of application needs other techniques than conventional computation. For many years, a scalable mediation system called MS1 has been used and running in production systems to perform CDR processing for one of the biggest telecommunication providers in Indonesia. However, due to the fast growth of subscribers, using it at the current scale becomes increasingly