J. ICT Res. Appl., Vol. 10, No. 2, 2016, 95-109 95
Received October 1
st
, 2015, 1
st
Revision November 2
nd
, 2015, 2
nd
Revision February 2
nd
, 2016, Accepted for
publication March 14
th
, 2016.
Copyright © 2016 Published by ITB Journal Publisher, ISSN: 2337-5787, DOI: 10.5614/itbj.ict.res.appl.2016.10.2.1
High Performance CDR Processing with MapReduce
Mulya Agung* & A. Imam Kistijantoro
School of Electrical Engineering and Informatics,
Institut Teknologi Bandung, Jalan Ganesha No. 10, Bandung 40132, Indonesia
*E-mail: agung@tritronik.com
Abstract. A call detail record (CDR) is a data record produced by
telecommunication equipment consisting of call detail transaction logs. It
contains valuable information for many purposes in several domains, such as
billing, fraud detection and analytical purposes. However, in the real world these
needs face a big data challenge. Billions of CDRs are generated every day and
the processing systems are expected to deliver results in a timely manner. The
capacity of our current production system is not enough to meet these needs.
Therefore a better performing system based on MapReduce and running on
Hadoop cluster was designed and implemented. This paper presents an analysis
of the previous system and the design and implementation of the new system,
called MS2. In this paper also empirical evidence is provided to demonstrate the
efficiency and linearity of MS2. Tests have shown that MS2 reduces overhead by
44% and speeds up performance nearly twice compared to the previous system.
From benchmarking with several related technologies in large-scale data
processing, MS2 was also shown to perform better in the case of CDR batch
processing. When it runs on a cluster consisting of eight CPU cores and two
conventional disks, MS2 is able to process 67,000 CDRs/second.
Keywords: call detail records; Hadoop; high performance; Java EE; MapReduce;
telecommunication mediation.
1 Introduction
CDRs generated by low-level telecommunication equipment must first be
prepared before they can be processed by high-level applications. In
telecommunications, the system that handles this preprocessing stage is called
the mediation system. Due to the large size of the generated CDRs and the need
for fast processing of the results for various purposes, the preprocessing stage in
the mediation system is a big data challenge. As discussed in [1], to achieve
acceptable performance, this kind of application needs other techniques than
conventional computation.
For many years, a scalable mediation system called MS1 has been used and
running in production systems to perform CDR processing for one of the
biggest telecommunication providers in Indonesia. However, due to the fast
growth of subscribers, using it at the current scale becomes increasingly