An effective hybrid learning system for telecommunication churn prediction Ying Huang , Tahar Kechadi School of Computer Science and Informatics, University College Dublin, Belfield, Dublin 4, Ireland article info Keywords: Classification Telecommunication Churn prediction Hybrid model Clustering Rule induction abstract Customer churn has emerged as a critical issue for Customer Relationship Management and customer retention in the telecommunications industry, thus churn prediction is necessary and valuable to retain the customers and reduce the losses. Moreover, high predictive accuracy and good interpretability of the results are two key measures of a classification model. More studies have shown that single model-based classification methods may not be good enough to achieve a satisfactory result. To obtain more accurate predictive results, we present a novel hybrid model-based learning system, which integrates the super- vised and unsupervised techniques for predicting customer behaviour. The system combines a modified k-means clustering algorithm and a classic rule inductive technique (FOIL). Three sets of experiments were carried out on telecom datasets. One set of the experiments is for ver- ifying that the weighted k-means clustering can lead to a better data partitioning results; the second set of experiments is for evaluating the classification results, and comparing it to other well-known model- ling techniques; the last set of experiment compares the proposed hybrid-model system with several other recently proposed hybrid classification approaches. We also performed a comparative study on a set of benchmarks obtained from the UCI repository. All the results show that the hybrid model-based learning system is very promising and outperform the existing models. Ó 2013 Elsevier Ltd. All rights reserved. 1. Introduction With recent evolution in the Information and Communication Technology (ICT) sector, numerous new and attractive services have been introduced, and they put huge pressure on traditional services. Customer churn has emerged as one of the major issues in Customer Relationship Management (CRM) in telecommunica- tion services around the world, for both wireless providers and long-distance carriers. For instance, in the U.S., telecom providers of long-distance and international services have been bearing the churn rates from 45% to 70% percent for some years (Mattison, 2001). Under the fierce competitive environment, it becomes very important for the telecom operators to retain their existing cus- tomers as acquiring new customers is much more expensive. Con- sequently, predicting which customers are likely to stop their subscription and switch to competitors (churn) is critical. Predict- ing the potential churners and successfully retain them, especially the valuable ones, can substantially increase the profitability of a company. In the telecommunications industry, operators usually capture the transactional data, which reflects the service usage, and some static data such as subscriber’s personal information and contract details. Data mining (DM) methods have emerged as a good alter- native to study the customer behaviour. We can find various DM techniques, such as decision tree, logistic regression, support vec- tor machine, artificial neural networks, inductive rule learning, etc. They have been applied to predict customer behaviour (Huang, Huang, & Kechadi, 2011; Hwang, Jung, & Suh, 2004; Larivire & Poel, 2005; Wei & Chiu, 2002; Xia & dong Jin, 2008). Most of the existing predictive modelling techniques, applied to customer churn, are based on supervised learning; very few of them have been based on unsupervised learning. In addition, most of the classifiers use single model (i.e., only one data mining technique). Many of the single model-based classifiers can predict potential churners to a large extent. However, either the accuracy is not good enough for some of the techniques or there is a room for improving the predic- tion accuracy for some others, and a hybrid model is a good alter- native for better classification performance. Moreover, usually the entire training data instances are all used to build prediction mod- els. However, it may be more effective to predict a new data in- stance based on partial training instances that are more similar to the test data than other training instances. The advantages of the proposed model over the other com- monly used modelling techniques in the domain of churn predic- tion concern the following aspects: Firstly, the prediction model 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.04.020 Corresponding author. Tel.: +353 873145218. E-mail address: ying.huang.1@ucdconnect.ie (Y. Huang). Expert Systems with Applications 40 (2013) 5635–5647 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa