Ensemble methods with non-negative matrix factorization for non-payment prevention system RYSZARD SZUPILUK 1,2 , PIOTR WOJEWNIK 1,2 , TOMASZ ZĄBKOWSKI 1,3 1 Polska Telefonia Cyfrowa Ltd., Al. Jerozolimskie 181, 02-222 Warsaw 2 Warsaw School of Economics, Al. Niepodleglosci 162, 02-554 Warsaw 3 Warsaw Agricultural University, Nowoursynowska 159, 02-787 Warsaw POLAND Abstract: A well designed and reliable prevention system for non-payment event is very important for the telecom company. Monitoring is especially needed in case the client exceeds the level of his standard payments what can lead to his financial problems. In this paper, we propose a system describing client's behavior and informing about possible problems in advance. In this approach we apply novel ensemble methods to integrate information from many models predicting the customer's behaviour. The ensemble methods base on non-negative matrix factorization which allows identifying the fundamental prediction components. The practical experiment with prevention system confirmed that proposed procedure. Key-Words: Ensemble methods, non-negative matrix factorization, customer data modeling, prediction. 1 Introduction One of the most important tasks in the company where the information systems are introduced is online monitoring of individual customer behaviour. In this paper we focus on the prevention system for non- payment event. A key issue is to build an appropriate model describing client’s behaviour. Assuming that different methods can model the customer behaviour in a slightly different way it seems natural to integrate and to use the information generated by many models [7]. From analytical point of view the presented methodology can be treated as ensemble methods for prediction improvement. Usually solutions of ensemble methods propose the combination of a few models by mixing their results or parameters [1,8,16]. In this paper we propose an alternative concept based on the assumption that prediction results contain the latent destructive and constructive components common to all the model results [14,15]. The elimination of the destructive ones should improve the final results. To find the latent components we apply a multidimensional decompositions with non-negative matrix factorization (NMF) [6,11]. The method will be described in the framework of a flexible system for adapting and managing the dunning process, but can be applied as ensemble method to any regression problem. 2 Prediction improvement We assume that we test many models eg. neural networks for prediction customer behaviour. Next, we assume that each prediction result includes two types of components: constructive associated with the target and destructive associated with the inaccurate learning data, individual properties of models, missing data, not precise parameter estimation, distribution assumptions etc. We collect particular model results together , where N is number of observations, and treat them as multivariate variable , 1 × ℜ ∈ N i x N m T n 2 1 ] ,..., , [ x x x X = R × ∈ X . In similar way we assume that the set of basis components is represented by S . The relation between observed prediction results and latent basis components is expressed by N n T n R × ∈ S s s s , ] ,..., , [ 2 1 = AS X = , (1) and means matrix X factorisation by basis components matrix S and mixing matrix A. Our aim is to find such mixing matrix A and unknown basis components set that matrix S (after rows reordering) can be described as [ ] T n p p p s s s s s s S ,..., , , ,..., , 2 1 2 1 + + = ) ) ) , (2) where 1 × ∈ N i R s ) is i-th constructive component, 1 × ∈ N i R s is i-th destructive component. After basic components are classified into destructive and constructive ones we can reject the destructive Proceedings of the 11th WSEAS International Conference on SYSTEMS, Agios Nikolaos, Crete Island, Greece, July 23-25, 2007 385