Mixtures of Monotone Networks for Prediction Marina Velikova, Hennie Daniels, and Ad Feelders Abstract— In many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human-decision maker. In this paper we consider partially monotone prediction problems, where the target variable depends monotonically on some of the input variables but not on all. We propose a novel method to construct prediction models, where monotone dependences with respect to some of the input variables are preserved by virtue of construction. Our method belongs to the class of mixture models. The basic idea is to convolute monotone neural networks with weight (kernel) functions to make predictions. By using simulation and real case studies, we demonstrate the application of our method. To obtain sound assessment for the performance of our approach, we use standard neural networks with weight decay and partially monotone linear models as benchmark methods for comparison. The results show that our approach outperforms partially monotone linear models in terms of accuracy. Furthermore, the incorporation of partial monotonicity constraints not only leads to models that are in accordance with the decision maker’s expertise, but also reduces considerably the model variance in comparison to standard neural networks with weight decay. Keywords— mixture models, monotone neural networks, partially monotone models, partially monotone problems. I. I NTRODUCTION I N many data mining applications, it is a priori known that the target function should satisfy certain constraints imposed by, for example, economic theory or a human- decision maker. In many cases, however, the final model obtained by data mining techniques alone does not meet these constraints. It is required that the algorithms have to be modified (enhanced) to obey the constraints in a strict fashion. One type of constraint, which is common in many decision problems, is the monotonicity constraint stating that the greater an input is, the greater the output must be, all other inputs being equal. There is a wide range of applications where monotonicity properties hold. Well-known examples include credit loan approval, the dependence of labor wages as a function of age and education, investment decisions, hedonic price models, selection and evaluation tasks ([1], [2]). Several data mining techniques have been developed, which incorporate monotonicity constraints such as neural net- works ([3], [4], [5], [6]), rational cubic interpolation of one- dimensional functions ([7]), decision trees ([8], [9], [10]), Manuscript received April 25, 2006. Marina Velikova is with the Center for Economic Research, Tilburg University, The Netherlands (phone: +31 13 466 8721, fax: +31 13 466 3069, e-mail: M.Velikova@uvt.nl). Hennie Daniels is with the Center for Economic Research, Tilburg Uni- versity, The Netherlands and ERIM Institute of Advanced Management Studies, Erasmus University Rotterdam, Rotterdam, The Netherlands (e-mail: daniels@uvt.nl). Ad Feelders is with the Department of Information and Computing Sci- ences, Utrecht University, The Netherlands (e-mail: ad@cs.uu.nl). etc. However, the main assumption for the implementation of most of these methods is that the function (output) being estimated should be monotone in all inputs (so-called total monotonicity). This in practice, of course, is not always the case. In this paper we consider partially monotone problems, where we assume that the target variable depends monoton- ically on some of the input variables but not on all. For example, common sense suggests that the house price has monotone increasing dependence on the number of rooms and the total house area, whereas for the number of floors this dependence does not necessarily hold. Such prior knowl- edge about monotone relationships can be incorporated as constraints in data mining algorithms in order to improve the accuracy or interpretability of the models derived as well as to reduce their variance on new data. It is known that non-monotone functions can often be represented as compositions of monotone functions; for exam- ple, unimodal probability distribution functions are monotone increasing on the left side of the mode point, and monotone decreasing on the right side ([6]). This implies that first we can construct a number of monotone models corresponding to the monotone regions in the non-monotone function; then we can combine the local monotone models in order to obtain the global model. The paper is organized as follows. In the next section we introduce notations and definitions related to monotonicity, which are needed for the follow-up discussion. The main contribution of this paper is the approach for partial mono- tonicity presented in Section IV-A. The approach is based on the convolution of kernel functions and a special type of monotone neural networks, introduced in Section III. In Section IV-B we present the design and the results from simulation studies carried out to test the performance of the proposed approach for partial monotonicity. Section IV- C demonstrates the application of the approach on a real case study of predicting abalone age. Concluding remarks are given in Section V. II. NOTATION AND DEFINITIONS Let x denote the vector of independent variables, which takes values in a k-dimensional input space, X , and ℓ denotes the dependent variable that takes values in a one-dimensional space, L. We assume that a data set D =(x,ℓ x ) of N points in X×L is given. For monotone problems, we assume that the data are gen- erated by a process with the following properties: ℓ x = f (x)+ ǫ (1) International Journal of Computational Intelligence Volume 3 Number 3 204