Tree Augmented Naive Bayes for Regression using Mixtures of Truncated Exponentials. Application to Higher Education Management ⋆ Antonio Fern´ andez, Mar´ ıa Morales, and Antonio Salmer´on Department of Statistics and Applied Mathematics, University of Almer´ ıa, Carrera de Sacramento s/n, E-04120 Almer´ ıa, Spain, {afalvarez,maria.morales,antonio.salmeron}@ual.es Abstract. In this paper we explore the use of Tree Augmented Naive Bayes (TAN) in regression problems where some of the independent vari- ables are continuous and some others are discrete. The proposed solution is based on the approximation of the joint distribution by a Mixture of Truncated Exponentials (MTE). The construction of the TAN structure requires the use of the conditional mutual information, which cannot be analytically obtained for MTEs. In order to solve this problem, we introduce an unbiased estimator of the conditional mutual information, based on Monte Carlo estimation. We test the performance of the pro- posed model in a real life context, related to higher education manage- ment, where regression problems with discrete and continuous variables are common. 1 Introduction In real life applications, it is common to ﬁnd problems in which the goal is to predict the value of a variable of interest depending on the values of some other observable variables. If the variable of interest is discrete, we are faced with a classiﬁcation problem, whilst if it is continuous, it is usually called a regression problem. In classiﬁcation problems, the variable of interest is called class and the observable variables are called features, while in regression frameworks, the variable of interest is called dependent variable and the observable ones are called independent variables. Bayesian networks [8, 12] have been previously used for classiﬁcation and regression purposes. More precisely, naive Bayes models have been applied to regression problems under the assumption that the joint distribution of the in- dependent variables and the dependent variable is multivariate Gaussian [7]. If the normality assumption is not fulﬁlled, the problem of regression with naive Bayes models has been approached using kernel densities to model the condi- tional distribution in the Bayesian network [5], but the obtained results are poor. Furthermore, the use of kernels introduce a high complexity in the model, which ⋆ This work has been supported by the Spanish Ministry of Education and Science, project TIN2004-06204-C03-01 and by Junta de Andaluc´ ıa, project P05-TIC-00276.