Global and Componentwise Extrapolation for Accelerating Data Mining from Large Incomplete Data Sets with the EM Algorithm Chun-Nan Hsu Han-Shen Huang Bo-Hou Yang Institute of Information Science Academia Sinica Nankang, Taipei, Taiwan {chunnan,hanshen,ericyang}@iis.sinica.edu.tw Abstract The Expectation-Maximization (EM) algorithm is one of the most popular algorithms for data mining from incom- plete data. However, when applied to large data sets with a large proportion of missing data, the EM algorithm may converge slowly. The triple jump extrapolation method can effectively accelerate the EM algorithm by substantially re- ducing the number of iterations required for EM to con- verge. There are two options for the triple jump method, global extrapolation (TJEM) and componentwise extrapo- lation (CTJEM). We tried these two methods for a variety of probabilistic models and found that in general, global extraplolation yields a better performance, but there are cases where componentwise extrapolation yields very high speedup. In this paper, we investigate when componentwise extrapolation should be preferred. We conclude that, when the Jacobian of the EM mapping is diagonal or block diag- onal, CTJEM should be preferred. We show how to deter- mine whether a Jacobian is diagonal or block diagonal and experimentally confirm our claim. In particular, we show that CTJEM is especially effective for the semi-supervised Bayesian classifier model given a highly sparse data set. 1. Introduction The Expectation-Maximization (EM) algorithm [4] is one of the most popular algorithms for data mining from incomplete data. Given an incomplete data set, the EM al- gorithm iteratively searches for the best parameter vector θ that maximizes the log-likelihood of the data. How- ever, when applied to large data sets with a large number of parameters to estimate, the EM algorithm may converge slowly. If the data sets also contain a large proportion of missing data or there are a large number of hidden variables in the model, the convergence of EM can be even slower. Aitken’s acceleration is one of the most commonly used method to speed up fixed-point iteration methods [2]. Since the EM algorithm can be considered as a fixed-point itera- tion method, we can apply Aitken’s acceleration to acceler- ate the EM algorithm [9, 10]. However, the multivariate version of Aitken’s acceler- ation requires to compute or approximate the Jacobian of the EM mapping matrix, which can be intractable. Many variants of Aitken’s acceleration have been proposed to ap- proximate Aitken’s acceleration as an extrapolation method. One of the methods is the triple jump extrapolation method (TJEM) [7, 5, 15]. The idea is to estimate the extrapola- tion rate by considering the previous two estimates of the parameter vectors. The triple jump extrapolation method can effectively accelerate the EM algorithm by substan- tially reducing the number of iterations required for the EM algorithm to converge. Another benefit of the triple jump method is that it can be easily integrated with exist- ing EM packages for any probabilistic model. We can even integrate the triple jump method with other extrapolation- based acceleration methods, such as the parameterized EM (pEM) [1] and the adaptive overrelaxed EM (aEM) [14], to further accelerate the convergence [6]. The triple jump method can extrapolate the parameter vector with one extrapolation rates for different dimensions. We refer to the former approach as global extrapolation and the latter as componentwise extrapolation. The component- wise extrapolation of the EM algorithm is referred to as the componentwise triple jump EM algorithm (CTJEM). Hes- terberg [5] proposed a global extrapolation method, while Huang et al. [7] described a componentwise extrapolaion method, though in that method many dimensions can be extrapolated together as a sub-vector. We tried these two methods for a variety of probabilistic models with synthe- sized data and found that in general, global extraplolation yields a better performance, but there are cases where com- ponentwise extrapolation yields very high speedup. In some