A Framework for Tuning Posterior Entropy in Unsupervised Learning Rajhans Samdani rsamdan2@illinois.edu Ming-Wei Chang minchang@microsoft.com Dan Roth danr@illinois.edu Abstract We present a general framework for unsu- pervised and semi-supervised learning con- taining a graded spectrum of Expectation Maximization (EM) algorithms. We call our framework Unified Expectation Maximiza- tion (UEM.) UEM allows us to tune the en- tropy of the inferred posterior distribution during the E-step to impact the quality of learning. Furthermore, UEM covers existing algorithms like standard EM and hard EM as well as constrained versions of EM such as Constraint-Driven Learning (Chang et al., 2007) and Posterior Regularization (Ganchev et al., 2010). Within the UEM framework, we can adapt the learning procedure to the data, initialization point, and supervision sig- nals like constraints. Experiments on POS tagging, information extraction, and word- alignment show that often the best perform- ing algorithm in the UEM family is a new algorithm that wasn’t available earlier, ex- hibiting the benefits of the UEM framework. 1. Introduction Expectation Maximization (EM) (Dempster et al., 1977) is inarguably the most widely used algorithm for unsupervised and semi-supervised learning (McCallum et al., 1998; Nigam et al., 2000; Brown et al., 1993; Klein & Manning, 2004). Recently, EM algorithms which incorporate constraints on structured output spaces have also been proposed (Chang et al., 2007; Ganchev et al., 2010; Mann & McCallum, 2008). Several variations of EM (e.g. hard EM) exist in the literature and choosing a suitable variation is often Presented at the International Conference on Machine Learning (ICML) workshop on Inferning: Interactions be- tween Inference and Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). very task-specific. In this paper, we focus on the vari- ations of EM that learn different models via differ- ent ways of inferring the posterior distribution during the E-step. We observe that, while parameter reg- ularization in supervised learning is very well stud- ied, to the extent that tuning regularization penalty is now a standard practice, the effect of regularizing the inferred distribution in EM is relatively unexplored. The EM algorithms which try to achieve this are few and far between. Some works have shown that for certain tasks, hard EM is more suitable than regular EM (Spitkovsky et al., 2010). With constraints incor- porated in changing the inference during the E-step, Posterior Regularization (PR) (Ganchev et al., 2010) corresponds to EM while Constraint-Driven Learning (CoDL) (Chang et al., 2007) corresponds to hard EM. The problem of choosing a good variation of EM re- mains elusive, along with the possibility of simple and better alternatives. In this paper, we approach these concerns from a novel perspective. We present a unified framework for EM, Unified Expectation Maximization (UEM) (Samdani et al., 2012), that gives an explicit handle on the en- tropy of the distribution inferred during the E-step of learning. UEM provides a continuous spectrum of EM algorithms parameterized by a simple temperature-like tuning parameter. Furthermore, UEM covers existing versions of the EM algorithm like standard and hard EM, PR and CODL. Using UEM, we can modulate the entropy of the in- ferred distribution to better fit the given data, initial- ization, and constraints. Existing EM codes can be very easily extended to implement UEM, which makes it a very easy way for practitioners to extract bet- ter performance out of their systems. We conduct ex- periments on unsupervised POS tagging, unsupervised word-alignment, and semi-supervised information ex- traction and show that choosing the right UEM vari- ation outperforms existing EM algorithms by a signif- icant margin.