The Knowledge Engineering Review, Vol. 00:0, 1–24. c 2004, Cambridge University Press DOI: 10.1017/S000000000000000 Printed in the United Kingdom A Comparative Study of Pivot Selection Strategies for Unsupervised Domain Adaptation XIA CUI, NOOR AL-BAZZAZ, DANUSHKA BOLLEGALA, FRANS COENEN University of Liverpool, Liverpool L69 3BX, United Kingdom E-mail: xia.cui@liverpool.ac.uk, noorbahjattayfor@yahoo.com, danushka.bollegala@liverpool.ac.uk, coenen@liverpool.ac.uk Abstract Selecting pivot features that connect a source domain to a target domain is an important first step in unsupervised domain adaptation (UDA). Although different strategies such as the frequency of a feature in a domain (Blitzer et al., 2006), mutual (or pointwise mutual) information (Blitzer et al., 2007; Pan et al., 2010) have been proposed in prior work in domain adaptation (DA) for selecting pivots, a comparative study into (a) how the pivots selected using existing strategies differ, and (b) how the pivot selection strategy affects the performance of a target DA task remain unknown. In this paper, we perform a comparative study covering different strategies that use both labelled (available for the source domain only) as well as unlabelled (available for both the source and target domains) data for selecting pivots for UDA. Our experiments show that in most cases pivot selection strategies that use labelled data outperform their unlabelled counterparts, emphasising the importance of the source domain labelled data for UDA. Moreover, pointwise mutual information (PMI), and frequency-based pivot selection strategies obtain the best performances in two state-of-the-art UDA methods. 1 Introduction Domain Adaptation (DA) considers the problem of adapting a model trained using data from one domain (i.e. source) to a different domain (i.e. target). DA methods have been successfully applied to many natural language processing (NLP) tasks such as, Part-of-Speech (POS) tagging (Blitzer et al., 2006; K¨ ubler and Baucom, 2011; Liu and Zhang, 2012; Schnabel and Sch¨ utze, 2013), sentiment classification (Blitzer et al., 2007; Li and Zong, 2008; Pan et al., 2010; Zhang et al., 2015; Bollegala et al., 2015), and machine translation (Koehn and Schroeder, 2007). Depending on the availability of labelled data for the target domain, DA methods are categorised into two groups: supervised domain adaptation (SDA) methods that assume the availability of (potentially small) labelled data for the target domain, and unsupervised domain adaptation (UDA) methods that do not. In this paper, we focus on UDA, which is technically more challenging than SDA due to the unavailability of labelled training instances for the target domain. UDA is more attractive in real-world DA tasks because it obviates the need to label target domain data. One of the fundamental challenges in UDA is the mismatch of features between the source and target domains. Because in UDA labelled data is available only for the source domain, even if we learn a highly accurate predictor using the source domain’s labelled data, the learnt model is often useless for making predictions in the target domain. The features seen by the predictor in the source domain’s labelled training instances might not occur at all in the target domain test instances. Even in cases where there is some overlap between the source and the target domain feature spaces, the discriminative power of those common features might vary across the two domains. For example, the word lightweight often expresses a positive sentiment for mobile electronic devices such as mobile phones, laptop computers, or handheld cameras, whereas the same word has a negative sentiment associated in movie reviews, because