Active Supervised Domain Adaptation Avishek Saha , Piyush Rai , Hal Daum´ e III , Suresh Venkatasubramanian , and Scott L. DuVall § School of Computing, University of Utah {avishek,piyush,suresh}@cs.utah.edu Department of Computer Science, University of Maryland CP hal@umiacs.umd.edu § VA SLC Healthcare System & University of Utah scott.duvall@hsc.utah.edu Abstract. In this paper, we harness the synergy between two important learning paradigms, namely, active learning and domain adaptation. We show how active learning in a target domain can leverage information from a different but related source domain. Our proposed framework, Ac- tive Learning Domain Adapted (Alda), uses source domain knowledge to transfer information that facilitates active learning in the target do- main. We propose two variants of Alda: a batch B-Alda and an online O-Alda. Empirical comparisons with numerous baselines on real-world datasets establish the efficacy of the proposed methods. Key words: active learning, domain adaptation, batch, online 1 Introduction We consider the supervised 1 domain adaptation setting [9] where we have a large amount of labeled data from some source domain, a large amount of unlabeled data from a target domain, and additionally a small budget for acquiring labels in the target domain. We show how, apart from leveraging information in the usual domain adaptation sense, the information from the source domain is further leveraged to selectively query for labels in the target domain (instead of choosing them randomly, as is the common practice). We achieve this by first training the best possible classifier in the source without using target labels, for instance, either by simply training a supervised classifier on the source labeled data, or by using some unsupervised adaptation technique using the unlabeled target data as well. Then, we use this learned hypothesis in various ways to leverage the source domain information when we are additionally given some fixed budget for acquiring some extra labeled target data (i.e., the active learning setting [12]). Authors contributed equally. 1 We define supervised domain adaptation as having labeled data in both source and target, unsupervised domain adaptation as having labeled data in only source, and semi-supervised domain adaptation as having labeled data in source and both labeled and unlabeled data in target.