Prior Knowledge Driven Domain Adaptation Gourab Kundu kundu2@illinois.edu Ming-Wei Chang mchang21@illinois.edu Dan Roth danr@illinois.edu Computer Science Department, University of Illinois at Urbana Champaign, IL 61801 Abstract The performance of a natural language sys- tem trained on one domain often drops sig- nificantly when testing on another domain. Therefore, the problem of domain adaptation remains one of the most important natural language processing challenges. While many different domain adaptation frameworks have been proposed, they have ignored one natu- ral resource – the prior knowledge on the new domain. In this paper, we propose a new adaptation framework called Prior knowl- edge Driven Adaptation (PDA), which takes advantage of the knowledge on the target domain to adapt the model. We empiri- cally study the effects of incorporating prior knowledge in different ways. On the task of part-of-speech tagging, we show that prior knowledge results in 42% error reduction when adapting from news text to biomedi- cal text. On the task of semantic role label- ing, when adapting from one news domain to another news domain, prior knowledge gives 25% error reduction for instances of be verbs (unseen in training domain) and 9% error re- duction over instances of all verbs. 1. Introduction Domain adaptation is an important issue for statis- tical learning based systems. For example, in natu- ral language processing (NLP) tasks, statistical mod- els trained on labeled data of one domain perform well on the same domain, but their performance de- grades severely when tested in a different domain. For example, all systems of CoNLL 2005 shared task (Carreras & M`arquez, 2005) on Semantic Role Presented at the ICML 2011 Workshop on Combining Learning Strategies to Reduce Label Cost, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s). Labeling (SRL) show a performance degradation of almost 10% or more when tested on a different do- main. For the task of part-of-speech (POS) tagging, performance drops almost 9% when systems trained on Wall Street Journal (WSJ) domain are tested on the Biomedical domain (Blitzer et al., 2006). Since label- ing is expensive and time-consuming, there is a need for adapting the model trained on a large amount of labeled data of a domain (source domain) to a new domain (target domain) which may have very few or no labeled data. One important resource that has largely been ignored in domain adaptation efforts is the prior knowledge on the target domain. Moreover, such prior knowl- edge is easy to collect and it is available for many do- mains. Prior knowledge may be available about the content of the domain like the vocabulary or structure of the sentences or styles of text. For example, tran- scribed text usually does not have any capitalization that is present in manually written text and this infor- mation is often known a priori. Prior knowledge may also be available about the annotation differences of the source and the target domain. For example, the names of all entities are annotated as proper nouns in WSJ domain (WSJWiki) whereas the names of genes are annotated as common nouns in Biomedical do- main (BioIEWiki). As another example, in the CoNLL 2007 shared task of domain adaptation (Nivre et al., 2007) for dependency parsing, there were significant annotation differences across the source and target do- main data. The participating teams did not have ac- cess to any labeled data in the target domain and so they could not learn a model to account for the annota- tions in the target domain. In the end, no team could improve the results substantially above the result ob- tained by the source domain model applied directly to the target domain data. Over the years, many adaptation frameworks have been proposed in the literature. Some of them focus on how to use a small amount of labeled data from the