Task Guided Representation Learning using Compositional Models for Zero-shot Domain Adaptation Shuang Liu, 1 Mete Ozay 2 1 RIKEN Center for AIP Abstract Zero-shot domain adaptation (ZDA) methods aim to trans- fer knowledge about a task learned in a source domain to a target domain, while data from target domain are not avail- able. In this work, we address learning feature representa- tions which are invariant to and shared among different do- mains considering task characteristics for ZDA. To this end, we propose a method for task-guided ZDA (TG-ZDA) which employs multi-branch deep neural networks to learn feature representations exploiting their domain invariance and share- ability properties. The proposed TG-ZDA models can be trained end-to-end without requiring synthetic tasks and data generated from estimated representations of target domains. The proposed TG-ZDA has been examined using benchmark ZDA tasks on image classification datasets. Experimental re- sults show that our proposed TG-ZDA outperforms state-of- the-art ZDA methods for different domains and tasks. Introduction In real-world applications, distribution of training data of a machine learning algorithm obtained from source domain D sr may diverge from that of its test data obtained from tar- get domain D t . This problem is called domain shift. Ma- chine learning models trained under domain shift suffer from a significant performance drop. For example, a model trained for detecting vehicles using a dataset collected in sunny days (e.g. from a source domain D sr ) performs well for inference on another dataset of vehicles collected in sunny days (e.g. from the same domain D sr ). However, the trained model may perform poorly on a dataset of vehicles collected in rainy days (e.g. from a target domain D t ). This problem becomes acute for deep neural networks (DNNs) which require massive amount of data for training. A solution to this problem is fine-tuning models pre- trained on D sr with data collected from D t . However, this solution demands considerable amount of labeled training data from D t , which may not be available in many applica- tions, e.g. medical informatics. To overcome this problem, Domain adaptation (DA) methods aim to train a model on a dataset from D sr , so that the trained model can perform well on a test dataset from D t . DA methods which train models using unlabelled data from D t are called unsupervised do- main adaptation (UDA) methods (Bousmalis et al. 2017; Lee et al. 2019; Chen et al. 2020). UDA methods require large amount of unlabelled data from D t to train models. How- ever, this assumption may not be valid in various real-world applications. For instance, access to D t may be limited or D t may be updated dynamically as in lifelong learning and autonomous driving. To address this problem for training models without using labels and unlabelled data from D t , zero-shot domain adaptation (ZDA) methods were proposed (Peng, Wu, and Ernst 2018; Wang and Jiang 2019). ZDDA (Kutbi, Peng, and Wu 2021) is a DNN based method proposed for ZDA. ZDDA models are trained in three phases across four modules, therefore, they are in- convenient to be deployed in various applications. CoCo- GAN (Wang and Jiang 2021) and its variations (Wang and Jiang 2020; Wang, Cheng, and Jiang 2021) extended Co- GAN (Liu and Tuzel 2016) to synthesize samples in missing domains for solving the ZDA problem. However, they can- not guarantee that their proposed GAN learns the real distri- bution of the missing domains. HGnet (Xia and Ding 2020) aims to learn domain invariant features, but does not incor- porate feature shareability among domains and across tasks, which is demonstrated to be beneficial for ZDA. We address the ZDA problem (Figure 1) employing in- variance (Zhao et al. 2019) and shareability (Liang, Hu, and Feng 2020) of hypotheses of feature representations in a compositional framework motivated by the recent advances in DA theory (Creager, Jacobsen, and Zemel 2021; Kamath et al. 2021; Li et al. 2021), and observations in deep learn- ing theory (LeCun, Bengio, and Hinton 2015; Saxe, Mc- Clelland, and Ganguli 2019; Bau et al. 2020; Lampinen and McClelland 2020; Sejnowski 2020; Poggio, Banburski, and Liao 2020). Our contributions are summarized as follows: • We propose a ZDA method called Task-guided Multi- branch Zero-shot Domain Adaptation (TG-ZDA) depicted in Figure 2, which takes advantage of invariance and share- ability of learned features by the guidance of auxiliary tasks (Figure 1). In TG-ZDA, we first estimate hypotheses of do- main invariant and shareable features. To transfer knowledge among domains governing auxiliary tasks, we learn task- aware hypotheses. Then, we integrate them in composite hy- potheses by employing a novel end-to-end trainable DNN, since hierarchical compositional functions can be approxi- mated by DNNs without incurring the curse of dimensional- ity (Poggio, Banburski, and Liao 2020). TG-ZDA does not arXiv:2109.05934v1 [cs.CV] 13 Sep 2021