[Re] Domain Generalization using Causal Matching Richard Jiles Dept. Computer Science Iowa State University Ames, Iowa, USA rdjiles@iastate.edu Mohna Chakraborty Dept. Computer Science Iowa State University Ames, Iowa, USA mohnac@iastate.edu Reproducibility Summary Scope of Reproducibility We reproduced the results of the paper "Domain Generalization Using Causal Matching." The standard supervised learning framework considers that the labels assigned to instances seen in the testing process must have appeared during the training phase. However, real-world designs may violate these considerations. For instance, in e-commerce, new products are released every day with different labels, and the labels may not be part of the model training. A generalized framework should be capable of detecting unseen labels. If a framework fails to detect unseen labels, it may face challenges in open domains and thus may not be generalizable. Methodology The open-source code of the paper has been used. The authors provided detailed instructions to reproduce the results on their GitHub page. We reproduced almost every table in the main text and a few of them from the appendix. In case of a mismatch of the results, we also investigated the cause and proposed possible explanations for such behavior. For the extensions, we wrote extra functions to check the paper’s claim on other open-source standard datasets. We mainly used the infrastructure offered by the publicly available GPUs offered by Google Colab and GPU-assisted desktop computers to train the models. Results Most of our results closely match the reported results in the original paper for the Rotated-MNIST [17], Fashion-MNIST [27], PACS [18, 28], and Chest-Xray [3] datasets. However, in some cases, as described later, we obtained better results quantitatively than the ones reported in the paper. By investigating the root cause of such mismatches, we provide a possible reason to avoid such a gap. We performed additional experiments by making necessary modifications for the Rotated-MNIST and Rotated Fashion-MNIST dataset. What was easy The authorized GitHub page of the paper has the open-source code, which was beneficial as it was well organized into multiple files. Thus, it was easy to follow. The experiments described in the paper were done on widely-used benchmark open-source datasets. Therefore, implementing each experiment was relatively easy to do. Likewise, since most of the parameters were reported in the scripts, we did not need much tuning in most experimentations. What was difficult Though running each experiment is relatively simple, the numerosity of experiments was a demanding task. In particular, each experiment in the actual setting requires training a network for a significant number of iterations. Having restricted access to computational resources and time, we sometimes changed the settings, sacrificing granularity. Nevertheless, these changes did not impact the interpretability of the final results. Communication with original authors We emailed the authors and received prompt responses to our questions regarding the provided Jupyter reproduction notebooks. Some tables had multiple runs for the same technique, but it was unclear how to execute the alternative runs. Preprint. Under review.