Proceedings of Machine Learning Research – Under Review 2021 Short Paper – MIDL 2021 submission Domain Adaption for Homogenizing CT Scans using Auto-Encoders for Cross-Dataset Medical Image Analysis Mohammadreza Amirian *1,2 amir@zhaw.ch 1 ZHAW School of Engineering, 8400 Winterthur, Switzerland 2 Ulm University, Institute of Neural Information Processing, 89081 Ulm, Germany Javier A. Montoya-Zegarra *1 mony@zhaw.ch Ahmet Selman Bozkir *1 bozk@zhaw.ch Marco Calandri 3 marco.calandri@unito.it 3 University of Turin, Department of Oncology, 10124 Turin, Italy Friedhelm Schwenker 2 friedhelm.schwenker@uni-ulm.de Thilo Stadelmann 1,4 stdm@zhaw.ch 4 Fellow, ECLT European Centre for Living Technology, 30123 Venice, Italy Editors: Under Review for MIDL 2021 Abstract Medical imaging research proﬁts from data uniﬁcation and homogenization methods to merge global datasets in order to reduce annotation eﬀort and improve generalization of trained models to unseen datasets. In this paper, we explicitly address dataset variability using two public datasets and propose an architecture that aims at erasing the diﬀerences in CT scans from diﬀerent sources while simultaneously introducing only minimal changes through leveraging the idea of deep auto-encoders. The proposed trainable prepossessing architecture (PrepNet) (i) is jointly trained on the SARS-COVID-2 and UCSD COVID-CT datasets and (ii) maintains discriminant features for downstream diagnosis. Keywords: Adaptive preprocessing, domain adaptation, auto-encoder 1. Introduction A major challenge in rolling out machine-learned models to a broad user base is the vari- ability of data encountered in the real world. Models can only be expected to work well on data of similar distribution as has been used for training, but ubiquitously, diﬀerences e.g. in the image acquisition setup hinder the applicability of a once developed model in novel settings. This paper uses the example of the negative eﬀects of such failure to adapt between diﬀerent datasets in the context of COVID-19 diagnosis. We address domain adaptation of medical image analysis methods by proposing a CNN for preprocessing 2D CT scans: the model is trained to fool a classiﬁer that discriminates between various CT scanning datasets, thus aiming to remove the cross-dataset variability. We evaluate the performance of the suggested method on the exemplary use case of predict- ing COVID-19 positive cases, due to the global variability in respective datasets and the availability of plenty of opportunities to compare. The methodology is inspired by genera- tive adversarial learning (Schmidhuber, 2020).Our contribution is twofold: (i) we propose a novel trainable preprocessing CNN architecture with a dual training objective that is ca- pable of equalizing the variability of diﬀerent CT-scanner technologies in the image domain (PrepNet ), see Figure 1 (right); (ii) we validate this model by showing the transferability of its diagnostic capabilities between diﬀerent CT technologies based on common public datasets. * Contributed equally © 2021 M. Amirian, J.A. Montoya-Zegarra, A.S. Bozkir, M. Calandri, F. Schwenker & T. Stadelmann.