Proceedings of Machine Learning Research – Under Review 2021 Short Paper – MIDL 2021 submission
Domain Adaption for Homogenizing CT Scans using
Auto-Encoders for Cross-Dataset Medical Image Analysis
Mohammadreza Amirian
*1,2
amir@zhaw.ch
1
ZHAW School of Engineering, 8400 Winterthur, Switzerland
2
Ulm University, Institute of Neural Information Processing, 89081 Ulm, Germany
Javier A. Montoya-Zegarra
*1
mony@zhaw.ch
Ahmet Selman Bozkir
*1
bozk@zhaw.ch
Marco Calandri
3
marco.calandri@unito.it
3
University of Turin, Department of Oncology, 10124 Turin, Italy
Friedhelm Schwenker
2
friedhelm.schwenker@uni-ulm.de
Thilo Stadelmann
1,4
stdm@zhaw.ch
4
Fellow, ECLT European Centre for Living Technology, 30123 Venice, Italy
Editors: Under Review for MIDL 2021
Abstract
Medical imaging research profits from data unification and homogenization methods to
merge global datasets in order to reduce annotation effort and improve generalization of
trained models to unseen datasets. In this paper, we explicitly address dataset variability
using two public datasets and propose an architecture that aims at erasing the differences
in CT scans from different sources while simultaneously introducing only minimal changes
through leveraging the idea of deep auto-encoders. The proposed trainable prepossessing
architecture (PrepNet) (i) is jointly trained on the SARS-COVID-2 and UCSD COVID-CT
datasets and (ii) maintains discriminant features for downstream diagnosis.
Keywords: Adaptive preprocessing, domain adaptation, auto-encoder
1. Introduction
A major challenge in rolling out machine-learned models to a broad user base is the vari-
ability of data encountered in the real world. Models can only be expected to work well
on data of similar distribution as has been used for training, but ubiquitously, differences
e.g. in the image acquisition setup hinder the applicability of a once developed model in
novel settings. This paper uses the example of the negative effects of such failure to adapt
between different datasets in the context of COVID-19 diagnosis.
We address domain adaptation of medical image analysis methods by proposing a CNN
for preprocessing 2D CT scans: the model is trained to fool a classifier that discriminates
between various CT scanning datasets, thus aiming to remove the cross-dataset variability.
We evaluate the performance of the suggested method on the exemplary use case of predict-
ing COVID-19 positive cases, due to the global variability in respective datasets and the
availability of plenty of opportunities to compare. The methodology is inspired by genera-
tive adversarial learning (Schmidhuber, 2020).Our contribution is twofold: (i) we propose
a novel trainable preprocessing CNN architecture with a dual training objective that is ca-
pable of equalizing the variability of different CT-scanner technologies in the image domain
(PrepNet ), see Figure 1 (right); (ii) we validate this model by showing the transferability
of its diagnostic capabilities between different CT technologies based on common public
datasets.
*
Contributed equally
© 2021 M. Amirian, J.A. Montoya-Zegarra, A.S. Bozkir, M. Calandri, F. Schwenker & T. Stadelmann.