ODIM: an efficient method to detect outliers via inlier-memorization effect of deep generative models Dongha Kim 1 Jaesung Hwang 2 Jongjin Lee 3 Kunwoong Kim 3 Yongdai Kim 3 Abstract Identifying whether a given sample is an outlier or not is an important issue in various real-world domains. This study aims to solve the unsuper- vised outlier detection problem where training data contain outliers, but any label information about inliers and outliers is not given. We pro- pose a powerful and efficient learning framework to identify outliers in a training data set using deep neural networks. We start with a new obser- vation called the inlier-memorization (IM) effect. When we train a deep generative model with data contaminated with outliers, the model first memo- rizes inliers before outliers. Exploiting this find- ing, we develop a new method called the outlier detection via the IM effect (ODIM). The ODIM only requires a few updates; thus, it is computa- tionally efficient, tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers successfully, regardless of the types of data, such as tabular, image, and sequen- tial. We empirically demonstrate the superiority and efficiency of the ODIM by analyzing 20 data sets. 1. Introduction Outlier (also anomaly) is an observation that differs signifi- cantly from other observations, and outlier detection (OD) is the task of identifying outliers in a given data set. OD has wide applications such as fraud detection, fault detec- tion, and defect detection in images. OD is also used as a pre-processing step in supervised learning to filter out anomalous training samples, which may degrade the perfor- mance of a predictive model. OD problems can be categorized into three areas in general: 1) Supervised outlier detection (SOD) requires label infor- 1 Department of Statistics, Sungshin Women’s University 2 SK Telecom 3 Department of Statistics, Seoul National University. Cor- respondence to: Yongdai Kim <ydkim0903@gmail.com>. Preprint. mation about whether each training sample is inlier (also normal) or outlier and solves the two-class classification task. A limitation of SOD is that it is hard to access the en- tirely labeled data set in practice. 2) Semi-supervised outlier detection (SSOD) refers to methods that assume all training data being inliers and construct patterns or models for the inliers. SSOD can be interpreted as the one-class classifica- tion task since information of outliers is not used during the training procedure. Similarly to SOD, it is not common to have a data set composed of only inliers (Chandola et al., 2009; Chalapathy & Chawla, 2019). 3) Unsupervised out- lier detection (UOD) deals with the most realistic situations where training data include some outliers but no label in- formation about anomalousness is available. Most anomaly detection tasks in practice are related to UOD since the in- formation of outliers in massive data is hardly known in advance. In this study, we propose a novel algorithm for UOD prob- lems. Our algorithm is motivated by so called the memo- rization effect observed in noisy label problems (Arpit et al., 2017; Jiang et al., 2018). The goal of noisy label problems is to learn an accurate classifier when some of the class labels in the training data are contaminated. When standard su- pervised learning algorithms are applied to such mislabeled data, an interesting phenomenon so called the memorization effect is observed where correctly labeled data are learned earlier and mislabeled data are learned later in the training phase of deep neural networks. The memorization effect makes it possible to detect mislabeled data by comparing per-sample losses in an early stage of the training phase. The aim of this paper is to apply the memorization effect to UOD problems to develop a novel algorithm for detecting outliers with high accuracy as well as high efficiency in computation. There already exists a study utilizing the memorization ef- fect for UOD problems. (Wang et al., 2019a) noticed that during a deep discriminative model is trained via the self- supervised learning framework, the model memorizes in- liers first and outliers next in the training phase, and named this phenomenon the inlier-priority effect. Generating more than a hundred artificial classes with a pre-specified anno- tation strategy, they suggested a method, called E 3 -Outlier, arXiv:2301.04257v1 [stat.ML] 11 Jan 2023