A new eﬀective and eﬃcient measure for outlying aspect mining Durgesh Samariya 1 , Sunil Aryal 2 , and Kai Ming Ting 1,3 1 School of Science, Engineering and Information Technology, Federation University, Churchill, Australia {d.samariya,kaiming.ting}@federation.edu.au 2 School of IT, Deakin University, Geelong, Australia sunil.aryal@deakin.edu.au 3 National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China tingkm@nju.edu.cn Abstract. Outlying Aspect Mining (OAM) aims to ﬁnd the subspaces (a.k.a. aspects) in which a given query is an outlier with respect to a given dataset. Existing OAM algorithms use traditional distance/density-based outlier scores to rank subspaces. Because these distance/density-based scores depend on the dimensionality of subspaces, they cannot be com- pared directly between subspaces of diﬀerent dimensionality. Z-score normalisation has been used to make them comparable. It requires to compute outlier scores of all instances in each subspace. This adds sig- niﬁcant computational overhead on top of already expensive density estimation—making OAM algorithms infeasible to run in large and/or high-dimensional datasets. We also discover that Z-score normalisation is inappropriate for OAM in some cases. In this paper, we introduce a new score called SiNNE, which is independent of the dimensionality of subspaces. This enables the scores in subspaces with diﬀerent dimension- alities to be compared directly without any additional normalisation. Our experimental results revealed that SiNNE produces better or at least the same results as existing scores; and it signiﬁcantly improves the runtime of an existing OAM algorithm based on beam search. Keywords: Outlying aspect mining, Dimensionality-unbiased score, Out- lier explanation, Nearest neighbor ensemble 1 Introduction Real-world datasets often have some anomalous data, a.k.a. outliers, which do not conform with the rest of the data. [3] formally deﬁned outlier as: “An ob- servation (or a subset of observations) which appears to be inconsistent with the remainder of that set of data”. Outlier Detection (OD) is an important task in data mining that deals with detecting outliers in datasets automatically. A wide range of OD algorithms has been proposed to detect outliers in a dataset. While those algorithms are good at detecting outliers, they cannot explain why arXiv:2004.13550v2 [cs.LG] 2 May 2020