Available online at www.sciencedirect.com Sensors and Actuators B 129 (2008) 643–651 Smell similarity on the basis of gas sensor array measurements K. Brudzewski a,∗ , S. Osowski a,b , K. Wolinska a , J. Ulaczyk a a Warsaw University of Technology, Warsaw, Poland b Military University of Technology, Warsaw, Poland Received 23 May 2007; received in revised form 3 September 2007; accepted 4 September 2007 Available online 21 September 2007 Abstract This paper discusses the problem of assessment of the similarity of smells on the basis of the gas sensor signals applied in an electronic nose system. We have compared the measures of similarity based on the geometrical description, the information theoretic approach and the statistical Kolmogorov–Smirnov test. Our main task was to develop the measures compatible with the human feeling of smells and a wide margin between the similar and dissimilar smells. The results concerning recognition of the similar and dissimilar smells presented and discussed in the paper suggest the Kolmogorov–Smirnov measure is most compatible with the human reception of smells and provides the widest margin. © 2007 Elsevier B.V. All rights reserved. Keywords: Electronic noise; Smell similarity; Sensor array 1. Introduction The question of similarity of smell is an important research subject in the computer recognition of aroma by an electronic nose system [1–7], because it is a very practical problem, of great applicability in cosmetic and food industry, especially for tracing the process of aging of the products. In spite of its great usefulness no totally acceptable measures of similarity of aroma have been deﬁned yet. The most often used measure is based on a distance between patterns of sensor signals representing the aroma in the feature space. Proximity of two objects means that they are similar in the afﬁnity sense. However the impor- tant problem is the resolution. Applying the normalized scale of similarity measure (the range 0–1) with 1 denoting the highest possible similarity (two same smells), it is desired to establish the measure taking all values in this range and not limited to the small sub-range of it. The aim of this paper is to study the subject of the determi- nation of the similarity measures and compare them with the human feeling. The results will show that human and algorith- mic similarity measures vary substantially in nature, but could be grouped into a cohesive way. The similarity measures con- sidered in the paper can be separated into three main groups. ∗ Corresponding author. E-mail address: brudz@ch.pw.edu.pl (K. Brudzewski). The ﬁrst one is based on the geometrical approach. The prob- lem of similarity of two smells represented by the sensor signals organized in the vector form can be deﬁned as similarity between the vectors of data. The geometrical similarity takes into account the distances between the vectors of data samples belonging to different groups of data. In the classical geometric approach, we determine the Euclidean distances between the data samples belonging to different groups for all possible combinations. The mean of these distances is used in deﬁnition of similarity. The other popular geometric method, called vector space approach (VSA), relies on clustering the data vectors ﬁrst. After clustering all vectors belonging to the same cluster are represented by their prototype vectors (so called centroids). The similarity between two clusters may be measured by the cosine of the angle between the centroids representing them or by the Euclidean distance between centroids. The problem, how to estimate the number of clusters, groups, dimensions, etc. is a pervasive one in a mul- tivariate analysis. If there are no a priori theoretical reasons, such decisions tend to remain somewhat arbitrary. In cluster analysis and multi-dimensional scaling, decisions based upon visual inspection of results are common. The results presented in the paper show the weakness and limitations of both geomet- rical measures, especially their small margins (small ranges of similarity measure for similar and dissimilar smells). To counteract this disadvantage, we have considered the measures based on the statistical principles, considering the dis- tributional similarity between groups of data samples. All sensor 0925-4005/$ – see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.snb.2007.09.050