Comparison of Euclidean Distance Function and Manhattan Distance Function Using K-Mediods Md. Mohibullah Md. Zakir Hossain Mahmudul Hasan Student (M.Sc. - Thesis) Assistant Professor Assistant Professor Department of Computer Science Department of Computer Science Department of Computer Science & Engineering & Engineering & Engineering Comilla University Comilla University Comilla University Comilla, Bangladesh Comilla, Bangladesh Comilla, Bangladesh. Abstract--Clustering is one kind of unsupervised learning methods. K-mediods is one of the partitioning clustering algorithms and it is also a distance based clustering. Distance measure is an important component of a clustering algorithm to measure the distances between data points. In this thesis paper, a comparison between Euclidean distance function and Manhattan distance function by using K-mediods has been made. To make this comparison, an instance of seven objects of a data set has been taken. Finally, we will show the simulation results in the result section of this paper. Keywords-- Clustering, K-mediods, Manhattan distance function, Euclidean distance function. I. INTRODUCTION Unsupervised learning works on a given set of records (e.g. observations or variables) with no attribute and organize them into groups, without advance knowledge of the definitions of the groups [1]. Clustering is one of the most important unsupervised learning techniques. Clustering, also known as cluster analysis), aims to organize a collection of data items into clusters, such that items within a cluster are more “similar” to each other than they are to items in the other clusters [2]. Clustering methods can be divided into two basic types: hierarchical and partition clustering [3]. There are many partition-based algorithms such as K-Means, K-Mediods and Fuzzy C-Means clustering etc. The k-means method uses centroid to represent the cluster and it is sensitive to outliers. This means, a data object with an extremely large value may disrupt the distribution of data. K-medoids method overcomes this problem by using medoids to represent the cluster rather than centroid. A medoid is the most centrally located data object in a cluster [4]. II. THE REASON BEHIND CHOOSING K- MEDIODS ALGORITHM 1. K-medoid is more flexible First of all, k-medoids can be used with any similarity measure. K-means however, may fail to converge - it really must only be used with distances that are consistent with the mean. So e.g. Absolute Pearson Correlation must not be used with k-means, but it works well with k-medoids. 2. Robustness of medoid Secondly, the medoid as used by k-medoids is roughly comparable to the median. It is a more robust estimate of a representative point than the mean as used in k- means. III. K-MEDOIDS ALGORITHM (PAM- PARTITIONING AROUND MEDOIDS) Algorithm [4, 6] Input K: the number of clusters D: a data set containing n objects Output: A set of k clusters. Method 1. Compute distance (cost) so as to associate each data point to its nearest medoid using Manhattan distance and/or Euclidean distance. 2. for each medoid m 1. for each non-medoid data point o 1. Swap m and o and compute the total cost of the configuration 3. Select the configuration with the lowest cost. 4. Repeat steps 1 to 3 until there is no change in the medoid. International Journal of Computer Science and Information Security (IJCSIS), Vol. 13, No. 10, October 2015 61 https://sites.google.com/site/ijcsis/ ISSN 1947-5500