Journey on Image Clustering Based on Color Composition Achmad Nizar Hidayanto*, Elisabeth Martha Koeanan* Abstract— Image clustering is a process of grouping images based on their similarity. The image clustering usually uses the color component, texture, edge, shape, or mixture of two components, etc. This research aims to explore image clustering using color composition. In order to complete this image clustering, three main components should be considered, which are color space, image representation (feature extraction), and clustering method itself. We aim to explore which composition of these factors will produce the best clustering results by combining various techniques from the three components. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use K-Means and Agglomerative Hierarchical Clustering algorithm. The results of the experiment show that GMM representation is better combined with RGB and L*a*b* color space, whereas Histogram is better combined with HSV. The experiments also show that K-Means is better than Agglomerative Hierarchical for images clustering. Keywords—Image clustering, feature extraction, RGB, HSV, L*a*b*, Gaussian Mixture Model (GMM), histogram, Agglomerative Hierarchical Clustering (AHC), K-Means, Expectation-Maximization (EM). I. INTRODUCTION ndonesia is a rich country in cultural heritages. One of them is Batik cloth which has various patterns and colors. As part of the cultural preservation, the need for creating a repository that becomes a reference collection of Batik is increasing. The repository requires functions such as image retrieval that can help users to automatically search particular cloth in the repository. However, retrieving images from a repository is quite time consuming as system should process a large of image data. In order to improve the efficiency and give better semantic to the image, some researchers such as Chen [1], Liu [2], Guan [3], Kim [4], Park [5], Liu [6], Fakouri [7] apply clustering algorithm for managing images before they can be retrieved. Image clustering is a process of grouping images based on their similarity. By clustering image, the retrieval process does not need to examine images one by one to match with the user query. The system just needs to compare user query with the centroid of the clusters, then returns all images belong to the matched cluster. This research aims to explore some components of image clustering methods in order to get the best component that will improve the quality of image clustering results. Image clustering based on image content usually uses the color composition, texture, edge, shape, or mixture of two components, etc. This research focuses on image clustering by using color component as previous research result by [8] shows that the use of color composition produces the best result in Batik retrieval. *Faculty of Computer Science, University of Indonesia Three main components regarding the image clustering we considered in this research are color space, image representation (feature extraction), and clustering method. The color spaces use RGB, HSV, and L*a*b* method. The image representations use Histogram and Gaussian Mixture Model (GMM), whereas the clustering methods use K- Means and Agglomerative Hierarchical Clustering algorithm. We expect that through this research we have the best combination of methods of each component that will produce the best clustering results. Our contributions in this research are twofold: • First, we compare the image clustering algorithms based on color composition in comprehensive manner by regarding influenced components which are color space, image representation and clustering methods. • Second, we evaluate the best combination that produces the best image clustering results In the next section we present a rapid overview of the backgrounds and related works. In section 3, 4 and 5, we present the theoretical foundations of this research which are color space, image representation and clustering methods respectively. In section 6, we give scenario of our experiments and present their result analysis. Finally we summarize our contribution and outline future work in section 7. Figure 1. Ilustration of Image Clustering I