International Journal of Computer Applications (0975 – 8887) Volume 57– No.15, November 2012 38 Content based Color Image Clustering Manish Maheshwari Mahesh Motwani, PhD. Rajiv Gandhi Technical University, Bhopal, Madhya Pradesh, India Sanjay Silakari, PhD. ABSTRACT Never before in history has image data been generated at such high volumes as it is today. If images are analyzed properly, they can reveal useful information to the users. Image mining deals with the extraction of implicit knowledge, image data relationship, or other patterns not explicitly stored in the images. Image clustering involves the extraction of features from image databases and then application of data mining algorithm to group images. In this paper a data mining approach to cluster the images using color and texture features are proposed. Three techniques are proposed to extract Color feature, using Color Moments, Block Truncation Coding algorithm and histogram method. To extract texture feature concept of Gray Level Co-occurrence Matrix is extended and applied to color images. K-means clustering algorithm is applied to groups the images. Keywords Image Retrieval, Histogram, Color Moments, Gray Level Co- occurrence Matrix, K-Means 1. INTRODUCTION The speedy progress in information technology for multimedia system has led to a rapid increase in the use of digital images. A lot of information is available in this data collection that is potentially useful in a variety of applications like crime prevention, military, home entertainment, education, cultural heritage, medical diagnosis, and World Wide Web [1, 2]. How to make use of this information effectively and efficiently is the major challenge. Exploring and analyzing the enormous volume of image data is becoming difficult. The image database containing raw image data cannot be directly used for retrieval. Raw image data need to be processed and descriptions based on the properties that are inherent in the images themselves are generated. Color and texture are two very important attributes used in image analyses. First order image properties can be successfully determined using color information. Texture generally describes a second order property of surfaces and scenes measured over image intensities. These inherited properties of the images stored in feature database which is used for retrieval and grouping. Clustering is a method of grouping data objects into different groups, such that similar data objects belong to the same group and dissimilar data objects to different clusters [3,4]. Image clustering consists of two steps the former is feature extraction and the second part is grouping. For each image in a database, a feature vector capturing certain essential properties of the image is computed and stored in a feature base. Clustering algorithm is applied over this extracted feature to form the group. In this paper a data mining approach to cluster the images based on color and texture feature is proposed. To extract Color feature three techniques are used separately. Color Moments are used to calculate mean, standard deviation and skewness. A new approach Block Truncation Image Mining (BTIM) is proposed using the concept of Block Truncation Coding to extract color feature. A new histogram quantization algorithm, Histogram Image Mining (HIM) is proposed to calculate 54 color histogram. Gray Level Co-occurrence Matrix is a texture analysis technique has been defined for grayscale images. To extract texture feature we propose a simple extension of these techniques to color images refereed as Co-occurrence Matrix Image Mining (CMIM). The Co- occurrence Matrix of each component of RGB image is calculated and features are extracted. K-means clustering algorithm is applied over these extracted features to form groups of these images. 2. IMAGE RETRIEVAL In image retrieval feature extraction is the process of interacting with images and performs extraction of meaningful information of images. The measurements or properties used to classify the objects are called Features, and the types or categories into which they are classified are called classes. Low-level visual features such as color, texture and shape often employed to search relevant images based on the query image. A n-dimensional feature vector represent an image where n is the selected number of extracted features. Color information is the most widely used features for image retrieval because of its strong correlation with the underlying image objects. A commonly used one is the RGB space because most digital images are acquired and represented in this space However, due to the fact that RGB space is not perceptually uniform, color space such as HSV (Hue, Saturation, and Value), HSL (Hue, Saturation and Luminance), CIE L*u*v* and CIE L*a*b* tend to be more appropriate for calculating color similarities. Color Histogram [1] [5] [6] is the commonly and very popular color feature used in many image retrieval systems. The mathematical foundation and color distribution of images can be characterized by color moments [7]. Color Coherence Vectors (CCV) have been proposed to incorporate spatial information into a color histogram representation [8]. 3. HISTOGRAM The brightness histogram h f (z) of an image provides the frequency of the brightness value z in the image- the histogram of an image with L gray-levels are represented by a one dimensional array with L elements. The histogram usually provides the global information about the image. It is invariant to translation and rotation around the viewing axis and varies slowly with changes of view angle, and scale. However the huge number of colors involved in high resolution images induces prohibitive computation costs