Block truncation coding with color clumps: A novel feature extraction technique for content based image classification SUDEEP THEPADE 1 , RIK DAS 2, * and SAURAV GHOSH 3 1 Department of Information Technology, Pimpri Chinchwad College of Engineering, Pune 411044, India 2 Department of Information Technology, Xavier Institute of Social Service, Ranchi, Jharkhand 834001, India 3 A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata 700 009, India e-mail: sudeepthepade@gmail.com; rikdas78@gmail.com; sauravghoshcu@gmail.com MS received 15 June 2014; revised 5 January 2016; accepted 21 March 2016 Abstract. The paper has explored principle of block truncation coding (BTC) as a means to perform feature extraction for content based image classification. A variation of block truncation coding, named BTC with color clumps has been implemented in this work to generate feature vectors. Classification performance with the proposed technique of feature extraction has been compared to existing techniques. Two widely used public dataset named Wang dataset and Caltech dataset have been used for analyses and comparisons of classification performances based on four different metrics. The study has established BTC with color clumps as an effective alternative for feature extraction compared to existing methods. The experiments were carried out in RGB color space. Two different categories of classifiers viz. K Nearest Neighbor (KNN) Classifier and RIDOR Classifier were used to measure the classification performances. A paired t test was conducted to establish the statistical significance of the findings. Evaluation of classifier algorithms were done in receiver operating characteristic (ROC) space. Keywords. Classification; retrieval; color clumps; threshold; RGB; t test. 1. Introduction Massive amount of image data generated everyday has been stored and maintained in digital form by the image data- bases. These huge datasets may be explored to find the correlations among them and to employ such correlations to create limited image categories inside the image database [1]. A crisp digest of the image content can be provided by generation of image groups to be used for effective means of image database management [2]. Thus, designing an efficient algorithm for extraction of feature vectors from the images has been the fundamental necessity for image classification. A new BTC based feature extraction algo- rithm named BTC with Color Clumps for content based image classification has been proposed in this work. This paper has compared the classification performances of existing feature extraction techniques with respect to the proposed one in two different classifier environments. A paired t test was conducted to establish the statistical sig- nificance of the results. The quantitative comparison has established the superiority of the proposed technique. The objective of the paper is to introduce a novel feature extraction technique. The novel technique has considerably decreased the feature vector size and has made it independent of image dimension. The proposed method has reduced the average time of feature extraction from each image in the dataset. The classification results have statistical significance of improved performance. 2. Related work Primary step for classification involved feature extraction from image data out of the assorted mix of images. Threshold selection has been considered as an efficient tool for feature extraction from images after binarization. Selection of threshold has been imperative to differentiate the background of an image from its foreground. A number of factors including ambient illumination, variance of gray levels within the object and the background, inadequate contrast, etc. can influence threshold selection process [35]. At the outset, threshold selection can be carried out with three different techniques namely, mean threshold selection, local threshold selection and global threshold selection. Image classification has been executed earlier by considering mean threshold selection and using block *For correspondence 939 Sa ¯dhana ¯ Vol. 41, No. 9, September 2016, pp. 939–958 Ó Indian Academy of Sciences DOI 10.1007/s12046-016-0535-2