A Bag of Words Based Approach for Classiﬁcation of HEp-2 Cell Images Shahab Ensaﬁ *† , Shijian Lu † , Ashraf A. Kassim * and Chew Lim Tan ‡ * Electrical & Computer Engineering, National University of Singapore, Email: {shahab.ensaﬁ, ashraf}@nus.edu.sg † Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Email: {stuse,slu}@i2r.a-star.edu.sg ‡ School of Computing National University of Singapore, Email: tancl@comp.nus.edu.sg Abstract—In this work we present an automatic HEp- 2 cell image classiﬁcation technique that exploits different spatial scaled image representation and sparse coding of SIFT and SURF features. The proposed method is applied on the ICIP2013 dataset in the I3A workshop, which is held in ICPR 2014 conference. Experiments are designed to capture the accuracies on training set with cross validation method. Additionally, the prior information on positive and intensity levels of cells are used to boost the overall performance. Finally, different number of iterations on learning the dic- tionary is studied to ﬁnd the optimum one. I. I NTRODUCTION Diagnosis of Autoimmune Diseases (AD) that affect neuromuscular system, hepatobiliary system, vasculitic syndromes etc. plays an important role in AD treatment. With an increasing number of the AD occurrence, an automated Computer Aided Diagnosis (CAD) system is required for the beneﬁts of lower cost, faster diagnosis, and better diagnosis repeatability. Indirect Immunoﬂuorescence (IIF) is a technique for diagnosing several autoimmune diseases, where antibodies are ﬁrst stained in a tissue and then bound to a ﬂuorescent chemical compound. In case of antinuclear antibodies (ANAs), the antibodies bind to the nucleus and demonstrate different visual patterns that can be captured and visualized within microscope images. In this regard, IIF is applied to Human Epithelial Cells type 2 (HEp-2 cells) where the presence of AD can be determined by classifying the captured visual cell patterns. A number of HEp-2 cell classiﬁcation techniques have been reported in recent years. Nosaka et al. [1] the winner of the ICPR 2012 cell classiﬁcation contest, made use of an extension of local binary patterns (LBP) for feature selection and linear support vector machines (SVM) for cell classiﬁcation [2]. Faraki et al. [3] exploited the ﬁsher tensors on the Riemannian manifold, which produces the same sized covariance matrix for all the regions in the image. They learned a Bag of Words (BoW) dictionary using k-means algorithm and used SVM for classiﬁcation. Shen et al. [4] followed the BoW method by pooling the gradient features based on the intensity orders of local grid points. Additionally, Wiliem et al. [5] used the Cell Pyramid Matching (CPM) method, which is composed of regional histograms of visual words coupled with the Multiple Kernel Learning framework. This paper presents an accurate and efﬁcient HEp-2 cell classiﬁcation system that can be exploited for computer- aided AD diagnosis. The proposed system has a number of novel contributions. First, speeded up robust features (SURF), which captures different visual characteristics but was rarely used for cell classiﬁcation, is introduced and integrated with the widely used SIFT features to help improve the cell classiﬁcation accuracy greatly. Second, the dictionary learning is investigated for sparse coding of visual features. Our study shows that the iteration of the dictionary learning is closely correlated with the cell classiﬁcation accuracy: a certain number of iterations gives the best accuracy which cannot be either too large or too small. Third, a multi-scale sparse coding scheme is implemented to exploits the sparse nature of the cell image data and a max pooling strategy is adopted to handle the loose spatial information. Both strategies help lower the reconstruction error and reduce the computational costs signiﬁcantly. Experiments on a publicly available dataset show that the proposed system improves the cell classiﬁcation accuracy greatly. II. METHOD As illustrates in Fig. 1, the method has two stages of training and testing. In the training stage, the masked cell images are used to extract the grid SIFT and SURF features. Then by getting samples from each image, the dictionary learning procedure is started. In this stage, the sparse coding method is used to capture the sparse prop- erty of patch based feature extraction. Then by applying the Max-pooling procedure on scaled sparse codes and vectorizing the histograms, the ﬁnal feature vectors of each image is captured. Then a multiclass linear SVM classiﬁer is trained to be used in testing stage as well. The proposed method is similar to the Ensaﬁ et al. [6] one with modifying the feature extraction stage, where the grid SURF features were added. A. Feature Extraction We use the SIFT [7] and SURF [8] features to capture the appearance characteristics of the different types of HEp-2 cells. Different from the SIFT that builds on image pyramids, SURF uses Hessian matrix instead of image down-sampling and smoothing for feature detection. As a result, SURF demonstrated better performance in the presence of illumination changes whereas SIFT performs better in the presence of image rotation and blur [9]. The two types of features therefore complement each other and the combination of them produces features with better representativeness and discriminability.