ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011 17 © 2011 ACEEE DOI: 01.IJSIP.02.01.195 Feature Selection using Stepwise ANOVA Discriminant Analysis for Mammogram Mass Classification B.Surendiran 1 , A.Vadivel 2 1 surendiran@gmail.com, 2 vadi@nitt.edu Abstract—In this paper, a feature selection method using stepwise Analysis Of Variance (ANOVA) Discriminant Analysis (DA) is used for classifying mammogram masses. This approach combines the 17 shape and margin properties of the mass regions and classifies the masses as benign or malignant using ANOVA DA. ANOVA DA provides wilk’s lambda statistics for each feature and its level of significance. In ANOVA DA the discriminating power of each feature is estimated based on grouping class variable. Principal component analysis (PCA) does feature extraction but it doesn’t consider the grouping class variable. The experiment is performed on 300 DDSM database mammogram images. The stepwise ANOVA DA and PCA dimension reduction methods are used to reduce the number of features used. The feature selection using stepwise ANOVA DA is better as it analyses the data according to grouping class variable. Stepwise ANOVA DA based feature selection gives reduced feature set, with high classification accuracy. Keywords—Discriminant analysis, Digital Mammogram, Shape and margin properties, Classifying Mass as Benign or Malignant, Stepwise ANOVA, PCA I. INTRODUCTION The breast cancer is the leading cause of death in female population. Every 3 minutes, a woman is diagnosed with breast cancer, and in every 13 minutes a woman dies from breast cancer [1]. Mammography is one of the best known technique for early breast cancer detection. Breast cancer death rates have been dropping steadily since 1995 due to earlier detection and increased use of mammography [1]. Computer Aided Detection (CAD) systems have been developed to aid radiologists in diagnosing cancer from digital mammograms and improves breast cancer diagnostic accuracy rate by 14.2% [2]. In breast, malignant and benign are abnormal growth of tumor cells. While malignant are considered as cancerous tumors, the benign are non-cancerous. According to Breast Imaging Reporting and Data System (BIRADS) the masses are characterized by shape, size, margins (borders) and density [3]. Benign masses are round and oval in shape and have smooth, circumscribed margins. The malignant masses have irregular shape and ill-defined, microlobulated or spiculated margins. It has been observed that shape and margin characteristics can be effectively used for classifying the masses either as benign or malignant. Based on shape and margin properties, some of the known approaches which classify the abnormalities based on BI- RADS system have been giving accurate results [4, 5]. Thus, in this paper, mass shape and margin properties are given high importance. These simple and yet effective geometric shape and margin properties visualize the masses as the way radiologists visualize the mammograms. Researchers had proposed various features for classifying masses in mammograms. The statistical features like uniformity, smoothness, third moments etc which utilize gray value or histogram of masses are used for classifying the masses [6]. However the gray values of mammogram tend to change, due to over-enhancement or in presence of noise. Most of the existing works have been concentrated on classifying the mass as normal or abnormal using shape features [7, 8]. But, most of previous approaches which classify the mass as benign or malignant are not able to get very good classification rate. In [9], a complex Bayesian Neural Networks classifier with 5 statistical measures has been used to classify the masses. The test has been carried out with small dataset containing only 17 sample mammograms and have achieved maximum of 81% accuracy. In this paper, 17 shape and margin properties are used for classifying the mass either as benign or malignant. It has been observed that not all the properties are equally important. The dimension or number of features can be reduced, which simplifies the classification. Dimension reduction techniques are feature selection and feature extraction. PCA is the commonly used feature extraction method in the literature [10-12]. The main disadvantage of the PCA method is does not consider the grouping class variable. A better feature selection method using stepwise ANOVA discriminant analysis is compared with PCA. The main advantage of ANOVA based feature selection is that, ANOVA estimates wilk’s lambda statistic based on the grouping class variable. It performs essential feature selection rather feature extraction without much loss in classification accuracy. The stepwise ANOVA DA is found to be giving good results compared to PCA. This paper is organized as follows. In Section 2, we present feature extraction using shape properties. Next in Section 3, we discuss about ANOVA discriminant analysis classification method. In section 4, we present the experimental results using PCA and stepwise ANOVA DA feature selection method. In section 5, we conclude the paper. II. MASS SHAPE AND MARGIN FEATURE EXTRACTION A. Mass Shape Characteristics According to BIRADS system, the shapes of the masses are characterized as round, oval, lobular, and irregular. Similarly, margin of the masses are characterized as circumscribed, obscured, micro-lobulated, and spiculated margins. Benign masses have round, oval and lobular Multimedia Information Retrieval Group, National Institute of Technology, Tiruchirappalli, India