ACEEE Int. J. on Signal & Image Processing, Vol. 02, No. 01, Jan 2011
17
© 2011 ACEEE
DOI: 01.IJSIP.02.01.195
Feature Selection using Stepwise ANOVA
Discriminant Analysis for Mammogram Mass
Classification
B.Surendiran
1
, A.Vadivel
2
1
surendiran@gmail.com,
2
vadi@nitt.edu
Abstract—In this paper, a feature selection method using
stepwise Analysis Of Variance (ANOVA) Discriminant
Analysis (DA) is used for classifying mammogram masses.
This approach combines the 17 shape and margin properties
of the mass regions and classifies the masses as benign or
malignant using ANOVA DA. ANOVA DA provides wilk’s
lambda statistics for each feature and its level of significance.
In ANOVA DA the discriminating power of each feature is
estimated based on grouping class variable. Principal
component analysis (PCA) does feature extraction but it
doesn’t consider the grouping class variable. The experiment
is performed on 300 DDSM database mammogram images.
The stepwise ANOVA DA and PCA dimension reduction
methods are used to reduce the number of features used. The
feature selection using stepwise ANOVA DA is better as it
analyses the data according to grouping class variable.
Stepwise ANOVA DA based feature selection gives reduced
feature set, with high classification accuracy.
Keywords—Discriminant analysis, Digital Mammogram,
Shape and margin properties, Classifying Mass as Benign or
Malignant, Stepwise ANOVA, PCA
I. INTRODUCTION
The breast cancer is the leading cause of death in female
population. Every 3 minutes, a woman is diagnosed with
breast cancer, and in every 13 minutes a woman dies from
breast cancer [1]. Mammography is one of the best known
technique for early breast cancer detection. Breast cancer
death rates have been dropping steadily since 1995 due to
earlier detection and increased use of mammography [1].
Computer Aided Detection (CAD) systems have been
developed to aid radiologists in diagnosing cancer from
digital mammograms and improves breast cancer diagnostic
accuracy rate by 14.2% [2].
In breast, malignant and benign are abnormal growth of
tumor cells. While malignant are considered as cancerous
tumors, the benign are non-cancerous. According to Breast
Imaging Reporting and Data System (BIRADS) the masses
are characterized by shape, size, margins (borders) and
density [3]. Benign masses are round and oval in shape and
have smooth, circumscribed margins. The malignant masses
have irregular shape and ill-defined, microlobulated or
spiculated margins. It has been observed that shape and
margin characteristics can be effectively used for
classifying the masses either as benign or malignant. Based
on shape and margin properties, some of the known
approaches which classify the abnormalities based on BI-
RADS system have been giving accurate results [4, 5].
Thus, in this paper, mass shape and margin properties are
given high importance. These simple and yet effective
geometric shape and margin properties visualize the masses
as the way radiologists visualize the mammograms.
Researchers had proposed various features for
classifying masses in mammograms. The statistical features
like uniformity, smoothness, third moments etc which
utilize gray value or histogram of masses are used for
classifying the masses [6]. However the gray values of
mammogram tend to change, due to over-enhancement or in
presence of noise. Most of the existing works have been
concentrated on classifying the mass as normal or abnormal
using shape features [7, 8]. But, most of previous
approaches which classify the mass as benign or malignant
are not able to get very good classification rate. In [9], a
complex Bayesian Neural Networks classifier with 5
statistical measures has been used to classify the masses.
The test has been carried out with small dataset containing
only 17 sample mammograms and have achieved maximum
of 81% accuracy.
In this paper, 17 shape and margin properties are used
for classifying the mass either as benign or malignant. It has
been observed that not all the properties are equally
important. The dimension or number of features can be
reduced, which simplifies the classification. Dimension
reduction techniques are feature selection and feature
extraction. PCA is the commonly used feature extraction
method in the literature [10-12]. The main disadvantage of
the PCA method is does not consider the grouping class
variable. A better feature selection method using stepwise
ANOVA discriminant analysis is compared with PCA. The
main advantage of ANOVA based feature selection is that,
ANOVA estimates wilk’s lambda statistic based on the
grouping class variable. It performs essential feature
selection rather feature extraction without much loss in
classification accuracy. The stepwise ANOVA DA is found
to be giving good results compared to PCA. This paper is
organized as follows. In Section 2, we present feature
extraction using shape properties. Next in Section 3, we
discuss about ANOVA discriminant analysis classification
method. In section 4, we present the experimental results
using PCA and stepwise ANOVA DA feature selection
method. In section 5, we conclude the paper.
II. MASS SHAPE AND MARGIN FEATURE EXTRACTION
A. Mass Shape Characteristics
According to BIRADS system, the shapes of the masses
are characterized as round, oval, lobular, and irregular.
Similarly, margin of the masses are characterized as
circumscribed, obscured, micro-lobulated, and spiculated
margins. Benign masses have round, oval and lobular
Multimedia Information Retrieval Group, National Institute of Technology, Tiruchirappalli, India