758 IEEE SENSORS JOURNAL, VOL. 14, NO. 3, MARCH 2014
Dealing With Redundant Features and Inconsistent
Training Data in Electronic Nose:
A Rough Set Based Approach
Anil Kumar Bag, Bipan Tudu, Nabarun Bhattacharyya, and Rajib Bandyopadhyay
Abstract—In many applications of electronic nose, the instru-
ment is trained with data generated by human experts prior to
its deployment in the fields. Quite often, these data are conflicting
and inaccurate and thus the performance of an electronic nose is
degraded. Moreover, degradation of its performance may also be
due to the presence of redundant features or sensors in the array.
While deploying an electronic nose for a specific application, it
is observed that some of the sensors may not be required and
only a subset of the sensor array contributes to the decision,
which implies that optimization of the sensor array is also very
important. To obtain a consistent and precise data set, both the
conflicting data and irrelevant features must be removed. The
rough set theory is capable of dealing with such an imprecise,
inconsistent data set and in this paper, the rough-set based
algorithm has been applied to remove the conflicting training
patterns and optimize the sensor array in an electronic nose
instrument used for sensing aroma of black tea samples.
Index Terms— Black tea, electronic nose, feature selection,
reduct, rough set, sensor array.
I. I NTRODUCTION
A
N ELECTRONIC nose [1] comprises of a sensor array
with its associated electronic circuitry, an odour delivery
system and a pattern recognition unit. The knowledge base
in the pattern recognition unit consists of feature information
obtained from electrical or other types of sensual responses
produced by the sensors in the array. These sensual responses
in terms of numerical data pattern contain the signature, which
is related to some features of the exposed substance. The
sensors in any application specific instrument like electronic
nose, electronic tongue should be sufficiently reliable, robust,
selective and reversible to guarantee satisfactory classification.
Unfortunately, limitations in data collection, high complexity
of sensory inputs, transient effects, and equipment failure
restrict the classifier to be trained by the data to have
the desired characteristics in a statistically sufficient way.
Manuscript received June 6, 2013; revised September 25, 2013; accepted
October 8, 2013. Date of publication October 17, 2013; date of current version
January 10, 2014. This work was supported in part by the Department of
Science and Technology and in part by the National Tea Research Foundation,
Tea Board, Government of India. The associate editor coordinating the review
of this paper and approving it for publication was Dr. Ashish Pandharipande.
A. K. Bag is with the Department of Applied Electronics and Instrumenta-
tion Engineering, Future Institute of Engineering and Management, Calcutta
700150, India (e-mail: anilkumarbag@gmail.com).
B. Tudu and R. Bandyopadhyay are with the Department of Instrumentation
and Electronics Engineering, Jadavpur University, Calcutta 700098, India
(e-mail: bt@iee.jusl.ac.in; rb@iee.jusl.ac.in ).
N. Bhattacharyya is with the Centre for Development of Advanced Com-
puting, Calcutta 700091, India (e-mail: nabarun.bhattacharya@cdac.in).
Digital Object Identifier 10.1109/JSEN.2013.2286110
In many occasions, there may be some redundant features in
the training patterns. In addition, the electronic nose while
being deployed for a particular application is trained by the
data given by the human experts and these patterns often have
conflicting data due to human error. For example, when the
electronic nose is used for evaluation of tea quality, the quality
scores assigned by the human tea tasters are the target patterns
in the training data set. The quality scores are purely subjective
in nature and depend upon the mood and professional acumen
of the tea taster. Thus, the training data patterns may contain
a number of irrelevant, redundant features and some decision
conflicting data patterns leading to inconsistency of represen-
tation of the information. Such a data set not only increases
time complexity, but also degrades classification accuracy. As
the effective information for classification often lies within
a lower dimensional feature space, the feature extraction or
dimensionality reduction has proven to be a crucial step in
all analytical methods or applications [2], [3]. The aim of
this work is to develop a strategy based on rough set theory
that addresses discovery of relevant features or attributes and
filtering in presence of conflicting data.
Rough set [4]–[6] theory (RST) was proposed by Z. Pawlak
in the early 1980s and has received more attention in the
domain of artificial intelligence and cognitive sciences, espe-
cially in the spheres of machine learning, knowledge acquisi-
tion, knowledge discovery from databases, decision analysis,
expert systems, inductive reasoning, data mining [7], [8] and
pattern recognition. It also enables creation of classification
rules from large datasets and has successfully been applied
in different fields like medical diagnosis [9], [10], stock
market prediction [11], insurance market analysis [12], etc.
In addition, rough set based feature selection has been used
on the QSAR (Quantitative structure activity relationship) data
set along with support vector machine as the classifier [13].
Another important feature of RST is attribute reduction
[14]–[17]. The idea behind the rough set theory is the
observation that the presence of uncertainty and impreci-
sion in knowledge base induces vague decision and vague-
ness may be caused by granularity of representation in the
information. Knowledge representation in rough set theory
is carried out via information system in a tabular form of
OBJECT → ATTRIBUTE VALUE relationship. The
tight granularity of representation of information in informa-
tion system insists similar objects to be in each equivalent
class, which leads to a consistent rule base. Thus, it is
important to filter data in the knowledge base in order to
1530-437X © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.