Available Online at www.ijcsmc.com
International Journal of Computer Science and Mobile Computing
A Monthly Journal of Computer Science and Information Technology
ISSN 2320–088X
IJCSMC, Vol. 2, Issue. 5, May 2013, pg.169 – 174
RESEARCH ARTICLE
© 2013, IJCSMC All Rights Reserved 169
Improved Data Reduction Technique in
Data Mining
Pritesh Vora
1
, Bhavesh Oza
2
1
Information Technology, Gujarat Technological University, L.D. College of Engineering, Ahmedabad, India
2
Computer Engineering Department, Gujarat Technological University, L.D. College of Engineering,
Ahmedabad, India
1
pritesh2212@gmail.com;
2
bhavesh_oza_2001@yahoo.com
Abstract— In Data Mining, Data reduction is important issue now a day. Due to huge size of data but
maximum of them is irrelevant to objective or some of the data is redundant, which leads to more processing
power consumptions and wrong result generation. Data Reduction implies reducing the data but without
compromising integrity of it. Decision Tree, attribute subset selections, clustering, data cube aggregation is
different techniques basically used for data reduction. Decision tree is a highly effective structure which is
gives the possible outcome. In a decision tree in which each branch node represent a choice between
alternatives and each node represent the decision or classification. Here we see the generalize algorithm and
apply the decision tree technique for reliable outcomes.
Key Terms: - Data mining; Decision tree; Data reduction
I. INTRODUCTION
Today, the development of the computer technology and the degree of the informationization is getting higher
and higher, so the people know that the data are needed by them is mass data on the present world. Data mining
is the process of extracting important information and knowledge from the large database (mass data)[1].In
these data, information and knowledge are implicit, which people do not know in advance, but potentially useful.
At present, the decision tree has important data mining method. Decision tree is commonly used in decision
analysis in data mining and machine learning to create knowledge structures that guide the decision making
process. Accessing a large amount of data in database which is time consuming process and maintaining large
amount of data, is very difficult. In database there are many irrelevant data, noisy data and also duplicate data.
Now pre-processing on all this data increase the quality or make the data more feasible to operate. In a database
there are many data duplication, irrelevant data and noisy data so to remove them data reduction techniques
must be applied.
II. DECISION TREE AND ID3
Decision tree provide the highly effective structure which can give the idea about possible outcomes. In
decision tree is a tree in which each branch node represent a choice between a number of alternatives, and each
leaf node represents a classification or decision. Every decision tree begins with what is termed a root node,
considered to be the parent of every other node. Each node in the tree evaluates an attribute in the data and
determines which path it should follow [1]. Typically, the decision test is based on comparing a value against
some constant. Classification using a decision tree is performed by routing from the root node until arriving at a
leaf node. In more generalize definition of the decision tree written in stepwise form: