Volume 2, No. 08, October 2013 ISSN – 2278-1080
The International Journal of Computer Science & Applications (TIJCSA)
RESEARCH PAPER
Available Online at http://www.journalofcomputerscience.com/
© 2013, http://www.journalofcomputerscience.com - TIJCSA All Rights Reserved 12
A Hybrid Approach Using I
2
and MC4.5 Algorithms
for Mining Multidimensional Data Sets (HIIMC4.5)
S.Santhosh kumar
Research Scholar, PRIST University, Thanjavur
Lecturer, Department of Computer Science
Government College for Women (A)
Kumbakonam, Tamil Nadu, India
Santhoshsundar@yahoo.com
Dr.E.Ramaraj
Director, Computer Center
Alagappa University
Karaikudi
India
eramaraj@rediffmail.com
Abstract
This paper presents a combinational approach of clustering and classification called semi-supervised learning
approaches. In this work we developed a hybrid model by combining our earlier contributions of two
algorithms. For large data bases, before searching a data, primary categorisation is needed to mine the data
efficiently. The proposed hybrid model is compared with the existing hybrid model, which is also our earlier
work to identify the closest data patterns in the large data bases. The new hybrid technique enables to improve
the limitations of existing hybrid model. The proposed model is compared with established hybrid model. The
implementation with different data sets, results accurate classification prediction with less error rate.
Key Terms: — C4.5 Classifier, k-means, M C4.5, I
2
Clustering.
1. Introduction
Semi- supervised learning (SSL) [1], is a type of machine learning technique handles both
labelled and unlabeled simultaneously. It is an emerging field of data mining become
popularised since 2005. The primary advantage of use of SSL is its cost effectiveness. The
processing of labelled data required knowledge, skill, technique and which is expensive. The
SSL allows extracting the unlabelled data with small amount of labelled data. For large
databases such as banking, medical are contains huge amounts of data. It is more expensive
and also more complex to extract the particular (labelled) data, whereas acquisition of
unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be
of great practical value. There are many SSL models developed based on data types. The
Generative model is one of familiar model which combines classification and clustering
techniques based on joint distribution of data. Based on combinational approach, we have
taken our proposed hybrid model, which is a combination of k-means algorithm and C4.5