Indonesian Journal of Electrical Engineering and Computer Science
Vol. 3, No. 3, September 2016, pp. 546 ~ 553
DOI: 10.11591/ijeecs.v3.i2.pp546-553 546
Received April 2, 2016; Revised July 25, 2016; Accepted August 10, 2016
Mining Association Rules: A Case Study on Benchmark
Dense Data
Mustafa Man
1
, Wan Aezwani Wan Abu Bakar
2
, Zailani Abdullah
3
, Masila Abd Jalil
4
,
Tutut Herawan
5
1,2,4
School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu,
21030 Kuala Terengganu, Terengganu, Malaysia
3
Faculty of Entepreneur and Business, Universiti Malaysia Kelantan,
16100 Kota Bharu, Kelantan, Malaysia
5
Department of Information Systems, Faculty of Computer Science and Information Technology,
University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia
Corresponding author, e-mail: mustafaman@umt.edu.my, beny2194@yahoo.com, zailania@umk.edu.my,
masita@umt.edu.my, tutut@um.edu.my
Abstract
Data mining is the process of discovering knowledge and previously unknown pattern from large
amount of data. The association rule mining (ARM) has been in trend where a new pattern analysis can be
discovered to project for an important prediction about any issues. Since the first introduction of frequent
itemset mining, it has received a major attention among researchers and various efficient and
sophisticated algorithms have been proposed to do frequent itemset mining. Among the best-known
algorithms are Apriori and FP-Growth. In this paper, we explore these algorithms and comparing their
results in generating association rules based on benchmark dense datasets. The datasets are taken from
frequent itemset mining data repository. The two algorithms are implemented in Rapid Miner 5.3.007 and
the performance results are shown as comparison. FP-Growth is found to be better algorithm when
encountering the support-confidence framework.
Keywords: data mining, association rule mining (ARM), frequent pattern mining (FPM), rapid miner,
apriori, fp growth
Copyright © 2016Institute of Advanced Engineering and Science. All rights reserved.
1. Introduction
Data mining is the research area where the huge dataset in database and data
repository are scoured and mined to find novel and useful pattern. Association analysis is one of
the four (4) core data mining tasks besides cluster analysis, predictive modeling and anomaly
detection [1]. The task of Association Rule Mining (ARM) is to discover if there exist the frequent
itemset or pattern in database and if any, an interesting relationships between these frequent
itemsets can reveal a new pattern analysis for the next step of decision making.
Finding frequent itemsets or patterns (as shown in Figure 1) is a big challenge and has
a strong and long-standing tradition in data mining. It is a fundamental part of many data mining
applications including market basket analysis, web link analysis, genome analysis and
molecular fragment mining [2]. The idea of mining association rule originates from the analysis
of market basket data [3]. Example of simple rule is “A customer who buys bread and butter will
also tend to buy milk with probability s% and c%”. The applicability of such rule to business
problems makes the association rule to become a popular mining method.
The ARM that relates to frequent pattern is called Frequent Pattern Mining (FPM). The
state-of-the-art algorithms in FPM are based upon horizontal data format and vertical data
format. Most of previous frequent mining techniques are dealing with horizontal format of their
data repositories but suffer from the requirement of many database scans. However, current
and emerging trend exists where some of the research works are focusing on dealing with
vertical data format and the rule mining results are quite promising. Apriori [3, 4] that relies on
horizontal format and FP-Growth [5] that relies on vertical format are among the best-known
algorithms in FPM. Neither horizontal nor vertical data format, both are still suffering from the
huge memory consumption [3-5] with higher datasets.