International Journal of Electrical and Computer Engineering (IJECE)
Vol.8, No.6, December2018, pp. 4477~4485
ISSN: 2088-8708, DOI: 10.11591/ijece.v8i6.pp4477-4485 4477
Journal homepage: http://iaescore.com/journals/index.php/IJECE
Postdiffset Algorithm in Rare Pattern: An Implementation via
Benchmark Case Study
Mustafa Man
1
, Wan Aezwani Wan Abu Bakar
2
, Masita Masila Abd. Jalil
3
, Julaily Aida Jusoh
4
1,3
School of Informatics & Applied Mathematics, Universiti Malaysia Terengganu, Malaysia
2,4
Faculty Informatic and Computing, Universiti Sultan Zainal Abidin, Malaysia
Article Info ABSTRACT
Article history:
Received Apr 10, 2018
Revised Jun 12, 2018
Accepted Jun 30, 2018
Frequent and infrequent itemset mining are trending in data mining
techniques. The pattern of Association Rule (AR) generated will help
decision maker or business policy maker to project for the next intended
items across a wide variety of applications. While frequent itemsets are
dealing with items that are most purchased or used, infrequent items are
those items that are infrequently occur or also called rare items. The AR
mining still remains as one of the most prominent areas in data mining that
aims to extract interesting correlations, patterns, association or casual
structures among set of items in the transaction databases or other data
repositories. The design of database structure in association rules mining
algorithms are based upon horizontal or vertical data formats. These two data
formats have been widely discussed by showing few examples of algorithm
of each data formats. The efforts on horizontal format suffers in huge
candidate generation and multiple database scans which resulting in higher
memory consumptions. To overcome the issue, the solutions on vertical
approaches are proposed. One of the established algorithms in vertical data
format is Eclat.ECLAT or Equivalence Class Transformation algorithm is
one example solution that lies in vertical database format. Because of its ‘fast
intersection’, in this paper, we analyze the fundamental Eclat and Eclat-
variants such asdiffsetand sortdiffset. In response to vertical data format and
as a continuity to Eclat extension, we propose a postdiffset algorithm as a
new member in Eclat variants that use tidset format in the first looping and
diffset in the later looping. In this paper, we present the performance of
Postdiffset algorithm prior to implementation in mining of infrequent or rare
itemset. Postdiffset algorithm outperforms 23% and 84% to diffset and
sortdiffset in mushroom and 94% and 99% to diffset and sortdiffset in retail
dataset.
Keyword:
Association rule mining
Eclat algorithm
Frequent itemset
Infrequent itemset
Vertical databse
Copyright © 2018 Institute of Advanced Engineering and Science.
All rights reserved.
Corresponding Author:
Wan Aezwani Wan Abu Bakar,
Facultyof Informatic and Computing, Universiti Sultan Zainal Abidin,
Besut Campus, 22200 Besut, Terengganu, Malaysia.
Email: wanaezwani@unisza.edu.my
1. INTRODUCTION
The main objectives of association rules mining are to find the correlations, associations or casual
structures among sets of items in the data repository. In other words, it allows non discovery of implicative
and interesting tendencies in databases. Frequent itemset and infrequent itemset mining are critical fields in
association rule mining. The fields are widely used across a variety of domains such as market basket
analysis, remedial, biology, banking or retail services [1], [21]. Frequent or infrequent itemsets may
contribute to big data generation. Undoubtedly, the critical issues regarding memory space consumption and
data storage capacity will significantly effect prior to frequent or infrequent generation of