International Journal of Recent Technology and Engineering (IJRTE)
ISSN: 2277-3878, Volume-8 Issue-2, July 2019
3885
Published By:
Blue Eyes Intelligence Engineering
& Sciences Publication
Retrieval Number: A1920058119/19©BEIESP
DOI: 10.35940/ijrte.A1920.078219
Abstract: Closed item sets are frequent itemsets that uniquely
determines the exact frequency of frequent item sets. Closed Item
sets reduces the massive output to a smaller magnitude without
redundancy. In this paper, we present PSS-MCI, an efficient
candidate generate based approach for mining all closed itemsets.
It enumerates closed item sets using hash tree, candidate
generation, super-set and sub-set checking. It uses partitioned
based strategy to avoid unnecessary computation for the itemsets
which are not useful. Using an efficient algorithm, it determines
all closed item sets from a single scan over the database. However,
several unnecessary item sets are being hashed in the buckets. To
overcome the limitations, heuristics are enclosed with algorithm
PSS-MCI. Empirical evaluation and results show that the
PSS-MCI outperforms all candidate generate and other
approaches. Further, PSS-MCI explores all closed item sets.
Index Terms: data mining, frequent itemsets, closed itemset,
minimum support.
I. INTRODUCTION
Nowadays, huge amounts of data are collected from various
resources and available to everyone. Due to the complexity of
data and the need of various applications, the extraction of
interested information such collection is an active research
area. Data mining is an active research are in retrieving
hidden, valuable and unknown information from a large
collection of data or database. In that, Frequent Itemset
Mining (FIM) is one of the popular data mining technique that
aims at extracting itemsets that are highly correlated as hidden
knowledge from a transactional database. FIM is formally
formulated as, from a given list of transactions, minimum
threshold, find all the itemsets whose occurrence is at least
minimum support count. FIM goal is to find the Frequent
Itemsets (FI), a set of items whose occurrence is greater than
the minimum support of all transactions. One of the basic
applications is market basket analysis [3], where each
transaction corresponds to a set of products purchased by a
customer. To analyze the purchase behavior, find a set of
products which occurs together in a minimum threshold
percentage of transactions. It can be mapped to many real
scenarios of applications, it is mapped to other topics
Frequent Episode Mining, Sequential Pattern Mining,
Classification and Clustering. In FIM, several approaches
have been prosed for FIM [4], classified into two groups, they
are CGAT (candidate generation and test) and other is without
candidate generation that is FP-Growth [13]. The reputed
algorithm under first category is Apriori [2, 3], which runs on
the heuristic Apriori and anti-monotonic property. The
second category is based on tree concept rather than
Revised Manuscript Received on July 05, 2019.
U. Mohan Srinivas, Research Scholar, CSE, ANUCET, Guntur, India.
E. Srinivasa Reddy, Prof. and Dean, CSE, ANU, UCET, Guntur, India .
candidates, where the entire data base is represented in a tree
and do mine tree recursively to extract all frequent itemsets.
However, the number of FI’s that are extracted from large
databases can be huge which requires huge storage area and
more computations. For example, Table 1 is recorded with a
list of 4 transactions, consider the minimum support
min-sup=50% (count = 2). FIM { a:3, b:3, c:2, ab:2, ac:2,
ad:2, bc:2, bd:2, abc:2}. As per the definition of closed
itemset, it is observed that {c:2, ab:2, ac:2, bc:2} can be
determined from {abc:2}. Hence {c:2, ab:2, ac:2, bc:2} are
considered as redundant.
Table 1: Sample Transactional Database
As a result, several condensed representations for FI’s have
been proposed to reduce the size of FI’s without losing
knowledge [8]. The very next alternation method was
Maximal Itemsets, a set of itemsets whose support reaches the
threshold and doesn’t have any superset. It has shown very
impact on the size of FI’s. MaxClique, Mafia [6], Pincer
search [16], Maxminer [Bayardo 98], Depth project [3],
Mafia [6], GenMax [12] and FPMax [12]. All the above
algorithms are able to extract all the maximal itemsets.
However, multiple scans of database was needed when the
main memory size was small and too many possible itemsets
were generated at each pass. However, extracting frequent
information with exact support is not achievable. Further, it
has been investigated, the result with the term Closed Itemsets
CI. CI is a set of itemsets which doesn’t have any supersets
with the same support. The research including top-down
approaches [7, 5, 20], Bottom-up approaches and
combination of both is Pincer search [16, 17].. The above
approaches have shown the output contains all the frequent
itemsets. However, multiple scans of database was needed
when the main memory size was small and too many possible
itemsets were generated at each pass.
Contributions:
In this paper, we propose a Novel approach called Partition
Based Single Scan Approach
(PSS-MCI) for Mining Closed
Itemsets. Hash Table is used to
capture the Possible Frequent
Mining Closed Item sets using Partition Based
Single Scan Algorithm
U. Mohan Srinivas, E. Srinivasa Reddy