International Journal of Electrical & Computer Science IJECS-IJENS Vol:14 No:03 31
1412503-7474-IJECS-IJENS © June 2014 IJENS
I J E N S
Apriori Based: Mining Infrequent and Non-Present
Item Sets from Transactional Data Bases
Sujatha Kamepalli
1
, Raja Sekhara Rao Kurra
2
and Sundara Krishna.Y.K.
3
1
Research Scholar, CSE Department, Krishna University
Machilipatnam, Andhra Pradesh, India
sujatha101012@gmail.com
2
Director, Sri Prakash College of Engineering,
Thuni , Andhra Pradesh, India.
krr_it@yahoo.co.in
3
Professor, CSE Department, Krishna University
Machilipatnam, Andhra Pradesh, India.
yksk2010@gmail.com
Abstract-- Item set mining has been an active area of research
due to its successful application in various data mining scenarios
including finding association rules. Though most of the past work
has been on finding frequent item sets, infrequent item set mining
has demonstrated its utility in web mining, bioinformatics and
other fields. In this paper, we propose a new method based on
Apriori algorithm to find infrequent item sets and non-present
item sets. Finally, we analyze the behavior of our proposed
method by considering a transactional data base.
Index Term-- Data mining, frequent item sets, infrequent item
sets, non-present item sets, Apriori.
1. INTRODUCTION
Frequent pattern mining was first proposed by
Agrawal et al. (1993) [2] for market basket analysis in the
form of association rule mining. It analyses customer buying
habits by finding associations between the different items that
customers place in their “shopping baskets”. For instance, if
customers are buying milk, how likely are they going to also
buy cereal (and what kind of cereal) on the same trip to the
supermarket? Such information can lead to increased sales by
helping retailers do selective marketing and arrange their shelf
space [3]. Patterns that are rarely found in database are often
considered to be uninteresting and are eliminated using the
support measure. Such patterns are known as infrequent
patterns [4].An infrequent pattern is an item set or a rule
whose support is less than the minimum support threshold.
Mining frequent item sets has found extensive
utilization in various data mining applications including
consumer market-basket analysis [13], inference of patterns
from web page access logs [17], and iceberg-cube computation
[14], Extensive research has, therefore, been conducted in
finding efficient algorithms for frequent item set mining,
especially in finding association rules [15], However,
significantly less attention has been paid to mining of
infrequent item sets, even though it has got important usage in
(i) mining of negative association rules from infrequent item
sets [18], (ii) statistical disclosure risk assessment where rare
patterns in anonymous census data can lead to statistical
disclosure [16], (iii) fraud detection where rare patterns in
financial or tax data may suggest unusual activity associated
with fraudulent behavior [16], and (iv) bioinformatics where
rare patterns in micro array data may suggest genetic disorders
[16].
Although a vast majority of infrequent patterns are
uninteresting, some of them might be useful to the analysis,
particularly those that correspond to negative correlations in
data. For example, the sale of DVDs and VCRs together is low
because any customer who buys a DVD will most likely not
buy a VCR and vice versa. Such negative correlated patterns
are useful to help identify competing items, which are items
that can be substituted for one another. Examples of competing
items include tea versus coffee, butter versus margarine,
regular versus diet soda, and desktop versus laptop computers.
Some infrequent patterns may also suggest the occurrence of
interesting rare events or exceptional situations in the data. For
example, if {Fire = Yes} is frequent but {Fire = Yes, Alarm =
On} is infrequent, then the latter is an interesting infrequent
pattern because it may indicate faulty alarm systems. To detect
such unusual situations, the expected support of a pattern must
be determined, so that, if a pattern turns out to have a
considerably lower support than expected, it is declared as an
interesting infrequent pattern [5]. The mining task that focuses
on discovering frequent patterns from the databases is called
frequent pattern mining. In frequent pattern mining, only
frequent patterns are returned while infrequent patterns are
simply discarded without further consideration. This is
because the most valuable information is carried by the
frequent patterns and the infrequent patterns cannot adequately
reflect the typical characteristics from the data because of their
rare occurrence [8]. However, since the late 1990s, more and
more researchers have realized the importance of infrequent
patterns with the increasing demands from applications of
anomaly detection, especially in medicine [6], genetics [10],
molecular biology [7] and network security [11]. In these