Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology, 7 (4.36) (2018) 533-541
International Journal of Engineering & Technology
Website: www.sciencepubco.com/index.php/IJET
Research paper
Finding Efficient Positive and Negative Itemsets Using
Interestingness Measures
P. Asha
1*
, T. Prem Jacob
2
, A. Pravin
3
1
Asst.Prof, Department of Computer Science and Engineering, Sathyabama Insttitue of Science and Technology, Chennai.
2
Asst.Prof, Department of Computer Science and Engineering, Sathyabama Insttitue of Science and Technology, Chennai.
3
Asst.Prof, Department of Computer Science and Engineering, Sathyabama Insttitue of Science and Technology, Chennai.
*Corresponding author E-mail:ashapandian225@gmail.com
Abstract
Currently, data gathering techniques have increased through which unstructured data creeps in, along with well defined data formats.
Mining these data and bringing out useful patterns seems difficult. Various data mining algorithms were put forth for this purpose. The
associated patterns generated by the association rule mining algorithms are large in number. Every ARM focuses on positive rule mining
and very few literature has focussed on rare_itemsets_mining. The work aims at retrieving the rare itemsets that are of most interest to
the user by utilizing various interestingness measures. Both positive and negative itemset mining would be focused in this work.
Keywords: Association rule, positive association rules, negative association ruls, Interestingness measures.
1. Introduction
Data Mining has grasped people’s interest due to the availability
of a wide range of raw data where useful information is poor.
Thus, there is a need for extracting meaningful information from
the large amount of data that is of user understandable form. Data
Mining is an analysis process where patterns are mined from a
large database or repositories. To retrieve useful information, data
preprocessing techniques such as Data cleaning and integrating,
after selection of data followed by transformation using which the
mining is done and such data representation is popularly known
as Knowledge Discovery in KDD. Mining is performed on
relational databases, Data Warehouse, Transaction Databases and
Advanced databases such as OODB, ORDB, Multimedia, Spatial
and Web Mining Databases. Mining can be applied in various
fields such as Sales/Marketing, Healthcare/Insurance,
Banking/Finance, Medicine, Biomedicine, Transportation,
Telecommunication etc.It is advantageous as a powerful tool that
find patterns and relationships among data which helps in
discovering hidden information from the large and useless datasets
available. But Data Mining cannot work without human effort and
also it cannot tell the value of information mined for our need.
Thus to ensure meaningful Data Mining results, the researcher
must understand the data available.
Association Rule can otherwise be called as pattern. It has two
constituents: antecedent and consequent which is similar to if -
then respectively. The item P corresponds to the antecedent
endowed in the data whereas the consequent Q is endowed in the
combination of the antecedent P. Patterns that are mined must be
meaningful. Such a meaningful information or pattern is
discovered through some interesting measures. Most important
measure is supported that indicates how frequently the itemset P
or Q occur in the dataset and confidence attests the number of
counts, the if/then (P->Q) statements have been committed to be
true. Other measures are Laplace, Pearson coefficient, conviction,
P-S measure, interest factor, chi-square test, lift, leverage. Some
of these measures are discussed and efficient method is
comparatively determined in the study. Also the relationship
between these meaningful pattern can be identified through a
traditional method called Association Rule Mining (ARM). ARM
is becoming a research topic that mines associated rules. The
strength of the associated rules is determined by interestingness
measure. Mined rules have to satisfy some user specified
minimum value of support and the confidence. Pretty good
algorithms such as Apriori, Elcat, FP-growth are available to
generate association rules. In mining, association rules favours us
to analyze and predict the customer’s behavior. Also, it plays an
starring role in the market basket analysis and many other real
time applications.
ARM finds relationship between two itemsets of the pattern,
P→Q, in which P and Q are disjoint items. Likewise ARM can be
used to mine k-itemsets. Most of the existing research is on
mining positive_association_rules of the pattern P→Q, but we
also focus on mining negative association rules of the pattern,
P→~Q, ~P→Q, ~P→~Q. This absence of itemsets is also
considered in our study. Different methods of ARM are discussed
which mines frequent and infrequent itemsets from where both
positive and negative rules are mined. Strong rules can be mined
by applying interestingness measure. Major usage of mining
negative association rules are fraud detection and to find genetic
disorders in the field of Bioinformatics. Primary issues in mining
negative rules are identification of appropriate patterns from the
large database, thus making it a challenging attempt. Also, various
methods that were incorporated to lessen the number of rules
generated and number of scans to the database is the main
objective discussed in the study.
2. Review on existing work
Rakesh et al. compared Apriori and Apriori_tid algorithms for
Association Rule Mining and combined the advantages of both
the algorithms, which was termed asApriori_hybrid. Apriori