International Journal of Electrical & Computer Science IJECS-IJENS Vol:14 No:03 31 1412503-7474-IJECS-IJENS © June 2014 IJENS I J E N S Apriori Based: Mining Infrequent and Non-Present Item Sets from Transactional Data Bases Sujatha Kamepalli 1 , Raja Sekhara Rao Kurra 2 and Sundara Krishna.Y.K. 3 1 Research Scholar, CSE Department, Krishna University Machilipatnam, Andhra Pradesh, India sujatha101012@gmail.com 2 Director, Sri Prakash College of Engineering, Thuni , Andhra Pradesh, India. krr_it@yahoo.co.in 3 Professor, CSE Department, Krishna University Machilipatnam, Andhra Pradesh, India. yksk2010@gmail.com Abstract-- Item set mining has been an active area of research due to its successful application in various data mining scenarios including finding association rules. Though most of the past work has been on finding frequent item sets, infrequent item set mining has demonstrated its utility in web mining, bioinformatics and other fields. In this paper, we propose a new method based on Apriori algorithm to find infrequent item sets and non-present item sets. Finally, we analyze the behavior of our proposed method by considering a transactional data base. Index Term-- Data mining, frequent item sets, infrequent item sets, non-present item sets, Apriori. 1. INTRODUCTION Frequent pattern mining was first proposed by Agrawal et al. (1993) [2] for market basket analysis in the form of association rule mining. It analyses customer buying habits by finding associations between the different items that customers place in their “shopping baskets”. For instance, if customers are buying milk, how likely are they going to also buy cereal (and what kind of cereal) on the same trip to the supermarket? Such information can lead to increased sales by helping retailers do selective marketing and arrange their shelf space [3]. Patterns that are rarely found in database are often considered to be uninteresting and are eliminated using the support measure. Such patterns are known as infrequent patterns [4].An infrequent pattern is an item set or a rule whose support is less than the minimum support threshold. Mining frequent item sets has found extensive utilization in various data mining applications including consumer market-basket analysis [13], inference of patterns from web page access logs [17], and iceberg-cube computation [14], Extensive research has, therefore, been conducted in finding efficient algorithms for frequent item set mining, especially in finding association rules [15], However, significantly less attention has been paid to mining of infrequent item sets, even though it has got important usage in (i) mining of negative association rules from infrequent item sets [18], (ii) statistical disclosure risk assessment where rare patterns in anonymous census data can lead to statistical disclosure [16], (iii) fraud detection where rare patterns in financial or tax data may suggest unusual activity associated with fraudulent behavior [16], and (iv) bioinformatics where rare patterns in micro array data may suggest genetic disorders [16]. Although a vast majority of infrequent patterns are uninteresting, some of them might be useful to the analysis, particularly those that correspond to negative correlations in data. For example, the sale of DVDs and VCRs together is low because any customer who buys a DVD will most likely not buy a VCR and vice versa. Such negative correlated patterns are useful to help identify competing items, which are items that can be substituted for one another. Examples of competing items include tea versus coffee, butter versus margarine, regular versus diet soda, and desktop versus laptop computers. Some infrequent patterns may also suggest the occurrence of interesting rare events or exceptional situations in the data. For example, if {Fire = Yes} is frequent but {Fire = Yes, Alarm = On} is infrequent, then the latter is an interesting infrequent pattern because it may indicate faulty alarm systems. To detect such unusual situations, the expected support of a pattern must be determined, so that, if a pattern turns out to have a considerably lower support than expected, it is declared as an interesting infrequent pattern [5]. The mining task that focuses on discovering frequent patterns from the databases is called frequent pattern mining. In frequent pattern mining, only frequent patterns are returned while infrequent patterns are simply discarded without further consideration. This is because the most valuable information is carried by the frequent patterns and the infrequent patterns cannot adequately reflect the typical characteristics from the data because of their rare occurrence [8]. However, since the late 1990s, more and more researchers have realized the importance of infrequent patterns with the increasing demands from applications of anomaly detection, especially in medicine [6], genetics [10], molecular biology [7] and network security [11]. In these