A Probabilistic Mining Method for Finding Association Rules Tzung-Pei Hong, Ming-Wen Tsai and Been-Chian Chien Institute of Information Engineering I-Shou University Kaohsiung, 84008, Taiwan, R.O.C. E-mail: tphong@csa500.isu.edu.tw ABSTRACT A data mining algorithm based on the adjusted residual in Statistics is proposed to find interesting itemsets and association rules from transactions. Some heuristics are proposed to reduce the calculation and search complexity of the derivation process. The proposed algorithm first detects the relevant item value patterns. These patterns are then combined and simplified by heuristics, and are finally converted into a set of rules. 1. INTRODUCTION A process of knowledge discovery in databases (KDD) generally consists of three phases [8][15] − pre-processing, data-mining and post-processing. The pre-processing phase includes understanding the domain and goal of an application, selecting an appropriate dataset or focusing on a subset of variables by data sampling on which discovery is to be performed. The data-mining phase applies specific mining algorithms for extracting patterns or rules from a set of data in a particular representation. The post-processing phase interprets the discovered patterns to be acceptable for human beings. It may even make possible visualization of the extracted patterns. This paper focuses on mining association rules from transactions. An association rule is an expression X⇒Y, where X is a set of items and Y is a single item. The rule means that in the set of transactions, if all the items in X exist in a transaction, then Y also exists in the transaction with a high probability. In the past, Agrawal et al proposed several mining algorithms based on the concept of large itemsets to find association rules from transactions [1][2][3][4]. Their approach is very interesting and creative. It can efficiently mine the association rules implicit in the transactions. It can not however detect the association rules, in which the items in the antecedent part are important to the ones in the conclusion part, but the frequencies of these items are not large enough for these rules to be mined out. This kind of rules is special to the transactions and important to the decision-makers although the items in them don’t often happen. Hou et al presented a method [12][13], which used the residual analysis in statistics [10] to find relevant patterns. It adopted several heuristics to find complex relevant patterns and to learn classification rules. Although Hou et al’s method can successfully learn classification rules, it considers only the relations between items and classes. In data mining from transactions, relations among items themselves are usually needed for providing useful implicit information. This leads to additional complexity. In this paper, we thus extend Hou et al’s method by using the concept of equivalent classes to mine association rules in databases. The remaining parts of this paper are organized as follows. The notation and definitions used in this paper are introduced in Section 2. The heuristics suitable to data mining from transactions are stated in Section 3. The proposed data-mining algorithm is described in Section 4. An example is given to illustrate the proposed algorithm in Section 5. Conclusions and future work are finally given in Section 6. 2. NOTATION AND DEFINITIONS Let A = {A 1 , A 2 , ..., A m } be a set of binary attributes, called items. Let D be a set of N transactions, where each transaction T is a set of items such that T⊆ A. An association rule is an implication of the form X⇒A i , where X⊂A, and X ∩A i =φ. The measure of the adjusted residual [10] is used here to find the patterns underlying this set of transactions. Let A i and A j be the i-th and the j-th items, 1≤ i, j ≤ m. Also let C Ai and C Aj respectively be the numbers of transactions that contain item A i and A j , and C (Ai, Aj) be the number of transactions that contain both items A i and A j , i≠j. The adjusted residual r AiAj between the items A i and A j is defined as: