(IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 9, September 2014 Finding Untraced Fuzzy Association Rules F. A. Mazarbhuiya College of Computer Science Albaha University Albaha, KSA Abstract— Fuzzy association rules are rules of the form “If X is A then Y is B” where X and Y are set of attributes and A, B are fuzzy sets that describe X and Y respectively. In most of fuzzy association rules mining problem fuzziness is specified by users. The users usually specify the fuzziness based on their understanding of the problem as well as the ability to express the fuzziness by natural language. However there exist some fuzziness which cannot be expressed using natural language due its limitation. In this paper we propose a method of extracting fuzzy association rules which cannot be traced by usual methods. We suggest a way of extracting these rules. Keywords- Fuzzy set, Association rules, Fuzzy interval, Certainty factor, Significance factor, Between Operation. I. INTRODUCTION The problem of association rule mining was defined by Agrawal et al [4]. Binary association rule mining is to find the relationships between the presences of various items within the baskets. A generalization of the binary association rules is motivated by the fact that a dataset is usually not restricted to binary attributes but also contains attributes with values ranging on ordered scales, such as cardinal or ordinal attributes. Quantitative association rules were defined for dealing with quantitative attributes [5]. In quantitative association rules attribute values are specified by means of subsets, which are typically intervals specified by hard boundaries. This is done by discretizing the domains of quantitative attributes into intervals. Generalizing from hard boundary intervals to soft boundary intervals has given rise to fuzzy association rules. A method for computing fuzzy association rules have been described in [1]. The fuzzy association rules are more understandable to human because of linguistic terms associated with fuzzy sets. The known fuzzy association rules mining techniques may however miss some interesting rules in the process as will be shown here. In this paper, we propose a method, which can extract these missing rules. The paper is organized as follows. In section II, we discuss briefly about the related works. In section III we review some definitions of basic terms and describe notations and symbols generally used with association rules mining.. In section IV, we discuss the problem that may arise in this method and then describe how to extract the missing rules. Finally in section V, we provide a conclusion and lines for future research. II. RELATED WORKS Replacing crisp sets (intervals) by fuzzy sets (intervals) leads to fuzzy (quantitative) association rules. Thus, a fuzzy association rule is understood as a rule of the form A→ B, where A and B are now fuzzy subsets rather than crisp subsets of the domains D X and D Y of two attributes X and Y respectively. Each attribute will be associated with several fuzzy sets. In other words, an attribute X is now replaced by a number of fuzzy attributes rather than by a number of binary attributes. Each element will contribute a vote between 0 and 1 both inclusive to the fuzzy attributes. The approach made in [1], [2], [6] to generalize the support- confidence measure for fuzzy association rules is to replace set-theoretic operations, namely Cartesian product and cardinality, by corresponding fuzzy set-theoretic operations. In [1] the terms significance and certainty are used instead of support and confidence usually used with non-fuzzy situations.: III. TERMS AND NOTATIONS USED A. Some basic definitions, terms and notations related to fuzzinesS Let E be the universe of discourse. A fuzzy set A in E is characterized by a membership function A(x) lying in [0, 1]. A(x) for x ∈ E represents the grade of membership of x in A. Thus a fuzzy set A is defined as A={(x, A(x)), x ∈ E } Fuzzy intervals are special fuzzy numbers satisfying the following. 1. there exists an interval [a, b]⊂ R such that A(x 0 ) =1 for all x 0 ∈ [a, b], and 2. A(x) is piecewise continuous. A fuzzy interval can be thought of as a fuzzy number with a flat region. A fuzzy interval A is denoted by A = [a, b, c, d] with a < b < c < d where A(a) = A(d) = 0 and A(x) = 1 for all x ∈[b, c]. A(x) for all x ∈[a, b] is known as left reference function and A(x) for x ∈ [c, d] is known as the right reference function. The left reference function is non-decreasing and the right reference function is non-increasing [see e.g. [3]]. B. Some basic definitions related to association rules Consider a set I = {i 1 , i 2 ,…,i m }of items, and let a transaction t(data record) be a subset of I i.e. t ⊆ I. Let D X = {t ∈ D⏐X ⊆ t} denote the set of transactions in the database D that contains the items X ⊆ I. The cardinality of this set i.e ⏐D X ⏐is called the support of X in D. Given a minimum threshold σ, X is said to be frequent if ⏐D X ⏐ ≥ σ. An association rule is a rule of the form A ﾆ B where A, B ⊆ I and ⏐D A∪B ⏐/ ⏐D A ⏐≥ ρ where ρ is another used defined threshold. The support of an association rule A ﾆ B is ⏐D A∪B ⏐. Sometimes the support is calculated as a fraction of the size of the dataset under consideration. In that case we have supp(A→ B) = ⏐D A∪B ⏐/ ⏐D⏐. The confidence is the proportion of correct applications of the rule: conf(A→ B) = ⏐D A∪B ⏐/ ⏐D A ⏐ Rather than looking at a transaction t as a subset of items, it can also be seen as a sequence (x 1 , x 2 , …,x m ) of values of binary variables X, with domain D X = {0, 1}, where x j = 1 if the jth item, i j , is contained in t, otherwise x j = 0. The association rule mining problem has been extended to handle relational tables rather than transactions of items. In this case the problem is transformed into binary one in the usual way. However a database may contain quantitative attributes (such as age, salary) and in such cases transforming it into binary one will not be possible due to the large size of the underlying domain e.g integers. The discrete interval method [5] divides the quantitative attribute domain into discrete intervals. Each element will contribute support to its own interval. In fact, each interval A = [x 1 , x 2 ] does again define a binary attribute X A (x) defined by X A (x) = 1 if x∈ A and 0 otherwise. In other words, each quantitative attribute X is replaced by k binary attributes i A X such that i i k A X 1 = ⊆ U . C. Significance factor The significance factor is calculated by first summing up all votes of each record with respect to the specified item set then dividing it by the total number of records. Let A be a set of fuzzy sets defined on a set of attributes X, then the significance factor of the pair <X,A> is calculated as Significance ⏐ ⏐ ∑∏ ∈ ∈ D x t D t X x j i a i j j ]} [ ( { α 28 http://sites.google.com/site/ijcsis/ ISSN 1947-5500