Optimized Fuzzy Association Rule Mining for Quantitative Data Hui Zheng, Jing He, Guangyan Huang, Yanchun Zhang Abstract— With the advance of computing and electronic technology, quantitative data, for example, continuous data (i.e., sequences of floating point numbers), become vital and have wide applications, such as for analysis of sensor data streams and financial data streams. However, existing association rule mining generally discover association rules from discrete variables, such as boolean data (’0’ and ’1’) and categorical data (’sunny’, ’cloudy’, ’rainy’, etc.) but very few deal with quantitative data. In this paper, a novel optimized fuzzy association rule mining (OFARM) method is proposed to mine association rules from quantitative data. The advantages of the proposed algorithm are in three folds: 1) propose a novel method to add the smoothness and flexibility of membership function for fuzzy sets; 2) optimize the fuzzy sets and their partition points with multiple objective functions after categorizing the quantitative data; and 3) design a two-level iteration to filter frequent-item-sets and fuzzy association-rules. The new method is verified by three different data sets, and the results have demonstrated the effectiveness and potentials of the developed scheme. Index Terms—Quantitative Association Rule; Fuzzy sets; Optimized Partition Points; Objective Function; I. I NTRODUCTION With the advance of computing and electronic technology, how to analyse quantitative data, for example, continuous data (i.e. sequences of floating point numbers) has become a crucial issue to solve, such as for analysis of sensor data streams and financial data streams. However, classical methods for association rule mining concern only non- quantitative variables such as binary and categorical data objects. Binary variables are also called boolean variables, whose values are either 0 (false) or 1 (true). Categorical variables are often labelled with category names, such as “sunny”, “cloudy”, “rainy”. It is also very often to represent categorical values with integers, which can be considered as groups of binary values. In contrast, the values of quantitative variables are usually represented by floating point numbers, they are so different to binary and categorical variables that conventional association rule approaches are not suitable for quantitative variables [1]. Therefore, several methods have been proposed to convert the quantitative attributes to categorical data objects, so these classical methods can be used. Frawley et al. proposes a mining approach, which par- titions quantitative attributes into intervals [3]. Then, some Hui Zheng are with University of Chinese Academy of Sciences, Beijing, China. Jing He, Guangyan Huang and Yanchun Zhang are with Centre for Applied Informatics, College of Engineering and Science, Victori- a University, Melbourne, Australia (email: zhenghui12b@mails.ucas.ac.cn {Jing.He,Guangyan.Huang,Yanchun.Zhang}@vu.edu.au). This work was supported by National Natural Science Foundation of China under Project 61332013 and Australia Research Council under Project LP100200682. approaches try to discover interval conditions on quantitative attributes using association rule with clustering [4], rule templates [5], specific interest measures [6] and genetic algorithms [7]. Most studies focus on how to group quantitative data into different sets. In all of the above methods, the quantitative data are transformed into category data and the association rule therefore can be achieved. But as direct discretization methods, they reduce the precision of data objects and if the pre-transformed data are close to the partition points, sharp boundaries problems will emerge when pre-transformed data transform into fuzzy sets. Real-world applications, however, usually need to keep this advantages of quantitative values as well as cutting down the sharp boundaries in partitioning process. Fuzzy association rule is a suitable method consisting more information for quantitative data. For instance, in Lee et al.’s paper [8], fuzzy sets are first introduced as an extension of association rules, which keep the precision of quantitative data with fuzzy sets and diminish the sharp boundaries while dividing the intervals to change fuzzy transactions into crisp ones. Then Delgado et al. [9] proposes a general model to discover association rules, using the definition of certainty factors and very strong rules to get the proper fuzzy associ- ation rules. Different from these, dozens of researchers have presented numerous methods to improve fuzzy association rule mining: Dubois et al. [10] develops an assessment approach partitioning the data into two groups using a given rule: those against the rule (the counterexamples) and those that are irrelevant; De Cock et al. [11] introduce new quality measures identifying the set of positive as well as the set of negative examples; some other papers [12][13] apply extra measures (such as clustering, classifying) to modify fuzzy association rule methods. Just like the fuzzy association rule mining methods men- tioned in these papers, in the fuzzy context, one can extend the boolean values 0, 1 (indicating absence and presence) to the interval [0, 1]. Whether a tuple contains an item is characterized by the membership. Consider blood pressure test as an example. Suppose a patient took a blood pressure test, and a doctor tries to determine whether the patient’s blood pressure is high or not. Note that blood pressure is measured quantitatively, i.e., what the doctor measures is a real number, not a boolean or binary value. For example with classification of blood pressure for adults, there are normally two criteria: systolic and diastolic pressure. If the systolic pressure falls into the interval of 120-139 mmHG and the diastolic pressure falls into 80-89 mmHG, the patient will be diagnosed as pre hypertension. But the hard cut might not always apply to all adult patients. Imagine if the patient only