Association Rules in STULONG and Natural Language ? Petr Strossa 1 and Jan Rauch 2 1 kizips@vse.cz, Deparment of Information and Knowledge Engineering 2 rauch@vse.cz, EuroMISE – Kardio Faculty of Informatics and Statistics, University of Economics, W. Churchill Sq. 4, 130 67 Prague, Czech Republic Abstract. A system of association rules (ARs) concerning STULONG data set is described. Examples of particular association rules are given. It is shown that the association rules found can be formulated in rea- sonable sentences of a natural language (NL). A limited language model for formulating association rules in a NL (English or Czech) is described. This model mainly consists of a set of formulation patterns, which only s- lightly depend on the subject domain, and tables of NL expressions (verb phrases, noun phrases etc.) for data columns and their values. The mor- phological problems and their solutions suitable for the limited subject domain, both for English and Czech, are described. 1 Introduction This paper presents results of a data mining activity concerning the STULONG data set. STULONG consists of two data matrices – Entry and Control. The data matrix Entry concerns entry examination of 1 419 men. There are results of observation of 219 attributes. The data matrix Control contains results of observation of 66 attributes at 10610 control examinations. The goal of the data mining was to find as much as possible interesting association rules (ARs) concerning the Entry data matrix. We used the 4ft-Miner procedure that is a part of the LISp-Miner system (see http://lispminer.vse.cz/ ). The main features of the 4ft-Miner pro- cedure are described in section 2. One of the problems related to the described activity is the large number of attributes of the data matrix Entry. It results in a large number of ARs that can be interesting from the point of view of the data owner. It is necessary to represent the results as a system of ARs. Ranking the whole set of resulting rules is not sufficient. The resulting rules must be grouped in a natural way corresponding to the structure and properties of the attributes. A system of ARs satisfying this requirement is outlined in section 3. Examples of ARs are also given. ? The work described here has been supported by projects LN00B107 and ZA471011 of the Ministry of Education of the Czech Republic and by the EU grant no. IST-1999-11495: Data Mining and Decision Support for Busi- ness Competitiveness: Solomon European Virtual Enterprise.