Integration of data selection and classification by fuzzy logic Miroslav Hudec a,⇑ , Mirko Vujoševic ´ b a INFOSTAT – Institute of Informatics and Statistics, Dúbravská cesta 3, 84524 Bratislava, Slovakia b Faculty of Organizational Sciences, University of Belgrade, Jove Ilic ´a 154, 11000 Beograd, Serbia article info Keywords: Fuzzy queries Generalized Logical Condition Fuzzy classification Integration abstract A concept of integration of fuzzy data selection and classification by fuzzy Generalized Logical Condition (GLC) is presented in this paper. The GLC that extends SQL queries with fuzzy logic was developed for the purpose of fuzzy data selection. In order to classify data by generating fuzzy queries from fuzzy rules, the extension of the GLC was created. The proposed methodology leads to the integration of data selection and data classification into one entity, while the access to relational databases remains unchanged. The obtained approach was presented on data from the municipal and urban statistical database. Data selection and classification problems can often be described more naturally in terms of natural language rather than by crisp numbers. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction The data selection (database queries) and data classification are two processes often used in data processing and analysis. Each ob- ject (database record, tuple) is characterized by its own vector of attributes. According to these attributes, objects are selected or classified for different purposes. The goal of data selection is to separate relevant objects from not relevant ones, in other words to extract them from a database for further use. In the classifica- tion, objects are classified into several classes what enables better overview of all objects and a particular action could be undertaken on objects from a chosen class. The selection and classification be- come particularly important when the information system stores a large number of records and their attributes. The collected data do not hold any significant value without analysis and searching for additional useful information inside. The fuzzy set theory brings a paradigm in work with the grada- tion, uncertainty and ambiguity described by linguistic terms when sharply defined selection and classification criteria could not be created. Fuzzy sets and fuzzy logic directly employ the ex- pert knowledge by means of approximate reasoning and linguistic terms in order to select and classify data. Moreover, users feel more comfortable using linguistic terms instead of precisely specified numerical constraints when expressing query conditions or classi- fication problem. The Structured Query Language (SQL) uses the two-valued logic in processes of data selection from relational databases. Although the SQL is a very powerful tool, it is unable to satisfy needs for data selection based on linguistic terms (vague predicates such as altitude near 200 m, high length of roads) and degrees of matching query conditions. To overcome this imperfection different ap- proaches based on fuzzy logic have been proposed and many fuzzy query implementations have been designed, e.g. Kacprzyk and Zadro _ zny (1995), Rasmussen and Yager (1997), Bosc and Pivert (2000) and Wang, Lee, and Chen (2007). In order to bring benefit of the fuzzy logic to users and to make easy to use the querying tool, the fuzzy Generalized Logical Condition (GLC) capable to implement linguistic terms into the where part of the SQL has been created and described in Hudec (2009). Another way of database query improvement is asking for data using sentences of natural language and crisp Boolean conditions (Papadakis, Kefalas, & Stilianakakis, 2011). A number of classification methodologies based on fuzzy logic were discussed in literature. The main advantage of a fuzzy classi- fication compared to a crisp one is that an element is not limited to a single class but can be assigned to several classes. Researches that are dedicated to examine generating fuzzy queries from clas- sification rules belong to interesting research area with many pos- sibilities for implementations. Two solutions were outlined in the paper: fuzzy Classification Query Language (fCQL) used in customer relationship management (Werro, Meier, Mezger, & Schindler, 2005) and generating fuzzy queries from weighted fuzzy classifier rules (Branco, Evsukoff, & Ebecken, 2005). The compari- son between fuzzy and crisp classification was outlined in Meier, Werro, Albrecht, and Sarakinos (2005). These papers also contain information about other researches and implementations in this field. Classification methods usually use database queries only as sup- porting tools to select relevant data for classification tasks. Our pre- liminary research has shown that fuzzy queries and the GLC could be used for data selection as well as for data classification (Hudec 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.009 ⇑ Corresponding author. E-mail address: hudec@infostat.sk (M. Hudec). Expert Systems with Applications 39 (2012) 8817–8823 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa