Neha Jain, Seema Shukla / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 3, May-Jun 2012, pp.1444-1451 1444 | P a g e Fuzzy Databases Using Extended Fuzzy C-Means Clustering Neha Jain*, Seema Shukla** *(Department of Computer Science, JSS Academy of Technical Education, Noida, India) ** (Department of Computer Science, JSS Academy of Technical Education, Noida, India) ABSTRACT In recent years, the Fuzzy Relational Database and its queries have gradually become a new research topic. Fuzzy Structured Query Language (FSQL) is used to retrieve the data from fuzzy database because traditional Structured Query Language (SQL) is inefficient to handling uncertain and vague queries. The proposed model provides the facility for naïve users for retrieving relevant results of non-crisp queries and improves the relevance of results provided by Fuzzy C-Means (FCM) through the use of extended Fuzzy C-Means (EFCM). An extended fuzzy clustering algorithm based on the Gustafson-Kessel (GK) algorithm. Fuzzy C-Means and Gustafson-Kessel algorithm both are well known fuzzy clustering algorithms. Gustafson-Kessel algorithm is needed because the clustering results of the traditional Fuzzy C-Means clustering algorithm are less stable and all the clusters are spherical shaped only. Gustafson- Kessel algorithm is useful for making clusters of different geometrical shapes. The result analysis of both the algorithms is on the basis of cluster validity measures which indicate that Gustafson-Kessel algorithm is better than Fuzzy C-Means fuzzy clustering algorithm. Keywords - Fuzzy C-Means, Fuzzy Databases, Fuzzy Systems, Gustafson-Kessel 1.Introduction Database is the most important part of every organization. It is used for storing the data and retrieving the data. Generally, Structured Query Language (SQL) is used for maintaining the data. Although Structured Query Language is a very powerful tool of Relational Data Base Management System (RDBMS) but there is also some limitations with the data. In traditional databases, data is stored in the numeric and alphanumeric format. So, finder should know his actual requirements in which boundary of data he wants. Only then output comes in precise form. But in the real world user is uncertain with his requirements. If user applies his thoughts in the form of query then lot of ambiguity, uncertainty and vagueness arise. For the uncertainty or approximation of the user another type of SQL is required. So, Fuzzy Structured Query Language (FSQL) is developed. Fuzzy relational databases extend the conventional relational database model to allow for representation of imprecise data. In general, each value in a crisp relational database is taken from a specified domain and is strongly typed and thus, the data is essentially homogeneous across all rows in the relation. Fuzzy relational databases, however, may allow heterogeneous data for an attribute. To establish its theoretical validity, fuzzy relational database theory is based on fuzzy set theory, which is extended from classical set theory. Zadeh (1965) is credited with developing both fuzzy logic and fuzzy set theory as a way to model the imprecision and uncertainty that is inherent in both the world and language. Clustering is a mathematical tool that attempts to discover structures or certain patterns in a data set, where the objects inside each cluster show a certain degree of similarity. Clustering is useful with database in Data Storage and Retrieval Process. When a query is made for the address of a Person the archived data is clustered according to the various criteria, e.g.- by similar street names, within the same zip code or by similar last name. There have been many researches for cluster analysis. Fuzzy clustering is an extension of cluster analysis. For finding the similarity in the data and grouping the data many fuzzy clustering algorithms are defined in the literature. Fuzzy C-Means algorithm and Gustafson-Kessel algorithm are two of them. They are very useful with the database. The proposed approach is the extension of Fuzzy C-Means (FCM) algorithm. The Gustafson-Kessel (GK) algorithm is an extension of the FCM, which can detect clusters of different orientation and shape in a data set by employing norm-inducing matrix for each cluster. Gustafson-Kessel (GK) algorithm is required because Fuzzy C-Means (FCM) algorithm has some limitations. The downside with using a single matrix A is that all clusters will have the same shape and orientation. When there are clusters with different shapes, FCM will be undesirable. Gustafson and Kessel extended the FCM by employing an adaptive distance norm for each cluster to detect different geometrical shapes in data sets. Each ith cluster has its own norm-inducing matrix A i which affects the distance norm in the FCM. Euclidean norm in the FCM is now changed as Mahalanobis distance norm. 2. Fuzzy Systems, Fuzzy Databases and Clustering This section introduces the basics of fuzzy systems and fuzzy databases and then the concepts of clustering are described. Fuzzy clustering algorithms Fuzzy C-Means and Gustafson-Kessel are described in detail. Then section explains the concept of cluster validity measurement indexes. 2.1 Fuzzy Systems Fuzzy logic [1],[4],[5] is a form of many-valued logic