International Journal of Engineering Innovation & Research Volume 1, Issue 2, ISSN : 2277 5668 70 Copyright © 2012 IJEIR, All right reserved Automatic Shape Annotation Using Rough Sets and Decision Trees Manoj P. Patil Department of Computer Science North Maharashtra University, Jalgaon, (M. S.) mpp145@gmail.com Satish R.Kolhe Department of Computer Science North Maharashtra University, Jalgaon, (M. S.) srkolhe2000@gmail.com Abstract Annotation of images automatically assigns tags to images by analyzing contents of images. Shape is the most important feature of images, by using this features tagging of images is possible, can be termed as automatic shape annotation. In this paper, a novel classifiers using machine learning techniques viz. Rough Set (RS) and Decision Tree (DT) are presented to classify shape images of a standard dataset for annotation purpose. Shape based features are extracted and organized to form a shape feature. Rough Set Exploration System (RSES) is used to develop decision tree based, rough set based classifiers for the tagging of shapes. The results obtained using these classifiers are presented and discussed. The RS classifier significantly improves the annotation performance. Keywords Automatic image annotation, shape features, decision tree, rough sets. I. INTRODUCTION The description of the object shape is an important task in image analysis and pattern recognition. The shapes occurring in the images have also a remarkable significance in image retrieval [1]. The ever growing number of images generated everyday is the reason to develop, evaluate and implement sophisticated automatic annotation system for the retrieval of images from large databases based on their content rather than their manual annotations. Although computers are still a long way from identifying and textually describing image concepts in the way humans do, it is possible to train computers on large previously annotated image databases, in order to learn the associations between visual image data and their textual descriptions [2]. These automatic image annotation systems have received intensive attention in the literature of image information retrieval since this area was started years ago, and consequently a broad range of techniques have been proposed. The algorithms used in these systems perform four tasks namely feature extraction, feature selection, training annotation system, and annotation of new images. The extraction task transforms rich content of images into a set of features. Feature extraction is a special form of dimensionality reduction. The generated features are to be used in selecting a subset of features. Feature selection reduces the number of features provided to train the system. The features which are likely to assist in discrimination are selected and used in the annotation task. Features those are similar and cannot discriminate shapes are not selected and hence discarded. A set of features is end result of the extraction process commonly called a feature vector, which composes a representation of the image. Among other generic image features like color and texture that are used to achieve the classification objective, shape is considered the most promising for the identification of entities in an image [3]. Shape is a fundamental image feature and one of the most important image feature used in Image Annotation and Retrieval. This feature alone provides capability to recognize, classify objects and retrieve similar images on the basis of their contents [4]. Among the classification algorithms decision tree algorithms is the most commonly used because it is easy to understand and cheap to implement. It provides a modeling technique that is easy for human to comprehend and simplifies the classification process [5]. A decision tree can be constructed from a set of instances by a divide- and conquer strategy. If all the instances belong to the same class, the tree is a leaf with that class as label. Otherwise, a test is chosen that has different outcomes for at least two of the instances, which are partitioned according to this outcome. The tree has as its root a node specifying the test and for each outcome in turn, the corresponding sub-tree is obtained by applying the same procedure to the subset of instances with that outcome. Rough set theory can be regarded as a new mathematical tool for imperfect data analysis. Rough set philosophy is founded on the assumption that with every object of the universe of discourse some information (data, knowledge) is associated. Objects characterized by the same information are indiscernible (similar) in view of the available information about them. The in-discernibility relation generated in this way is the mathematical basis of rough set theory. Any set of all indiscernible (similar) objects is called an elementary set, and forms a basic granule (atom) of knowledge about the universe. Any union of some elementary sets is referred to as a crisp (precise) set otherwise the set is rough (imprecise, vague). In this paper automatic annotation of shapes using decision trees and rough sets techniques is discussed. A novel classifier using Rough Set (RS) is presented to classify shape images of a standard dataset for annotation purpose. Shape features are extracted from the input images and then classification is done. Decision tree generation, discretization and rule extraction for rough sets is accomplished using RSES. Classifiers using decision tree and rough sets techniques are formulated in RSES. The description of the use of various machine learning techniques for classification is provided in Section 2.