Object Recognition based on Visual Grammars and Bayesian Networks El´ ıas Ruiz, L. Enrique Sucar National Institute of Astrophisics, Optics and Electronics Computer Science Department {elias ruiz, esucar}@inaoep.mx Abstract A novel proposal for object recognition based on relational grammars and Bayesian Networks is pre- sented. Based on a Symbol-Relation grammar an object is represented as a hierarchy of features and spatial relations. This representation is trans- formed to a Bayesian network structure which pa- rameters are learned from examples. Thus, recog- nition is based on probabilistic inference in the Bayesian network representation. Preliminary re- sults in modeling natural objects are presented. I Introduction Most current object recognition systems are centered in rec- ognizing certain type of objects, and do not consider their structure. This implies several limitations: (i) the systems are difficult to generalize to any type of object, (ii) they are not robust to noise and occlusions, (iii) the model is diffi- cult to interpret. This paper proposes a model that achieves a hierarchical representation of a visual object in order to perform object recognition tasks, based on a visual grammar [Ferrucci et al., 1996] and Bayesian Networks (BN’s). Thus, we propose the incorporation of a visual grammar in order to develop an understandable hierarchical model so that from basic elements (obtained by any image segmentation algo- rithm) it will construct more complex forms by certain rules of composition defined in the grammar, in order to achieve object recognition in certain context (e.g. images of objects indoors). The importance of using a hierarchical approach is that it can build a more robust model to noise and occlusions, also the BN model can work with incomplete evidence. In addition, the model expresses the grammar in an understand- able way to a human, in order to interpret the model and even modify the structure. There are several works using a hierarchical ap- proach [Chang et al., 2011; Felzenszwalb, 2011; Melendez et al., 2010; Zhu and Mumford, 2006]. We propose Symbol- Relation grammars (SR-Grammars) because they can repre- sent relationships between elements using predicate logic; the transformation into a BN can deal with uncertainty and we can do inference in order to perform object recognition tasks. Figure 1: Training of the model. Starting from training images and a description of the object in terms of the lexicon and the visual grammar, the model generates a BN structure with its parameters. Figure 2: Testing stage. The test image is segmented with the visual dictionary and correspondences between regions and the visual lexicon are obtained. After that, the algorithm evaluates subsets of those regions with their spatial relationships that are candidates to be evaluated in the previously trained PGM in order to do inference. At the end, we obtain a result given by the PGM if there is an object in the image. II Methods The proposed method compromises two phases: (i) model construction and transformation to a BN (Fig. 1); and (ii) im- age pre-processing and object recognition using probabilistic inference (Fig. 2). Next we briefly describe the main steps of our method. A Segmentation and Lexicon The segmentation is performed with simple RGB quanti- zation (32 levels) and edge extraction using Gabor Filters. Small regions are fused with other regions. The idea is to use a simple segmentation algorithm. These regions define a visual dictionary. Every region is described using several shape and color features. Similar regions by their features are considered as candidates to terminal elements in our gram- mar. All the terminal elements are described in a “Lexicon”. B SR-Grammars and Spatial Relationships The object representation is based on SR-grammars [Ferrucci et al., 1996] which can incorporate spatial relationships be- tween elements. In our work, these relationships determine Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence 3241