C. Arcelli et al. (Eds.): IWVF4, LNCS 2059, pp. 85–100, 2001.
© Springer-Verlag Berlin Heidelberg 2001
A Fragment-Based Approach to Object Representation
and Classification
Shimon Ullman, Erez Sali, and Michel Vidal-Naquet
The Weizmann Institute of Science
Rehvot 76100, Israel
Shimon@wisdom.weizmann.ac.il
Abstract. The task of visual classification is the recognition of an object in the
image as belonging to a general class of similar objects, such as a face, a car, a
dog, and the like. This is a fundamental and natural task for biological visual
systems, but it has proven difficult to perform visual classification by artificial
computer vision systems. The main reason for this difficulty is the variability of
shape within a class: different objects vary widely in appearance, and it is
difficult to capture the essential shape features that characterize the members of
one category and distinguish them from another, such as dogs from cats.
In this paper we describe an approach to classification using a fragment-based
representation. In this approach, objects within a class are represented in terms
of common image fragments that are used as building blocks for representing a
large variety of different objects that belong to a common class. The fragments
are selected from a training set of images based on a criterion of maximizing
the mutual information of the fragments and the class they represent. For the
purpose of classification the fragments are also organized into types, where
each type is a collection of alternative fragments, such as different hairline or
eye regions for face classification. During classification, the algorithm detects
fragments of the different types, and then combines the evidence for the
detected fragments to reach a final decision. Experiments indicate that it is
possible to trade off the complexity of fragments with the complexity of the
combination and decision stage, and this tradeoff is discussed.
The method is different from previous part-based methods in using class-
specific object fragments of varying complexity, the method of selecting
fragments, and the organization into fragment types. Experimental results of
detecting face and car views show that the fragment-based approach can
generalize well to a variety of novel image views within a class while
maintaining low mis-classification error rates. We briefly discuss relationships
between the proposed method and properties of parts of the primate visual
system involved in object perception.
1 Introduction
The general task of visual object recognition can be divided into two related, but
somewhat different tasks – classification and identification. Classification is
concerned with the general description of an object as belonging to a natural class of
similar objects, such as a face or a dog. Identification is a more specific level of