Semantic Grouping of Visual Features
Alexandra Teynor and Hans Burkhardt
Department of Computer Science, Albert-Ludwigs-Universit¨ at Freiburg, Germany
{teynor, Hans.Burkhardt}@informatik.uni-freiburg.de
Abstract
Many current object class models build on visual
parts that constitute an object. However, visually differ-
ent entities may actually refer to the same object part.
This may be harmful for part based object class mod-
els. We present a method how visually distinct parts
with the same semantic role can be associated by cre-
ating groupings based on the similarity of their occur-
rence distributions. Experimental results verify that
more compact class representations can be built based
on these groupings, which lead to improved classifica-
tion performance and/or reduced classification time.
1. Introduction
A common technique for the recognition of object
classes is the use of part dictionaries or “visual code-
books”. These codebooks contain a variety of possi-
ble image structures. Whenever visual codebooks are
created, e.g., by clustering appearance features from lo-
cal image patches, we only have a visual, not a seman-
tic grouping of object parts. The variety in the visual
appearance of semantically equal object parts are due
to several reasons. First, there are natural intra class
variabilities. Then, we also have to deal with differ-
ent poses, e.g., a mouth might be open, shut, or smil-
ing showing the teeth. But also other reasons exist:
current feature extraction methods often rely on inter-
est point detections which are not always on the same
locations on different object instances. This might re-
sult in shifted local windows for the same object part.
So an eye might not always occur at the center of a lo-
cal window, but also slightly shifted to the left or right.
The features extracted from such shifted windows can
be quite different. Invariance towards such shifts might
be incorporated into the local features, but some very
successful features like the SIFT features by Lowe [5]
deliberately do not only consider the frequency of cer-
tain structures, but also their location. These types of
features are affected by shifts in the detected structure.
Depending on the classification strategy, a separate
treatment of semantically similar parts might be harm-
ful. Especially when using “bag-of-feature” type ap-
proaches, parts with the same role are assigned to dif-
ferent dictionary entries. Distance calculation between
part histograms is typically performed in a bin-by-bin
fashion, so performance can be degraded by not relat-
ing semantically similar parts.
In this work, we present a novel way on how to per-
form a semantic grouping of object parts. Parts with a
different visual appearance but with the same semantic
role are associated by the similarity of their occurrence
distributions given the object class.
2. Related work
Previous work concerning the semantic grouping of
visual structures has been performed by Leibe [4] or
Epshtein and Ullman [1]. Leibe combines visual parts
by co-location and co-activation clustering. His ap-
proach is similar to ours as he also tries to associate
parts that occur at the same location in an image, but
he uses a weighted variation of the Hausdorff distance
to combine visual parts. He does not apply his proce-
dure to part frequency based object class models, as
he advocates a Hough transform like voting method.
Epshtein and Ullman use the context of parts in a prob-
abilistic framework. They identify the geometric rela-
tion of parts co-occurring with a basic “root fragment”,
and search for similar constellations in test images. Our
approach does not need a root fragment, but creates a
number of groupings based on the desired similarity of
the occurrence distributions.
3. Method
The basic idea is that object parts with the same se-
mantic meaning occur at the same location(s) on an
object. For example, the mouth is always located in
978-1-4244-2175-6/08/$25.00 ©2008 IEEE