Syntactic Image Parsing using Ontology and Semantic Descriptions
Ifeoma Nwogu
University of Rochester
Rochester, NY
inwogu@cs.rochester.edu
Venu Govindaraju
University at Buffalo, SUNY
Buffalo, NY
govind@buffalo.edu
Chris Brown
University of Rochester
Rochester, NY
brown@cs.rochester.edu
Abstract
We present an ontology-guided, symbol-based, image
parser which involves the use of semantic, spoken lan-
guage descriptions of entities in images as well as the
real-world spatial relationships defined between these enti-
ties. Our parsing approach explicitly describes objects and
the relationships between them with linguistically meaning-
ful modes of colors, textures and [coarse] expressions of
shapes.
The image parser is built on a syntactic image grammar-
based framework and performs a (near) global optimization
using superpixels as an initial set of subpatterns. It hypoth-
esizes the entities in images using their local semantic at-
tributes and verifies them globally using their more global
features and their relative spatial locations,. Evaluations
of the parser are performed on selected images which we
make publicly available along with their manual segmenta-
tions and our labeling results.
1. Introduction
In our image parsing approach, given a 2D image of the
real-world, we first decide whether the image is in the class
of outdoor or indoor images. Within these classes exist a set
of subdomains (or locations) as described by Zhu et al.[26].
Some examples of subdomains in their indoor class include
bathroom, hall, office etc. while examples in the outdoor
subdomain include street, highway, cityview, seashore etc.
Numerous approaches have been presented in the literature
that accomplish this task of scene categorization quite suc-
cessfully [11, 13, 24].
We therefore assume a given subdomain and explicitly
build an ontology to define the entities we expect to find
in it. Within the ontology, the parts-relationships (or mere-
ology) of the entities are defined, for example, the sky is
made up of both clouds and clear sky; The relationships be-
tween the entities in the vocabulary are are defined in the
ontology in the form of a syntactic tree, which defines how
pairs of entities can relate to each other (topological rela-
tionships). Finally, the ontology also describes any con-
straints imposed, to ensure that only certain syntax relations
are allowed. For example, if the sea and sky touch (in 2D
images), due to perspective geometry, the border between
them is a (near) straight line of zero slope. Such syntax rela-
tions make up the grammatical rules for parsing. Hence, we
capture and formally record human-based knowledge about
entities and their parts, their attributes and relationships us-
ing an ontology.
The summary of the proposed approach is as follows:
given any input image (assumed to be in one of the pre-
established sub-domains), the image is over-segmented to
yield a set of ”superpixels”, a locally, coherent grouping of
image pixels that preserve the structure necessary for image
parsing. Every entity in the image is therefore made up of
one or more superpixels or subpatterns. A graph is defined
over the superpixels and a Markov random field (MRF) is
imposed on the resulting irregular graph. Low-level seman-
tic features are extracted from the superpixels and these rep-
resent the pattern primitives that will initiate the parsing
process. Grammar production rules in form of ontologi-
cal rules are used for syntactically reconstructing the scene
from the graph of superpixels. Each superpixel stochasti-
cally takes on different entity labels and the sequence of
labels that yields the best explanation for the image scene,
according to the syntax relations, represents the final labels
of the parsed image.
We summarize our contributions as follows:
(i) where many approaches perform entity and scene cate-
gorization in an ad-hoc manner (categorizing unrelated
entities - tigers, computer screens, grand pianos, trees,
snow, and cars in one algorithm), we present a formal
approach to modeling real-world entities and their re-
lationships, using the ontology of subdomains. The
advantage here is that our model can be extended uni-
versally to eventually encompass most real-world en-
tities;
(ii) by implementing our parser formally, we pre-define a
vocabulary of entities, their syntax relations and the
rules of production. Hence, given any unconstrained
41 978-1-4244-7030-3/10/$26.00 ©2010 IEEE