A Vision-Based Architecture for Intent Recognition Alireza Tavakkoli, Richard Kelley, Christopher King, Mircea Nicolescu, Monica Nicolescu, and George Bebis Department of Computer Science and Engineering University of Nevada, Reno, USA {tavakkol, rkelley, cjking, mircea, monica, bebis}@cse.unr.edu Abstract. Understanding intent is an important aspect of communica- tion among people and is an essential component of the human cognitive system. This capability is particularly relevant for situations that involve collaboration among multiple agents or detection of situations that can pose a particular threat. We propose an approach that allows a physical robot to detect the intentions of others based on experience acquired through its own sensory-motor abilities. It uses this experience while taking the perspective of the agent whose intent should be recognized. The robot’s capability to observe and analyze the current scene employs a novel vision-based technique for target detection and tracking, using a non-parametric recursive modeling approach. Our intent recognition method uses a novel formulation of Hidden Markov Models (HMM’s) de- signed to model a robot’s experience and its interaction with the world while performing various actions. 1 Introduction The ability to understand the intent of others is critical for the success of com- munication and collaboration between people. The general principle of under- standing intentions that we propose in this work is inspired from psychological evidence of a Theory of Mind [1], which states that people have a mechanism for representing, predicting and interpreting each other’s actions. This mechanism, based on taking the perspective of others [2], gives people the ability to infer the intentions and goals that underlie action [3]. We base our work on these ﬁndings and we take an approach that uses the observer’s own learned experience to detect the intentions of the agent or agents it observes. When matched with our own past experiences, these sensory observations become indicative of what our intentions would be in the same situation. The proposed system models the interactions with the world, acquired from visual information. This information is used in a novel formulation of Hidden Markov Models (HMMs) adapted to suit our needs. The distinguishing feature in our HMMs is that they model not only transitions between discrete states, but also the way in which the parameters encoding the goals of an activity change