Weakly Interacting Object Tracking in Indoor Environments Kao-Wei Wan † and Chieh-Chih Wang †‡ † Department of Computer Science and Information Engineering ‡ Graduate Institute of Networking and Multimedia National Taiwan University, Taipei, Taiwan Email: leo@robotics.csie.ntu.edu.tw, bobwang@ntu.edu.tw Tu T. Ton National Services, Transport Planning Parsons Brinckerhoff Australia Pty Limited Sydney, NSW, Australia Email: tton@pb.com.au Abstract— Interactions between targets have been exploited to solve the occlusion problem in multitarget tracking but not to provide higher level scene understanding. As indoor environments are relatively unconstrained than urban areas, interactions in indoor environments are weaker and have more variants. Weak interactions make scene interaction modeling and neighboring object interaction modeling challenging. In this paper, a place-driven scene interaction model is proposed to represent long-term interactions in indoor environments. To deal with complicated short-term interactions, the neighboring object interaction model consists of three short-term interaction models, following, approaching and avoidance. The moving model, the stationary process model and these two interaction models are integrated to accomplish weakly interacting object tracking. In addition, higher level scene understanding such as unusual activ- ity recognition and important place identiﬁcation is accomplished straightforwardly. The experimental results using data from a laser scanner demonstrate the feasibility and robustness of the proposed approaches. I. I NTRODUCTION Multiple moving object tracking or multitarget tracking is a key prerequisite for automating many useful robotics applica- tions. The classical approaches such as the multiple hypothesis tracking (MHT) algorithm [1] and the joint probabilistic data association (JPDA) approach [2] have been extensively applied in many applications. However, only a few works addressed the observation and motion modeling issues of interactions among the tracked objects and the scene. Khan et al. [3] proposed a Markov chain Monte Carlo (MCMC)-based particle ﬁlter to track interacting ants in which interactions are modeled through a Markov random ﬁeld motion prior. Their interaction potential is only based on static poses which cannot provide higher level scene understanding. Smith et al. [4] adopt a simple interaction model to penalize object overlapping. Sullivan and Carlsson [5] proposed to construct an interaction graph and then apply a two-stage clustering scheme to label the identity of the target. Instead of modeling or understanding interactions explicitly, these studies use the term, interaction, to describe the situations that the target and adjacent objects share the common measurements and cannot be correctly labeled. In these existing approaches, interactions represent negative information. Wang et al. [6] proposed a variable structure multiple model (VSMM) estimation framework[7] with a scene interaction model and a neighboring object interaction model to perform multiple interacting object tracking in urban areas using a laser scanner. In this framework, interactions gain positive information. The scene interaction model and the neighboring object interaction model respectively take the long-term and short-term interactions between the tracked object and its surroundings into account. This approach not only solves the data association problem but also provides higher level scene understanding. As moving objects in urban areas always obey the strict trafﬁc rules, the interactions in these urban areas are stronger than in indoor environments. Weaker interactions make scene interaction and neighboring object interaction modeling more challenging as objects have more freedom to move and the in- teractions could have more variants. In this paper, we propose to accomplish weakly interacting object tracking by exploiting a place-driven scene interaction model and a neighboring object interaction model consisting of three short interaction models. The basic maneuver model, the stationary process model and these two interaction models are seamlessly fused via a digraph switching algorithm in the VSMM estimation framework. In addition, higher level scene understanding such as unusual activity recognition and important place identiﬁca- tion is accomplished straightforwardly through the proposed interacting object tracking framework. The performance of the proposed approaches is evaluated with manually labeled ground truth data. The remainder of the paper is organized as follows. Section II reviews the VSMM estimation framework, and describes our approaches to integrate the basic maneuver and interaction models. The scene interaction model and the neighboring object model are described in Sections III and IV, respectively. In Section V, we demonstrate that the proposed approaches are able to solve the difﬁcult occlusion problem and to provide higher level scene understanding. The experimental results and performance evaluation are in Section VI. Finally, conclusion and future work are in Section VII. II. VARIABLE-STRUCTURE MULTIPLE MODEL ESTIMATION In this section, we review the theoretical foundations of the variable-structure multiple model (VSMM) estimation frame- work brieﬂy, and describe our approaches to integrate the moving models, the stationary model, the scene interaction model and the neighboring object interaction model in detail.