Relational Graph Mining for Learning Events from Video Muralikrishna Sridhar and Anthony G Cohn and David C Hogg 1 Abstract. In this work, we represent complex video activities as one large activity graph and propose a constraint based graph min- ing technique to discover a partonomy of classes of subgraphs corre- sponding to event classes. Events are deﬁned as subgraphs of the ac- tivity graph that represent what we regard as interesting interactions, that is, where all objects are actively engaged and are characterized by frequent occurrences in the activity graph. Subgraphs with these two properties are mined using a level-wise algorithm, and then par- titioned into equivalence classes which we regard as event classes. Moreover, a taxonomy of these event classes naturally emerges from the level-wise mining procedure. Experimental results in an aircraft turnaround apron scenario show that the proposed technique has con- siderable potential for characterizing and mining events from video. 1 Introduction An important problem in computer vision is to learn a high level un- derstanding of complex activities from videos starting with low level visual analysis. Such an understanding involves learning the events which are the natural building blocks of activities, and also their structural partonomic relationships. Complex activities are usually composed of multiple events that may occur in parallel, and overlap- ping events may share participating objects. Complex activities also contain spurious and missing objects and spatial relationships, aris- ing either due to instability in image processing or due to coinciden- tal occurrences. We address the problem of unsupervised discovery of an event partonomy from such complex video scenes. An important problem in graph mining is to mine interesting sub- graphs from a graph database or a single graph. Several techniques [2] have been developed to mine subgraphs that are interesting either because of their frequency or for satisfying certain constraints. In this work, we represent activities as a single large activity graph. The key hypothesis is that events (in contrast to noise and coincidental occur- rences) correspond to interesting subgraphs of this activity graph and are hence called event graphs. Our earlier work [9] introduced a relational qualitative spatio- temporal representation called an activity graph to represent inter- actions between all objects in a scene. Two measures of interesting- ness - frequency and a manually deﬁned focus mechanism were used to drive the mining process for discovering event graphs. We have very recently improved the representation in [11] with a more robust variable free activity graph and a generic focus mechanism called in- teractivity, both of which we adopt in this work. In [11], we focussed on learning the most probable interpretation of a video using a gen- erative model. In this work, we adopt a complementary graph mining approach of learning an event partonomy by characterizing events as 1 University of Leeds, UK, {krishna,agc,dch}@comp.leeds.ac.uk. This work is supported by the EPSRC (EP/D061334/1) and the EU FP7 (Project 214975, Co-Friend). We also thank colleagues in the Co-friend project. Trolley, Loader & Plane Plane Puller & Plane Bridge & Plane Interactions Figure 1. Aircraft handling scenario. The highlighted ellipses shows some groups of interacting objects. sufﬁciently frequent and interactive subgraphs of the activity graph. The underlying hypothesis is that non-events which may be obser- vation noise or coincidences do not tend to possess these systematic properties. This hypothesis is validated on large video data set cap- turing activities in an aircraft apron. This paper presents a more formal treatment of the graph mining technique that has been very brieﬂy introduced in our recent short paper in [10], where also a HMM for robustly computing the activity graph is introduced. This paper also formalizes the interactivity mea- sure in terms of graphs, which was originally formulated in terms of tracks [11]. 2 Related Work Much previous work on event analysis represents activities as propo- sitional sequences rather than in a more expressive relational form such as logic or graphs. Sequential representation has been used for unsupervised learning of events using standard frameworks such as pattern recognition techniques [14] , graphical models [13] and grammars [7]. However, activities that are composed of events hap- pening in parallel or with shared objects are challenging to mine with sequence based representations [3] or even those that may use logical sequences[1], since sequences do not form a natural representation for such parallel overlapping activities. These problems are addressed in [9], where we introduced a rela- tional graph based representations of activities for representing inter- actions between objects. This representation has been further mod- iﬁed in [11] with a more robust variable free activity graph where graph mining frameworks can be directly applied. We also intro- duced a generic focus mechanism called interactivity, both of which we adopt in this work. The following paragraphs provide an overview of graph mining approaches related to this work. Much of the initial work on graph based learning [15] focussed on frequent subgraphs since the isomorphism of graphs is combina- torially expensive [6]. Despite this restriction, many solutions that efﬁciently search the space of candidate frequent graphs have been