Sri Lanka Association for Artificial Intelligence (SLAAI) Proceedings of second Annual Session 19 th February 2005 – Colombo Towards Building a Cognitive Vision System for Learning in Behavioural Models from Symbolic Data using Qualitative Spatio-Temporal Relations 22 Towards building a Cognitive Vision System for learning in Behavioural Models from Symbolic Data using Qualitative Spatio- Temporal Relations D. D. M. Ranasinghe, Department of Mathematics & Computer Science, The Open University of Sri Lanka, Nawala, Nugegoda, Sri Lanka menaka_dul@yahoo.com A.G.Cohn, School of Computing, University of Leeds, LS2 9JT, UK, agc@comp.leeds.ac.uk A. S. Karunananda Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex, UB8, 3PH, UK asoka.karunananda@brunel.ac.uk Abstract Research has been carried out to develop a cognitive vision computer system that will be capable of perceiving, reasoning and learning through visual inputs. Our approach is based on the assumption that robust qualitative spatio-temporal relations extracted from visual data can be used to successfully implement cognitive vision systems that behave like humans with reasoning and learning abilities. In our ongoing research, placing covers on a dinner table has been analysed and have been identified some basic robust qualitative spatio- temporal relations such as rightof, leftof, frontof, etc. Prolog has been used for implementation of analysis of spatio-temporal relations, while extraction of relevant rules from relations has been implemented with the help from Progol, which is a many sorted language for inductive logic-programming that implements learning by examples. The final goal of this project is to make a computer-based cognitive vision system capable of learning from more comprehensive set of examples and carry out complex reasoning on spatio- temporal visual data to learn behavioural models. 1. Introduction One of the fundamental abilities of human beings is to recognize, learn and conceptualise knowledge from visual inputs. We categorise the objects that we happen to see in the environment into semantic groups and extract relevant knowledge about them. These semantic groups can be generated according to our intention of what knowledge that we want to build from visual inputs. Another unique feature of humans is, given the same visual input the generated knowledge at different time points is different. This is because humans interpret what they see based on the prior diverse knowledge and experiences they have about the world; hence this generates the continuous process of incremental and adaptive learning. In addition, humans have the capability of applying the robust concepts/rules that they have already acquired to learn and adapt to new situations. This whole process of cognition can be interpreted as generation of knowledge on the basis of perception, reasoning, learning and prior models of the things. In this research we intend to adapt the fundamental ability of humans explained above to learn from visual inputs, in the process of building computer systems to behave/reason like humans. We argue that in an unfamiliar environment, humans tend to abstract qualitative relations among the constituents of the environment, as these seem to be more robust. This is a key feature that can be manipulated to build autonomous cognitive vision systems and at present this feature is not exploited to a great extent [9,12,13]. Therefore, in our research we exploit qualitative spatio- temporal relations in extracting robust concepts/rules. These learned spatio-temporal