11. Cognitively Motivated Novelty Detection in Video Data Streams James M. Kang, Muhammad Aurangzeb Ahmad, Ankur Teredesai, and Roger Gaborski Summary. Automatically detecting novel events in video data streams is an extremely chal- lenging task. In recent years, machine-based parametric learning systems have been quite successful in exhaustively capturing novelty in video if the novelty ﬁlters are well-deﬁned in constrained environments. Some important questions however remain: How close are such systems to human perception? Can results derived from comparing human perception with machine novelty help tasks such as storing (indexing) and retrieval of novel events in large video repositories? In this chapter a quantitative experimental evaluation of human-based vs. machine-based novelty systems is canvassed. A machine-based system for detecting novel events in video data streams is ﬁrst described. The issues of designing an indexing-strategy or “Manga” (comic-book representation is termed as “manga” in Japanese) to effectively de- termine the “most-representative” novel frames for a video sequence are then discussed. The evaluation of human-based vs. machine-based novelty is quantiﬁed by metrics based on lo- cation of novel events, number of novel events, etc. Low-level image features were used for machine-based novelty detection and do not include any semantic processing such as object detection to keep the computational load to a minimum. 11.1 Introduction Extracting novelty from video streams is gaining attention because of the ready avail- ability of large amounts of video being collected and due to insufﬁcient means of automatically extracting important details from such media. Different ways to sum- marize video based on novel or important aspects of the video are being explored by a wide range of industries [9, 17, 24]. Businesses that use video conferencing are interested in ways to capture important sections of meetings and make an outline of each meeting available for future reference. Likewise, security/surveillance-based industries are looking for ways to detect novel events in huge streams of seemingly unimportant video data. We explore interesting ways to generate a cluster index of video frames, based on image features within the frames. Human novelty detection is then compared against a machine-based novelty detection technique. An example of such comparison is shown in Figure 11.1. The frames in the ﬁgure are the “representative novel frames” of a cluster found for both human and the machine. Differences and similarities between