Mining Video Editing Rules in Video Streams Yuya Matsuo Department of Computer and System Engineering, Kobe University Nada, Kobe 657-8501, Japan yuya@ai.cs.scitec.kobe- u.ac.jp Miki Amano Department of Computer and System Engineering, Kobe University Nada, Kobe 657-8501, Japan miki@ai.cs.scitec.kobe- u.ac.jp Kuniaki Uehara Department of Computer and System Engineering, Kobe University Nada, Kobe 657-8501, Japan uehara@ai.cs.scitec.kobe- u.ac.jp ABSTRACT Data mining is a technique to discover useful patterns or patterns of special interest as explicit knowledge from a vast quantity of data. In video editing, there are a lot of editing patterns. According to the editor's preferences, different editing patterns give the opportunity to achieve a variety of effects. Discovering the editing patterns is required, because it is useful to find each editor's skills and to use them for editing new video material. In this paper, we propose the methods of extracting editing rules from video stream by introducing data mining technique. We can edit a video material by applying the extracted rules. The edited video may produce the same quality as the video from which we extracted the patterns. Keywords Data Mining, Video Grammar, Video Material, Video Editing, Shot Size. 1. INTRODUCTION A huge amount of multimedia information including video is becoming prevalent as a result of advances in multimedia computing technologies and high-speed networks. Due to its high information and entertainment capability, video is rapidly becoming one of the most popular media. A video editor connects some fragments of the video material with a certain meaning. As the video material itself has no meaning, the editing work is necessary in order to make the video more meaningful and attractive. However, the ways of conjugating the fragments are various. Moreover, when video is edited to precisely convey editor’s intention to a viewer, it must obey some universal rules. We call these rules “video grammar”. Professional video editors, like the broadcasting station staff, use such rules (video grammar). In documentary films, variety shows, and other TV programs, etc., the number of editing rules is limited. However, depending on the editor’s preferences, the edited video will produce a different effect even if the same video material is edited. We may able to find the patterns particular to each video type. For example, in the scene where two speakers A and B are talking, the shot contains both A and B that appears often in every three shots to show who are talking to each other. Some periodicity can be discovered. Apart from periodic patterns, there may be a continuous pattern over multiple cuts. We try to extract such editing rules particular to each video contents type. In order to extract these rules, the metadata such as shot size or camerawork has to be extracted and indexed to the video. Till now, such extraction work of editing rules has been done manually. In this paper, we propose the methods to automatically extract such editing rules specific to each video by using data mining. 2. VIDEO GRAMMAR The video grammar is a group of rules to define the shot connection. The rules are described in the same manner as conventional sentence grammar. A basic element is a group of shots to which the video grammar is applied. We shall explain, at first, the definition of a shot. The cut is defined as a physical continuous section where the camera starts recording at the beginning and stops at the end. On the other hand, the shot is defined as a logical continuous section where the shot size or camera work is uniquely defined within the cut, as shown in Fig. 1. Therefore, one or more shots are included in one cut. Figure 1: An example of a cut including three shots. The shot size is selected according to the distance from the camera to the objects. The shot size is classified into loose shot (LS), medium shot (MS) and tight shot (TS), as shown in Fig 2. TS and LS are the shots taken by approaching to or leaving from the object respectively compared with MS. A full shot is the shot where all the objects are included and is used as a master shot at the editing process. The following video grammar is available concerned with these shot sizes: Rule (1): Two shots cannot be connected to each other, where their shot sizes are extremely different, such as TS and LS. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’00, Month 1-2, 2000, City, State. Copyright 2000 ACM 1-58113-000-0/00/0000…$5.00. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Multimedia’02, December 1-6, 2002, Juan-les-Pins, France. Copyright 2002 ACM 1-58113-620-X/02/0012…$5.00.