Multimedia Tools and Applications, 25, 59–83, 2005 c  2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. Generating Semantic Descriptions of Broadcasted Sports Videos Based on Structures of Sports Games and TV Programs NAOKO NITTA naoko@comm.eng.osaka-u.ac.jp NOBORU BABAGUCHI babaguchi@comm.eng.osaka-u.ac.jp Department of Communication Engineering, Osaka University, 2-1 Yamada-Oka, Suita, Osaka 565-0871, Japan TADAHIRO KITAHASHI kt@ksc.kwansei.ac.jp Department of Informatics, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan Abstract. This paper presents a model to represent a broadcasted sports video in a semantic way and proposes a method of automatically generating semantic descriptions of signiﬁcant scenes. Representation of a video should clarify the semantic content of the video as accurately as possible. Our model structurizes the video and speciﬁes suitable semantic descriptions for video segments paying attention to the structure of both a sports game and a sports TV program. As the elements of these semantic descriptions, the proposed method tries to obtain the information about the plays and their related players from the closed-caption stream by searching key phrases. Finding the corresponding segments of the video by means of template matching for the image stream attaches these textual descriptions to the proper portion of the video. In this paper, we discuss some experimental results of our method and the potentiality for integrating these results into the standardized MPEG-7 description tools. Keywords: video content analysis, broadcasted sports video, closed-caption, intermodal collaboration, MPEG-7 1. Introduction Continuous increase in the amount of multimedia data has strongly required the novel framework of simple but meaningful representation that enables efﬁcient multimedia appli- cations. As a tool to realize this representation, the MPEG-7, formally known as Multimedia Content Description Interface, became an international standard for describing multimedia data in 2001. The MPEG-7 allows descriptions of audio-visual content at different percep- tual and semantic levels, but does not specify how to obtain these descriptions or what kinds of descriptions are needed for a speciﬁc task. Therefore, aiming at effective content-based retrieval and summarization systems, we propose how to represent a broadcasted sports video (hereafter simply called a sports video) in a clear and concise way, and present a method for automatically generating the semantic descriptions necessary for the proposed representation. Videos have typical structures depending on their genres. For instance, a news video can be considered as a sequence of units each of which starts with an image frame presenting the anchor person followed by a variety of news. A drama video can be considered as an assembly of semantically interrelated scenes with in-between Commercial Message (CM)