November 14, 2006 18:18 WSPC/Trim Size: 9.75in x 6.5in for Review Volume Ch13˙SpokenDocRet CHAPTER 13 SPOKEN DOCUMENT RETRIEVAL AND SUMMARIZATION Berlin Chen, † Hsin-Min Wang ‡ and Lin-Shan Lee § † National Taiwan Normal University, Taipei ‡ Academia Sinica, Taipei § National Taiwan University, Taipei E-mail: berlin@csie.ntnu.edu.tw, whm@iis.sinica.edu.tw, lslee@gate.sinica.edu.tw Huge, continually increasing quantities of multimedia content including speech information are ﬁlling up our computers, networks and lives. It is obvious that speech is one of the most important sources of information for multimedia con- tent, as it is the speech of the content that tells us of the subjects, topics and concepts. As a result, the associated spoken documents of the multimedia con- tent will be key for content retrieval and browsing. Substantial eﬀorts along with very encouraging results for spoken document transcription, retrieval, and sum- marization have been reported. This chapter presents a concise yet comprehensive overview of information retrieval and automatic summarization technologies that have been developed in recent years for eﬃcient spoken document retrieval and browsing applications. An example prototype system for voice retrieval of Chinese broadcast news collected in Taiwan will be introduced as well. 1. Introduction Speech is the primary and most convenient means of communication between humans. 1 In the future of networks, digital content over the network will include all the information relating to our daily life activities, from real-time information to knowledge archives, from work environments to private services. Naturally, the most attractive form of content is multimedia, including speech which carries the information that tells us of the subjects, topics and concepts of the multimedia con- tent. As a result, the spoken documents associated with the network content will be key in retrieval and browsing activities. 2 At the same time, the rapid development of network and wireless technologies is making it possible for people to access network content not only from oﬃces and homes, but from anywhere, at any time with the use of small, hand-held devices such as personal digital assistants (PDAs) and cell phones. Today, our access to the network is primarily text-based. Users need to enter instructions by keying in words or texts, and the network or search engine in turn oﬀers text materials for the user to select. These users therefore interact with the network or search engine and 299