Automated closed-captioning of live TV broadcast news in French Julie Brousseau, Jean-Franc ¸ois Beaumont, Gilles Boulianne, Patrick Cardinal, Claude Chapdelaine, Michel Comeau, Fr´ ed´ eric Osterrath, Pierre Ouellet Centre de recherche informatique de Montr´ eal (CRIM) Montr´ eal, Canada Julie.Brousseau@crim.ca Abstract This paper describes the system currently under development at CRIM whose aim is to provide real-time closed captioning of live TV broadcast news in Canadian French. This project is done in collaboration with TVA Network, a national TV broad- caster and the RQST (a Qu´ ebec association which promotes the use of subtitling). The automated closed-captioning sys- tem will use CRIM’s transducer-based large vocabulary French recognizer. The system will be totally integrated to the existing broadcaster’s equipment and working methods. First ”on-air” use will take place in February 2004. 1. Introduction In Canada, one person out of ten (10%) suffers of a hearing impairment problem. In Qu´ ebec, this problem affects more than 750 000 people [1]. While subtitling becomes increasingly available in English (about 90% of televisual contents), hardly 60% of French broad- cast news are subtitled. In the area of live news broadcast- ing (live interviews or special reports) the percentage is lower still. The restricted accessibility of information to the French speaking deaf and hearing impaired viewers is in large part due to the current lack of available technologies dedicated to the production of live French closed-captioning. At present, only one Canadian TV broadcaster is capable of generating real-time closed-captioning by using a system developed in the late ’80s and based on stenography [2]. Subtitles are obtained by an ex- perienced stenographer feeding an automatic computer-based transcription system. Reported performances were sufficiently good at the time (5% WER) to use the system live. While this approach offered a viable solution to the prob- lem at the time, it is fast becoming obsolete for several reasons, with the absence of stenography teaching in Canada being at the forefront of the problem. Since the system requires opera- tion by a trained professional, this situation will ultimately lead to a human resources problem. Other important factors, such as the impossibility of updating internal models running on obso- lete development platforms (DOS) make this approach difficult to maintain. The federal government agency that oversees Canadian televisual content (CRTC) is aware of the situation and has be- gun to take act by compelling Canadian TV broadcasters to improve on the quantity and quality of their closed-captioning, particularly in the area of live broadcasts. In this context a joint project involving TVA Network, RQST and CRIM’s speech recognition team started in April 2002. The aim of the project is to adapt CRIM’s transducer- based large vocabulary French speech recognizer, to produce real-time subtitles of live TV broadcast news. The first “on-air” use of the system is planned for Febru- ary 2004. Two major design features lead us to believe that adequate levels of performance will be achieved for commer- cial usage. First, the system is based on a re-speak method [3]: the acoustic environment, widely variable in broadcast news, is ipso facto controlled. Second, the speech recognizer will have access to prior information such as the relevant news topic. This means that topic-dependent language models can be used. The next section outlines system design considerations that have been taken into account during the various stages of de- velopment. We also report on particular phonetic phenomena related to the re-speaking task (section 3). Section 4 gives a description of the acoustic and language models used for a pre- liminary evaluation. Finally, details on how performance is to be evaluated are given in section 5 and preliminary results re- ported. 2. System design In light of the fact that the proposed solution is to be used by the broadcaster’s current closed-captioning staff, it is important that the integration to the existing broadcaster’s equipment and working methods be thoroughly insightful. Our primary aim is to maintain system flexibility and ease of use for the closed- captioning staff, allowing progressive system adaptation and evolution following newscasting trends. The automatic closed-captioning process is as follows: au- dio is sourced from newscaster to re-speaker which repeats spo- ken content. Repeated audio is then sent over a computer net- work to the speech recognition system which in turn produces transcriptions that are filtered and formatted to be fed to the broadcaster’s closed caption encoder. The developed system is based on a client-server approach. While the server mainly provides transcriptions of spoken con- tent, the client provides a logon procedure protecting the user interface and the audio sampling environment. To guarantee constant digitizing quality across variations in computer hardware, recordings are made with an industry stan- dard USB microphone. Ideal recording conditions are main- tained by an automatic gain control. After each login, users are submitted to a session adapta- tion procedure that also validates recording quality. Users are prompted to repeat a set of pre-determined phonetically bal- anced sentences, from which an acoustic model likelihood is computed. To allow for a flexible speech recognition system that has the ability of keeping track of evolving news coverage, users are allowed to enter out-of-vocabulary words in a controlled fash- ion. Both the interface which provides a vocabulary update wiz- ard, and the speech recognition server are designed to make use EUROSPEECH 2003 - GENEVA 1245