Shallow Semantic Analysis of ASR Transcripts Associated with Video Shots * Manuel Alc´antara Pl´a & Thierry Declerck DFKI GmbH manuel.alcantara,declerck @dfki.de Abstract We investigate the role that semantic analysis can play for the structuring of transcriptions automatically taken from video record- ings. Main problems arisen while working with this kind of corpus are reported, and semantic analysis focused on nouns and verbs are proposed to solve them. 1 Introduction We investigate the role that shallow semantic analysis can play for the anal- ysis and structuring of ASR Video Transcripts (AVT) in relation with the video contents. We have used the TRECVid 2002 corpus, which is domain- general and includes many different speakers. The AVT were automatically generated from the audio files and annotated in MPEG7 XML. The use of AVT is intended to help in providing for some semantic metadata in several tasks, most of them related to the necessity of identifying the content of videos [7, 8]. Some of this tasks, such as video content abstraction or video segmentation, are burning topics in the field of multimedia corpora, and low level visual/audio features do not seem to get accurate enough results for bridging the semantic gap between systems and users [4]. We describe in section 2 the most salient problems when analyzing AVT and the special characteristics of the corpus we are working on; in 3 and 4 we propose some methods to extract as much amount of information from the AVT as possible through the semantic analysis of nouns and verbs. Since * Research supported by the European Commission, contract FP6-027026, Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content - K-Space, and partially funded by a MEC/Fulbright grant of the Spanish Government. 1