Extended Hidden Vector State Parser Jan ˇ Svec 1 and Filip Jurˇ c´ıˇ cek 2 1 Center of Applied Cybernetics, Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Pilsen, 306 14, Czech Republic honzas@kky.zcu.cz 2 Cambridge University Engineering Department Cambridge CB21PZ, United Kingdom fj228@cam.ac.uk Abstract. The key component of a spoken dialogue system is a spoken under- standing module. There are many approaches to the understanding module design and one of the most perspective is a statistical based semantic parsing. This paper presents a combination of a set of modiﬁcations of the hidden vector state (HVS) parser which is a very popular method for the statistical semantic parsing. This paper describes the combination of three modiﬁcations of the basic HVS parser and proves that these changes are almost independent. The proposed changes to the HVS parser form the extended hidden vector state parser (EHVS). The per- formance of the parser increases from 47.7% to 63.1% under the exact match between the reference and the hypothesis semantic trees evaluated using Human- Human Train Timetable corpus. In spite of increased performance, the complex- ity of the EHVS parser increases only linearly. Therefore the EHVS parser pre- serves simplicity and robustness of the baseline HVS parser. 1 Introduction The goal of this paper is to brieﬂy describe the set of modiﬁcations of the hidden vec- tor state (HVS) parser and to show that these changes are almost independent. Every described modiﬁcation used alone signiﬁcantly improves the parsing performance. The idea is to incorporate these modiﬁcations into a single statistical model. We suppose that the combined model yields even better results. The HVS parser consists of two statisti- cal models - the semantic and the lexical model (see bellow). In the following sections we describe three techniques to improve the performance of the parser by modifying these models. First, we use a data-driven initialization of the lexical model of the HVS parser based on the use of negative examples which are collected automatically from the semantic corpus. Second, we deal with the inability of the HVS parser to process left-branching language structures. The baseline HVS parser uses a implicit pushing of concepts during a state transitions and this limits the class of generated semantic trees to be right- branching only. To overcome this constraint we introduce an explicit push operation into the semantic model and we extend the class of parseable trees to the left-branching trees, the right-branching trees and their combinations. V. Matouˇ sek and P. Mautner (Eds.): TSD 2009, LNAI 5729, pp. 403–410, 2009. c  Springer-Verlag Berlin Heidelberg 2009