Natural Language Understanding by Combining Statistical Methods and Extended Context-Free Grammars Stefan Schw¨ arzler , Joachim Schenk , Frank Wallhoff, and G¨ unther Ruske Institute for Human-Machine Communication Technische Universit¨ at M¨ unchen 80290 Munich, Germany {sts,joa,waf,rus}@mmk.ei.tum.de Abstract. This paper introduces an novel framework for speech understanding using extended context-free grammars (ECFGs) by combining statistical methods and rule based knowledge. By only using 1st level labels a considerable lower ex- pense of annotation effort can be achieved. In this paper we derive hierarchical non-deterministic automata from the ECFGs, which are transformed into transi- tion networks (TNs) representing all kinds of labels. A sequence of recognized words is hierarchically decoded by using a Viterbi algorithm. In experiments the difference between a hand-labeled tree bank annotation and our approach is eval- uated. The conducted experiments show the superiority of our proposed frame- work. Comparing to a hand-labeled baseline system ( =100%) we achieve 95,4 % acceptance rate for complete sentences and 97.8 % for words. This induces an accuray rate of 95.1 % and error rate of 4.9 %, respectively F1-measure 95.6 % in a corpus of 1 300 sentences. 1 Introduction In this paper, we address the problem of developing a simple, yet powerful speech un- derstanding system based on manually derived domain-specific grammars. Contrary to existing grammar based speech understanding systems (e. g. Nuance Toolkit platform), not only grammar decisions are included, but information from grammar and word-label connections are combined as well and decoded by Viterbi algorithm. A two pass approach is often adopted, in which a domain-specific language model is constructed and used for speech recognition in the first pass, and the understanding model obtained with various learning algorithms is applied in the second pass to “un- derstand” the output from the speech recognizer. Another approach handles recognition and understanding at the same time [?,2]. For this purpose so-called “concepts” are de- fined, which represent a piece of information on the lexical as well as on the semantic level. In this way all the statistical methods from speech recognition can be utilized for all hierarchies. This concerns especially the stochastic modeling solutions offered by Hidden Markov Models. The hierarchies consist of transition networks (TNs) whose nodes either represent terminal symbols or refer to other TNs [3]. This approach uses Note: Both authors contributed equally to this paper. G. Rigoll (Ed.): DAGM 2008, LNCS 5096, pp. 254–263, 2008. c Springer-Verlag Berlin Heidelberg 2008