A Machine Learning Approach to Speech Act Classification Using Function Words James O’Shea, Zuhair Bandar and Keeley Crockett Department of Computing and Mathematics Manchester Metropolitan University United Kingdom {z.bandar, k.crockett, j.d.oshea }@mmu.ac.uk Abstract. This paper presents a novel technique for the classification of sentences as Dialogue Acts, based on structural information contained in function words. It focuses on classifying questions or non-questions as a generally useful task in agent-based systems. The proposed technique extracts salient features by replacing function words with numeric tokens and replacing each content word with a standard numeric wildcard token. The Decision Tree, which is a well-established classification technique, has been chosen for this work. Experiments provide evidence of potential for highly effective classification, with a significant achievement on a challenging dataset, before any optimisation of feature extraction has taken place. Keywords:- Dialogue Act, Speech Act, Classification, Semantic Similarity, Decision Tree. Introduction Dialogue Act (DA) classification is an established element of research in the field of Dialogue Management [1-6]. This work is motivated by the application of DA classification to natural language interaction with Dialogue Systems (DSs) [7] and Robots [8]. The majority of current dialogue systems use Pattern Matching (PM) or Natural Language Processing (NLP) to analyse and answer a user utterance. PM systems have been reported as the best for developing dialogue systems that seem to be coherent and intelligent to users [9]. They support scalability to large numbers of users because they do not require preprocessing stages, but are labour intensive and therefore costly to develop and maintain. NLP systems have a substantial theoretical basis but require a chain of computationally intensive and error-prone stages such as pos-tagging, syntactical repair and parsing. This rules them out for web-based systems that must service many users in real time. Short Text Semantic Similarity (STSS) offers an alternative approach to PM and NLP. A user utterance (a unit of dialogue containing a communicative action [1]) is