Logo-DM – A Speech Therapy Optimization Data Mining System Mirela Danubianu, Adina Luminita Barala Faculty of Electrical Engineering and Computer Science Stefan cel Mare University of Suceava Suceava, Romania e-mail: mdanub@eed.usv.ro; adina@eed.usv.ro Abstract— This paper presents Logo-DM, a prototype for a data mining system dedicated to help the optimization of the personalized speech therapy. It uses data collected by TERAPERS system that was implemented at the “Stefan cel Mare” University of Suceava to assist speech therapists in the treatment of children suffering from dyslalia. Over these data, some data mining methods has been applied. The patterns obtained are useful for specialists for an efficient current activity. These can also provide knowledge that serves to improve the support offered by TERAPERS by raising the quality of its embedded expert system. Keywords-computer-based speech therapy; data mining; classification; association rules. I. INTRODUCTION Speech impairment, one of the most common issues in childhood, might be the source of adult’s integration problems in the community. This is one of the reasons why a special attention was paid to speech therapy. A speech disorder can be corrected, if it is discovered and properly treated in due time. However, therapy is a complex process, which must be adapted to each child. Since 1990-2000, computer-assisted speech therapy became a frequent practice. In this context, many Computer- Based Speech Therapy (CBST) tools or systems were developed. For example, IBM has developed Speechviewer III system [1]. While users perform several speech actions, Speechviewer III creates an interactive visual model of speech. Another project is the ICATIANI device, developed by TLATOA Speech Processing Group, CENTIA Universidad de las Américas, Puebla Cholula, Pue, México [2]. It uses sounds and graphics in order to ensure the practice of Spanish Mexican pronunciation. The third example, Articulation Tutor (ARTUR) [3] provides an integrated speech therapy system. It contains two main components: an intuitive graphical interface named Wizard- of-Oz and a virtual speech tutor named Artur. Using audio (user’s utterance) and video (facial data) information, the system can recognize and reproduce mispronunciations. After that, ARTUR suggests the correct pronunciation (audio data) and the correct speech elements’ position (virtual articulator model). The use of these systems has allowed researchers and practitioners to collect a considerable volume of data, related to children’ particularities, therapeutically paths, and results. But, contrary to expectations, a large amount of data does not automatically lead to a significant increase of the volume and quality of information, because traditional data processing tools are not applicable. For these reasons, modern methods that aim to discover new and potentially useful patterns from large volumes of data were implemented. This process is called Knowledge Discovery in Databases (KDD) [4]. Its central step is data mining that involves the application of algorithms, which with acceptable performance, provide a particular enumeration of patterns from data. In 2008, at Research Center in Computer Science from “Stefan cel Mare” University of Suceava the TERAPERS system was implemented. This is a CBST that aims to assist the personalized therapy of dyslalia – an articulation disorder found to a significant percentage of children from age of 3-4 years. This is the first CBST developed for Romanian language. During its exploitation, data about few hundred cases were collected. This was the starting point for the idea to try the optimization of personalized speech therapy by data mining techniques. Our paper’s purpose is to show an overview of the Logo- DM system – a dedicate data mining system, that aims to optimize the personalized therapy of Romanian children suffering from dyslalia. This system is designed so that useful patterns can be easily discovered by speech therapists. They may use Logo- DM to analyze datasets obtained by integrating data collected in all speech therapy offices that use TERAPERS. In Section II, some basic concepts related to the Knowledge Discovery in Databases and the position occupied by data mining stage within this process are presented. Section III refers to speech disorders and their implications on the individual’s development. It highlights also the complexity of speech therapy. Section IV makes a brief description of the Logo-DM system. Finally, Section V contains some conclusion and future work. II. KNOWLEDGE DISCOVERY IN DATABASES PROCESS AND DATA MINING Knowledge Discovery in Databases concept was developed as a result of the emergence of very large volumes of data, whose analysis was not possible by using traditional database techniques. It aims to identify “valid, novel, potentially useful, and understandable patterns in data” [4], and is a complex, interactive and iterative process. 101 Copyright (c) IARIA, 2013. ISBN: 978-1-61208-311-7 IMMM 2013 : The Third International Conference on Advances in Information Mining and Management