www.editada.org International Journal of Combinatorial Optimization Problems and Informatics, 13(2), May-Aug 2022, 65–75. ISSN: 2007-1558 _______________________________________________________________________________________ © Editorial Académica Dragón Azteca S. de R.L. de C.V. (EDITADA.ORG), All rights reserved. English mispronunciation detection module using a Transformer network integrated into a chatbot Marcos E. Martinez-Quezada, J. Patricia Sánchez-Solís, Gilberto Rivera, Rogelio Florencia*, Francisco López-Orozco División Multidisciplinaria de Ciudad Universitaria / Universidad Autónoma de Ciudad Juárez, Chih 32500, Mexico *Correspondence: rogelio.florencia@uacj.mx Abstract. Today it is crucial to have up-to-date information for companies to be more competitive in this business world. There are applications based on speech recognition that allows access to data stored in databases. However, the proper functioning of these applications lies in good pronunciation, a skill that most people do not have. In this paper, the architecture of an English mispronunciation detection module integrated into a chatbot is proposed. It allows users to enter the audio of the phrases in which they want to evaluate their pronunciation. The output is the mispronounced words, thus helping the user to practice their English language pronunciation. The proposed architecture consists of an Automatic Speech Recognizer (ASR) model based on a Transformer network that converts the audio signal to text and an algorithm for string alignment that identifies mispronounced words using the Levenshtein distance. The Transformer network was trained using the LibriSpeech and L2-ARTIC datasets. The module was evaluated using the Accuracy metrics, reaching 90%, and the Character Error Rate metric, reaching 9.5%. Additionally, its performance was evaluated on a group of real users, showing promising results. Keywords: Mispronunciation detection, Automatic Speech recognition, Transformer Network. Article Info Received: August 31, 2021 Accepted: October 23, 2021 1 Introduction Business Intelligence refers to the tools and strategies used in the processing, analysis, and visualization of data to support decision-making in companies [1]. Accessing up-to-date information in real time, stored on company servers or in the cloud, could allow decision-makers to have the certainty of carrying out business operations and obtaining favorable dividends for their companies. In this sense, applications based on speech recognition are intended to answer queries expressed in natural language by users [2]. However, on the one hand, for these applications to achieve a good performance, the pronunciation of the users is a key element, a skill that most people do not have. On the other hand, most of these applications have been developed for English, which is the predominant language in this globalized world. Pronunciation is often the most difficult skill to develop when learning a second language. Interaction with other people is a key point in developing speech skills. However, sometimes learning partner is not available, which may delay the improvement of this skill [3]. There are several tools that can help learners in language learning, such as websites or apps. One of those tools is chatbots, which has been well received in the second language learning task [4]. Additionally, Automatic Speech Recognition (ASR) systems are often used in the mispronunciation detection task [5]. Considering that frequent interaction with a chatbot could allow users to improve their pronunciation skills, the integration of both chatbots and ASR systems could be useful to emphasize the pronunciation of the language.