International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 08 Issue: 05 | May 2021 www.irjet.net p-ISSN: 2395-0072 © 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 3874 Language Translation by Stand-Alone Voice Cloning: A Multispeaker Text-To-Speech Synthesis Approach based on Transfer Learning Sakshi Bhajikhaye [2], Dr Sonali Ridhorkar [1], Vidhi Gautam [2], Mamta Soni [2], Mayank Badole [2], Adarsh Kant [2], Pranjali Rewatkar [2] [1] HOD, Department of Computer Science, G.H. Raisoni Academy of Engineering and Technology, RTMNU, Maharashtra, India 440033. [2] Student, Department of Computer Science, G.H. Raisoni Academy of Engineering and Technology RTMNU, Maharashtra, India 440033. ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Stand Alone Language Translator is a speech to speech translation application for android mobile phone, which enables the translation of speech signals in a source language to the target language in the human voice, which is the same as the source voice. Stand Alone Language Translator includes three modules, Speech Recognition, Language Translation, and Speech Synthesis. The speech recognition module captures the voice or speech from the mobile user through Microphone, identifies then converts speech into text, and then the text is sent to Language Translation along with a sample voice for further process. Language Translation module does the process of translation, i.e., this module consists of a library for both languages, and when text is received by this module, it converts the text of one language to another selected by a user, and thus it sends the translated text to the last module. The speech Synthesis module acts as the text-to-speech translator, i.e., when it gets the translated text. This module processes translated text which converts it into speech and will provide the output to the user in the same voice as the source human voice. Thus, this language Translation application works by integrating all these three modules and gives the user the best output. Key Words: — End-to-end speech-to-speech translation, Speech Recognition, Language Translation, Speech Synthesizer, Multilingual speech 1.INTRODUCTION Owing to this era, the global scenario adds to the demand for communication among speakers of different languages. Stand Alone Language Translator (SALT) enables the communication between people speaking in different languages. Stand Alone Language Translator being able to speak and have one's words translated automatically into the other person's language. This translator is used to convey the original tone and intent of a source language to the target language. Automatic speech to speech translation technology consists of three separate technologies: technology to recognize speech (speech recognition), technology to translate the recognized words (language translation), and technology to synthesize speech in the other person's language (speech synthesis). Speech to Speech Translation systems is often used in a specific situation which includes supporting conversations in non- native languages. The demand for trans-lingual conversations triggered by IT technologies has boosted research activities on Speech to Speech Translation technology. The work proposed for Speech to Speech Translation is a mobile application for an android platform that translates the real-time speech of one language into another required targeting language. A good speech-to- speech translation system can be characterized by its ability to keep intact the fluency and meaning of the original speech input. 2. REVIEW CASMACAT is a modular, web-based translation workbench that offers advanced functionalities for computer-aided translation and the scientific study of human translation. MateCat is a tool whose objective is to improve the integration of machine translation and human translation within the so-called computer-aided translation framework. It provides translators with text editors that can manage several document formats and suitably arrange their content into text segments ready to be translated [3]. Curriculum learning (CL) might help avoid bad local minimums, hasten training convergence, and improve generalization. These advantages have been empirically demonstrated in various tasks, including shape recognition, object classification, and language modeling [1]. The advantage of SMT is that one does not require a deeper syntactic understanding of Source and Target languages [8]. Voice Translator is an android mobile application that helps the user to translate one language to another by using a Bluetooth environment which makes it possible to talk with every human being indifferent language [7]. The main goal of stand-alone voice conversion is to modify an utterance from the source speaker while keeping the linguistic contents unchanged in order to match the frequency of the target speaker [4]. 3. PROBLEM DEFINITION Let us consider a dataset of utterances grouped by their speaker where we denote the jth utterance of the ith speaker as uij. Utterances are indicated in the waveform domain. Then, we denote by xij -the log-mel spectrogram of the utterance uij. A log-mel spectrogram is a deterministic, non-