International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 08 Issue: 05 | May 2021 www.irjet.net p-ISSN: 2395-0072
© 2021, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 3874
Language Translation by Stand-Alone Voice Cloning: A Multispeaker
Text-To-Speech Synthesis Approach based on Transfer Learning
Sakshi Bhajikhaye [2], Dr Sonali Ridhorkar [1], Vidhi Gautam [2], Mamta Soni [2], Mayank
Badole [2], Adarsh Kant [2], Pranjali Rewatkar [2]
[1]
HOD, Department of Computer Science, G.H. Raisoni Academy of Engineering and Technology, RTMNU,
Maharashtra, India 440033.
[2]
Student, Department of Computer Science, G.H. Raisoni Academy of Engineering and Technology RTMNU,
Maharashtra, India 440033.
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Stand Alone Language Translator is a speech to
speech translation application for android mobile phone,
which enables the translation of speech signals in a source
language to the target language in the human voice, which is
the same as the source voice. Stand Alone Language
Translator includes three modules, Speech Recognition,
Language Translation, and Speech Synthesis. The speech
recognition module captures the voice or speech from the
mobile user through Microphone, identifies then converts
speech into text, and then the text is sent to Language
Translation along with a sample voice for further process.
Language Translation module does the process of translation,
i.e., this module consists of a library for both languages, and
when text is received by this module, it converts the text of one
language to another selected by a user, and thus it sends the
translated text to the last module. The speech Synthesis
module acts as the text-to-speech translator, i.e., when it gets
the translated text. This module processes translated text
which converts it into speech and will provide the output to
the user in the same voice as the source human voice. Thus,
this language Translation application works by integrating all
these three modules and gives the user the best output.
Key Words: — End-to-end speech-to-speech translation,
Speech Recognition, Language Translation, Speech
Synthesizer, Multilingual speech
1.INTRODUCTION
Owing to this era, the global scenario adds to the demand for
communication among speakers of different languages.
Stand Alone Language Translator (SALT) enables the
communication between people speaking in different
languages. Stand Alone Language Translator being able to
speak and have one's words translated automatically into
the other person's language. This translator is used to
convey the original tone and intent of a source language to
the target language. Automatic speech to speech translation
technology consists of three separate technologies:
technology to recognize speech (speech recognition),
technology to translate the recognized words (language
translation), and technology to synthesize speech in the
other person's language (speech synthesis). Speech to
Speech Translation systems is often used in a specific
situation which includes supporting conversations in non-
native languages. The demand for trans-lingual
conversations triggered by IT technologies has boosted
research activities on Speech to Speech Translation
technology. The work proposed for Speech to Speech
Translation is a mobile application for an android platform
that translates the real-time speech of one language into
another required targeting language. A good speech-to-
speech translation system can be characterized by its ability
to keep intact the fluency and meaning of the original speech
input.
2. REVIEW
CASMACAT is a modular, web-based translation workbench
that offers advanced functionalities for computer-aided
translation and the scientific study of human translation.
MateCat is a tool whose objective is to improve the
integration of machine translation and human translation
within the so-called computer-aided translation framework.
It provides translators with text editors that can manage
several document formats and suitably arrange their content
into text segments ready to be translated [3]. Curriculum
learning (CL) might help avoid bad local minimums, hasten
training convergence, and improve generalization. These
advantages have been empirically demonstrated in various
tasks, including shape recognition, object classification, and
language modeling [1]. The advantage of SMT is that one does
not require a deeper syntactic understanding of Source and
Target languages [8]. Voice Translator is an android mobile
application that helps the user to translate one language to
another by using a Bluetooth environment which makes it
possible to talk with every human being indifferent language
[7]. The main goal of stand-alone voice conversion is to
modify an utterance from the source speaker while keeping
the linguistic contents unchanged in order to match the
frequency of the target speaker [4].
3. PROBLEM DEFINITION
Let us consider a dataset of utterances grouped by their
speaker where we denote the jth utterance of the ith speaker
as uij. Utterances are indicated in the waveform domain.
Then, we denote by xij -the log-mel spectrogram of the
utterance uij. A log-mel spectrogram is a deterministic, non-