505 978-1-7281-5761-0/20/$31.00 ©2020 IEEE Voice Antispoofing System Vulnerabilities Research Aleksandr M. Sinitca 1 , Nikita V. Efimchik, Evgeniy D. Shalugin, Vladimir A. Toropov Faculty of Computer Science and Technology Saint Petersburg Electrotechnical University "LETI" St. Petersburg, Russia 1 amsinitca@etu.ru Konstantin Simonchik ID R&D Inc. New York, USA simonchik@idrnd.net Abstract— Recently, the problem of protecting information systems from various types of spoofing is gaining relevance. The article presents a study of the voice anti-spoofing system for the search for vulnerabilities to text-to-speech attack. As part of the study, a new test dataset was created for the voice anti-spoofing system, which includes about 150,000 audio from more than 15,000 phrases in 25 languages by 8 TTS engines. The study showed uneven recognition quality depending on the voice of the text-to-speech converter and vulnerability to signal noise, which indicates the features of the detector. The results will allow improving the quality of detection of text-to-speech converters. Keywords— antispoofing; vulnerabilities; text-to-speech I. INTRODUCTION Nowadays, technologies related to the synthesis and modeling of speech are developing very quickly, allowing you to create voice recordings almost indistinguishable from real ones. Such services are called Text-to-Speech (TTS). That’s why the problem of protecting different systems from this type of attack is now one of the most relevant. A large number of scientists are busy in developing algorithms that would be able to distinguish the synthesized voice of the machine from the real one. These algorithms need to be thoroughly tested to make sure that the system really works and these tests are highly important the same as quality and diversity of these tests. To turn text into speech, 6 products were used: IBM Cloud API, Google Cloud Platform, Baidu TTS, Amazon Polly, Yandex SpeechKit, MaryTTS. In IBM Cloud API and Google Cloud Platform, in addition to сonventional speech engines, deep neural network engines were also used. II. SYSTEM UNDER TEST There are no ready-made serious solutions for voice protection from spoofing in open sources. There are so-called "anti-fraud" solutions that analyze audio, voice, behavior and metadata to create risk assessments of calls and customer credentials. And, while the solution implies the possible existence of anti-spoofing systems, nowhere does it say that they actually exist, and if so, how they work. Moreover, such solutions are largely based on telephony methods. There are many commercial anti-spoofing solutions, such as Microsoft, Nuance, STC solutions and many others, but there are obviously no open access to their systems. There are also freely available articles describing solutions from various competitions, such as the AVSspoof competition [1]. However, the finished products for these solutions are not publicly available. The study tested a system that uses the most common techniques and methods of building anti-spoofing systems. The system that showed the best results at the ASVspoof 2019 competition [2] was chosen as the target for the research of the vulnerabilities of anti-spoofing systems. System was used in Python API version. III. TESTING METHODOLOGY Testing was carried out in 3 stages. First, a corpus of texts in N different languages was collected. In the next step, all the prepared phrases were turned into audio recordings (in wav format) using the TTS services mentioned earlier. These audio recordings were then processed by the test system, which determined which were human speech and which were artificially synthesized speech. At the next stage, a certain number of recordings that did not pass the test were selected (the system defined them as artificially synthesized audio recordings) and "white noise" was imposed on them. The augmented audio recordings were again processed by the system under test. Eventually, an SDK was developed to automate the testing of anti-spoofing systems, and on the basis of the tests conducted, the attack resistance of the system under study based on the most popular methods of building anti- spoofing systems was analyzed. To search for vulnerabilities in the proposed anti-spoofing system using a text-to-speech vector, the path was chosen to create your own dataset based on publicly available texts in various languages for a set of main languages, as well as inheritance of the speech corpus from the translation dataset from materials of the European Parliament [3] containing 20 language pairs In addition, the Chinese language uses the textitData-Baker’s TTS data dataset. Taking into account the available languages and votes from the considered TTS, the following dataset was received: • IBM Cloud API: de: 540, en: 2496, es: 1508, fr: 784, it: 542, ja: 97, pt: 748. Total: 6715 • Google Cloud API: ar-XA: 600, cs-CZ: 468, da-DK: 522, de-DE: 1560, el-GR: 532, en-GB: 2496, en-US: 3120, es-ES: 377,.fi-FI: 528, fr-FR: 3136, hu-HU: 462. it-IT: 2168, ja-JP: 776, nl-NL: 2670, pl-PL: 2390, pt-