SFF Anti-Spoofer: IIIT-H Submissionfor Automatic Speaker Verification Spoofing and Countermeasures Challenge 2017 K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, and Anil Kumar Vuppala Speech Processing Laboratory, KCIS International Institute of Information Technology, Hyderabad, India {raju.alluri, sivanand.a, sudarsanareddy.kadiri}@research.iiit.ac.in, {svg, anil.vuppala}@iiit.ac.in Abstract The ASVspoof 2017 challenge is about the detection of re- played speech from human speech. The proposed system makes use of the fact that when the speech signals are replayed, they pass through multiple channels as opposed to original record- ings. This channel information is typically embedded in low signal to noise ratio regions. A speech signal processing method with high spectro - temporal resolution is required to extract ro- bust features from such regions. The single frequency filtering (SFF) is one such technique, which we propose to use for replay attack detection. While SFF based feature representation was used at front-end, Gaussian mixture model and bi-directional long short-term memory models are investigated at the back- end as classifiers. The experimental results on ASVspoof 2017 dataset reveal that, SFF based representation is very effective in detecting replay attacks. The score level fusion of back end classifiers further improved the performance of the system which indicates that both classifiers capture complimentary in- formation. Index Terms: Spoofing, countermeasures, replay attack, Gaus- sian mixture model, bi-directional long short-term memory, sin- gle frequency filtering. 1. Introduction Recent advances in speech technology made automatic speaker verification (ASV) as a reliable biometric solution to many ap- plications like e-commerce and telephone banking [1, 2]. A general assumption in ASV is that the authorized user produces speech signal to the verification system for access. However, this may not be true for all cases. For example, an unautho- rized user may get the access of verification system by imitat- ing the authorized speaker voice. This manipulation is known as spoofing attack. The current state-of-the-art ASV systems [3, 4] are robust to the session and channel variations. However, they are vulnerable to spoofing attacks [5]. In the literature, four spoofing attacks were registered [5]. They are, impersonation, voice conversion (VC), speech synthesis (SS), and replay. In the present study, the focus is on developing countermeasures for replay attacks. A survey of studies on spoofing attacks for ASV is pre- sented in-detail in [5]. According to the survey in [5], most of the countermeasures were developed with the prior knowledge of spoofing attacks. However, this may not be the scenario for practical cases, where the nature of the attacks can not be known prior. Also, most of the works were conducted on non standard databases and hence the results are not comparable. With the aim to setup a standard datasets and common eval- uation protocols, a special session [6] in spoofing countermea- sures for ASV was conducted in INTERSPEECH 2013. Be- cause spoofing can be affected by high quality SS and VC, col- laborating with respective communities led to a rich and stan- dard data set on which robust countermeasures could be eval- uated. As a part of the series, the second special session was conducted in INTERSPEECH 2015 [7]. The organisers came up with a standard text independent data set and a common protocol to deal with VC and SS attacks. Several researchers have developed countermeasures for these attacks, the results are summarised in [8]. As replay attacks are not included in ASVspoof 2015 data set, the researchers from Idiap came up with a more generalised database AVspoof 1 [9] which contains VC, SS and replay attacks. In biometrics theory applications and systems (BTAS) 2016 [10], a special session was con- ducted on speaker anti spoofing challenge by providing a replay database collected from AVspoof. There are several studies re- ported on BTAS 2016 corpus [10, 11, 12, 13]. AVspoof corpus is collected with few recording devices in a controlled environ- ment with varying acoustics. For practical scenario there is a need of more generalised replay corpus. In the current spe- cial session on ASV spoofing and countermeasures (ASVspoof 2017) challenge 2 , the organisers provided a new text-dependent replay corpus [14] which is more diverse in nature than AVspoof for replay attack detection. The majority of the successful countermeasures to re- play attack detection reported in the literature are based on non-conventional features like inverted mel frequency cep- stral coefficients (IMFCC) [10], rectangular frequency cep- stral coefficients (RFCC) [11] and constant Q cepstral coeffi- cients (CQCC) [12, 13] coupled with Gaussian mixture model (GMM). In this study, we use recently proposed single fre- quency filtering coefficients (SFFCC) [15] for the task of re- play attack detection. Gaussian mixture model (GMM) and bi- directional long short-term memory (BLSTM) model are used as classifiers. As these two classifiers are distinctive in nature i.e, GMM is a generative model whereas BLSTM is a discrim- inative model, we fuse the two classifier scores with a logistic regression to get benefit from complementary nature of these classifiers. The organisation of the paper is as follows. In Section 2, proposed approach is described in detail which includes front- end features and back-end classifiers used. The experimental setup with detailed feature representation and classifier param- eters are presented in Section 3. Results are discussed in Section 4. Finally, conclusion of study is presented in Section 5. 1 https://www.idiap.ch/dataset/avspoof 2 http://www.spoofingchallenge.org Copyright 2017 ISCA INTERSPEECH 2017 August 20–24, 2017, Stockholm, Sweden http://dx.doi.org/10.21437/Interspeech.2017-676 107