SFF Anti-Spoofer: IIIT-H Submissionfor Automatic Speaker Veriﬁcation Spooﬁng and Countermeasures Challenge 2017 K N R K Raju Alluri, Sivanand Achanta, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty, and Anil Kumar Vuppala Speech Processing Laboratory, KCIS International Institute of Information Technology, Hyderabad, India {raju.alluri, sivanand.a, sudarsanareddy.kadiri}@research.iiit.ac.in, {svg, anil.vuppala}@iiit.ac.in Abstract The ASVspoof 2017 challenge is about the detection of re- played speech from human speech. The proposed system makes use of the fact that when the speech signals are replayed, they pass through multiple channels as opposed to original record- ings. This channel information is typically embedded in low signal to noise ratio regions. A speech signal processing method with high spectro - temporal resolution is required to extract ro- bust features from such regions. The single frequency ﬁltering (SFF) is one such technique, which we propose to use for replay attack detection. While SFF based feature representation was used at front-end, Gaussian mixture model and bi-directional long short-term memory models are investigated at the back- end as classiﬁers. The experimental results on ASVspoof 2017 dataset reveal that, SFF based representation is very effective in detecting replay attacks. The score level fusion of back end classiﬁers further improved the performance of the system which indicates that both classiﬁers capture complimentary in- formation. Index Terms: Spooﬁng, countermeasures, replay attack, Gaus- sian mixture model, bi-directional long short-term memory, sin- gle frequency ﬁltering. 1. Introduction Recent advances in speech technology made automatic speaker veriﬁcation (ASV) as a reliable biometric solution to many ap- plications like e-commerce and telephone banking [1, 2]. A general assumption in ASV is that the authorized user produces speech signal to the veriﬁcation system for access. However, this may not be true for all cases. For example, an unautho- rized user may get the access of veriﬁcation system by imitat- ing the authorized speaker voice. This manipulation is known as spooﬁng attack. The current state-of-the-art ASV systems [3, 4] are robust to the session and channel variations. However, they are vulnerable to spooﬁng attacks [5]. In the literature, four spooﬁng attacks were registered [5]. They are, impersonation, voice conversion (VC), speech synthesis (SS), and replay. In the present study, the focus is on developing countermeasures for replay attacks. A survey of studies on spooﬁng attacks for ASV is pre- sented in-detail in [5]. According to the survey in [5], most of the countermeasures were developed with the prior knowledge of spooﬁng attacks. However, this may not be the scenario for practical cases, where the nature of the attacks can not be known prior. Also, most of the works were conducted on non standard databases and hence the results are not comparable. With the aim to setup a standard datasets and common eval- uation protocols, a special session [6] in spooﬁng countermea- sures for ASV was conducted in INTERSPEECH 2013. Be- cause spooﬁng can be affected by high quality SS and VC, col- laborating with respective communities led to a rich and stan- dard data set on which robust countermeasures could be eval- uated. As a part of the series, the second special session was conducted in INTERSPEECH 2015 [7]. The organisers came up with a standard text independent data set and a common protocol to deal with VC and SS attacks. Several researchers have developed countermeasures for these attacks, the results are summarised in [8]. As replay attacks are not included in ASVspoof 2015 data set, the researchers from Idiap came up with a more generalised database AVspoof 1 [9] which contains VC, SS and replay attacks. In biometrics theory applications and systems (BTAS) 2016 [10], a special session was con- ducted on speaker anti spooﬁng challenge by providing a replay database collected from AVspoof. There are several studies re- ported on BTAS 2016 corpus [10, 11, 12, 13]. AVspoof corpus is collected with few recording devices in a controlled environ- ment with varying acoustics. For practical scenario there is a need of more generalised replay corpus. In the current spe- cial session on ASV spooﬁng and countermeasures (ASVspoof 2017) challenge 2 , the organisers provided a new text-dependent replay corpus [14] which is more diverse in nature than AVspoof for replay attack detection. The majority of the successful countermeasures to re- play attack detection reported in the literature are based on non-conventional features like inverted mel frequency cep- stral coefﬁcients (IMFCC) [10], rectangular frequency cep- stral coefﬁcients (RFCC) [11] and constant Q cepstral coefﬁ- cients (CQCC) [12, 13] coupled with Gaussian mixture model (GMM). In this study, we use recently proposed single fre- quency ﬁltering coefﬁcients (SFFCC) [15] for the task of re- play attack detection. Gaussian mixture model (GMM) and bi- directional long short-term memory (BLSTM) model are used as classiﬁers. As these two classiﬁers are distinctive in nature i.e, GMM is a generative model whereas BLSTM is a discrim- inative model, we fuse the two classiﬁer scores with a logistic regression to get beneﬁt from complementary nature of these classiﬁers. The organisation of the paper is as follows. In Section 2, proposed approach is described in detail which includes front- end features and back-end classiﬁers used. The experimental setup with detailed feature representation and classiﬁer param- eters are presented in Section 3. Results are discussed in Section 4. Finally, conclusion of study is presented in Section 5. 1 https://www.idiap.ch/dataset/avspoof 2 http://www.spoofingchallenge.org Copyright  2017 ISCA INTERSPEECH 2017 August 20–24, 2017, Stockholm, Sweden http://dx.doi.org/10.21437/Interspeech.2017-676 107