DETECTING SPLICING IN DIGITAL AUDIOS USING LOCAL NOISE LEVEL ESTIMATION Xunyu Pan, Xing Zhang and Siwei Lyu Computer Science Department, University at Albany, SUNY Albany, NY 12222, USA {xypan,xz654242,lsw}@cs.albany.edu ABSTRACT One common form of tampering in digital audio signals is known as splicing, where sections from one audio is inserted to another audio. In this paper, we propose an effective splic- ing detection method for audios. Our method achieves this by detecting abnormal differences in the local noise levels in an audio signal. This estimation of local noise levels is based on an observed property of audio signals that they tend to have kurtosis close to a constant in the band-pass filtered domain. We demonstrate the efficacy and robustness of the proposed method using both synthetic and realistic audio splicing forg- eries. Index Terms— Digital Forensics, Audio Splicing, Local Noise Level Estimation 1. INTRODUCTION Digital audios have become ubiquitous with the popularity of the internet and portable digital devices such as personal mu- sic players and smartphones. In the meanwhile, rapid devel- opments of low-cost and sophisticate editing software make the modification of audio file much easier for untrained users. There have been several recent cases of audio forgery draw- ing the public’s attention, including the alleged tampering of the recorded audio of actor Mel Gibbson [7], and the contro- versy over the authenticity of the audio files claimed to be the voices of Osama Bin Laden [8]. The increasing number of forged audios calls for more effective tools for the authentica- tion and forgery detection for digital audios. In this paper, we describe a new method that can be ap- plied to detect a common form of tampering in digital audio signals known as splicing, where sections from one audio are inserted into another audio. Our method achieves this by de- tecting abnormal differences in the local noise levels in an audio signal. The estimation of local noise levels is based on an observed property of audio signals – they tend to have kurtosis close to a constant in the band-pass filtered domain. The variance of noise in the audio signal is estimated by min- imizing an objective function that has a closed-form optimal solution. We examine the noise level inconsistency within the audio file, which are used to detect the location and length of suspicious audio clips. We also report the robustness and effectiveness of our method using both synthetic and realistic audio forgeries with splicing tampering. 2. PREVIOUS WORK Digital watermarking may be used to protect the authenticity of audio [3, 11]. However, to apply digital watermarking, it is necessary to have particular hardware/software support that most non-professional digital audios recording devices lack. Recent years, several active forensic detection methods for audio signals have been developed [4, 5, 6, 12]. For in- stance, acoustic device, e.g. microphones, are identified [4, 5] by extracting background features of audio stream. Simi- lar forensic tool based on the amount of sound reverberation, which uniquely decides the shape and composition of a room where the audio signal was recorded, is proposed in [6]. In another work [12], the digital tampering in MP3 audio data is identified by checking the inconsistency of frame offsets. However, most of these methods assume some knowledge of the recording device or the specific file format. On the other hand, we may obtain more general forgery detection methods using common statistical properties of digital audios indepen- dent of specific recording devices or file formats. In [10], local noise levels are estimated by computing the second and fourth moments at each local signal block. But the method assumes that the kurtosis values of the original signal are known, which is hard to satisfy in practice. Our proposed work is most closely related with the work in [13], where the optimal values for the kurtosis of the original clean signal and the variance of the noise are sought simultaneously by min- imizing an objective function, assuming scale invariance of signal kurtosis. In contrast, our method has an efficient im- plementation based on a closed-form solution, and can be ex- tended to estimate local noise levels. 3. METHOD 3.1. Kurtosis Constancy The audio kurtosis κ, which represents the peakedness of the distribution of the signal sampling values x, is defined as κ =