Alpha-trimmed Image Estimation for JPEG Steganography Detection Mei-Ching Chen , Sos S. Agaian , C. L. Philip Chen , and Benjamin M. Rodriguez ∗† Department of Electrical and Computer Engineering The University of Texas at San Antonio, San Antonio, TX, U.S.A. Space Department, Johns Hopkins University Applied Physics Laboratory Laurel, MD, U.S.A. axf710@my.utsa.edu Abstract—In information security, steganalysis has been an important topic since evidences first indicated steganography has been used for covert communication. Among all digital files, numerous devices generate JPEG images due to the capability of compression and compatibility. A large number of JPEG steganography methods are also provided online for free usage. This has spawned significant research in the area of JPEG steganalysis. This paper introduces an image estimation technique utilizing the alpha-trimmed mean for distinguishing clean and steganography images. The hidden information is considered additive noise to the image. The alpha-trimmed method estimates steganographic messages within images in the spatial domain and provide flexibility for classifying various steganography methods in the JPEG compression domain. For three JPEG steganography methods along with three embedding message files applied to an image data set, the proposed method results in better separability between clean and steganographic classes. The results are based on comparisons between the presented method and two existing methods in which classification accuracies are increased by as much as 32%. Index Terms—Alpha-trimmed mean, image estimation, JPEG steganalysis, feature generation I. I NTRODUCTION Information security is and will continue to be a serious issue. Digital steganography has been one of the main vehi- cles used to secure data. Secret information is imperceptibly hidden within signals with the use of steganography. Signals containing enclosed messages are stored or transmitted through public channels without indication that pertinent information is hidden. On the other hand, behaviors of computer and cyber crime which consider steganography as a means of concealment lead to the problem of steganalysis [1]. The goal of image steganography detection is to determine whether a given image potentially contains secret data. For the problem of image steganalysis, approximation techniques can be used to resolve certain characteristics of an image in order to determine the existence of anomalies. This includes estimating anomalies in the image pixel values or coefficient values in the transform domain. The predicted pixel val- ues/coefficients along with/without the original values can be used for generating features that are capable of separating input images into various categories. Related issues include detecting the existence of steganographic content, identification of the steganography method being used, extraction of the covert message, etc [2]. Among digital files, there are numerous sources that generate digital images. Furthermore, a majority of the devices create and store images as JPEG file types, a popularly used compressed image file format [3]. Due to a large number of online freeware generating steganography files with JPEG images, it is necessary to properly detect three forms of JPEG embedding methods: steganographic messages hidden within header files steganographic messages hidden within coefficients steganographic messages hidden within footers This paper focuses on steganography detection of JPEG images, in which steganography methods embed the secret message within JPEG coefficients. Due to the characteristics of JPEG images, information hiding in JPEG coefficients is disseminated throughout the image in spatial domain pixel values without visually distorting the image. Hence, the hidden messages are considered additive noises within the spatial domain. This is the basis for developing an approximation technique for steganography images. In the existing image feature generation methods for ste- ganalysis, approximation techniques used for image pixel value or coefficient estimations are based on cropping in the spatial domains [4], [5], regression in the wavelet domains [6] and coefficient comparisons in the JPEG domain [7]. This paper presents a spatial domain estimation technique, the alpha- trimmed mean filter estimation. This method provides small amounts of noise estimation disseminated throughout the spa- tial domain and concentrated in the low and mid band coeffi- cients in the JPEG quantized DCT blocks. Statistics are applied to both the original images and the predicted images for calculating a set of features. These statistics include a global histogram, individual histograms of low frequency coefficients, coefficient frequencies, coefficient variation, blockiness, and co-occurrence matrix of the coefficients [4]. The paper is organized as follows. Section II gives back- ground knowledge of two feature generation methods, DCT features [4] and Markov features [7], as well as alpha- trimmed mean [8] which will be used to estimate a given image. The proposed method including image estimation and statistical measurements for generating features is described in Section III. Section IV illustrates the classifier utilized here [9]. In addition, this section also describes cross validation Proceedings of the 2009 IEEE International Conference on Systems, Man, and Cybernetics San Antonio, TX, USA - October 2009 978-1-4244-2794-9/09/$25.00 ©2009 IEEE 4718