Application of the Generic Feature Selection Measure in Detection of Web Attacks Hai Thanh Nguyen 1 , Carmen Torrano-Gimenez 2 , Gonzalo Alvarez 2 Slobodan Petrovi´ c 1 , and Katrin Franke 1 1 Norwegian Information Security Laboratory Gjøvik University College, Norway {hai.nguyen, katrin.franke, slobodan.petrovic}@hig.no 2 Instituto de F´ ısica Aplicada, Consejo Superior de Investigaciones Cient´ ıficas {carmen.torrano,gonzalo}@iec.csic.es Abstract. Feature selection for filtering HTTP-traffic in Web applica- tion firewalls (WAFs) is an important task. We focus on the Generic- Feature-Selection (GeFS) measure [4], which was successfully tested on low-level package filters, i.e., the KDD CUP’99 dataset. However, the performance of the GeFS measure in analyzing high-level HTTP-traffic is still unknown. In this paper we study the GeFS measure for WAFs. We conduct experiments on the publicly available ECML/PKDD-2007 dataset. Since this dataset does not target any real Web application, we additionally generate our new CSIC-2010 dataset. We analyze the sta- tistical properties of both two datasets to provide more insides of their nature and quality. Subsequently we determine appropriate instances of the GeFS measure for feature selection. We use different classifiers to test the detection accuracies. The experiments show that we can remove 63% of irrelevant and redundant features from the original dataset, while reducing only 0.12% the detection accuracy of WAFs. Key words: Web attack detection, Web application firewall, intrusion detection systems, feature selection, machine learning algorithms. 1 Introduction Web attacks pose many serious threats to modern Internet. The number of Web attacks is steadily increasing, consequently Web application firewalls (WAFs) [8] need to be more and more effective. One of the approaches for improving the effectiveness of WAFs is to apply the feature selection methods. Achiev- ing reduction of the number of relevant traffic features without negative effect on detection accuracy is a goal that greatly increases the available processing time of WAFs and reduces the required system resources. As there exist many feature selection algorithms (see, for example [2,3]), the question that arises is which ones could be applied in intrusion detection in general and in Web attack detection in particular. The most of the feature selection work in intrusion detec- tion practice is still done manually and the quality of selected features depends