Application of the Generic Feature Selection Measure in Detection of Web Attacks Hai Thanh Nguyen 1 , Carmen Torrano-Gimenez 2 , Gonzalo Alvarez 2 Slobodan Petrovi´ c 1 , and Katrin Franke 1 1 Norwegian Information Security Laboratory Gjøvik University College, Norway {hai.nguyen, katrin.franke, slobodan.petrovic}@hig.no 2 Instituto de F´ ısica Aplicada, Consejo Superior de Investigaciones Cient´ ıﬁcas {carmen.torrano,gonzalo}@iec.csic.es Abstract. Feature selection for ﬁltering HTTP-traﬃc in Web applica- tion ﬁrewalls (WAFs) is an important task. We focus on the Generic- Feature-Selection (GeFS) measure [4], which was successfully tested on low-level package ﬁlters, i.e., the KDD CUP’99 dataset. However, the performance of the GeFS measure in analyzing high-level HTTP-traﬃc is still unknown. In this paper we study the GeFS measure for WAFs. We conduct experiments on the publicly available ECML/PKDD-2007 dataset. Since this dataset does not target any real Web application, we additionally generate our new CSIC-2010 dataset. We analyze the sta- tistical properties of both two datasets to provide more insides of their nature and quality. Subsequently we determine appropriate instances of the GeFS measure for feature selection. We use diﬀerent classiﬁers to test the detection accuracies. The experiments show that we can remove 63% of irrelevant and redundant features from the original dataset, while reducing only 0.12% the detection accuracy of WAFs. Key words: Web attack detection, Web application ﬁrewall, intrusion detection systems, feature selection, machine learning algorithms. 1 Introduction Web attacks pose many serious threats to modern Internet. The number of Web attacks is steadily increasing, consequently Web application ﬁrewalls (WAFs) [8] need to be more and more eﬀective. One of the approaches for improving the eﬀectiveness of WAFs is to apply the feature selection methods. Achiev- ing reduction of the number of relevant traﬃc features without negative eﬀect on detection accuracy is a goal that greatly increases the available processing time of WAFs and reduces the required system resources. As there exist many feature selection algorithms (see, for example [2,3]), the question that arises is which ones could be applied in intrusion detection in general and in Web attack detection in particular. The most of the feature selection work in intrusion detec- tion practice is still done manually and the quality of selected features depends